1. Directory organization

The workflow for setting up, running, and analysing a simulation consists of multiple and rather different steps. It is useful to perform these different steps in separate directories in order to avoid overwriting files or using wrong files.

1.1. Create working directories

It is recommended that the following directory structure be used, as the tutorial steps through them sequentially:

coord/
top/
solvation/
emin/
posres/
MD/
analysis/

Create these directories using:

mkdir top solvation emin posres MD analysis

Description of directories

coord
original PDB (structural) files
top
generating topology files (.top, .itp)
solvation
adding solvent and ions to the system
emin
performing energy minimization
posres
short MD simulation with position restraints on the heavy protein atoms, to allow the solvent to equilibrate around the protein without disturbing the protein structure
MD
MD simulation (typically, you will transfer the md.tpr file to a supercomputer, run the simulation there, then copy the the output back to this trajctory)
analysis

post-processing a production trajectory to facilitate easy visualization (i.e., using VMD); analysis of the simulations can be placed in (sub)directories under analysis, e.g.

analysis/RMSD
analysis/RMSF
...

The subdirectories depend on the specific analysis tasks that you want to carry out. The above directory layout is only a suggestion, but, in practice, some sort of ordered directory hierarchy will facilitate reproducibility, improve efficiency, and maintain your sanity.

Important

The command snippets in this tutorial assume the directory layout given above as the workflow depends on each step’s being carried out inside the appropriate directory. In particular, relative paths are used to access files from previous steps. It should be clear from context in which directory the commands are to be executed. If you get a File input/output error from grompp (or any of the other commands), first check that you are able to see the file by just doing a ls ../path/to/file from where you are in the file system. If you can’t see the file then check (1) that you are in the correct directory, (2) that you have created the file in a previous step.

1.2. Obtain starting structure

Note

The starting structure coord/4ake_a.pdb has been provided as part of the tutorial package, so the instructions that follow are optional for this tutorial. However, these steps provide an idea of what may be required in obtaining a suitable starting structure for MD simulation.

  1. Download 4AKE the Protein Data Bank (PDB) through the web interface

  2. Create a new PDB file with just chain A

    Modify the downloaded PDB file. For a relatively simple protein like AdK, one can just open the PDB file in a text editor and remove all the lines that are not needed.(For more complex situations, molecular modeling software can be used.)

  • Remove all comment lines (but keep TITLE, HEADER)
  • Remove all crystal waters (HOH) [1]
  • Remove all chain B ATOM records.
  • Save as coord/4ake_a.pdb.

Footnotes

[1]Often you would actually want to retain crystallographic water molecules as they might have biological relevance. In our example this is likely not the case and by removing all of them we simplify the preparation step somewhat. If you keep them, gmx pdb2gmx in the next step will actually create entries in the topology for them.