1. Directory organization¶
The workflow for setting up, running, and analysing a simulation consists of multiple and rather different steps. It is useful to perform these different steps in separate directories in order to avoid overwriting files or using wrong files.
1.1. Create working directories¶
It is recommended that the following directory structure be used, as the tutorial steps through them sequentially:
coord/
top/
solvation/
emin/
posres/
MD/
analysis/
Create these directories using:
mkdir top solvation emin posres MD analysis
Description of directories
coord
- original PDB (structural) files
top
- generating topology files (
.top
,.itp
) solvation
- adding solvent and ions to the system
emin
- performing energy minimization
posres
- short MD simulation with position restraints on the heavy protein atoms, to allow the solvent to equilibrate around the protein without disturbing the protein structure
MD
- MD simulation (typically, you will transfer the
md.tpr
file to a supercomputer, run the simulation there, then copy the the output back to this trajctory) analysis
post-processing a production trajectory to facilitate easy visualization (i.e., using VMD); analysis of the simulations can be placed in (sub)directories under analysis, e.g.
analysis/RMSD analysis/RMSF ...
The subdirectories depend on the specific analysis tasks that you want to carry out. The above directory layout is only a suggestion, but, in practice, some sort of ordered directory hierarchy will facilitate reproducibility, improve efficiency, and maintain your sanity.
Important
The command snippets in this tutorial assume the directory layout given
above as the workflow depends on each step’s being carried out
inside the appropriate directory. In particular, relative paths are used
to access files from previous steps. It should be clear from context
in which directory the commands are to be executed. If you get a
File input/output error
from grompp (or any of the
other commands), first check that you are able to see the file by just
doing a ls ../path/to/file
from where you are in the file system.
If you can’t see the file then check (1) that you are in the correct
directory, (2) that you have created the file in a previous step.
1.2. Obtain starting structure¶
Note
The starting structure coord/4ake_a.pdb
has been
provided as part of the tutorial package, so the instructions that
follow are optional for this tutorial. However, these steps provide an
idea of what may be required in obtaining a suitable starting
structure for MD simulation.
Download 4AKE the Protein Data Bank (PDB) through the web interface
Create a new PDB file with just chain A
Modify the downloaded PDB file. For a relatively simple protein like AdK, one can just open the PDB file in a text editor and remove all the lines that are not needed.(For more complex situations, molecular modeling software can be used.)
- Remove all comment lines (but keep TITLE, HEADER)
- Remove all crystal waters (HOH) [1]
- Remove all chain B ATOM records.
- Save as
coord/4ake_a.pdb
.
Footnotes
[1] | Often you would actually want to retain crystallographic water molecules as they might have biological relevance. In our example this is likely not the case and by removing all of them we simplify the preparation step somewhat. If you keep them, gmx pdb2gmx in the next step will actually create entries in the topology for them. |