Main workflow¶
1. Constraints creation¶
When tertiary constraints are to be used, .cst file should be created and put int the constraints subdirectory. The syntax is explained in the Rosetta documentation.
The file can be created by hand or using the following tools:
1a. Generation from DCA files¶
To create a tertiary constraints from DCA predictions, use the following command:
rna_predict make-constraints \
--dca-file <dca.dat> \
--mapping-mode <mapping_mode> \
--dca-count <count> \
--cst-function <function> \
--cst-out-file <output_name> \
--filter <filter>
The input dca file should contain at least two columns containing the residue numbers of the contact. The file should be sorted by DCA score. Optionally, the first line of the file may contain a comment specifying a PDB-mapping such as the following:
# pdb-mapping: 10-12,44,80-90
This defines how the residue numbers in the DCA file are to be mapped. Rosetta always uses 1,2,3,... internally, so the mapping above would, for example, result in the residue number 12 in the DCA file to be mapped to prediction residue 3.
The --mapping-mode parameter specifies the method to map residue-residue contacts to atom-atom contacts. Options are:
- minAtom
- pOnly
For details about the mapping, see Residue-residue to atom-atom mapping.
The --dca-count option limits the number of predictions in the DCA input file.
The --cst-function sets the Rosetta function to use. See https://www.rosettacommons.org/docs/latest/constraint-file.html#Function-Types for details. The default function for constraints creation (FADE -100 26 20 -2 2) uses a spline smoothed square well potential (represented by the “FADE” function) and a default parameter set. After the generation of the cst file, it can of course be fine-tuned by further modification in any text editor.
The --cst-out-file option specifies an output filename.
The --filter option allows the DCA contacts to be passed through a chain of filters first. For the filter documentation see DCA Filtering Syntax.
1b. Editing existing file¶
To simply replace the Rosetta function in an existing .cst file you can use:
rna_predict edit-constraints \
--cst <input_cst> \
--cst-function <function> \
--cst-out-file <output_name>
For option explanation see above.
This is pretty much the same as using search-and-replace in any text editor.
2. Prepare constraints¶
To run a simulation with a specific set of constraints (or none), another preparation step needs to be run:
rna_predict prepare-cst \
--cst <cst> \
--override-motifs-cst <motif_cst>
The --cst option selects the constraints from the constraints directory to be prepared. If not given, a prediction called ‘none’ for no tertiary constraints is created.
Optionally, it is possible to use a different set of motifs for the assembly. For example you can create a common set of motif models and use this in all future assemblies. To do this, specify the --override-motifs-cst option.
3. Motif creation¶
For all non-helical parts (loop regions, etc.) multiple models need to be created. To do this, run the following:
rna_predict create-motifs \
--cst <cst> \
--cycles <cycles> \
--nstruct <nstruct> \
--seed <random_seed> \
--use-native
As always, --cst selects the constraints.
The --cycles option sets the number of monte-carlo cycles to run for generating each model.
The --nstruct option sets the number of models created for each motif.
To override the initial random seed, you can specify --seed.
And to have Rosetta automatically calculate RMSD values to a native structure you can supply the --use-native option.
4. Assembly¶
To combine helix and motif models an assembly simulation is run:
rna_predict assemble \
--cst <cst> \
--cycles <cycles> \
--nstruct <nstruct> \
--seed <random_seed> \
--use-native
The options are the same as the ones for create-motifs, but their default values vary.
Note: The assembly step does not check how many models have already been created so far.
5. Evaluation¶
5a. Evaluation using Rosetta clustering and scoring¶
When the assembly has finished, you can evaluate the simulation. This means:
- Cluster the models
- Calculate RMSD values to the native structure, if available, and to the model with the best score.
Usage:
rna_predict evaluate \
--cst <cst> \
--cluster-cutoff <cutoff> \
--cluster-limit <limit> \
--full-eval
The --cluster-cutoff option specifies the RMSD radius in angstrom after which to create a new cluster.
The --cluster-limit option limits the maximum number of clusters to be created.
The --full-eval option forces the whole evaluation to be run again, and ignore any previous results stored.
5b. Custom scoring¶
Due to the fact that DCA predictions are not perfect, a custom scoring method was created. For each DCA prediction neighboring residues are included and if the distance between any of these residue paris are in contact the score is increased. Usage:
rna_predict evaluate-custom \
--cst <cst> \
--dca-file <dca_file> \
--dca-count <count> \
--radius <radius> \
--threshold <threshold> \
--full-eval
For the --dca-file and --dca-count options see make-constraints.
The --radius option sets the number of neighboring residues to take into account.
The --threshold option sets the distance threshold under which a residue pair is treated as in-contact.