Usage ===== ``rna_predict`` comes with a command line interface. To see all available options run:: rna_predict --help Peparation workflow ------------------- These steps need to be run only once for every structure. 1. Directory Preparation ^^^^^^^^^^^^^^^^^^^^^^^^ The first step is to prepare a dedicated prediction directory. You need to have at least a fasta file containing the sequence and another file containing the secondary structure (in dot-bracket notation). Additionally, if available, you can supply a PDB file containing the native crystal structure. To prepare the directory, run the following:: rna_predict prepare \ --name \ --sequence \ --secstruct \ --native All the options are optional and default values are used if not given. See the output of ``rna_predict prepare --help`` for additional information. This will parse out stems and motifs into the ``preparation`` subdirectory. Note: When using the ``--native`` option, make sure the PDB file has neen adjusted to work with rosetta. This means that the residues have to be reordered starting at 1, so that Rosetta's ``pdbslice.py`` program uses the corrent ones. To achieve this, prepare the file using the Rosetta tool:: make_rna_rosetta_ready.py 2. Create ideal helix models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For each helical part of the structure, an ideal helix model needs to be generated:: rna_predict create-helices Main workflow ------------- 1. Constraints creation ^^^^^^^^^^^^^^^^^^^^^^^ When tertiary constraints are to be used, ``.cst`` file should be created and put int the ``constraints`` subdirectory. The syntax is explained in the `Rosetta documentation `_. The file can be created by hand or using the following tools: 1a. Generation from DCA files """"""""""""""""""""""""""""" To create a tertiary constraints from DCA predictions, use the following command:: rna_predict make-constraints \ --dca-file \ --mapping-mode \ --dca-count \ --cst-function \ --cst-out-file \ --filter The input dca file should contain at least two columns containing the residue numbers of the contact. The file should be sorted by DCA score. Optionally, the first line of the file may contain a comment specifying a PDB-mapping such as the following:: # pdb-mapping: 10-12,44,80-90 This defines how the residue numbers in the DCA file are to be mapped. Rosetta always uses 1,2,3,... internally, so the mapping above would, for example, result in the residue number 12 in the DCA file to be mapped to prediction residue 3. The ``--mapping-mode`` parameter specifies the method to map residue-residue contacts to atom-atom contacts. Options are: * minAtom * pOnly For details about the mapping, see :ref:`atom-mapping`. The ``--dca-count`` option limits the number of predictions in the DCA input file. The ``--cst-function`` sets the Rosetta function to use. See ``_ for details. The default function for constraints creation (``FADE -100 26 20 -2 2``) uses a spline smoothed square well potential (represented by the "FADE" function) and a default parameter set. After the generation of the cst file, it can of course be fine-tuned by further modification in any text editor. The ``--cst-out-file`` option specifies an output filename. The ``--filter`` option allows the DCA contacts to be passed through a chain of filters first. For the filter documentation see :ref:`filters`. 1b. Editing existing file """"""""""""""""""""""""" To simply replace the Rosetta function in an existing ``.cst`` file you can use:: rna_predict edit-constraints \ --cst \ --cst-function \ --cst-out-file For option explanation see above. This is pretty much the same as using search-and-replace in any text editor. 2. Prepare constraints ^^^^^^^^^^^^^^^^^^^^^^ To run a simulation with a specific set of constraints (or none), another preparation step needs to be run:: rna_predict prepare-cst \ --cst \ --override-motifs-cst The ``--cst`` option selects the constraints from the ``constraints`` directory to be prepared. If not given, a prediction called 'none' for no tertiary constraints is created. Optionally, it is possible to use a different set of motifs for the assembly. For example you can create a common set of motif models and use this in all future assemblies. To do this, specify the ``--override-motifs-cst`` option. 3. Motif creation ^^^^^^^^^^^^^^^^^ For all non-helical parts (loop regions, etc.) multiple models need to be created. To do this, run the following:: rna_predict create-motifs \ --cst \ --cycles \ --nstruct \ --seed \ --use-native As always, ``--cst`` selects the constraints. The ``--cycles`` option sets the number of monte-carlo cycles to run for generating each model. The ``--nstruct`` option sets the number of models created for each motif. To override the initial random seed, you can specify ``--seed``. And to have Rosetta automatically calculate RMSD values to a native structure you can supply the ``--use-native`` option. 4. Assembly ^^^^^^^^^^^ To combine helix and motif models an assembly simulation is run:: rna_predict assemble \ --cst \ --cycles \ --nstruct \ --seed \ --use-native The options are the same as the ones for ``create-motifs``, but their default values vary. Note: The assembly step does not check how many models have already been created so far. 5. Evaluation ^^^^^^^^^^^^^ 5a. Evaluation using Rosetta clustering and scoring """"""""""""""""""""""""""""""""""""""""""""""""""" When the assembly has finished, you can evaluate the simulation. This means: * Cluster the models * Calculate RMSD values to the native structure, if available, and to the model with the best score. Usage:: rna_predict evaluate \ --cst \ --cluster-cutoff \ --cluster-limit \ --full-eval The ``--cluster-cutoff`` option specifies the RMSD radius in angstrom after which to create a new cluster. The ``--cluster-limit`` option limits the maximum number of clusters to be created. The ``--full-eval`` option forces the whole evaluation to be run again, and ignore any previous results stored. 5b. Custom scoring """""""""""""""""" Due to the fact that DCA predictions are not perfect, a custom scoring method was created. For each DCA prediction neighboring residues are included and if the distance between any of these residue paris are in contact the score is increased. Usage:: rna_predict evaluate-custom \ --cst \ --dca-file \ --dca-count \ --radius \ --threshold \ --full-eval For the ``--dca-file`` and ``--dca-count`` options see ``make-constraints``. The ``--radius`` option sets the number of neighboring residues to take into account. The ``--threshold`` option sets the distance threshold under which a residue pair is treated as in-contact. Utilities --------- Status information ^^^^^^^^^^^^^^^^^^ To print a summary of all predictions and their current state, run:: rna_predict status \ [--compare] \ [cst [cst ...] The output table contains the following columns: * P: preparation step * M: motif generation: * A: assembly: * E: evaluation If a step is completed, *X* is shown, *-* otherwise. For motif generation a * * * may be shown to indicate that models from a different set of constraints are used. When the ``--compare`` option is given, comparison to the native structure is performed and the output is extended with the following columns: * 1: Native RMSD score of the first cluster * 5: Lowest native RMSD score of the first 5 clusters * 10: Lowest native RMSD score of the first 10 clusters * n: Number of models Model information and extraction ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To print model information or extract PDB files use the following subcommands:: rna_predict print-models|extract-models \ --cst \ --mode \ model [model ...] The ``--mode`` option selects the way to look up models: * cluster: Cluster number to reference the cluster primary model * cluster_ntop: Clusters sorted by the RMSDs of their representatives * ntop: Models sorted by RMSD to native structure * tag: Internal model name * top: Models sorted by Rosetta score The ``model`` options may be string (if ``mode`` is 'tag'), or numbers. For ``mode=cluster_ntop`` it may also be in the form of ``n/m``, meaning the ``n``th best cluster out of the first ``m`` clusters. Examples:: rna_predict extract-models --mode=tag S_000289 S_000100 # extract two models by tag rna_predict extract-models --mode=top 1 2 3 4 5 # extract the two best-scoring models rna_predict extract-models --mode=ntop 1 # extract the model with the lowest native RMSD rna_predict extract-models --mode=cluster 1 2 3 4 5 # extract the cluster primaries of the first 5 clusters rna_predict extract-models --mode=cluster_ntop 1/5 # extract the lowest native RMSD cluster out of the first 5 clusters Evaluation tools ^^^^^^^^^^^^^^^^ Plot generation and other tools can be accessed using:: rna_predict tools ... plot-clusters """"""""""""" Plot score over native RMSD. Usage:: rna_predict tools plot-clusters \ --score-weights \ --max-models \ cst The ``--score-weights`` options allows to calculate a different total model score using the individual Rosetta scores. The score name "default" can be given to set a default value for all other, non-specified scores. For example, to only visualize the score of additional constraints, use:: --score-weights default:0,atom_pair_constraint:1 For a list of score names, refer to the Rosetta documentation or use the ``print-models`` command. The ``--max-models`` option limits the number of models by either specifying a number (greater than 1) or a fraction (smaller than or equal to 1.0). plot-constraint-quality """"""""""""""""""""""" This visualizes the distances of constraints by comparing it to a reference (native) PDB structure. Usage:: rna_predict tools plot-constraint-quality \ --dca-mode \ reference-pdb \ cst|dca|filter [cst|dca|filter ...] When ``--dca-mode`` is given, residue-residue distances are plotted, atom-atom contacts otherwise. For the ``filter``-syntax see :ref:`filters`. plot-contact-atoms """""""""""""""""" Plots atoms involved in forming nucleotide contacts that satisfy the cutoff condition in the contact database. Usage:: rna_predict tools plot-contact-atoms \ --mean-cutoff \ --std-cutoff The ``--mean-cutoff`` and ``--std-cutoff`` options select the limits for the average contact distances standard deviations. plot-contact-distances """""""""""""""""""""" Plots histogram for each nucleotide pair contact containing the distances of the atoms involved. Usage:: rna_predict tools plot-contact-distances plot-dca-contacts-in-pdb """""""""""""""""""""""" Visualizes how well DCA contacts are fullfiled in PDB files. Usage:: rna_predict tools plot-dca-contacts-in-pdb \ dca-file \ pdb-file [pdb-file -..] plot-pdb-comparison """"""""""""""""""" Compare PDB files by plotting the distances of the residues. Usage:: rna_predict tools plot-pdb-comparison \ ref-pdb \ sample-pdb [sample-pdb ...] plot-gdt """""""" Create a GDT (gloabl distance test) plot. This plots a distance cutoff on the y-axis and the percent of residues which are below the cutoff on the x-axis. Usage:: rna_predict tools plot-gdt \ ref-pdb \ sample-pdb [sample-pdb ...]