Usage

rna_predict comes with a command line interface. To see all available options run:

rna_predict --help

Peparation workflow

These steps need to be run only once for every structure.

1. Directory Preparation

The first step is to prepare a dedicated prediction directory. You need to have at least a fasta file containing the sequence and another file containing the secondary structure (in dot-bracket notation). Additionally, if available, you can supply a PDB file containing the native crystal structure. To prepare the directory, run the following:

rna_predict prepare \
    --name <name> \
    --sequence <fasta_file> \
    --secstruct <secstruct_file> \
    --native <native_pdb_file>

All the options are optional and default values are used if not given. See the output of rna_predict prepare --help for additional information.

This will parse out stems and motifs into the preparation subdirectory.

Note: When using the --native option, make sure the PDB file has neen adjusted to work with rosetta. This means that the residues have to be reordered starting at 1, so that Rosetta’s pdbslice.py program uses the corrent ones. To achieve this, prepare the file using the Rosetta tool:

make_rna_rosetta_ready.py <native_pdb>

2. Create ideal helix models

For each helical part of the structure, an ideal helix model needs to be generated:

rna_predict create-helices

Main workflow

1. Constraints creation

When tertiary constraints are to be used, .cst file should be created and put int the constraints subdirectory. The syntax is explained in the Rosetta documentation.

The file can be created by hand or using the following tools:

1a. Generation from DCA files

To create a tertiary constraints from DCA predictions, use the following command:

rna_predict make-constraints \
    --dca-file <dca.dat> \
    --mapping-mode <mapping_mode> \
    --dca-count <count> \
    --cst-function <function> \
    --cst-out-file <output_name> \
    --filter <filter>

The input dca file should contain at least two columns containing the residue numbers of the contact. The file should be sorted by DCA score. Optionally, the first line of the file may contain a comment specifying a PDB-mapping such as the following:

# pdb-mapping: 10-12,44,80-90

This defines how the residue numbers in the DCA file are to be mapped. Rosetta always uses 1,2,3,... internally, so the mapping above would, for example, result in the residue number 12 in the DCA file to be mapped to prediction residue 3.

The --mapping-mode parameter specifies the method to map residue-residue contacts to atom-atom contacts. Options are:

  • minAtom
  • pOnly

For details about the mapping, see Residue-residue to atom-atom mapping.

The --dca-count option limits the number of predictions in the DCA input file.

The --cst-function sets the Rosetta function to use. See https://www.rosettacommons.org/docs/latest/constraint-file.html#Function-Types for details. The default function for constraints creation (FADE -100 26 20 -2 2) uses a spline smoothed square well potential (represented by the “FADE” function) and a default parameter set. After the generation of the cst file, it can of course be fine-tuned by further modification in any text editor.

The --cst-out-file option specifies an output filename.

The --filter option allows the DCA contacts to be passed through a chain of filters first. For the filter documentation see DCA Filtering Syntax.

1b. Editing existing file

To simply replace the Rosetta function in an existing .cst file you can use:

rna_predict edit-constraints \
    --cst <input_cst> \
    --cst-function <function> \
    --cst-out-file <output_name>

For option explanation see above.

This is pretty much the same as using search-and-replace in any text editor.

2. Prepare constraints

To run a simulation with a specific set of constraints (or none), another preparation step needs to be run:

rna_predict prepare-cst \
    --cst <cst> \
    --override-motifs-cst <motif_cst>

The --cst option selects the constraints from the constraints directory to be prepared. If not given, a prediction called ‘none’ for no tertiary constraints is created.

Optionally, it is possible to use a different set of motifs for the assembly. For example you can create a common set of motif models and use this in all future assemblies. To do this, specify the --override-motifs-cst option.

3. Motif creation

For all non-helical parts (loop regions, etc.) multiple models need to be created. To do this, run the following:

rna_predict create-motifs \
    --cst <cst> \
    --cycles <cycles> \
    --nstruct <nstruct> \
    --seed <random_seed> \
    --use-native

As always, --cst selects the constraints.

The --cycles option sets the number of monte-carlo cycles to run for generating each model.

The --nstruct option sets the number of models created for each motif.

To override the initial random seed, you can specify --seed.

And to have Rosetta automatically calculate RMSD values to a native structure you can supply the --use-native option.

4. Assembly

To combine helix and motif models an assembly simulation is run:

rna_predict assemble \
    --cst <cst> \
    --cycles <cycles> \
    --nstruct <nstruct> \
    --seed <random_seed> \
    --use-native

The options are the same as the ones for create-motifs, but their default values vary.

Note: The assembly step does not check how many models have already been created so far.

5. Evaluation

5a. Evaluation using Rosetta clustering and scoring

When the assembly has finished, you can evaluate the simulation. This means:

  • Cluster the models
  • Calculate RMSD values to the native structure, if available, and to the model with the best score.

Usage:

rna_predict evaluate \
    --cst <cst> \
    --cluster-cutoff <cutoff> \
    --cluster-limit <limit> \
    --full-eval

The --cluster-cutoff option specifies the RMSD radius in angstrom after which to create a new cluster.

The --cluster-limit option limits the maximum number of clusters to be created.

The --full-eval option forces the whole evaluation to be run again, and ignore any previous results stored.

5b. Custom scoring

Due to the fact that DCA predictions are not perfect, a custom scoring method was created. For each DCA prediction neighboring residues are included and if the distance between any of these residue paris are in contact the score is increased. Usage:

rna_predict evaluate-custom \
    --cst <cst> \
    --dca-file <dca_file> \
    --dca-count <count> \
    --radius <radius> \
    --threshold <threshold> \
    --full-eval

For the --dca-file and --dca-count options see make-constraints.

The --radius option sets the number of neighboring residues to take into account.

The --threshold option sets the distance threshold under which a residue pair is treated as in-contact.

Utilities

Status information

To print a summary of all predictions and their current state, run:

rna_predict status \
    [--compare] \
    [cst [cst ...]

The output table contains the following columns:

  • P: preparation step
  • M: motif generation:
  • A: assembly:
  • E: evaluation

If a step is completed, X is shown, - otherwise. For motif generation a * * * may be shown to indicate that models from a different set of constraints are used.

When the --compare option is given, comparison to the native structure is performed and the output is extended with the following columns:

  • 1: Native RMSD score of the first cluster
  • 5: Lowest native RMSD score of the first 5 clusters
  • 10: Lowest native RMSD score of the first 10 clusters
  • n: Number of models

Model information and extraction

To print model information or extract PDB files use the following subcommands:

rna_predict print-models|extract-models \
    --cst <cst> \
    --mode <selection_mode> \
    model [model ...]

The --mode option selects the way to look up models:

  • cluster: Cluster number to reference the cluster primary model
  • cluster_ntop: Clusters sorted by the RMSDs of their representatives
  • ntop: Models sorted by RMSD to native structure
  • tag: Internal model name
  • top: Models sorted by Rosetta score

The model options may be string (if mode is ‘tag’), or numbers. For mode=cluster_ntop it may also be in the form of n/m, meaning the n``th best cluster out of the first ``m clusters.

Examples:

rna_predict extract-models --mode=tag S_000289 S_000100  # extract two models by tag
rna_predict extract-models --mode=top 1 2 3 4 5  # extract the two best-scoring models
rna_predict extract-models --mode=ntop 1  # extract the model with the lowest native RMSD
rna_predict extract-models --mode=cluster 1 2 3 4 5  # extract the cluster primaries of the first 5 clusters
rna_predict extract-models --mode=cluster_ntop 1/5  # extract the lowest native RMSD cluster out of the first 5 clusters

Evaluation tools

Plot generation and other tools can be accessed using:

rna_predict tools <tool> ...

plot-clusters

Plot score over native RMSD. Usage:

rna_predict tools plot-clusters \
    --score-weights <score:weight,score:weight,...> \
    --max-models <max> \
    cst

The --score-weights options allows to calculate a different total model score using the individual Rosetta scores. The score name “default” can be given to set a default value for all other, non-specified scores.

For example, to only visualize the score of additional constraints, use:

--score-weights default:0,atom_pair_constraint:1

For a list of score names, refer to the Rosetta documentation or use the print-models command.

The --max-models option limits the number of models by either specifying a number (greater than 1) or a fraction (smaller than or equal to 1.0).

plot-constraint-quality

This visualizes the distances of constraints by comparing it to a reference (native) PDB structure. Usage:

rna_predict tools plot-constraint-quality \
    --dca-mode \
    reference-pdb \
    cst|dca|filter [cst|dca|filter ...]

When --dca-mode is given, residue-residue distances are plotted, atom-atom contacts otherwise.

For the filter-syntax see DCA Filtering Syntax.

plot-contact-atoms

Plots atoms involved in forming nucleotide contacts that satisfy the cutoff condition in the contact database. Usage:

rna_predict tools plot-contact-atoms \
    --mean-cutoff <cutoff> \
    --std-cutoff <cutoff>

The --mean-cutoff and --std-cutoff options select the limits for the average contact distances standard deviations.

plot-contact-distances

Plots histogram for each nucleotide pair contact containing the distances of the atoms involved. Usage:

rna_predict tools plot-contact-distances

plot-dca-contacts-in-pdb

Visualizes how well DCA contacts are fullfiled in PDB files. Usage:

rna_predict tools plot-dca-contacts-in-pdb \
    dca-file \
    pdb-file [pdb-file -..]

plot-pdb-comparison

Compare PDB files by plotting the distances of the residues. Usage:

rna_predict tools plot-pdb-comparison \
    ref-pdb \
    sample-pdb [sample-pdb ...]

plot-gdt

Create a GDT (gloabl distance test) plot.

This plots a distance cutoff on the y-axis and the percent of residues which are below the cutoff on the x-axis.

Usage:

rna_predict tools plot-gdt \
    ref-pdb \
    sample-pdb [sample-pdb ...]