rna_predict package

Submodules

rna_predict.dcatools module

class rna_predict.dcatools.DcaContact(res1, res2, use_contact=True, weight=1)[source]

Bases: object

Class representing a DCA contact.

__init__(res1, res2, use_contact=True, weight=1)[source]

Create new DCA contact.

Parameters:
  • res1 – number of first residue (from 1)
  • res2 – number of second residue (from 1)
  • use_contact – True or False if contact is to be used
  • weight – assign a different weight to the contact (default 1)
get_rosetta_function(function='FADE -100 26 20 -2 2')[source]

Return Rosetta function of the contact as a list while applying weight.

Parameters:function – rosetta function as text
Returns:rosetta function as a list with applied weight
exception rna_predict.dcatools.DcaException[source]

Bases: exceptions.Exception

Custom exception class used for foreseeable, DCA related errors.

class rna_predict.dcatools.DcaFilter[source]

Bases: object

Filter base class.

apply(contact, quiet=False)[source]

Apply filter to a DCA contact.

Parameters:
  • contact – DCA contact
  • quiet – reduce verbosity
class rna_predict.dcatools.DcaFilterThreshold(pdb_chain, threshold, keep_below=True, mode='minimum_heavy')[source]

Bases: rna_predict.dcatools.DcaFilter

Filter to skips DCA contact if below or above a threshold.

__init__(pdb_chain, threshold, keep_below=True, mode='minimum_heavy')[source]

Create a new threshold filter.

Parameters:
  • pdb_chain – PDB chain
  • threshold – threshold below or above to keep a contact
  • keep_below – True to keep below threshold, False to keep above
  • mode – What distance to compare to (average_heavy, minimum_heavy)
apply(contact, quiet=False)[source]
rna_predict.dcatools.build_cst_info_from_dca_contacts(dca_data, sequence, mapping_mode, cst_function, number_dca_predictions, quiet=False)[source]

Maps DCA residue contacts to atom-atom constraints.

Parameters:
  • dca_data – list od DcaContacts
  • sequence – sequence as text
  • mapping_mode – atom-to-atom mapping mode to use, supported values: “minAtom” or “pOnly”
  • cst_function – rosetta function and parameters as text string
  • number_dca_predictions – maximum number of DCA predictions to use
  • quiet – reduce output verbosity
Returns:

list of constraint information

rna_predict.dcatools.filter_dca_data(dca_data, dca_filter_chain, quiet=False)[source]

Run list of DCA contacts through a chain of filters.

Parameters:
  • dca_data – list of DcaContact objects
  • dca_filter_chain – list of DcaFilter objects
  • quiet – reduce output verbosity
rna_predict.dcatools.get_atoms_for_res(res, term_phosphate=False)[source]

Get list of atoms for residue.

Parameters:
  • res – nucleotide (A,U,G,C)
  • term_phosphate – add P atoms
Returns:

list of atoms

rna_predict.dcatools.get_atoms_for_res_sequence(sequence)[source]

Get list of atoms for a sequence of nucleotides

Parameters:sequence – sequence as text
Returns:list of atoms
rna_predict.dcatools.get_contact_distance_map(structure_directory='/home/sebastian/.rna_predict/structure_info', westhof_vector=None, force_rebuild=False)[source]

Returns contact distance map

The contact distance map is cached it in the user directory and updated when newer files are found.

Parameters:
  • structure_directory – directory to look up structure information text files
  • westhof_vector – list of factors to apply different weights to the bonding family classes (defaults to [1, 1, ... ])
  • force_rebuild – force rebuilding the distance map
rna_predict.dcatools.get_contact_distance_map_mean(distance_map, mean_cutoff=None, std_cutoff=None)[source]

Return an average distance map containing only those contacts with average distance and standard deviation satisfiying a cutoff.

Parameters:
  • distance_map – full distance map
  • mean_cutoff – limit for average
  • std_cutoff – limit for standard deviation
Returns:

average distance map

rna_predict.dcatools.get_contact_information_in_pdb_chain(dca_contact, pdb_chain, heavy_only=True)[source]

Returns distance information about a DCA contact in a realized PDB chain

Return value is a tuple of:

  • Average distance: Mean distance of all atoms in the contacts.
  • Minimum distance: Minimum distance between two atoms in the contact.
  • Minimum pair: List of [atom1, atom2] forming the minimal contact
Parameters:
  • dca_contact – DcaContact object
  • pdb_chain – PDB chain structure object
  • heavy_only – Only use heavy atoms
Returns:

tuple (average_dist, minimum_dist, minimum_pair). In case the contact cannot be found in the PDB file (0, 0, None) is returned.

rna_predict.dcatools.parse_dca_data(dca_prediction_filename)[source]

Read a DCA file, adjust the sequence numbers to match the alignment of the PDB file, and create a list of DcaContacts

Parameters:dca_prediction_filename – DCA input filename
Returns:list of DcaContact objects
rna_predict.dcatools.read_pdb_mapping_from_file(dca_prediction_filename)[source]

Read a PDB mapping from DCA file if present and return it as text

Parameters:dca_prediction_filename – DCA input filename
Returns:PDB mapping text

rna_predict.main module

rna_predict.main.main()[source]

main commandline parser.

rna_predict.pdbtools module

rna_predict.pdbtools.align_structure(ref_pdb, moving_pdb, assign_b_factors=True)[source]

Align one PDB structure to another.

Parameters:
  • ref_pdb – reference PDB structure object
  • moving_pdb – moving PDB structure object
  • assign_b_factors – place distance values in the b-factor field
Returns:

tuple (res_dists, atom_dists, rmsd, transformation_matrix)

rna_predict.pdbtools.filter_atoms(atoms, heavy_only=False, p_only=False)[source]

Filter list of atoms.

Parameters:
  • atoms – list of atoms
  • heavy_only – only keep heavy atoms
  • p_only – only keep P atoms
Returns:

filtered list of atoms

rna_predict.pdbtools.get_center_of_res(res)[source]

Calculate the center of a residue.

Parameters:res – residue object
Returns:center coordinates
rna_predict.pdbtools.get_pdb_by_code(pdb_code, pdb_directory='/home/sebastian/.rna_predict/pdbs')[source]

Get PDB file by code. Download if necessary.

Parameters:
  • pdb_code – PDB code to download
  • pdb_directory – directory lookup and store the PDB file, defaults to the rna_predict PDB cache directory
Returns:

PDB filename

rna_predict.pdbtools.parse_pdb(pdb_code, pdb_file)[source]

Parse PDB file using Biopyhon.

Parameters:
  • pdb_code – internal id
  • pdb_file – PDB filename
Returns:

PDB structure object

rna_predict.simulation module

class rna_predict.simulation.Command(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]

Bases: object

Helper class to store external program calls plus additional flags.

__init__(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]

Create a new command wrapper.

Parameters:
  • command – list of external command and parameters
  • add_suffix – type of suffix to first entry in command. currently only None or rosetta are supported
  • dry_run – don’t actually execute anything
  • print_commands – print commandline when running
  • stdin – optional text to use as standard input
  • quiet – hide output of external program
get_full_command(sysconfig)[source]

If necessary, add the suffix to the command. Depending on user / system configuration.

Parameters:sysconfig – instance of SysConfig
Returns:suffixed command as a list
class rna_predict.simulation.EvalData[source]

Bases: object

Evaluation data storing model and cluster information.

__init__()[source]

Create empty EvalData object.

get_model_count()[source]

Returns the number of models in the vaulation data

Returns:number of models
get_models(model_list, kind='tag')[source]

Returns a selection of models with specific properties

Parameters:
  • model_list – list of models to get, might be a list of numbers, or a list of model names
  • kind

    model selection mode:

    • tag: str: internal name such as S_000123_5
    • top: int: models ordered by score
    • ntop: int: models ordered by rmsd_native
    • cluster: int: nth cluster decoy
    • cluster_ntop: n[/m]: nth cluster ordered by native rmsd [of first m]
Returns:

list of selected models

static get_weighted_model_score(model, score_weights)[source]

Calculate a model score based on different weights for the invidiual Rosetta scores

Parameters:
  • model – model to reweight
  • score_weights – dict of rosetta score names and their weights. default to set the default weight for all scores. Example: {'default':0, 'atom_pair_constraint': 1} to only use atom pair constraints.
static load_from_cst(constraints)[source]

Load evaluation data from a prediction.

Parameters:constraints – constraints selection
static load_from_file(filename)[source]

Load evaluation data from a file.

Parameters:filename – path to a file containing the evaluation data
print_models(model_list, kind='tag')[source]

Print a list of models

Parameters:
  • model_list – see get_models
  • kind – see get_models
save_to_file(filename)[source]

Dump evaldata to a file.

Parameters:filename – path to a file to store the data
class rna_predict.simulation.RNAPrediction(sysconfig)[source]

Bases: object

Base class used for prediction simulation

CONFIG_FILE = '.config'
__init__(sysconfig)[source]

Create a prediction simulation for the current directory and try to load an existing configuration.

assemble(nstruct=50000, cycles=20000, constraints=None, dry_run=False, seed=None, use_native_information=False, threads=1)[source]

Assembly step. Assemble helices and motifs into complete models.

Parameters:
  • nstruct – number of models to create
  • cycles – number of monte-carlo cycles per model
  • constraints – constraints selection
  • dry_run – don’t actually run any external command
  • seed – optionally override random seed
  • use_native_information – use native pdb file to calculate rmsd for each model
  • threads – number of threads to use
check_config()[source]

Check if current directory contains a valid configuraion.

create_helices(dry_run=False, threads=1)[source]

Helix creation step. Create one ideal helix model for each helix.

Parameters:
  • dry_run – don’t actually run any external command
  • threads – number of threads to use
create_motifs(nstruct=50000, cycles=20000, dry_run=False, seed=None, use_native_information=False, threads=1, constraints=None, motif_subset=None)[source]

Motif generation step. Generate models for each motif.

Parameters:
  • nstruct – number of models to create for each motif
  • cycles – number of monte-carlo cycles per model
  • dry_run – don’t actually run any external command
  • seed – optionally override random seed
  • use_native_information – use native pdb file to calculate rmsd for each model
  • threads – number of threads to use
  • constraints – constraints selection
  • motif_subset – sepcify a list of motifs to generate instead of all, counting starts at 1 (i.e. [1, 3, 4])
edit_constraints(constraints, output_filename=None, cst_function='FADE -100 26 20 -2 2')[source]

Edit an existing .cst file, replacing the rosetta function.

Parameters:
  • constraints – constraints selection
  • output_filename – fixed output filename or None to automatically create
  • cst_function – rosetta function and parameters as text string
evaluate(constraints=None, cluster_limit=10, cluster_cutoff=4.1, full_evaluation=False)[source]

Evaluation step. Extract model information, cluster models and calculate rmsd values.

Parameters:
  • constraints – constraints selection
  • cluster_limit – maximum number of clusters to produce
  • cluster_cutoff – rmsd distance in A at which a new cluster is created
  • full_evaluation – discard existing evaluation data, re-extract model information and re-calculate rmsd values.
evaluate_custom(constraints=None, dca_prediction_filename='dca/dca.txt', full_evaluation=False, threshold=7.5, radius=2, number_dca_predictions=100, threads=4)[source]

Custom scoring algorithmy by inspecting neighboring residues of dca contact pairs

Parameters:
  • constraints – constraints selection
  • dca_prediction_filename – dca filename
  • full_evaluation – discard existing evaluation data and extracted models, re-extract and re-calculate distance information
  • threshold – threshold in A to count a contact
  • radius – number of adjacent residues to inspect
  • number_dca_predictions – maximum number of DCA predictions to use
  • threads – number of threads to use for extraction
execute_command(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]

Execute an external command.

Parameters:
  • command – list of external command and parameters
  • add_suffix – type of suffix to first entry in command. currently only None or rosetta are supported
  • dry_run – don’t actually execute anything
  • print_commands – print commandline when running
  • stdin – optional text to use as standard input
  • quiet – hide output of external program
execute_command_and_capture(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]

Execute an external command while capturing output.

Parameters:
  • command – list of external command and parameters
  • add_suffix – type of suffix to first entry in command. currently only None or rosetta are supported
  • dry_run – don’t actually execute anything
  • print_commands – print commandline when running
  • stdin – optional text to use as standard input
  • quiet – hide output of external program
Returns:

generator over lines of standard output

execute_commands(commands, threads=1)[source]

Execute a list of commands parallelly.

Parameters:
  • commands – list of Commands
  • threads – number of parallel invocations
extract_models(constraints, model_list, kind='tag')[source]

Extract PDB files of a set of models

Parameters:
  • constraints – constraints selection
  • model_list – see EvalData.print_models
  • kind – see EvalData.print_models
extract_pdb(constraints, model)[source]

Extract PDB file of a model to the tmp directory.

Parameters:
  • constraints – constraints selection
  • model – tag of the model
Returns:

path to the extracted PDB file

get_csts()[source]

Returns a list of all constraints.

Returns:list of all constraints names
static get_models(constraints, model_list, kind='tag')[source]

Returns a selection of models with specific properties

Parameters:
  • constraints – constraints selection
  • model_list – see EvalData.print_models
  • kind – see EvalData.get_models
get_status(constraints=None, include_evaluation_data=False, include_motif_model_count=False, include_assembly_model_count=False)[source]

Returns a dict containing status information for a cst.

Parameters:
  • constraints – constraints selection
  • include_motif_model_count – set to True to include model count (longer processing time)
  • include_assembly_model_count – set to True to include model count (longer processing time)
  • include_evaluation_data – set to True to include evaluation model count and native RMSD values (best of first 1,5,10 clusters) if available (longer processing time)
Returns:

status dict

load_config()[source]

Load simulation configuration of current directory

make_constraints(dca_prediction_filename='dca/dca.txt', output_filename=None, number_dca_predictions=100, cst_function='FADE -100 26 20 -2 2', filter_text=None, mapping_mode='minAtom')[source]

Create a set of constraints from a DCA prediction.

Parameters:
  • dca_prediction_filename – DCA input file
  • output_filename – fixed output filename or None to automatically create
  • number_dca_predictions – maximum number of DCA predictions to use
  • cst_function – rosetta function and parameters as text string
  • filter_text – optional: List of DCA filters (see parse_dca_filter_string for details)
  • mapping_mode – atom-to-atom mapping mode to use, supported values: minAtom or pOnly
modify_config(key, value)[source]

Modify configuration entry.

Parameters:
  • key – setting to modify
  • value – new value (“-” to store None)
static parse_cst_file(constraints_file)[source]

Parse .cst file as a list of constraints

Parameters:constraints_file – path to a .cst file
Returns:list of constraints
static parse_cst_name_and_filename(constraints)[source]

Find and clean up constraints by name or filename

Parameters:constraints – constraints name or filename
Returns:tuple of constraints name and filename
parse_dca_filter_string(line)[source]

Parse a text string and turn it tinto a list of DCA filters

Multiple filters are separated by command and have the folling format:

<filter>:<arg>:<arg>:...

Threshold filter: Lookup dca contacts in a PDB file, discard or keep contact depending on whether the contact is realized:

Format: threshold:<n>:<cst>:<mode>:<moodel>

  • n: threshold (< 0: keep below, > 0: keep above)
  • cst: constraints selection to look up PDB file
  • mode, model: model selection mode (see EvalData.get_models for details)

Example: threshold:8.0:100rnaDCA_FADE_-100_26_20_-2_2:cluster:1,threshold:-6.0:100rnaDCA_FADE_-100_26_20_-2_2:cluster:1

None filter: Empty filter

Format: none

Parameters:line – string to parse
Returns:list of DcaFilter objects
prepare(fasta_file='sequence.fasta', params_file='secstruct.txt', native_pdb_file=None, data_file=None, torsions_file=None, name=None)[source]

Preparation step. Parse out stems and motifs from sequence and secondary structure information and create necessary base files.

Parameters:
  • fasta_file – fasta file containing the sequence
  • params_file – text file containing the secondary structure
  • native_pdb_file – native pdb file if available
  • data_file – additional data file if available
  • torsions_file – additional torsions file if available
  • name – optional name for this set of predictions
prepare_cst(constraints=None, motifs_override=None)[source]

Constraints preparation step. Prepare constraints files for motif generation and assembly.

Parameters:
  • constraints – constraints selection
  • motifs_override – optional name of a different set of constraints to use as motifs.
print_config()[source]

Print simulation configuration.

static print_models(constraints, model_list, kind='tag')[source]

Print a set of models

Parameters:
  • constraints – constraints selection
  • model_list – see EvalData.print_models
  • kind – see EvalData.print_models
print_status(native_compare=False, csts=None)[source]

Print summary of predictions and their current status.

Output format always contains the following columns:

  • P: preparation step
  • M: motif generation
  • A: assembly
  • E: evaluation

If a step is completed, X is shown, - otherwise.

For motif generation a * may be shown to indicate that models from a different set of constraints are used.

If native_compare is set to True another 4 columns are printed:

  • 1: native rmsd score of the first cluster
  • 5: lowest native rmsd score of the first 5 clusters
  • 10: lowest native rmsd score of the first 10 clusters
  • n: number of models
Parameters:
  • native_compare – print rmsd comparison to native structure
  • csts – list of constraints to include in output (default: all)
save_config()[source]

Save simulation configuration of current directorx

exception rna_predict.simulation.SimulationException[source]

Bases: exceptions.Exception

Custom exception class for foreseeable prediction errors.

rna_predict.simulation.check_dir_existence(path, alternative_error_text=None)[source]

Make sure a directory exists and raise an exception otherwise.

Parameters:
  • path – directory to check
  • alternative_error_text – alternative error text passed to exception
rna_predict.simulation.check_file_existence(path, alternative_error_text=None)[source]

Make sure a file exists and raise an exception otherwise.

Parameters:
  • path – filename to check
  • alternative_error_text – alternative error text passed to exception
rna_predict.simulation.delete_glob(pattern, print_notice=True)[source]

Helper function to delete files while expanding shell globs.

Parameters:
  • pattern – pattern to delete
  • print_notice – if True print verbose notice
rna_predict.simulation.fix_atom_names_in_cst(cst_info, sequence)[source]

Switches N1 and N3 atom names in a residue.

Parameters:
  • cst_info – list of constraints
  • sequence – residue sequence in lower case
Returns:

modified list of constraints

rna_predict.simulation.get_model_count_in_silent_file(silent_file)[source]

Returns the number of models present in a Rosetta silent file.

Parameters:silent_file – filename
Returns:model count
rna_predict.simulation.merge_silent_files(target, sources)[source]

Merges rosetta silent files (.out).

All source files are appended to target. Model numbers are incremented uniquely. Source files are deleted after a successful merge.

Parameters:
  • target – target filename
  • sources – source filenames
Returns:

model count

rna_predict.simulation.natural_sort_key(s, _nsre=<_sre.SRE_Pattern object at 0x7fc8a47b0df0>)[source]

Helper function to be used as sort key in sorted() to naturally sort numeric parts.

rna_predict.sysconfig module

class rna_predict.sysconfig.SysConfig[source]

Bases: object

Stores user configuration

SYSCONFIG_FILE = '/home/sebastian/.rna_predict/sysconfig'
SYSCONFIG_LOCATION = '/home/sebastian/.rna_predict'
__init__()[source]

Load system configuration.

check_sysconfig()[source]

Check if all external programs are accessible.

Returns:dict containing lists of failed (fail) and successful (success) programm accesses
load_sysconfig()[source]

Load system configuration.

print_sysconfig()[source]

Pretty-print configuration.

rna_predict.tools module

rna_predict.tools.plot_clusters(cst, max_models=0.99, score_weights=None)[source]

Plot score over native rmsd.

Parameters:
  • cst – constraints
  • max_models – limit to number of models if > 1, or relative percentage if <= 1
  • score_weights – see EvalData.get_weighted_model_score
rna_predict.tools.plot_constraint_quality(comparison_pdb, sources, dca_mode=False)[source]

Plot constraint quality.

This visualizes the distances of constraints by comparing it to a reference (native) PDB structure.

Parameters:
  • comparison_pdb – filename to a pdb file to compare to
  • sources – list of dca files or constraints (depending on dca_mode). may also use ‘filter:’ to filter on-the-fly
  • dca_mode – visualize residue-residue DCA instead of atom-atom constraints
rna_predict.tools.plot_contact_atoms(mean_cutoff, std_cutoff)[source]

Plots atoms involved in forming nucleotide contacts that satisfy the cutoff condition in the contact database.

Parameters:
  • mean_cutoff – limit for average distance
  • std_cutoff – limit for the standard deviation
rna_predict.tools.plot_contact_distances()[source]

Plots histogram for each nucleotide pair contact containing the distances of the atoms involved.

rna_predict.tools.plot_contact_map(native_filename='native.pdb', first_filename='dca/dca.txt', second_filename='dca/mi.txt', native_cutoff=8.0)[source]
rna_predict.tools.plot_dca_contacts_in_pdb(dca_prediction_filename, pdb_files)[source]

Visualize how well DCA contacts are fullfiled in PDB files.

This plots the distances of dca contacts in one or more PDB files.

Parameters:
  • dca_prediction_filename – input DCA filename
  • pdb_files – list of PDB filenames
rna_predict.tools.plot_gdt(pdb_ref_filename, pdbs_sample_filenames)[source]
rna_predict.tools.plot_pdb_comparison(pdb_ref_filename, pdbs_sample_filenames)[source]

Compare PDB files by plotting the distance of the residues.

Parameters:
  • pdb_ref_filename – reference PDB filename
  • pdbs_sample_filenames – list of sample PDB filenames
rna_predict.tools.plot_tp_rate(pdb_ref_filename, dca_filenames, tp_cutoff=8.0)[source]

rna_predict.utils module

rna_predict.utils.comma_separated_entries_to_dict(s, type_key, type_value)[source]

Parses a string containing comma separated key-value paris (colon-separated) into a dict with fixed key and value types.

Example: Turns foo:3,bar:4 into {"foo": 3, "bar": 4}

Parameters:
  • s – input string
  • type_key – type of the keys
  • type_value – type of the values
Returns:

dict

rna_predict.utils.comma_separated_ranges_to_list(s)[source]

Parses a string containing comma separated ranges to a list.

Example: Turns 1-3,10,20-22 into [1, 2, 3, 10, 20, 21, 22]

Parameters:s – comma separated ranges
Returns:list of ints
rna_predict.utils.mkdir_p(path)[source]

Creates directories recursively and does not error when they already exist.

Parameters:path – directory to create
rna_predict.utils.read_file_line_by_line(filename, skip_empty=True)[source]

Yields lines in a file while stripping whitespace.

Parameters:
  • filename – filename to read
  • skip_empty – True if empty lines should be skipped
Returns:

next line in file

Module contents