rna_predict package¶

Submodules¶

rna_predict.dcatools module¶

class rna_predict.dcatools.DcaContact(res1, res2, use_contact=True, weight=1)[source]¶

Bases: object

Class representing a DCA contact.

__init__(res1, res2, use_contact=True, weight=1)[source]¶

Create new DCA contact.

Parameters:	res1 – number of first residue (from 1) res2 – number of second residue (from 1) use_contact – True or False if contact is to be used weight – assign a different weight to the contact (default 1)

get_rosetta_function(function='FADE -100 26 20 -2 2')[source]¶

Return Rosetta function of the contact as a list while applying weight.

Parameters:	function – rosetta function as text
Returns:	rosetta function as a list with applied weight

exception rna_predict.dcatools.DcaException[source]¶

Bases: exceptions.Exception

Custom exception class used for foreseeable, DCA related errors.

class rna_predict.dcatools.DcaFilter[source]¶

Bases: object

Filter base class.

apply(contact, quiet=False)[source]¶

Apply filter to a DCA contact.

Parameters:	contact – DCA contact quiet – reduce verbosity

class rna_predict.dcatools.DcaFilterThreshold(pdb_chain, threshold, keep_below=True, mode='minimum_heavy')[source]¶

Bases: rna_predict.dcatools.DcaFilter

Filter to skips DCA contact if below or above a threshold.

__init__(pdb_chain, threshold, keep_below=True, mode='minimum_heavy')[source]¶

Create a new threshold filter.

Parameters:	pdb_chain – PDB chain threshold – threshold below or above to keep a contact keep_below – True to keep below threshold, False to keep above mode – What distance to compare to (average_heavy, minimum_heavy)

apply(contact, quiet=False)[source]¶

rna_predict.dcatools.build_cst_info_from_dca_contacts(dca_data, sequence, mapping_mode, cst_function, number_dca_predictions, quiet=False)[source]¶

Maps DCA residue contacts to atom-atom constraints.

Parameters:	dca_data – list od DcaContacts sequence – sequence as text mapping_mode – atom-to-atom mapping mode to use, supported values: “minAtom” or “pOnly” cst_function – rosetta function and parameters as text string number_dca_predictions – maximum number of DCA predictions to use quiet – reduce output verbosity
Returns:	list of constraint information

rna_predict.dcatools.filter_dca_data(dca_data, dca_filter_chain, quiet=False)[source]¶

Run list of DCA contacts through a chain of filters.

Parameters:	dca_data – list of DcaContact objects dca_filter_chain – list of DcaFilter objects quiet – reduce output verbosity

rna_predict.dcatools.get_atoms_for_res(res, term_phosphate=False)[source]¶

Get list of atoms for residue.

Parameters:	res – nucleotide (A,U,G,C) term_phosphate – add P atoms
Returns:	list of atoms

rna_predict.dcatools.get_atoms_for_res_sequence(sequence)[source]¶

Get list of atoms for a sequence of nucleotides

Parameters:	sequence – sequence as text
Returns:	list of atoms

rna_predict.dcatools.get_contact_distance_map(structure_directory='/home/sebastian/.rna_predict/structure_info', westhof_vector=None, force_rebuild=False)[source]¶

Returns contact distance map

The contact distance map is cached it in the user directory and updated when newer files are found.

Parameters:	structure_directory – directory to look up structure information text files westhof_vector – list of factors to apply different weights to the bonding family classes (defaults to `[1, 1, ... ]`) force_rebuild – force rebuilding the distance map

rna_predict.dcatools.get_contact_distance_map_mean(distance_map, mean_cutoff=None, std_cutoff=None)[source]¶

Return an average distance map containing only those contacts with average distance and standard deviation satisfiying a cutoff.

Parameters:	distance_map – full distance map mean_cutoff – limit for average std_cutoff – limit for standard deviation
Returns:	average distance map

rna_predict.dcatools.get_contact_information_in_pdb_chain(dca_contact, pdb_chain, heavy_only=True)[source]¶

Returns distance information about a DCA contact in a realized PDB chain

Return value is a tuple of:

Average distance: Mean distance of all atoms in the contacts.
Minimum distance: Minimum distance between two atoms in the contact.
Minimum pair: List of [atom1, atom2] forming the minimal contact

Parameters:	dca_contact – DcaContact object pdb_chain – PDB chain structure object heavy_only – Only use heavy atoms
Returns:	tuple `(average_dist, minimum_dist, minimum_pair)`. In case the contact cannot be found in the PDB file `(0, 0, None)` is returned.

rna_predict.dcatools.parse_dca_data(dca_prediction_filename)[source]¶

Read a DCA file, adjust the sequence numbers to match the alignment of the PDB file, and create a list of DcaContacts

Parameters:	dca_prediction_filename – DCA input filename
Returns:	list of DcaContact objects

rna_predict.dcatools.read_pdb_mapping_from_file(dca_prediction_filename)[source]¶

Read a PDB mapping from DCA file if present and return it as text

Parameters:	dca_prediction_filename – DCA input filename
Returns:	PDB mapping text

rna_predict.main module¶

rna_predict.main.main()[source]¶: main commandline parser.

rna_predict.pdbtools module¶

rna_predict.pdbtools.align_structure(ref_pdb, moving_pdb, assign_b_factors=True)[source]¶

Align one PDB structure to another.

Parameters:	ref_pdb – reference PDB structure object moving_pdb – moving PDB structure object assign_b_factors – place distance values in the b-factor field
Returns:	tuple (res_dists, atom_dists, rmsd, transformation_matrix)

rna_predict.pdbtools.filter_atoms(atoms, heavy_only=False, p_only=False)[source]¶

Filter list of atoms.

Parameters:	atoms – list of atoms heavy_only – only keep heavy atoms p_only – only keep P atoms
Returns:	filtered list of atoms

rna_predict.pdbtools.get_center_of_res(res)[source]¶

Calculate the center of a residue.

Parameters:	res – residue object
Returns:	center coordinates

rna_predict.pdbtools.get_pdb_by_code(pdb_code, pdb_directory='/home/sebastian/.rna_predict/pdbs')[source]¶

Get PDB file by code. Download if necessary.

Parameters:	pdb_code – PDB code to download pdb_directory – directory lookup and store the PDB file, defaults to the rna_predict PDB cache directory
Returns:	PDB filename

rna_predict.pdbtools.parse_pdb(pdb_code, pdb_file)[source]¶

Parse PDB file using Biopyhon.

Parameters:	pdb_code – internal id pdb_file – PDB filename
Returns:	PDB structure object

rna_predict.simulation module¶

class rna_predict.simulation.Command(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]¶

Bases: object

Helper class to store external program calls plus additional flags.

__init__(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]¶

Create a new command wrapper.

Parameters:	command – list of external command and parameters add_suffix – type of suffix to first entry in command. currently only `None` or `rosetta` are supported dry_run – don’t actually execute anything print_commands – print commandline when running stdin – optional text to use as standard input quiet – hide output of external program

get_full_command(sysconfig)[source]¶

If necessary, add the suffix to the command. Depending on user / system configuration.

Parameters:	sysconfig – instance of SysConfig
Returns:	suffixed command as a list

class rna_predict.simulation.EvalData[source]¶

Bases: object

Evaluation data storing model and cluster information.

__init__()[source]¶: Create empty EvalData object.

get_model_count()[source]¶

Returns the number of models in the vaulation data

Returns:	number of models

get_models(model_list, kind='tag')[source]¶

Returns a selection of models with specific properties

Parameters:	model_list – list of models to get, might be a list of numbers, or a list of model names kind – model selection mode: `tag`: str: internal name such as S_000123_5 `top`: int: models ordered by score `ntop`: int: models ordered by rmsd_native `cluster`: int: nth cluster decoy `cluster_ntop`: n[/m]: nth cluster ordered by native rmsd [of first m]
Returns:	list of selected models

static get_weighted_model_score(model, score_weights)[source]¶

Calculate a model score based on different weights for the invidiual Rosetta scores

Parameters:	model – model to reweight score_weights – dict of rosetta score names and their weights. `default` to set the default weight for all scores. Example: `{'default':0, 'atom_pair_constraint': 1}` to only use atom pair constraints.

static load_from_cst(constraints)[source]¶

Load evaluation data from a prediction.

Parameters:	constraints – constraints selection

static load_from_file(filename)[source]¶

Load evaluation data from a file.

Parameters:	filename – path to a file containing the evaluation data

print_models(model_list, kind='tag')[source]¶

Print a list of models

Parameters:	model_list – see get_models kind – see get_models

save_to_file(filename)[source]¶

Dump evaldata to a file.

Parameters:	filename – path to a file to store the data

class rna_predict.simulation.RNAPrediction(sysconfig)[source]¶

Bases: object

Base class used for prediction simulation

CONFIG_FILE = '.config'¶

__init__(sysconfig)[source]¶: Create a prediction simulation for the current directory and try to load an existing configuration.

assemble(nstruct=50000, cycles=20000, constraints=None, dry_run=False, seed=None, use_native_information=False, threads=1)[source]¶

Assembly step. Assemble helices and motifs into complete models.

Parameters:	nstruct – number of models to create cycles – number of monte-carlo cycles per model constraints – constraints selection dry_run – don’t actually run any external command seed – optionally override random seed use_native_information – use native pdb file to calculate rmsd for each model threads – number of threads to use

check_config()[source]¶: Check if current directory contains a valid configuraion.

create_helices(dry_run=False, threads=1)[source]¶

Helix creation step. Create one ideal helix model for each helix.

Parameters:	dry_run – don’t actually run any external command threads – number of threads to use

create_motifs(nstruct=50000, cycles=20000, dry_run=False, seed=None, use_native_information=False, threads=1, constraints=None, motif_subset=None)[source]¶

Motif generation step. Generate models for each motif.

Parameters:

nstruct – number of models to create for each motif
cycles – number of monte-carlo cycles per model
dry_run – don’t actually run any external command
seed – optionally override random seed
use_native_information – use native pdb file to calculate rmsd for each model
threads – number of threads to use
constraints – constraints selection
motif_subset – sepcify a list of motifs to generate instead of all, counting starts at 1 (i.e. [1, 3, 4])

edit_constraints(constraints, output_filename=None, cst_function='FADE -100 26 20 -2 2')[source]¶

Edit an existing .cst file, replacing the rosetta function.

Parameters:	constraints – constraints selection output_filename – fixed output filename or None to automatically create cst_function – rosetta function and parameters as text string

evaluate(constraints=None, cluster_limit=10, cluster_cutoff=4.1, full_evaluation=False)[source]¶

Evaluation step. Extract model information, cluster models and calculate rmsd values.

Parameters:	constraints – constraints selection cluster_limit – maximum number of clusters to produce cluster_cutoff – rmsd distance in A at which a new cluster is created full_evaluation – discard existing evaluation data, re-extract model information and re-calculate rmsd values.

evaluate_custom(constraints=None, dca_prediction_filename='dca/dca.txt', full_evaluation=False, threshold=7.5, radius=2, number_dca_predictions=100, threads=4)[source]¶

Custom scoring algorithmy by inspecting neighboring residues of dca contact pairs

Parameters:

constraints – constraints selection
dca_prediction_filename – dca filename
full_evaluation – discard existing evaluation data and extracted models, re-extract and re-calculate distance information
threshold – threshold in A to count a contact
radius – number of adjacent residues to inspect
number_dca_predictions – maximum number of DCA predictions to use
threads – number of threads to use for extraction

execute_command(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]¶

Execute an external command.

Parameters:	command – list of external command and parameters add_suffix – type of suffix to first entry in command. currently only `None` or `rosetta` are supported dry_run – don’t actually execute anything print_commands – print commandline when running stdin – optional text to use as standard input quiet – hide output of external program

execute_command_and_capture(command, add_suffix=None, dry_run=False, print_commands=True, stdin=None, quiet=False)[source]¶

Execute an external command while capturing output.

Parameters:	command – list of external command and parameters add_suffix – type of suffix to first entry in command. currently only `None` or `rosetta` are supported dry_run – don’t actually execute anything print_commands – print commandline when running stdin – optional text to use as standard input quiet – hide output of external program
Returns:	generator over lines of standard output

execute_commands(commands, threads=1)[source]¶

Execute a list of commands parallelly.

Parameters:	commands – list of Commands threads – number of parallel invocations

extract_models(constraints, model_list, kind='tag')[source]¶

Extract PDB files of a set of models

Parameters:	constraints – constraints selection model_list – see EvalData.print_models kind – see EvalData.print_models

extract_pdb(constraints, model)[source]¶

Extract PDB file of a model to the tmp directory.

Parameters:	constraints – constraints selection model – tag of the model
Returns:	path to the extracted PDB file

get_csts()[source]¶

Returns a list of all constraints.

Returns:	list of all constraints names

static get_models(constraints, model_list, kind='tag')[source]¶

Returns a selection of models with specific properties

Parameters:	constraints – constraints selection model_list – see EvalData.print_models kind – see EvalData.get_models

get_status(constraints=None, include_evaluation_data=False, include_motif_model_count=False, include_assembly_model_count=False)[source]¶

Returns a dict containing status information for a cst.

Parameters:

constraints – constraints selection
include_motif_model_count – set to True to include model count (longer processing time)
include_assembly_model_count – set to True to include model count (longer processing time)
include_evaluation_data – set to True to include evaluation model count and native RMSD values (best of first 1,5,10 clusters) if available (longer processing time)

Returns:

status dict

load_config()[source]¶: Load simulation configuration of current directory

make_constraints(dca_prediction_filename='dca/dca.txt', output_filename=None, number_dca_predictions=100, cst_function='FADE -100 26 20 -2 2', filter_text=None, mapping_mode='minAtom')[source]¶

Create a set of constraints from a DCA prediction.

Parameters:

dca_prediction_filename – DCA input file
output_filename – fixed output filename or None to automatically create
number_dca_predictions – maximum number of DCA predictions to use
cst_function – rosetta function and parameters as text string
filter_text – optional: List of DCA filters (see parse_dca_filter_string for details)
mapping_mode – atom-to-atom mapping mode to use, supported values: minAtom or pOnly

modify_config(key, value)[source]¶

Modify configuration entry.

Parameters:	key – setting to modify value – new value (“-” to store None)

static parse_cst_file(constraints_file)[source]¶

Parse .cst file as a list of constraints

Parameters:	constraints_file – path to a .cst file
Returns:	list of constraints

static parse_cst_name_and_filename(constraints)[source]¶

Find and clean up constraints by name or filename

Parameters:	constraints – constraints name or filename
Returns:	tuple of constraints name and filename

parse_dca_filter_string(line)[source]¶

Parse a text string and turn it tinto a list of DCA filters

Multiple filters are separated by command and have the folling format:

<filter>:<arg>:<arg>:...

Threshold filter: Lookup dca contacts in a PDB file, discard or keep contact depending on whether the contact is realized:

Format: threshold:<n>:<cst>:<mode>:<moodel>

n: threshold (< 0: keep below, > 0: keep above)
cst: constraints selection to look up PDB file
mode, model: model selection mode (see EvalData.get_models for details)

Example: threshold:8.0:100rnaDCA_FADE_-100_26_20_-2_2:cluster:1,threshold:-6.0:100rnaDCA_FADE_-100_26_20_-2_2:cluster:1

None filter: Empty filter

Format: none

Parameters:	line – string to parse
Returns:	list of DcaFilter objects

prepare(fasta_file='sequence.fasta', params_file='secstruct.txt', native_pdb_file=None, data_file=None, torsions_file=None, name=None)[source]¶

Preparation step. Parse out stems and motifs from sequence and secondary structure information and create necessary base files.

Parameters:	fasta_file – fasta file containing the sequence params_file – text file containing the secondary structure native_pdb_file – native pdb file if available data_file – additional data file if available torsions_file – additional torsions file if available name – optional name for this set of predictions

prepare_cst(constraints=None, motifs_override=None)[source]¶

Constraints preparation step. Prepare constraints files for motif generation and assembly.

Parameters:	constraints – constraints selection motifs_override – optional name of a different set of constraints to use as motifs.

print_config()[source]¶: Print simulation configuration.

static print_models(constraints, model_list, kind='tag')[source]¶

Print a set of models

Parameters:	constraints – constraints selection model_list – see EvalData.print_models kind – see EvalData.print_models

print_status(native_compare=False, csts=None)[source]¶

Print summary of predictions and their current status.

Output format always contains the following columns:

P: preparation step
M: motif generation
A: assembly
E: evaluation

If a step is completed, X is shown, - otherwise.

For motif generation a * may be shown to indicate that models from a different set of constraints are used.

If native_compare is set to True another 4 columns are printed:

1: native rmsd score of the first cluster
5: lowest native rmsd score of the first 5 clusters
10: lowest native rmsd score of the first 10 clusters
n: number of models

Parameters:	native_compare – print rmsd comparison to native structure csts – list of constraints to include in output (default: all)

save_config()[source]¶: Save simulation configuration of current directorx

exception rna_predict.simulation.SimulationException[source]¶

Bases: exceptions.Exception

Custom exception class for foreseeable prediction errors.

rna_predict.simulation.check_dir_existence(path, alternative_error_text=None)[source]¶

Make sure a directory exists and raise an exception otherwise.

Parameters:	path – directory to check alternative_error_text – alternative error text passed to exception

rna_predict.simulation.check_file_existence(path, alternative_error_text=None)[source]¶

Make sure a file exists and raise an exception otherwise.

Parameters:	path – filename to check alternative_error_text – alternative error text passed to exception

rna_predict.simulation.delete_glob(pattern, print_notice=True)[source]¶

Helper function to delete files while expanding shell globs.

Parameters:	pattern – pattern to delete print_notice – if True print verbose notice

rna_predict.simulation.fix_atom_names_in_cst(cst_info, sequence)[source]¶

Switches N1 and N3 atom names in a residue.

Parameters:	cst_info – list of constraints sequence – residue sequence in lower case
Returns:	modified list of constraints

rna_predict.simulation.get_model_count_in_silent_file(silent_file)[source]¶

Returns the number of models present in a Rosetta silent file.

Parameters:	silent_file – filename
Returns:	model count

rna_predict.simulation.merge_silent_files(target, sources)[source]¶

Merges rosetta silent files (.out).

All source files are appended to target. Model numbers are incremented uniquely. Source files are deleted after a successful merge.

Parameters:	target – target filename sources – source filenames
Returns:	model count

rna_predict.simulation.natural_sort_key(s, _nsre=<_sre.SRE_Pattern object at 0x7fc8a47b0df0>)[source]¶: Helper function to be used as sort key in sorted() to naturally sort numeric parts.

rna_predict.sysconfig module¶

class rna_predict.sysconfig.SysConfig[source]¶

Bases: object

Stores user configuration

SYSCONFIG_FILE = '/home/sebastian/.rna_predict/sysconfig'¶

SYSCONFIG_LOCATION = '/home/sebastian/.rna_predict'¶

__init__()[source]¶: Load system configuration.

check_sysconfig()[source]¶

Check if all external programs are accessible.

Returns:	dict containing lists of failed (`fail`) and successful (`success`) programm accesses

load_sysconfig()[source]¶: Load system configuration.

print_sysconfig()[source]¶: Pretty-print configuration.

rna_predict.tools module¶

rna_predict.tools.plot_clusters(cst, max_models=0.99, score_weights=None)[source]¶

Plot score over native rmsd.

Parameters:	cst – constraints max_models – limit to number of models if > 1, or relative percentage if <= 1 score_weights – see EvalData.get_weighted_model_score

rna_predict.tools.plot_constraint_quality(comparison_pdb, sources, dca_mode=False)[source]¶

Plot constraint quality.

This visualizes the distances of constraints by comparing it to a reference (native) PDB structure.

Parameters:	comparison_pdb – filename to a pdb file to compare to sources – list of dca files or constraints (depending on dca_mode). may also use ‘filter:’ to filter on-the-fly dca_mode – visualize residue-residue DCA instead of atom-atom constraints

rna_predict.tools.plot_contact_atoms(mean_cutoff, std_cutoff)[source]¶

Plots atoms involved in forming nucleotide contacts that satisfy the cutoff condition in the contact database.

Parameters:	mean_cutoff – limit for average distance std_cutoff – limit for the standard deviation

rna_predict.tools.plot_contact_distances()[source]¶: Plots histogram for each nucleotide pair contact containing the distances of the atoms involved.

rna_predict.tools.plot_contact_map(native_filename='native.pdb', first_filename='dca/dca.txt', second_filename='dca/mi.txt', native_cutoff=8.0)[source]¶

rna_predict.tools.plot_dca_contacts_in_pdb(dca_prediction_filename, pdb_files)[source]¶

Visualize how well DCA contacts are fullfiled in PDB files.

This plots the distances of dca contacts in one or more PDB files.

Parameters:	dca_prediction_filename – input DCA filename pdb_files – list of PDB filenames

rna_predict.tools.plot_gdt(pdb_ref_filename, pdbs_sample_filenames)[source]¶

rna_predict.tools.plot_pdb_comparison(pdb_ref_filename, pdbs_sample_filenames)[source]¶

Compare PDB files by plotting the distance of the residues.

Parameters:	pdb_ref_filename – reference PDB filename pdbs_sample_filenames – list of sample PDB filenames

rna_predict.tools.plot_tp_rate(pdb_ref_filename, dca_filenames, tp_cutoff=8.0)[source]¶

rna_predict.utils module¶

rna_predict.utils.comma_separated_entries_to_dict(s, type_key, type_value)[source]¶

Parses a string containing comma separated key-value paris (colon-separated) into a dict with fixed key and value types.

Example: Turns foo:3,bar:4 into {"foo": 3, "bar": 4}

Parameters:	s – input string type_key – type of the keys type_value – type of the values
Returns:	dict

rna_predict.utils.comma_separated_ranges_to_list(s)[source]¶

Parses a string containing comma separated ranges to a list.

Example: Turns 1-3,10,20-22 into [1, 2, 3, 10, 20, 21, 22]

Parameters:	s – comma separated ranges
Returns:	list of ints

rna_predict.utils.mkdir_p(path)[source]¶

Creates directories recursively and does not error when they already exist.

Parameters:	path – directory to create

rna_predict.utils.read_file_line_by_line(filename, skip_empty=True)[source]¶

Yields lines in a file while stripping whitespace.

Parameters:	filename – filename to read skip_empty – True if empty lines should be skipped
Returns:	next line in file

rna_predict package¶

Submodules¶

rna_predict.dcatools module¶

rna_predict.main module¶

rna_predict.pdbtools module¶

rna_predict.simulation module¶

rna_predict.sysconfig module¶

rna_predict.tools module¶

rna_predict.utils module¶

Module contents¶

Table Of Contents

Previous topic

This Page

Navigation

rna_predict package¶

Submodules¶

rna_predict.dcatools module¶

rna_predict.main module¶

rna_predict.pdbtools module¶

rna_predict.simulation module¶

rna_predict.sysconfig module¶

rna_predict.tools module¶

rna_predict.utils module¶

Module contents¶

Table Of Contents

Previous topic

This Page

Quick search

Navigation