evcouplings.fold package¶
evcouplings.fold.cns module¶
evcouplings.fold.filter module¶
Functions for detecting ECs that should not be included in 3D structure prediction
Most functions in this module are rewritten from older pipeline code in choose_CNS_constraint_set.m
- Authors:
- Thomas A. Hopf
-
evcouplings.fold.filter.
detect_secstruct_clash
(i, j, secstruct)[source]¶ Detect if an EC pair (i, j) is geometrically impossible given a predicted secondary structure
Based on direct port of the logic implemented in choose_CNS_constraint_set.m from original pipeline, lines 351-407.
Use secstruct_clashes() to annotate an entire table of ECs.
Parameters: Returns: clashes – True if (i, j) clashes with secondary structure
Return type:
-
evcouplings.fold.filter.
disulfide_clashes
(ec_pairs, output_column='cys_clash')[source]¶ Add disulfide bridge clashes to EC table (i.e. if any cysteine residue is coupled to another cysteine). This flag is necessary if disulfide bridges are created during folding, since only one bridge is possible per cysteine.
Parameters: - ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for the occurrence of multiple cys-cys pairings (with columns i, j, A_i, A_j)
- output_column (str, optional (default: "cys_clash")) – Target column indicating if pair is in a clash or not
Returns: Annotated EC table with clashes
Return type: pandas.DataFrame
-
evcouplings.fold.filter.
secstruct_clashes
(ec_pairs, residues, output_column='ss_clash', secstruct_column='sec_struct_3state')[source]¶ Add secondary structure clashes to EC table
Parameters: - ec_pairs (pandas.DataFrame) – Table with EC pairs that will be tested for clashes with secondary structure (with columns i, j)
- residues (pandas.DataFrame) – Table with residues in sequence and their secondary structure (columns i, ss_pred).
- output_column (str, optional (default: "secstruct_clash")) – Target column indicating if pair is in a clash or not
- secstruct_column (str, optional (default: "sec_struct_3state")) – Source column in ec_pairs with secondary structure states (H, E, C)
Returns: Annotated EC table with clashes
Return type: pandas.DataFrame
evcouplings.fold.protocol module¶
evcouplings.fold.ranking module¶
evcouplings.fold.restraints module¶
Functions for generating distance restraints from evolutionary couplings and secondary structure predictions
- Authors:
- Thomas A. Hopf Anna G. Green (docking restraints)
-
evcouplings.fold.restraints.
docking_restraints
(ec_pairs, output_file, restraint_formatter, config_file=None)[source]¶ Create .tbl file with distance restraints for docking
Parameters: - ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j, segment_i, segment_j)
- output_file (str) – Path to file in which restraints will be saved
- restraint_formatter (function) – Function called to create string representation of restraint
- config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
-
evcouplings.fold.restraints.
ec_dist_restraints
(ec_pairs, output_file, restraint_formatter, config_file=None)[source]¶ Create .tbl file with distance restraints based on evolutionary couplings
Logic based on choose_CNS_constraint_set.m, lines 449-515
Parameters: - ec_pairs (pandas.DataFrame) – Table with EC pairs that will be turned into distance restraints (with columns i, j, A_i, A_j)
- output_file (str) – Path to file in which restraints will be saved
- restraint_formatter (function) – Function called to create string representation of restraint
- config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
-
evcouplings.fold.restraints.
secstruct_angle_restraints
(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]¶ Create .tbl file with dihedral angle restraints based on secondary structure prediction
Logic based on make_cns_angle_constraints.pl
Parameters: - residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
- output_file (str) – Path to file in which restraints will be saved
- restraint_formatter (function, optional) – Function called to create string representation of restraint
- config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
- secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).
-
evcouplings.fold.restraints.
secstruct_dist_restraints
(residues, output_file, restraint_formatter, config_file=None, secstruct_column='sec_struct_3state')[source]¶ Create .tbl file with distance restraints based on secondary structure prediction
Logic based on choose_CNS_constraint_set.m, lines 519-1162
Parameters: - residues (pandas.DataFrame) – Table containing positions (column i), residue type (column A_i), and secondary structure for each position
- output_file (str) – Path to file in which restraints will be saved
- restraint_formatter (function) – Function called to create string representation of restraint
- config_file (str, optional (default: None)) – Path to config file with folding settings. If None, will use default settings included in package (restraints.yml).
- secstruct_column (str, optional (default: sec_struct_3state)) – Column name in residues dataframe from which secondary structure will be extracted (has to be H, E, or C).
evcouplings.fold.tools module¶
Wrappers for tools for 3D structure prediction from evolutionary couplings
- Authors:
- Thomas A. Hopf
-
evcouplings.fold.tools.
parse_maxcluster_clustering
(clustering_output)[source]¶ Parse maxcluster clustering output into a DataFrame
Parameters: clustering_output (str) – stdout output from maxcluster after clustering Returns: Parsed result table (columns: filename, cluster, cluster_size) Return type: pandas.DataFrame
-
evcouplings.fold.tools.
parse_maxcluster_comparison
(comparison_output)[source]¶ Parse maxcluster output into a DataFrame
Parameters: comparison_output (str) – stdout output from maxcluster after comparison Returns: Parsed result table (columns: filename, num_pairs, rmsd, maxsub, tm, msi), refer to maxcluster documentation for explanation of the score fields. Return type: pandas.DataFrame
-
evcouplings.fold.tools.
read_psipred_prediction
(filename, first_index=1)[source]¶ Read a psipred secondary structure prediction file in horizontal or vertical format (auto-detected).
Parameters: Returns: pred – Table containing secondary structure prediction, with the following columns:
- i: position
- A_i: amino acid
- sec_struct_3state: prediction (H, E, C)
If reading vformat, also contains columns for the individual (score_coil/helix/strand)
If reading hformat, also contains confidence score between 1 and 9 (sec_struct_conf)
Return type: pandas.DataFrame
-
evcouplings.fold.tools.
run_cns
(inp_script=None, inp_file=None, log_file=None, binary='cns')[source]¶ Run CNSsolve 1.21 (without worrying about environment setup)
Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.
Either input_script or input_file has to be specified.
Parameters: - inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
- inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
- log_file (str, optional (default: None)) – Save CNS stdout output to this file
- binary (str, optional (default: "cns")) – Absolute path of CNS binary
Raises: ExternalToolError
– If call to CNS failsInvalidParameterError
– If no input script (file or string) given
-
evcouplings.fold.tools.
run_cns_13
(inp_script=None, inp_file=None, log_file=None, source_script=None, binary='cns')[source]¶ Run CNSsolve 1.3
Note that the user is responsible for verifying the output products of CNS, since their paths are determined by .inp scripts and hard to check automatically and in a general way.
Either input_script or input_file has to be specified.
Parameters: - inp_script (str, optional (default: None)) – CNS “.inp” input script (actual commands, not file)
- inp_file (str, optional (default: None)) – Path to .inp input script file. Will override inp_script if also specified.
- log_file (str, optional (default: None)) – Save CNS stdout output to this file
- source_script (str, optional (default: None)) – Script to set CNS environment variables. This should typically point to .cns_solve_env_sh in the CNS installation main directory (the shell script itself needs to be edited to contain the path of the installation)
- binary (str, optional (default: "cns")) – Name of CNS binary
Raises: ExternalToolError
– If call to CNS failsInvalidParameterError
– If no input script (file or string) given
-
evcouplings.fold.tools.
run_maxcluster_cluster
(predictions, method='average', rmsd=True, clustering_threshold=None, binary='maxcluster')[source]¶ Compare a set of predicted structures to an experimental structure using maxcluster.
For clustering functionality, use run_maxcluster_clustering() function.
Parameters: - predictions (list(str)) – List of PDB files that should be compared against experiment
- method ({"single", "average", "maximum", "pairs_min", "pairs_abs"}, optional (default: "average")) – Clustering method (single / average / maximum linkage, or min / absolute size neighbour pairs
- clustering_threshold (float (optional, default: None)) – Initial clustering threshold (maxcluster -T option)
- rmsd (bool, optional (default: True)) – Use RMSD-based clustering (faster)
- binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns: Clustering result table (see parse_maxcluster_clustering for more detailed explanation)
Return type: pandas.DataFrame
-
evcouplings.fold.tools.
run_maxcluster_compare
(predictions, experiment, normalization_length=None, distance_cutoff=None, binary='maxcluster')[source]¶ Compare a set of predicted structures to an experimental structure using maxcluster.
For clustering functionality, use run_maxcluster_clustering() function.
For a high-level wrapper around this function that removes problematic atoms and compares multiple models, please look at evcouplings.fold.protocol.compare_models_maxcluster().
Parameters: - predictions (list(str)) – List of PDB files that should be compared against experiment
- experiment (str) – Path of experimental structure PDB file. Note that the numbering and residues in this file must agree with the predicted structure, and that the structure may not contain duplicate atoms (multiple models, or alternative locations for the same atom).
- normalization_length (int, optional (default: None)) – Use this length to normalize the Template Modeling (TM) score (-N option of maxcluster). If None, will normalize by length of experiment.
- distance_cutoff (float, optional (default: None)) – Distance cutoff for MaxSub search (-d option of maxcluster). If None, will use maxcluster auto-calibration.
- binary (str, optional (default: "maxcluster")) – Path to maxcluster binary
Returns: Comparison result table (see parse_maxcluster_comparison for more detailed explanation)
Return type: pandas.DataFrame
-
evcouplings.fold.tools.
run_psipred
(fasta_file, output_dir, binary='runpsipred')[source]¶ Run psipred secondary structure prediction
psipred output file convention: run_psipred creates output files <rootname>.ss2 and <rootname2>.horiz in the current working directory, where <rootname> is extracted from the basename of the input file (e.g. /home/test/<rootname>.fa)
Parameters: Returns: - ss2_file (str) – Absolute path to prediction output in “VFORMAT”
- horiz_file (str) – Absolute path to prediction output in “HFORMAT”
Raises: ExternalToolError
– If call to psipred fails