metacyc
MetacycParser(sbml, reactions=None, atom_mapping=None, pathways=None, compounds=None, publications=None, classes=None)
Bases: SBMLParser
Converting MetaCyc files to a Neo4j database. Documentation on the MetaCyc files and format FAQs can be found at:
- MetaCyc data files download: https://metacyc.org/downloads.shtml
- MetaCyc file formats: https://bioinformatics.ai.sri.com/ptools/flatfile-format.html
- SBML FAQ: https://synonym.caltech.edu/documents/faq
See :class:.sbml.SBMLParser for more information on SBML parsing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sbml |
Union[str, Path]
|
The path to the SBML file. |
required |
reactions |
Optional[Union[str, Path]]
|
The path to the |
None
|
atom_mapping |
Optional[Union[str, Path]]
|
The path to the |
None
|
pathways |
Optional[Union[str, Path]]
|
The path to the |
None
|
compounds |
Optional[Union[str, Path]]
|
The path to the |
None
|
publications |
Optional[Union[str, Path]]
|
The path to the |
None
|
classes |
Optional[Union[str, Path]]
|
The path to the |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
sbml_file |
Filepath to the input SBML file. |
|
input_files |
A dictionary of the paths to the input |
|
missing_ids |
dict[str, set[str]]
|
A dictionary of sets of IDs that were not found in the
input files. This is helpful for collecting IDs that appear to be in
one class but are actually in another. :meth: |
Source code in parser/metacyc.py
collect_atom_mapping_dat_nodes(rxn_ids, smiles)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rxn_ids |
Iterable[str]
|
The reaction name from the graph database. |
required |
smiles |
dict[str, str]
|
The reaction ID -> SMILES dictionary from the atom mapping file. |
required |
Source code in parser/metacyc.py
collect_citation_dat_nodes(cit_ids, pub_dat)
Annotate a citation node with data from the publication.dat file.
If there are multiple fields in the given cit_id, then the fields are
separated by colons. The first field is the citation ID, the second is the
evidence type (in classes.dat), the third is not documented, and the
fourth is the curator's name.
In most cases the citation ID should match: PUB-[A-Z0-9]+$,
with a few exceptions containing double dashes, e.g. PUB--8,
or some dashes within author names, e.g. PUB-CHIH-CHING95.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cit_ids |
Iterable[str]
|
The citation |
required |
pub_dat |
dict[str, list[list[str]]]
|
The publication.dat data. |
required |
Source code in parser/metacyc.py
collect_reactions_dat_nodes(rxn_ids, rxn_dat)
Parse entries from the reaction attribute-value file, and prepare nodes to add the graph database in one transaction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rxn_ids |
Iterable[str]
|
Reaction full |
required |
rxn_dat |
dict[str, list[list[str]]]
|
Output of self.read_dat_file for the |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
A list of dictionaries, each of which contains the information |
Source code in parser/metacyc.py
fix_pathway_nodes(pw_nodes, all_rxns)
Some fields in the list of Pathway nodes require preprocessing before being fed into
the database. Specifically:
predecessorscontains a list of reaction IDs wrapped in parentheses. We need to extract the first ID as the target reaction, and take all the others as preceding events of the first one.reactionLayouttells us the primary reactants and products of the reactions in a given pathway.pathwayLinkslinks the pathway to other pathways through intermediateCompounds.
While parsing these fields, we don't add any new Reaction nodes.
These are often hypothetical reactions and not part of the SBML file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pw_nodes |
list[dict[str, Any]]
|
Output of :meth: |
required |
all_rxns |
set[str]
|
All valid reaction names. |
required |