metabolike.db package

Submodules

metabolike.db.metacyc module

class metabolike.db.metacyc.MetacycClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j', create_db=True, drop_if_exists=False, reaction_groups=True)

Bases: SBMLClient

add_props_to_nodes(node_label, node_prop_key, nodes, desc, **kwargs)

Add properties to a list of nodes.

Parameters

node_label (str) – Label of the node.
node_prop_key (str) – Property key of the node used to locate the node.
nodes (List[Dict[str, Any]]) – Properties to add to the node.
desc (str) – see create_nodes().

get_all_compounds()

Fetch all Compound nodes. All Compound node with RDF have BioCyc IDs.

Return type: List[Tuple[str, str]]

get_all_nodes(label, prop)

Fetch a property of nodes with a certain label.

Return type: List[str]

merge_nodes(n1_label, n2_label, n1_attr, n2_attr, attr_val)

Merge two nodes with the same attribute value. Note that properties is hard-coded to be “override”, which means all node attributes of n2 will be used to override the attributes of n1.

Parameters

n1_label (str) – The label of the first node.
n2_label (str) – The label of the second node.
n1_attr (str) – The attribute of the first node for filtering.
n2_attr (str) – The attribute of the second node for filtering.
attr_val (str) – The value of the attributes for filtering.

metacyc_default_cyphers = {'compounds': '\nUNWIND $batch_nodes AS n\n MATCH (c:Compound {name: n.name})\n SET c += n.props\n FOREACH (cit IN n.citations |\n MERGE (x:Citation {metaId: cit})\n MERGE (c)-[:hasCitation]->(x)\n );\n', 'pathways': "\nUNWIND $batch_nodes AS n\n MERGE (pw:Pathway {metaId: n.metaId})\n ON CREATE SET pw.name = n.metaId, pw += n.props\n ON MATCH SET pw += n.props\n FOREACH (cit IN n.citations |\n MERGE (c:Citation {metaId: cit})\n MERGE (pw)-[:hasCitation]->(c)\n )\n FOREACH (taxa IN n.species |\n MERGE (t:Taxa {metaId: taxa})\n MERGE (pw)-[:hasRelatedSpecies]->(t)\n )\n FOREACH (taxa IN n.taxonomicRange |\n MERGE (t:Taxa {metaId: taxa})\n MERGE (pw)-[:hasExpectedTaxonRange]->(t)\n )\n FOREACH (rxn IN n.rateLimitingStep |\n MERGE (pw)-[l:hasReaction]->(:Reaction {name: rxn})\n ON MATCH SET l.isRateLimitingStep = true\n )\n FOREACH (cpd IN n.primaryReactants |\n MERGE (c:Compound)-[:hasRDF {bioQualifier: 'is'}]->(:RDF {biocyc: cpd})\n MERGE (pw)-[:hasPrimaryReactant]->(c)\n )\n FOREACH (cpd IN n.primaryProducts |\n MERGE (c:Compound)-[:hasRDF {bioQualifier: 'is'}]->(:RDF {biocyc: cpd})\n MERGE (pw)-[:hasPrimaryProduct]->(c)\n )\n FOREACH (super_pw IN n.superPathways |\n MERGE (spw:Pathway {metaId: super_pw})\n ON CREATE SET spw.name = super_pw\n MERGE (spw)-[:hasSubPathway]->(pw)\n )\n FOREACH (super_pw IN n.inPathway |\n MERGE (spw:Pathway {metaId: super_pw})\n ON CREATE SET spw.name = super_pw\n MERGE (spw)-[:hasSubPathway]->(pw)\n )\n FOREACH (pred IN n.predecessors |\n MERGE (r1: Reaction {canonicalId: pred.r1})\n FOREACH (rxn IN pred.r2 |\n MERGE (r2:Reaction {canonicalId: rxn})\n MERGE (r2)-[l:isPrecedingEvent]->(r1)\n ON CREATE SET l.hasRelatedPathway = n.metaId\n )\n );\n", 'reactions': '\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {name: n.name})\n ON CREATE SET r += n.props\n ON MATCH SET r += n.props\n FOREACH (pw IN n.inPathway |\n MERGE (p:Pathway {metaId: pw})\n ON CREATE SET p.name = pw\n MERGE (p)-[:hasReaction]->(r)\n )\n FOREACH (cpt IN n.rxnLocations |\n MERGE (c:Compartment {metaId: cpt})\n MERGE (r)-[:hasCompartment]->(c)\n )\n FOREACH (cit IN n.citations |\n MERGE (c:Citation {metaId: cit})\n MERGE (r)-[:hasCitation]->(c)\n );\n'}

metacyc_to_graph(parser)

Enterpoint for setting up the database.

The process is as follows:

Parse the SBML file. All parsing errors are logged as warnings.
If db_name is not given, use the metaid attribute of the SBML file to name the database.
Create the database and constraints.
Feed the SBML file into the database. This will populate Compartment, Reaction, Compound, GeneProduct, GeneProductSet, GeneProductComplex, and RDF nodes.
If reactions.dat is given, parse the file and add standard Gibbs free energy, standard reduction potential, reaction direction, reaction balance status, systematic name, comment attributes to Reaction nodes. Also link Reaction nodes to Pathway and Citation nodes. #. If pathways.dat is given:
- Add synonyms, types, comments, common names to Pathway nodes.
- Link Pathway nodes to super-pathway Pathway and taxonomy
  Taxa nodes.
- Link Reaction nodes within the pathway with
  isPrecedingEvent relationships.
- Link Pathway nodes with their rate limiting steps
  (Reaction nodes).
- Link Pathway nodes with primary reactant and product
  Compound nodes.
- Link Pathway nodes with other Pathway nodes and annotate the shared Compound nodes.
- For Reaction nodes, add isPrimary[Reactant|Product]InPathway
  labels in their links to Compound nodes.
If atom-mappings-smiles.dat is given, parse the file and add SMILES mappings to Reaction nodes.
If compounds.dat is given, parse the file and add standard Gibbs free energy, logP, molecular weight, monoisotopic molecular weight, polar surface area, pKa {1,2,3}, comment, and synonyms to Compound nodes. Also add SMILES and INCHI strings to related RDF nodes, and link the Compound nodes to Citation nodes.
If pubs.dat is given, parse the file and add DOI, PUBMED, MEDLINE IDs, title, source, year, URL, and REFERENT-FRAME to Citation nodes.
If classes.dat is given, parse the file and:
- Add common name and synonyms to Compartment nodes.
- Add common name, strain name, comment, and synonyms to Taxa nodes.

metabolike.db.neo4j module

class metabolike.db.neo4j.Neo4jClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j')

Bases: object

setup Neo4j driver.

Parameters

uri (str) – URI of the Neo4j server. Defaults to neo4j://localhost:7687. For more details, see neo4j.driver.Driver.
neo4j_user (str) – Neo4j user. Defaults to “neo4j”.
neo4j_password (str) – Neo4j password. Defaults to “neo4j”.
database (str) – Name of the database. Defaults to “neo4j”.

driver: neo4j.Neo4jDriver or neo4j.BoltDriver.

database: str, name of the database to use.

close()

create(force=False)

Helper function to create a database.

Parameters: force (bool) – If True, the database will be dropped if it already exists.

read(cypher, **kwargs)

Helper function to read from the database. Streams all records in the query into a list of dictionaries.

Parameters

cypher (str) – Query to read from the database.
**kwargs – Parameters to pass to the Cypher query.

Return type

List[Dict[str, Any]]

read_tx(tx_func, **kwargs)

Helper function to read from the database.

Parameters

tx_func (Callable) – A transaction function to run.
**kwargs – Keyword arguments to pass to tx_func or parameters for the Cypher query.

Return type

List[Any]

write(cypher, **kwargs)

Helper function to write to the database. Ignores returned output.

Parameters

cypher (str) – Query to write to the database.
**kwargs – Keyword arguments to pass to BaseDB.run().

metabolike.db.sbml module

class metabolike.db.sbml.SBMLClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j', create_db=True, drop_if_exists=False, reaction_groups=True)

Bases: Neo4jClient

In addition to the Neo4j driver, this class also includes a set of helper methods for creating nodes and relationships in the graph with data from the SBML file.

Parameters

uri (str) – URI of the Neo4j server. Defaults to neo4j://localhost:7687. For more details, see neo4j.driver.Driver.
neo4j_user (str) – Neo4j user. Defaults to neo4j.
neo4j_password (str) – Neo4j password. Defaults to neo4j.
database (str) – Name of the database. Defaults to neo4j.
create_db (bool) – Whether to create the database. See setup_graph_db().
drop_if_exists (bool) – Whether to drop the database if it already exists.

driver: neo4j.Neo4jDriver or neo4j.BoltDriver.

database: str, name of the database to use.

available_node_labels: tuple of strings indicating the possible node labels in the graph.

create_nodes(desc, nodes, query, batch_size=1000, progress_bar=False)

Create nodes in batches with the given label and properties.

For Compartment nodes, simply create them with given properties.

Each compound node is linked to its Compartment node. If it has related RDF nodes, these are also created and linked to the Compound node.

GeneProduct nodes don’t have relationships to Compartment nodes, but they are linked to corresponding RDF nodes.

Parameters

desc (str) – Label of the node in log and progress bar.
nodes (List[Dict[str, Any]]) – List of properties of the nodes.
query (str) – Cypher query to create the nodes.
batch_size (int) – Number of nodes to create in each batch.
progress_bar (bool) – Show progress bar for slow queries.

default_cyphers = {'Compartment': '\nUNWIND $batch_nodes AS n\n MERGE (x:Compartment {metaId: n.metaId})\n ON CREATE SET x += n.props;\n', 'Compound': '\nUNWIND $batch_nodes AS n\n MATCH (cpt:Compartment {metaId: n.compartment})\n MERGE (c:Compound {metaId: n.metaId})-[:hasCompartment]->(cpt)\n ON CREATE SET c += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (r:RDF)\n SET r = rdf.rdf\n MERGE (r)<-[rel:hasRDF]-(c)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n );\n', 'GeneProduct': '\nUNWIND $batch_nodes AS n\n MERGE (gp:GeneProduct {metaId: n.metaId})\n ON CREATE SET gp += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (r:RDF)\n SET r = rdf.rdf\n MERGE (r)<-[rel:hasRDF]-(gp)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n );\n', 'GeneProductComplex': '\nUNWIND $batch_nodes AS n\n CREATE (gpc:GeneProductComplex:GeneProduct {metaId: n.metaId})\n FOREACH (g IN n.components |\n MERGE (gp:GeneProduct {metaId: g})\n MERGE (gpc)-[:hasComponent]->(gp)\n );\n', 'GeneProductSet': '\nUNWIND $batch_nodes AS n\n MERGE (gps:GeneProduct {metaId: n.metaId})\n SET gps:GeneProductSet:GeneProduct\n FOREACH (g IN n.members |\n MERGE (m:GeneProduct {metaId: g})\n MERGE (gps)-[:hasMember]->(m)\n );\n', 'Group': '\nUNWIND $batch_nodes AS n\n MERGE (g:Group {metaId: n.metaId})\n ON CREATE SET g += n.props\n FOREACH (member IN n.members |\n MERGE (m:Reaction {metaId: member})\n MERGE (g)-[:hasGroupMember]->(m)\n );\n', 'Reaction': "\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {metaId: n.metaId})\n ON CREATE SET r += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (x:RDF)\n SET x = rdf.rdf\n MERGE (x)<-[rel:hasRDF]-(r)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n )\n FOREACH (reactant IN n.reactants |\n MERGE (c:Compound {metaId: reactant.cpdId}) // can't MATCH here\n MERGE (r)-[rel:hasLeft]->(c)\n ON CREATE SET rel = reactant.props\n )\n FOREACH (product IN n.products |\n MERGE (c:Compound {metaId: product.cpdId})\n MERGE (r)-[rel:hasRight]->(c)\n ON CREATE SET rel = product.props\n );\n", 'Reaction-GeneProduct': '\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {metaId: n.reaction})\n MERGE (gp:GeneProduct {metaId: n.target})\n MERGE (r)-[:hasGeneProduct]->(gp);\n'}

sbml_to_graph(parser)

Populate Neo4j database with SBML data. The process is as follows:

Parse the SBML file. All parsing errors are logged as warnings.
Create the database and constraints.
Feed the SBML file into the database. This will populate Compartment, Reaction, Compound, GeneProduct, GeneProductSet, GeneProductComplex, and RDF nodes.

Nodes are created for each SBML element using MERGE statements: https://neo4j.com/docs/cypher-manual/current/clauses/merge/#merge-merge-with-on-create

metabolike.db package

Submodules

metabolike.db.metacyc module

metabolike.db.neo4j module

metabolike.db.sbml module

Module contents