metabolike.db package
Submodules
metabolike.db.metacyc module
- class metabolike.db.metacyc.MetacycClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j', create_db=True, drop_if_exists=False, reaction_groups=True)
Bases:
SBMLClient- add_props_to_nodes(node_label, node_prop_key, nodes, desc, **kwargs)
Add properties to a list of nodes.
- Parameters
node_label (
str) – Label of the node.node_prop_key (
str) – Property key of the node used to locate the node.nodes (
List[Dict[str,Any]]) – Properties to add to the node.desc (
str) – seecreate_nodes().
- get_all_compounds()
Fetch all
Compoundnodes. All Compound node with RDF have BioCyc IDs.- Return type
List[Tuple[str,str]]
- get_all_nodes(label, prop)
Fetch a property of nodes with a certain label.
- Return type
List[str]
- merge_nodes(n1_label, n2_label, n1_attr, n2_attr, attr_val)
Merge two nodes with the same attribute value. Note that properties is hard-coded to be “override”, which means all node attributes of
n2will be used to override the attributes ofn1.- Parameters
n1_label (
str) – The label of the first node.n2_label (
str) – The label of the second node.n1_attr (
str) – The attribute of the first node for filtering.n2_attr (
str) – The attribute of the second node for filtering.attr_val (
str) – The value of the attributes for filtering.
- metacyc_default_cyphers = {'compounds': '\nUNWIND $batch_nodes AS n\n MATCH (c:Compound {name: n.name})\n SET c += n.props\n FOREACH (cit IN n.citations |\n MERGE (x:Citation {metaId: cit})\n MERGE (c)-[:hasCitation]->(x)\n );\n', 'pathways': "\nUNWIND $batch_nodes AS n\n MERGE (pw:Pathway {metaId: n.metaId})\n ON CREATE SET pw.name = n.metaId, pw += n.props\n ON MATCH SET pw += n.props\n FOREACH (cit IN n.citations |\n MERGE (c:Citation {metaId: cit})\n MERGE (pw)-[:hasCitation]->(c)\n )\n FOREACH (taxa IN n.species |\n MERGE (t:Taxa {metaId: taxa})\n MERGE (pw)-[:hasRelatedSpecies]->(t)\n )\n FOREACH (taxa IN n.taxonomicRange |\n MERGE (t:Taxa {metaId: taxa})\n MERGE (pw)-[:hasExpectedTaxonRange]->(t)\n )\n FOREACH (rxn IN n.rateLimitingStep |\n MERGE (pw)-[l:hasReaction]->(:Reaction {name: rxn})\n ON MATCH SET l.isRateLimitingStep = true\n )\n FOREACH (cpd IN n.primaryReactants |\n MERGE (c:Compound)-[:hasRDF {bioQualifier: 'is'}]->(:RDF {biocyc: cpd})\n MERGE (pw)-[:hasPrimaryReactant]->(c)\n )\n FOREACH (cpd IN n.primaryProducts |\n MERGE (c:Compound)-[:hasRDF {bioQualifier: 'is'}]->(:RDF {biocyc: cpd})\n MERGE (pw)-[:hasPrimaryProduct]->(c)\n )\n FOREACH (super_pw IN n.superPathways |\n MERGE (spw:Pathway {metaId: super_pw})\n ON CREATE SET spw.name = super_pw\n MERGE (spw)-[:hasSubPathway]->(pw)\n )\n FOREACH (super_pw IN n.inPathway |\n MERGE (spw:Pathway {metaId: super_pw})\n ON CREATE SET spw.name = super_pw\n MERGE (spw)-[:hasSubPathway]->(pw)\n )\n FOREACH (pred IN n.predecessors |\n MERGE (r1: Reaction {canonicalId: pred.r1})\n FOREACH (rxn IN pred.r2 |\n MERGE (r2:Reaction {canonicalId: rxn})\n MERGE (r2)-[l:isPrecedingEvent]->(r1)\n ON CREATE SET l.hasRelatedPathway = n.metaId\n )\n );\n", 'reactions': '\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {name: n.name})\n ON CREATE SET r += n.props\n ON MATCH SET r += n.props\n FOREACH (pw IN n.inPathway |\n MERGE (p:Pathway {metaId: pw})\n ON CREATE SET p.name = pw\n MERGE (p)-[:hasReaction]->(r)\n )\n FOREACH (cpt IN n.rxnLocations |\n MERGE (c:Compartment {metaId: cpt})\n MERGE (r)-[:hasCompartment]->(c)\n )\n FOREACH (cit IN n.citations |\n MERGE (c:Citation {metaId: cit})\n MERGE (r)-[:hasCitation]->(c)\n );\n'}
- metacyc_to_graph(parser)
Enterpoint for setting up the database.
The process is as follows:
Parse the SBML file. All parsing errors are logged as warnings.
If
db_nameis not given, use themetaidattribute of the SBML file to name the database.Create the database and constraints.
Feed the SBML file into the database. This will populate
Compartment,Reaction,Compound,GeneProduct,GeneProductSet,GeneProductComplex, andRDFnodes.If
reactions.datis given, parse the file and add standard Gibbs free energy, standard reduction potential, reaction direction, reaction balance status, systematic name, comment attributes toReactionnodes. Also linkReactionnodes toPathwayandCitationnodes. #. Ifpathways.datis given:Add synonyms, types, comments, common names to
Pathwaynodes.- Link
Pathwaynodes to super-pathwayPathwayand taxonomy Taxanodes.
- Link
- Link
Reactionnodes within the pathway with isPrecedingEventrelationships.
- Link
- Link
Pathwaynodes with their rate limiting steps (
Reactionnodes).
- Link
- Link
Pathwaynodes with primary reactant and product Compoundnodes.
- Link
Link
Pathwaynodes with otherPathwaynodes and annotate the sharedCompoundnodes.- For
Reactionnodes, addisPrimary[Reactant|Product]InPathway labels in their links to
Compoundnodes.
- For
If
atom-mappings-smiles.datis given, parse the file and add SMILES mappings toReactionnodes.If
compounds.datis given, parse the file and add standard Gibbs free energy, logP, molecular weight, monoisotopic molecular weight, polar surface area, pKa {1,2,3}, comment, and synonyms toCompoundnodes. Also add SMILES and INCHI strings to relatedRDFnodes, and link theCompoundnodes toCitationnodes.If
pubs.datis given, parse the file and add DOI, PUBMED, MEDLINE IDs, title, source, year, URL, andREFERENT-FRAMEtoCitationnodes.If
classes.datis given, parse the file and:Add common name and synonyms to
Compartmentnodes.Add common name, strain name, comment, and synonyms to
Taxanodes.
metabolike.db.neo4j module
- class metabolike.db.neo4j.Neo4jClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j')
Bases:
objectsetup Neo4j driver.
- Parameters
uri (
str) – URI of the Neo4j server. Defaults toneo4j://localhost:7687. For more details, seeneo4j.driver.Driver.neo4j_user (
str) – Neo4j user. Defaults to “neo4j”.neo4j_password (
str) – Neo4j password. Defaults to “neo4j”.database (
str) – Name of the database. Defaults to “neo4j”.
- driver
neo4j.Neo4jDriverorneo4j.BoltDriver.
- database
str, name of the database to use.
- close()
- create(force=False)
Helper function to create a database.
- Parameters
force (
bool) – If True, the database will be dropped if it already exists.
- read(cypher, **kwargs)
Helper function to read from the database. Streams all records in the query into a list of dictionaries.
- Parameters
cypher (
str) – Query to read from the database.**kwargs – Parameters to pass to the Cypher query.
- Return type
List[Dict[str,Any]]
- read_tx(tx_func, **kwargs)
Helper function to read from the database.
- Parameters
tx_func (
Callable) – A transaction function to run.**kwargs – Keyword arguments to pass to
tx_funcor parameters for the Cypher query.
- Return type
List[Any]
- write(cypher, **kwargs)
Helper function to write to the database. Ignores returned output.
- Parameters
cypher (
str) – Query to write to the database.**kwargs – Keyword arguments to pass to
BaseDB.run().
metabolike.db.sbml module
- class metabolike.db.sbml.SBMLClient(uri='neo4j://localhost:7687', neo4j_user='neo4j', neo4j_password='neo4j', database='neo4j', create_db=True, drop_if_exists=False, reaction_groups=True)
Bases:
Neo4jClientIn addition to the Neo4j driver, this class also includes a set of helper methods for creating nodes and relationships in the graph with data from the SBML file.
- Parameters
uri (
str) – URI of the Neo4j server. Defaults toneo4j://localhost:7687. For more details, seeneo4j.driver.Driver.neo4j_user (
str) – Neo4j user. Defaults toneo4j.neo4j_password (
str) – Neo4j password. Defaults toneo4j.database (
str) – Name of the database. Defaults toneo4j.create_db (
bool) – Whether to create the database. Seesetup_graph_db().drop_if_exists (
bool) – Whether to drop the database if it already exists.
- driver
neo4j.Neo4jDriverorneo4j.BoltDriver.
- database
str, name of the database to use.
- available_node_labels
tuple of strings indicating the possible node labels in the graph.
- create_nodes(desc, nodes, query, batch_size=1000, progress_bar=False)
Create nodes in batches with the given label and properties.
For
Compartmentnodes, simply create them with given properties.Each
compoundnode is linked to itsCompartmentnode. If it has relatedRDFnodes, these are also created and linked to theCompoundnode.GeneProductnodes don’t have relationships toCompartmentnodes, but they are linked to correspondingRDFnodes.- Parameters
desc (
str) – Label of the node in log and progress bar.nodes (
List[Dict[str,Any]]) – List of properties of the nodes.query (
str) – Cypher query to create the nodes.batch_size (
int) – Number of nodes to create in each batch.progress_bar (
bool) – Show progress bar for slow queries.
- default_cyphers = {'Compartment': '\nUNWIND $batch_nodes AS n\n MERGE (x:Compartment {metaId: n.metaId})\n ON CREATE SET x += n.props;\n', 'Compound': '\nUNWIND $batch_nodes AS n\n MATCH (cpt:Compartment {metaId: n.compartment})\n MERGE (c:Compound {metaId: n.metaId})-[:hasCompartment]->(cpt)\n ON CREATE SET c += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (r:RDF)\n SET r = rdf.rdf\n MERGE (r)<-[rel:hasRDF]-(c)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n );\n', 'GeneProduct': '\nUNWIND $batch_nodes AS n\n MERGE (gp:GeneProduct {metaId: n.metaId})\n ON CREATE SET gp += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (r:RDF)\n SET r = rdf.rdf\n MERGE (r)<-[rel:hasRDF]-(gp)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n );\n', 'GeneProductComplex': '\nUNWIND $batch_nodes AS n\n CREATE (gpc:GeneProductComplex:GeneProduct {metaId: n.metaId})\n FOREACH (g IN n.components |\n MERGE (gp:GeneProduct {metaId: g})\n MERGE (gpc)-[:hasComponent]->(gp)\n );\n', 'GeneProductSet': '\nUNWIND $batch_nodes AS n\n MERGE (gps:GeneProduct {metaId: n.metaId})\n SET gps:GeneProductSet:GeneProduct\n FOREACH (g IN n.members |\n MERGE (m:GeneProduct {metaId: g})\n MERGE (gps)-[:hasMember]->(m)\n );\n', 'Group': '\nUNWIND $batch_nodes AS n\n MERGE (g:Group {metaId: n.metaId})\n ON CREATE SET g += n.props\n FOREACH (member IN n.members |\n MERGE (m:Reaction {metaId: member})\n MERGE (g)-[:hasGroupMember]->(m)\n );\n', 'Reaction': "\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {metaId: n.metaId})\n ON CREATE SET r += n.props\n FOREACH (rdf IN n.rdf |\n CREATE (x:RDF)\n SET x = rdf.rdf\n MERGE (x)<-[rel:hasRDF]-(r)\n ON CREATE SET rel.bioQualifier = rdf.bioQual\n )\n FOREACH (reactant IN n.reactants |\n MERGE (c:Compound {metaId: reactant.cpdId}) // can't MATCH here\n MERGE (r)-[rel:hasLeft]->(c)\n ON CREATE SET rel = reactant.props\n )\n FOREACH (product IN n.products |\n MERGE (c:Compound {metaId: product.cpdId})\n MERGE (r)-[rel:hasRight]->(c)\n ON CREATE SET rel = product.props\n );\n", 'Reaction-GeneProduct': '\nUNWIND $batch_nodes AS n\n MERGE (r:Reaction {metaId: n.reaction})\n MERGE (gp:GeneProduct {metaId: n.target})\n MERGE (r)-[:hasGeneProduct]->(gp);\n'}
- sbml_to_graph(parser)
Populate Neo4j database with SBML data. The process is as follows:
Parse the SBML file. All parsing errors are logged as warnings.
Create the database and constraints.
Feed the SBML file into the database. This will populate
Compartment,Reaction,Compound,GeneProduct,GeneProductSet,GeneProductComplex, andRDFnodes.
Nodes are created for each SBML element using
MERGEstatements: https://neo4j.com/docs/cypher-manual/current/clauses/merge/#merge-merge-with-on-create