Skip to content

metacyc

MetacycClient

Bases: SBMLClient

add_props_to_nodes(node_label, node_prop_key, nodes, desc, **kwargs)

Add properties to a list of nodes.

Parameters:

Name Type Description Default
node_label str

Label of the node.

required
node_prop_key str

Property key of the node used to locate the node.

required
nodes list[dict[str, Any]]

Properties to add to the node.

required
desc str

see :meth:create_nodes.

required
Source code in db/metacyc.py
def add_props_to_nodes(
    self,
    node_label: str,
    node_prop_key: str,
    nodes: list[dict[str, Any]],
    desc: str,
    **kwargs,
):
    """Add properties to a list of nodes.

    Args:
        node_label: Label of the node.
        node_prop_key: Property key of the node used to locate the node.
        nodes: Properties to add to the node.
        desc: see :meth:`create_nodes`.
    """
    if node_label not in self.available_node_labels:
        logger.warning(f"Invalid label: {node_label}")
        return

    query = f"""
    UNWIND $batch_nodes AS n
      MATCH (x:{node_label} {{{node_prop_key}: n.{node_prop_key}}})
      SET x += n.props;
    """
    self.create_nodes(desc, nodes, query, **kwargs)

get_all_compounds()

Fetch all Compound nodes.

All Compound node with RDF have BioCyc IDs.

Source code in db/metacyc.py
def get_all_compounds(self) -> list[tuple[str, str]]:
    """Fetch all ``Compound`` nodes.

    All Compound node with RDF have BioCyc IDs.
    """
    res = self.read(
        """
        MATCH (c:Compound)-[:hasRDF {bioQualifier: 'is'}]->(r:RDF)
        RETURN DISTINCT c.name, r.biocyc;
        """
    )  # TODO: 38 POLYMER nodes don't have BioCyc IDs
    return [(cpd["c.name"], cpd["r.biocyc"]) for cpd in res]

get_all_nodes(label, prop)

Fetch a property of nodes with a certain label.

Source code in db/metacyc.py
def get_all_nodes(self, label: str, prop: str) -> list[str]:
    """Fetch a property of nodes with a certain label."""
    if label not in self.available_node_labels:
        raise ValueError(f"Invalid label: {label}")
    # TODO: check for valid properties
    res = self.read(
        f"""
        MATCH (n:{label})
        RETURN DISTINCT n.{prop} AS prop;
        """
    )
    return [n["prop"] for n in res]

merge_nodes(n1_label, n2_label, n1_attr, n2_attr, attr_val)

Merge two nodes with the same attribute value. Note that properties is hard-coded to be "override", which means all node attributes of n2 will be used to override the attributes of n1.

Parameters:

Name Type Description Default
n1_label str

The label of the first node.

required
n2_label str

The label of the second node.

required
n1_attr str

The attribute of the first node for filtering.

required
n2_attr str

The attribute of the second node for filtering.

required
attr_val str

The value of the attributes for filtering.

required
Source code in db/metacyc.py
def merge_nodes(self, n1_label: str, n2_label: str, n1_attr: str, n2_attr: str, attr_val: str):
    """Merge two nodes with the same attribute value. Note that properties is hard-coded to be
    "override", which means all node attributes of ``n2`` will be used to override the
    attributes of ``n1``.

    Args:
        n1_label: The label of the first node.
        n2_label: The label of the second node.
        n1_attr: The attribute of the first node for filtering.
        n2_attr: The attribute of the second node for filtering.
        attr_val: The value of the attributes for filtering.
    """
    self.write(
        f"""
        MATCH (n1:{n1_label} {{{n1_attr}: $val}}),
              (n2:{n2_label} {{{n2_attr}: $val}})
        CALL apoc.refactor.mergeNodes([n1, n2], {{properties: 'override'}})
        YIELD node
        RETURN node;
        """,
        val=attr_val,
    )

metacyc_to_graph(parser)

Enterpoint for setting up the database.

The process is as follows:

. Parse the SBML file. All parsing errors are logged as warnings.

. If db_name is not given, use the metaid attribute of the SBML

file to name the database.

. Create the database and constraints.

. Feed the SBML file into the database. This will populate

Compartment, Reaction, Compound, GeneProduct, GeneProductSet, GeneProductComplex, and RDF nodes.

. If reactions.dat is given, parse the file and add standard Gibbs

free energy, standard reduction potential, reaction direction, reaction balance status, systematic name, comment attributes to Reaction nodes. Also link Reaction nodes to Pathway and Citation nodes. #. If pathways.dat is given:

  * Add synonyms, types, comments, common names to ``Pathway`` nodes.
  * Link ``Pathway`` nodes to super-pathway ``Pathway`` and taxonomy
      ``Taxa`` nodes.
  * Link ``Reaction`` nodes within the pathway with
      ``isPrecedingEvent`` relationships.
  * Link ``Pathway`` nodes with their rate limiting steps
      (``Reaction`` nodes).
  * Link ``Pathway`` nodes with primary reactant and product
      ``Compound`` nodes.
  * Link ``Pathway`` nodes with other ``Pathway`` nodes and annotate
    the shared ``Compound`` nodes.
  * For ``Reaction`` nodes, add ``isPrimary[Reactant|Product]InPathway``
      labels in their links to ``Compound`` nodes.

. If atom-mappings-smiles.dat is given, parse the file and add

SMILES_ mappings to Reaction nodes.

. If compounds.dat is given, parse the file and add standard Gibbs

free energy, logP, molecular weight, monoisotopic molecular weight, polar surface area, pKa {1,2,3}, comment, and synonyms to Compound nodes. Also add SMILES and INCHI strings to related RDF nodes, and link the Compound nodes to Citation nodes.

. If pubs.dat is given, parse the file and add DOI, PUBMED, MEDLINE IDs,

title, source, year, URL, and REFERENT-FRAME to Citation nodes.

. If classes.dat is given, parse the file and:

  • Add common name and synonyms to Compartment nodes.
  • Add common name, strain name, comment, and synonyms to Taxa nodes.

.. _SMILES: https://en.wikipedia.org/wiki/SMILES

Source code in db/metacyc.py
def metacyc_to_graph(self, parser: MetacycParser):
    """Enterpoint for setting up the database.

    The process is as follows:

    #. Parse the SBML file. All parsing errors are logged as warnings.
    #. If ``db_name`` is not given, use the ``metaid`` attribute of the SBML
       file to name the database.
    #. Create the database and constraints.
    #. Feed the SBML file into the database. This will populate
       ``Compartment``, ``Reaction``, ``Compound``, ``GeneProduct``,
       ``GeneProductSet``, ``GeneProductComplex``, and ``RDF`` nodes.
    #. If ``reactions.dat`` is given, parse the file and add standard Gibbs
       free energy, standard reduction potential, reaction direction,
       reaction balance status, systematic name, comment attributes to
       ``Reaction`` nodes. Also link ``Reaction`` nodes to ``Pathway`` and
       ``Citation`` nodes.
       #. If ``pathways.dat`` is given:

          * Add synonyms, types, comments, common names to ``Pathway`` nodes.
          * Link ``Pathway`` nodes to super-pathway ``Pathway`` and taxonomy
              ``Taxa`` nodes.
          * Link ``Reaction`` nodes within the pathway with
              ``isPrecedingEvent`` relationships.
          * Link ``Pathway`` nodes with their rate limiting steps
              (``Reaction`` nodes).
          * Link ``Pathway`` nodes with primary reactant and product
              ``Compound`` nodes.
          * Link ``Pathway`` nodes with other ``Pathway`` nodes and annotate
            the shared ``Compound`` nodes.
          * For ``Reaction`` nodes, add ``isPrimary[Reactant|Product]InPathway``
              labels in their links to ``Compound`` nodes.
    #. If ``atom-mappings-smiles.dat`` is given, parse the file and add
       SMILES_ mappings to ``Reaction`` nodes.

    #. If ``compounds.dat`` is given, parse the file and add standard Gibbs
       free energy, logP, molecular weight, monoisotopic molecular weight,
       polar surface area, pKa {1,2,3}, comment, and synonyms to ``Compound``
       nodes. Also add SMILES and INCHI strings to related ``RDF`` nodes, and
       link the ``Compound`` nodes to ``Citation`` nodes.
    #. If ``pubs.dat`` is given, parse the file and add DOI, PUBMED, MEDLINE IDs,
       title, source, year, URL, and ``REFERENT-FRAME`` to ``Citation`` nodes.
    #. If ``classes.dat`` is given, parse the file and:

       * Add common name and synonyms to ``Compartment`` nodes.
       * Add common name, strain name, comment, and synonyms to ``Taxa`` nodes.

    .. _SMILES: https://en.wikipedia.org/wiki/SMILES
    """
    # Populate Neo4j database with data from SBML file
    self.sbml_to_graph(parser)

    rxn_dat = self._reactions_dat_to_graph(parser)
    if rxn_dat is not None:
        self._pathways_to_graph(rxn_dat, parser)

    self._smiles_dat_to_graph(parser)
    self._compounds_dat_to_graph(parser)
    self._citations_dat_to_graph(parser)
    self._classes_dat_to_graph(parser)

    parser.report_missing_ids()