brenda
Reading and parsing the BRENDA text file.
A lot of the code is translated from the brendaDb R package.
The BRENDA text file is organised in an EC-number specific format. The
information on each EC-number is given in a very short and compact
way in a part of the file. The contents of each line are described
by a two/three letter acronym at the beginning of the line and
start after a TAB. Empty spaces at the beginning of a line
indicate a continuation line.
.. brendaDb R package: https://bioconductor.org/packages/release/bioc/html/brendaDb.html
.. BRENDA text file: https://www.brenda-enzymes.org/download_brenda_without_registration.php
The contents are organised in ~40 information fields as given
below. Protein information is included in #...#, literature
citations are in <...>, commentaries in (...) and field-
special information in {...}.
It's not officially documented, but some fields also have commentaries
wrapped in |...|. These are usually for reaction-related fields, so
(...) would be for substrates and |...| for products.
Protein information is given as the combination organism/Uniprot accession number where available. When this information is not given in the original paper only the organism is given.
/// indicates the end of an EC-number specific part.
parse_brenda(filepath, cache=False, ec_nums=None)
Parse the BRENDA text file into a dict.
This implementation focuses on extracting information from the text file,
and feeding the data into a Neo4j database. The parser is implemented
using Lark. A series of :class:lark.visitors.Transformer classes are used
to clean the data and convert it into the format required by Neo4j.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath |
Union[str, Path]
|
The path to the BRENDA text file. |
required |
cache |
bool
|
Whether to cache the parsed data to a parquet file. |
False
|
ec_nums |
Optional[Iterable[str]]
|
A list of EC numbers to extract. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dict with the |