LipiNet Build Walkthrough (SwissLipids + Rhea)
This notebook demonstrates, step by step, how lipinet.build_lipinet constructs the combined LipiNet nodes and edges tables, and shows the behind-the-scenes pieces: loading/parsing sources, linking ChEBI IDs across sources, and merging nodes/edges.
Prereqs: You should have the lipinet package (your repo) importable in this environment.
Python: 3.12.4
Imported lipinet OK.
1) Cache status
swisslipids cache: present
rhea cache: present
lipinet cache: present
2) Build the combined LipiNet (public API)
↪ Loading LipiNet (combined) from cache
Nodes: (2817072, 41) Edges: (7002161, 8)
|
node_id |
layer |
origin_vertex |
rhea_Equation |
rhea_ChEBI identifier |
rhea_chebi_name |
rhea_EC number |
rhea_Enzymes |
rhea_Gene Ontology |
rhea_Cross-reference (Reactome) |
... |
sl_Exact m/z of [M+NH4]+ |
sl_Exact m/z of [M-H]- |
sl_Exact m/z of [M+Cl]- |
sl_Exact m/z of [M+OAc]- |
sl_CHEBI |
sl_LIPID MAPS |
sl_HMDB |
sl_MetaNetX |
sl_PMID |
sl_Components_parsed |
| 0 |
RHEA:21252 |
rhea_reactionid |
rhea |
(S)-2-hydroxyglutarate + A = 2-oxoglutarate + AH2 |
CHEBI:16782;CHEBI:13193;CHEBI:16810;CHEBI:17499 |
(S)-2-hydroxyglutarate;A;2-oxoglutarate;AH2 |
EC:1.1.99.2 |
4258.0 |
GO:0047545 2-hydroxyglutarate dehydrogenase ac... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
| 1 |
RHEA:21256 |
rhea_reactionid |
rhea |
3-phosphoshikimate + phosphoenolpyruvate = 5-O... |
CHEBI:145989;CHEBI:58702;CHEBI:57701;CHEBI:43474 |
3-phosphoshikimate;phosphoenolpyruvate;5-O-(1-... |
EC:2.5.1.19 |
44340.0 |
GO:0003866 3-phosphoshikimate 1-carboxyvinyltr... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
| 2 |
RHEA:21260 |
rhea_reactionid |
rhea |
[thioredoxin]-disulfide + L-methionine + H2O =... |
CHEBI:50058;CHEBI:57844;CHEBI:15377;CHEBI:5877... |
L-cystine residue;L-methionine;H2O;L-methionin... |
EC:1.8.4.14 |
3112.0 |
GO:0033745 L-methionine-(R)-S-oxide reductase ... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
3 rows × 41 columns
|
source_layer |
source_id |
target_layer |
target_id |
interlayer |
edge_type |
origin_edge |
ec_level |
| 0 |
rhea_ec |
EC:1 |
rhea_ec |
EC:1.1 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 1 |
rhea_ec |
EC:2 |
rhea_ec |
EC:2.5 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 2 |
rhea_ec |
EC:1 |
rhea_ec |
EC:1.8 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
|
count |
| layer |
|
| swisslipids |
779312 |
| sl_abbreviation |
736949 |
| sl_synonyms |
534781 |
| sl_metanetx |
504880 |
| sl_parent |
184620 |
| rhea_reactionid |
17783 |
| sl_hmdb |
17232 |
| rhea_chebiid |
13723 |
| sl_lipidmaps |
12112 |
| rhea_ec |
6489 |
| sl_chebi |
4277 |
| sl_components |
1708 |
| sl_components_parsed |
1677 |
| sl_pmid |
1529 |
Interlayer edge count: 6114475
Edge counts by (source_layer, target_layer):
|
|
edges |
| source_layer |
target_layer |
|
| swisslipids |
sl_components |
1852844 |
| sl_components_parsed |
1852844 |
| sl_abbreviation |
786750 |
| swisslipids |
779247 |
| sl_synonyms |
568257 |
| sl_metanetx |
505003 |
| sl_parent |
493491 |
| rhea_reactionid |
rhea_chebiid |
83885 |
| swisslipids |
sl_hmdb |
26026 |
| rhea_reactionid |
rhea_ec |
18072 |
| swisslipids |
sl_lipidmaps |
12117 |
| sl_pmid |
10109 |
| rhea_ec |
rhea_ec |
6482 |
| swisslipids |
sl_chebi |
4278 |
| sl_chebi |
rhea_chebiid |
2756 |
3) Behind the scenes: parse each source
↪ Loading SwissLipids cache
↪ Loading Rhea (processed) from cache
SwissLipids: (2779077, 32) nodes / (6890966, 5) edges
Rhea: (37995, 10) nodes / (108439, 7) edges
3.1) Layers present
|
count |
| layer |
|
| swisslipids |
779312 |
| sl_abbreviation |
736949 |
| sl_synonyms |
534781 |
| sl_metanetx |
504880 |
| sl_parent |
184620 |
| sl_hmdb |
17232 |
| sl_lipidmaps |
12112 |
| sl_chebi |
4277 |
| sl_components |
1708 |
| sl_components_parsed |
1677 |
| sl_pmid |
1529 |
|
count |
| layer |
|
| rhea_reactionid |
17783 |
| rhea_chebiid |
13723 |
| rhea_ec |
6489 |
4) Cross-source linking: ChEBI (SwissLipids ↔ Rhea)
Linked ChEBI edges: 2756
Linked ChEBI interlayer edges: (2756, 7)
|
source_layer |
source_id |
target_layer |
target_id |
interlayer |
edge_type |
origin_edge |
| 0 |
sl_chebi |
10036 |
rhea_chebiid |
CHEBI:10036 |
True |
same_id_chebi |
lipinet |
| 1 |
sl_chebi |
10362 |
rhea_chebiid |
CHEBI:10362 |
True |
same_id_chebi |
lipinet |
| 2 |
sl_chebi |
11152 |
rhea_chebiid |
CHEBI:11152 |
True |
same_id_chebi |
lipinet |
| 3 |
sl_chebi |
1156 |
rhea_chebiid |
CHEBI:1156 |
True |
same_id_chebi |
lipinet |
| 4 |
sl_chebi |
116314 |
rhea_chebiid |
CHEBI:116314 |
True |
same_id_chebi |
lipinet |
| 5 |
sl_chebi |
11641 |
rhea_chebiid |
CHEBI:11641 |
True |
same_id_chebi |
lipinet |
| 6 |
sl_chebi |
1178 |
rhea_chebiid |
CHEBI:1178 |
True |
same_id_chebi |
lipinet |
| 7 |
sl_chebi |
11867 |
rhea_chebiid |
CHEBI:11867 |
True |
same_id_chebi |
lipinet |
| 8 |
sl_chebi |
1189 |
rhea_chebiid |
CHEBI:1189 |
True |
same_id_chebi |
lipinet |
| 9 |
sl_chebi |
11893 |
rhea_chebiid |
CHEBI:11893 |
True |
same_id_chebi |
lipinet |
Sample mapped CHEBI IDs (SwissLipids → Rhea):
|
source_layer |
source_id |
target_layer |
target_id |
interlayer |
edge_type |
origin_edge |
| 352 |
sl_chebi |
138100 |
rhea_chebiid |
CHEBI:138100 |
True |
same_id_chebi |
lipinet |
| 855 |
sl_chebi |
48946 |
rhea_chebiid |
CHEBI:48946 |
True |
same_id_chebi |
lipinet |
| 883 |
sl_chebi |
52639 |
rhea_chebiid |
CHEBI:52639 |
True |
same_id_chebi |
lipinet |
| 1801 |
sl_chebi |
76591 |
rhea_chebiid |
CHEBI:76591 |
True |
same_id_chebi |
lipinet |
| 1774 |
sl_chebi |
76475 |
rhea_chebiid |
CHEBI:76475 |
True |
same_id_chebi |
lipinet |
| 1695 |
sl_chebi |
76291 |
rhea_chebiid |
CHEBI:76291 |
True |
same_id_chebi |
lipinet |
| 1604 |
sl_chebi |
75587 |
rhea_chebiid |
CHEBI:75587 |
True |
same_id_chebi |
lipinet |
| 1087 |
sl_chebi |
62243 |
rhea_chebiid |
CHEBI:62243 |
True |
same_id_chebi |
lipinet |
| 396 |
sl_chebi |
138569 |
rhea_chebiid |
CHEBI:138569 |
True |
same_id_chebi |
lipinet |
| 1255 |
sl_chebi |
71567 |
rhea_chebiid |
CHEBI:71567 |
True |
same_id_chebi |
lipinet |
5) Merging node frames (origin tagging & prefixing unique columns)
Joined nodes: (2817072, 41)
|
node_id |
layer |
origin_vertex |
rhea_Equation |
rhea_ChEBI identifier |
rhea_chebi_name |
rhea_EC number |
rhea_Enzymes |
rhea_Gene Ontology |
rhea_Cross-reference (Reactome) |
... |
sl_Exact m/z of [M+NH4]+ |
sl_Exact m/z of [M-H]- |
sl_Exact m/z of [M+Cl]- |
sl_Exact m/z of [M+OAc]- |
sl_CHEBI |
sl_LIPID MAPS |
sl_HMDB |
sl_MetaNetX |
sl_PMID |
sl_Components_parsed |
| 0 |
RHEA:21252 |
rhea_reactionid |
rhea |
(S)-2-hydroxyglutarate + A = 2-oxoglutarate + AH2 |
CHEBI:16782;CHEBI:13193;CHEBI:16810;CHEBI:17499 |
(S)-2-hydroxyglutarate;A;2-oxoglutarate;AH2 |
EC:1.1.99.2 |
4258.0 |
GO:0047545 2-hydroxyglutarate dehydrogenase ac... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
| 1 |
RHEA:21256 |
rhea_reactionid |
rhea |
3-phosphoshikimate + phosphoenolpyruvate = 5-O... |
CHEBI:145989;CHEBI:58702;CHEBI:57701;CHEBI:43474 |
3-phosphoshikimate;phosphoenolpyruvate;5-O-(1-... |
EC:2.5.1.19 |
44340.0 |
GO:0003866 3-phosphoshikimate 1-carboxyvinyltr... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
| 2 |
RHEA:21260 |
rhea_reactionid |
rhea |
[thioredoxin]-disulfide + L-methionine + H2O =... |
CHEBI:50058;CHEBI:57844;CHEBI:15377;CHEBI:5877... |
L-cystine residue;L-methionine;H2O;L-methionin... |
EC:1.8.4.14 |
3112.0 |
GO:0033745 L-methionine-(R)-S-oxide reductase ... |
NaN |
... |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
3 rows × 41 columns
Column order (shared first, then prefixed):
['node_id', 'layer', 'origin_vertex', 'rhea_Equation', 'rhea_ChEBI identifier', 'rhea_chebi_name', 'rhea_EC number', 'rhea_Enzymes', 'rhea_Gene Ontology', 'rhea_Cross-reference (Reactome)', 'rhea_ec_level', 'sl_Lipid ID', 'sl_Level', 'sl_Name', 'sl_Abbreviation*'] ...
6) Merging edge frames (source edges + interlayer links)
Joined edges: (7002161, 8)
|
source_layer |
source_id |
target_layer |
target_id |
interlayer |
edge_type |
origin_edge |
ec_level |
| 0 |
rhea_ec |
EC:1 |
rhea_ec |
EC:1.1 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 1 |
rhea_ec |
EC:2 |
rhea_ec |
EC:2.5 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 2 |
rhea_ec |
EC:1 |
rhea_ec |
EC:1.8 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 3 |
rhea_ec |
EC:1 |
rhea_ec |
EC:1.5 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
| 4 |
rhea_ec |
EC:6 |
rhea_ec |
EC:6.3 |
False |
ec_hierarchy |
rhea |
main_class->subclass |
Interlayer edges in merged set: 6114475
7) Quick QC: orphans & coverage by layer
|
nodes |
touched |
orphans |
| sl_chebi |
4277 |
4277 |
0 |
| rhea_chebiid |
13723 |
13723 |
0 |
| rhea_reactionid |
17783 |
17783 |
0 |
| rhea_ec |
6489 |
6490 |
-1 |
Note that the one orphan for the rhea_ec is most likely due to the nans not being filtered out. We should later correct _join edges to handle this by santizing for it.
8) Optional: Write outputs to .data/processed/