lipinet package#
Submodules#
lipinet.databases module#
- lipinet.databases.clean(df, name_of_resource, verbose=False)#
Some of the data sources need specialised cleaning to make them nicer to work with.
- lipinet.databases.clean_columns(df, cols, strip_chars=None, trim_substrings=None, verbose=False)#
- For each col in cols:
.str.strip(strip_chars) — if strip_chars is None defaults to whitespace
remove any of the trim_substrings at start or end
- Parameters:
df (
DataFrame) – your DataFramecols (
list[str]) – list of column names to clean (if empty, all columns)strip_chars (
str|None) – string of characters to strip from ends (None → whitespace)trim_substrings (
list[str] |None) – list of literal substrings to drop if they appear at start or endverbose (
bool) – print before/after samples
- Return type:
- lipinet.databases.download_and_load_data(filename, url, file_format='csv', compressed=False, sep=',', encoding='utf-8', verbose=False)#
Checks if the specified file exists locally. If not, downloads it from the provided URL. Supports loading compressed files and handling different formats.
Parameters: - filename (str): The name of the file to be saved within the data directory. - url (str): The URL to download the file from if it’s not found locally. - file_format (str): The format of the file (‘json’ or ‘csv’). Defaults to ‘csv’. - compressed (bool): If True, expects the downloaded file to be in gzip format. Defaults to False. - sep (str): Separator to use if loading CSV/TSV data. Defaults to ‘,’. - encoding (str): Encoding to use for reading files. Defaults to ‘utf-8’. - verbose (bool): If True, prints additional information during the process. Defaults to False.
Returns: - data (DataFrame, dict, or list): The loaded data from the file, in the format specified.
- lipinet.databases.get_prior_knowledge(name_of_resource, verbose=False)#
lipinet.parse_swisslipids module#
A standalone module that loads and processes SwissLipids data into a df_nodes using lipinet.
This module provides a helper function parse_swisslipids_data that can be imported into notebooks or other scripts. A thin wrapper in the main() function allows command-line execution.
- lipinet.parse_swisslipids.main()#
Thin wrapper for command-line execution.
- lipinet.parse_swisslipids.parse_swisslipids_data(verbose=False)#
Core function to process SwissLipids data and return nodes and edges dataframes.
lipinet.parse_rhea module#
lipinet.parse_rhea
A standalone module that loads and processes Rhea data into node and edge DataFrames for LipiNet. Provides a helper function parse_rhea_data and a CLI entrypoint.
- lipinet.parse_rhea.build_rhea_ec_edges_and_nodes(df_ec)#
- Given a DataFrame with EC hierarchy columns:
Main_Class, Subclass, Subsubclass, EC_number,
- this function creates:
A DataFrame of edges linking each hierarchical level.
A DataFrame of unique nodes with a ‘ec_level’ column indicating the node’s level in the hierarchy.
- Parameters:
df_ec (DataFrame)
- lipinet.parse_rhea.explode_columns(df, columns, delimiter=';')#
Split and explode the specified columns of a DataFrame.
- Parameters:
- Returns:
A new DataFrame with the specified columns exploded.
- Return type:
pd.DataFrame
Note
Each row in the specified columns must produce lists of the same length.
- lipinet.parse_rhea.main()#
- lipinet.parse_rhea.parse_rhea_data(verbose=False)#
Core function to load and process Rhea data.
- lipinet.parse_rhea.process_ec_numbers(df)#
Process the ‘EC number’ column of the input DataFrame.
- Parameters:
df (pd.DataFrame) – A DataFrame containing an ‘EC number’ column.
- Returns:
- A new DataFrame with the following columns:
’EC_number’: The reassembled EC number in the format ‘EC:Main_Class.Subclass.Subsubclass.Serial_Number’
’Main_Class’: The first part of the EC number.
’Subclass’: The second part of the EC number.
’Subsubclass’: The third part of the EC number.
’Serial_Number’: The fourth part of the EC number.
- Return type:
pd.DataFrame
lipinet.utils module#
- lipinet.utils.check_for_split_characters(df, delimiter='|')#
- lipinet.utils.create_nodedf_from_edgedf(edge_df, props=['layer', 'id'], cols=['layer', 'node_id'])#
- lipinet.utils.split_and_expand_large(df, split_col, delimiter, expand_cols)#
Splits a column by a delimiter and expands specified columns for large DataFrames, handling None/NaN values.
Parameters: df (pd.DataFrame): The original DataFrame. split_col (str): The name of the column to split. delimiter (str): The delimiter to split the column by. expand_cols (list): List of column names to be expanded with the split column.
Returns: pd.DataFrame: A new DataFrame with the split and expanded rows.