fairmd.lipids.databankio module
Input/Output auxiliary functions.
Input/Output module with some small usefull functions. It includes: - Downloading files. - Resolving URLs. - Calculating file hash for fingerprinting.
- fairmd.lipids.databankio.calc_file_sha1_hash(fi: str, step: int = 67108864, *, one_block: bool = True) str[source]
Calculate the SHA1 hash of a file.
Reads the file in chunks to handle large files efficiently if specified.
- Parameters:
fi (str) – The path to the file.
step (int) – The chunk size in bytes for reading the file. Defaults to 64MB. Only used if one_block is False.
one_block (bool) – If True, reads the first step bytes of the file. If False, reads the entire file in chunks of step bytes. Defaults to True.
- Returns:
str
- Return type:
The hexadecimal SHA1 hash of the file content.
- fairmd.lipids.databankio.create_simulation_directories(software: str, sim_hashes: Mapping, out: str, *, dry_run_mode: bool = False) str[source]
Create a nested output directory structure to save simulation results.
The directory structure is generated based on the hashes of the simulation input files.
- Parameters:
software – MD engine software (from simulation metadata)
sim_hashes (Mapping) – A dictionary mapping file types (e.g., “TPR”, “TRJ”) to their hash information. The structure is expected to be {‘TYPE’: [(‘filename’, ‘hash’)]}.
out (str) – The root output directory where the nested structure will be created.
dry_run_mode (bool) – If True, the directory path is resolved but not created. Defaults to False.
- Returns:
str
- Return type:
The full path to the created output directory.
- Raises:
FileExistsError – If the target output directory already exists and is: not empty.
NotImplementedError – If the simulation software is not supported.:
RuntimeError – If the target output directory could not be created.:
- fairmd.lipids.databankio.download_resource_from_uri(uri: str, dest: str, *, override_if_exists: bool = False, max_bytes: bool = False) int[source]
Download file resource from a URI to a local destination.
Checks if the file already exists and has the same size before downloading. Can also perform a partial “dry-run” download.
- Parameters:
uri (str) – The URL of the file resource.
dest (str) – The local destination path to save the file.
override_if_exists (bool) – If True, the file will be re-downloaded even if it already exists. Defaults to False.
max_bytes (bool) – If True, only a partial download is performed (up to MAX_DRYRUN_SIZE). Defaults to False.
- Returns:
int – 0: Download was successful. 1: Download was skipped because the file already exists. 2: File was re-downloaded due to a size mismatch.
- Return type:
A status code indicating the result.
- Raises:
ConnectionError – An error occurred after multiple download attempts.:
OSError – The downloaded file size does not match the expected size.:
- fairmd.lipids.databankio.download_with_progress_with_retry(uri: str, dest: str, *, tqdm_title: str = 'Downloading', stop_after: int | None = None) None[source]
Download a file with a progress bar and retry logic.
Uses tqdm to display a progress bar during the download.
- Parameters:
uri (str) – The URL of the file to download.
dest (str) – The local destination path to save the file.
tqdm_title (str) – The title used for the progress bar description.
stop_after (int) – Download max num of bytes
- fairmd.lipids.databankio.resolve_file_url(doi: str, fi_name: str, *, validate_uri: bool = True) str[source]
Resolve a download file URI from zenodo record’s DOI and filename.
Currently supports Zenodo DOIs.
- Parameters:
(str) (fi_name) – The DOI identifier for the repository (e.g., “10.5281/zenodo.1234”).
(str) – The name of the file within the repository.
(bool) (validate_uri) – If True, checks if the resolved URL is a valid and reachable address. Defaults to True.
- Return str:
The full, direct download URL for the file.
- Raises:
HTTPError or other connection errors – If the URL cannot be opened after multiple retries.
NotImplementedError – If the DOI provider is not supported.