Scripts.DatabankLib.databankio module
Inut/Output auxilary module DatabankLib.databankio.
Input/Output auxilary module with some small usefull functions. It includes: - Network communication. - Downloading files. - Checking links. - Resolving DOIs. - Calculating file hashes.
- Scripts.DatabankLib.databankio.retry_with_exponential_backoff(max_attempts: int = 3, delay_seconds: int = 1) Callable [source]
Retry a function with exponential backoff.
- Parameters:
max_attempts – (int) The maximum number of attempts. Defaults to 3.
delay_seconds – (int) The initial delay between retries in seconds. The delay doubles after each failed attempt. Defaults to 1.
- Scripts.DatabankLib.databankio.get_file_size_with_retry(uri: str) int [source]
Fetch the size of a file from a URI with retry logic.
- Parameters:
uri – (str) The URL of the file.
- Returns:
The size of the file in bytes, or 0 if the ‘Content-Length’ header is not present (int).
- Scripts.DatabankLib.databankio.download_with_progress_with_retry(uri: str, dest: str, fi_name: str) None [source]
Download a file with a progress bar and retry logic.
Uses tqdm to display a progress bar during the download.
- Parameters:
uri (str) – The URL of the file to download.
dest (str) – The local destination path to save the file.
fi_name (str) – The name of the file, used for the progress bar description.
- Scripts.DatabankLib.databankio.download_resource_from_uri(uri: str, dest: str, *, override_if_exists: bool = False, dry_run_mode: bool = False) int [source]
Download file resource from a URI to a local destination.
Checks if the file already exists and has the same size before downloading. Can also perform a partial “dry-run” download.
- Parameters:
uri (str) – The URL of the file resource.
dest (str) – The local destination path to save the file.
override_if_exists (bool) – If True, the file will be re-downloaded even if it already exists. Defaults to False.
dry_run_mode (bool) – If True, only a partial download is performed (up to MAX_DRYRUN_SIZE). Defaults to False.
- Returns:
int – 0: Download was successful. 1: Download was skipped because the file already exists. 2: File was re-downloaded due to a size mismatch.
- Return type:
A status code indicating the result.
- Raises:
ConnectionError – An error occurred after multiple download attempts.:
OSError – The downloaded file size does not match the expected size.:
- Scripts.DatabankLib.databankio.resolve_doi_url(doi: str, validate_uri: bool = True) str [source]
Resolve a DOI to a full URL and validates that it is reachable.
- Parameters:
doi (str) – The DOI identifier (e.g., “10.5281/zenodo.1234”).
validate_uri (bool) – If True, checks if the resolved URL is a valid and reachable address. Defaults to True.
- Returns:
str
- Return type:
The full, validated DOI link (e.g., “https://doi.org/…”).
- Raises:
urllib.error.HTTPError – If the DOI resolves to a URL, but the server: returns an HTTP error code (e.g., 404 Not Found).
ConnectionError – If the server cannot be reached after multiple retries.:
- Scripts.DatabankLib.databankio.resolve_download_file_url(doi: str, fi_name: str, validate_uri: bool = True) str [source]
Resolve a download file URI from a supported DOI and filename.
Currently supports Zenodo DOIs.
- Parameters:
doi (str) – The DOI identifier for the repository (e.g., “10.5281/zenodo.1234”).
fi_name (str) – The name of the file within the repository.
validate_uri (bool) – If True, checks if the resolved URL is a valid and reachable address. Defaults to True.
- Returns:
str
- Return type:
The full, direct download URL for the file.
- Raises:
RuntimeError – If the URL cannot be opened after multiple retries.:
NotImplementedError – If the DOI provider is not supported.:
- Scripts.DatabankLib.databankio.calc_file_sha1_hash(fi: str, step: int = 67108864, one_block: bool = True) str [source]
Calculate the SHA1 hash of a file.
Reads the file in chunks to handle large files efficiently if specified.
- Parameters:
fi (str) – The path to the file.
step (int) – The chunk size in bytes for reading the file. Defaults to 64MB. Only used if one_block is False.
one_block (bool) – If True, reads the first step bytes of the file. If False, reads the entire file in chunks of step bytes. Defaults to True.
- Returns:
str
- Return type:
The hexadecimal SHA1 hash of the file content.
- Scripts.DatabankLib.databankio.create_databank_directories(sim: Mapping, sim_hashes: Mapping, out: str, *, dry_run_mode: bool = False) str [source]
Create a nested output directory structure to save simulation results.
The directory structure is generated based on the hashes of the simulation input files.
- Parameters:
sim (Mapping) – A dictionary containing simulation metadata, including the “SOFTWARE” key.
sim_hashes (Mapping) – A dictionary mapping file types (e.g., “TPR”, “TRJ”) to their hash information. The structure is expected to be {‘TYPE’: [(‘filename’, ‘hash’)]}.
out (str) – The root output directory where the nested structure will be created.
dry_run_mode (bool) – If True, the directory path is resolved but not created. Defaults to False.
- Returns:
str
- Return type:
The full path to the created output directory.
- Raises:
FileExistsError – If the target output directory already exists and is: not empty.
NotImplementedError – If the simulation software is not supported.:
RuntimeError – If the target output directory could not be created.: