Scripts.DatabankLib.databankio module
- module databankio:
Inut/Output auxilary module
- description:
Input/Output auxilary module with some small usefull functions. It includes: - Network communication. - Downloading files. - Checking links. - Resolving DOIs. - Calculating file hashes.
- Scripts.DatabankLib.databankio.retry_with_exponential_backoff(max_attempts=3, delay_seconds=1)[source]
Decorator that retries a function with exponential backoff.
- Parameters:
max_attempts (int) – The maximum number of attempts. Defaults to 3.
delay_seconds (int) – The initial delay between retries in seconds. The delay doubles after each failed attempt. Defaults to 1.
- Scripts.DatabankLib.databankio.get_file_size_with_retry(uri: str) int [source]
Fetches the size of a file from a URI with retry logic.
- Parameters:
uri (str) – The URL of the file.
- Returns:
- The size of the file in bytes, or 0 if the ‘Content-Length’
header is not present.
- Return type:
int
- Scripts.DatabankLib.databankio.download_with_progress_with_retry(uri: str, dest: str, fi_name: str) None [source]
Downloads a file with a progress bar and retry logic.
Uses tqdm to display a progress bar during the download.
- Parameters:
uri (str) – The URL of the file to download.
dest (str) – The local destination path to save the file.
fi_name (str) – The name of the file, used for the progress bar description.
- Scripts.DatabankLib.databankio.download_resource_from_uri(uri: str, dest: str, override_if_exists: bool = False, dry_run_mode: bool = False) int [source]
Download file resource from a URI to a local destination.
Checks if the file already exists and has the same size before downloading. Can also perform a partial “dry-run” download.
- Parameters:
uri (str) – The URL of the file resource.
dest (str) – The local destination path to save the file.
override_if_exists (bool) – If True, the file will be re-downloaded even if it already exists. Defaults to False.
dry_run_mode (bool) – If True, only a partial download is performed (up to MAX_DRYRUN_SIZE). Defaults to False.
- Returns:
- A status code indicating the result.
0: Download was successful. 1: Download was skipped because the file already exists. 2: File was re-downloaded due to a size mismatch.
- Return type:
int
- Raises:
ConnectionError – An error occurred after multiple download attempts.
OSError – The downloaded file size does not match the expected size.
- Scripts.DatabankLib.databankio.resolve_doi_url(doi: str, validate_uri: bool = True) str [source]
Resolves a DOI to a full URL and validates that it is reachable.
- Parameters:
doi (str) – The DOI identifier (e.g., “10.5281/zenodo.1234”).
validate_uri (bool) – If True, checks if the resolved URL is a valid and reachable address. Defaults to True.
- Returns:
The full, validated DOI link (e.g., “https://doi.org/…”).
- Return type:
str
- Raises:
urllib.error.HTTPError – If the DOI resolves to a URL, but the server returns an HTTP error code (e.g., 404 Not Found).
ConnectionError – If the server cannot be reached after multiple retries.
- Scripts.DatabankLib.databankio.calc_file_sha1_hash(fi: str, step: int = 67108864, one_block: bool = True) str [source]
Calculates the SHA1 hash of a file.
Reads the file in chunks to handle large files efficiently if specified.
- Parameters:
fi (str) – The path to the file.
step (int) – The chunk size in bytes for reading the file. Defaults to 64MB. Only used if one_block is False.
one_block (bool) – If True, reads the first step bytes of the file. If False, reads the entire file in chunks of step bytes. Defaults to True.
- Returns:
The hexadecimal SHA1 hash of the file content.
- Return type:
str
- Scripts.DatabankLib.databankio.create_databank_directories(sim: Mapping, sim_hashes: Mapping, out: str, dry_run_mode: bool = False) str [source]
Creates a nested output directory structure to save simulation results.
The directory structure is generated based on the hashes of the simulation input files.
- Parameters:
sim (Mapping) – A dictionary containing simulation metadata, including the “SOFTWARE” key.
sim_hashes (Mapping) – A dictionary mapping file types (e.g., “TPR”, “TRJ”) to their hash information. The structure is expected to be {‘TYPE’: [(‘filename’, ‘hash’)]}.
out (str) – The root output directory where the nested structure will be created.
dry_run_mode (bool) – If True, the directory path is resolved but not created. Defaults to False.
- Returns:
The full path to the created output directory.
- Return type:
str
- Raises:
FileExistsError – If the target output directory already exists and is not empty.
NotImplementedError – If the simulation software is not supported.
RuntimeError – If the target output directory could not be created.