Scripts.DatabankLib.databankio module

module databankio:

Inut/Output auxilary module

description:

Input/Output auxilary module with some small usefull functions. It includes: - Network communication. - Downloading files. - Checking links. - Resolving DOIs. - Calculating file hashes.

Scripts.DatabankLib.databankio.retry_with_exponential_backoff(max_attempts=3, delay_seconds=1)[source]

Decorator that retries a function with exponential backoff.

Parameters:
  • max_attempts (int) – The maximum number of attempts. Defaults to 3.

  • delay_seconds (int) – The initial delay between retries in seconds. The delay doubles after each failed attempt. Defaults to 1.

Scripts.DatabankLib.databankio.get_file_size_with_retry(uri: str) int[source]

Fetches the size of a file from a URI with retry logic.

Parameters:

uri (str) – The URL of the file.

Returns:

The size of the file in bytes, or 0 if the ‘Content-Length’

header is not present.

Return type:

int

Scripts.DatabankLib.databankio.download_with_progress_with_retry(uri: str, dest: str, fi_name: str) None[source]

Downloads a file with a progress bar and retry logic.

Uses tqdm to display a progress bar during the download.

Parameters:
  • uri (str) – The URL of the file to download.

  • dest (str) – The local destination path to save the file.

  • fi_name (str) – The name of the file, used for the progress bar description.

Scripts.DatabankLib.databankio.download_resource_from_uri(uri: str, dest: str, override_if_exists: bool = False, dry_run_mode: bool = False) int[source]

Download file resource from a URI to a local destination.

Checks if the file already exists and has the same size before downloading. Can also perform a partial “dry-run” download.

Parameters:
  • uri (str) – The URL of the file resource.

  • dest (str) – The local destination path to save the file.

  • override_if_exists (bool) – If True, the file will be re-downloaded even if it already exists. Defaults to False.

  • dry_run_mode (bool) – If True, only a partial download is performed (up to MAX_DRYRUN_SIZE). Defaults to False.

Returns:

A status code indicating the result.

0: Download was successful. 1: Download was skipped because the file already exists. 2: File was re-downloaded due to a size mismatch.

Return type:

int

Raises:
  • ConnectionError – An error occurred after multiple download attempts.

  • OSError – The downloaded file size does not match the expected size.

Scripts.DatabankLib.databankio.resolve_doi_url(doi: str, validate_uri: bool = True) str[source]

Resolves a DOI to a full URL and validates that it is reachable.

Parameters:
  • doi (str) – The DOI identifier (e.g., “10.5281/zenodo.1234”).

  • validate_uri (bool) – If True, checks if the resolved URL is a valid and reachable address. Defaults to True.

Returns:

The full, validated DOI link (e.g., “https://doi.org/…”).

Return type:

str

Raises:
  • urllib.error.HTTPError – If the DOI resolves to a URL, but the server returns an HTTP error code (e.g., 404 Not Found).

  • ConnectionError – If the server cannot be reached after multiple retries.

Scripts.DatabankLib.databankio.calc_file_sha1_hash(fi: str, step: int = 67108864, one_block: bool = True) str[source]

Calculates the SHA1 hash of a file.

Reads the file in chunks to handle large files efficiently if specified.

Parameters:
  • fi (str) – The path to the file.

  • step (int) – The chunk size in bytes for reading the file. Defaults to 64MB. Only used if one_block is False.

  • one_block (bool) – If True, reads the first step bytes of the file. If False, reads the entire file in chunks of step bytes. Defaults to True.

Returns:

The hexadecimal SHA1 hash of the file content.

Return type:

str

Scripts.DatabankLib.databankio.create_databank_directories(sim: Mapping, sim_hashes: Mapping, out: str, dry_run_mode: bool = False) str[source]

Creates a nested output directory structure to save simulation results.

The directory structure is generated based on the hashes of the simulation input files.

Parameters:
  • sim (Mapping) – A dictionary containing simulation metadata, including the “SOFTWARE” key.

  • sim_hashes (Mapping) – A dictionary mapping file types (e.g., “TPR”, “TRJ”) to their hash information. The structure is expected to be {‘TYPE’: [(‘filename’, ‘hash’)]}.

  • out (str) – The root output directory where the nested structure will be created.

  • dry_run_mode (bool) – If True, the directory path is resolved but not created. Defaults to False.

Returns:

The full path to the created output directory.

Return type:

str

Raises:
  • FileExistsError – If the target output directory already exists and is not empty.

  • NotImplementedError – If the simulation software is not supported.

  • RuntimeError – If the target output directory could not be created.