Welcome to NMRlipids databank’s documentation!
NMRlipids databank is a community-driven catalogue containing atomistic MD simulations of biologically relevant lipid membranes emerging from the NMRlipids open collaboration.
NMRlipids databank is an overlay databank.
Each databank entry is a simulation described by the README.yaml file which contains all the essential information for the data upcycling and reuse. This includes the information about permanent location of each simulation file, but raw data is located in distributed locations outside the NMRlipids databank. The content of README.yaml files is described in User input and content of README.yaml files. The README.yaml files are stored in the NMRlipids databank git in subfolders named based on file hash identities. For details and information about overlay databank structure see the NMRlipids databank manuscript.
NMRlipids Databank-GUI provides easy access to the NMRlipids Databank content through a graphical user interface (GUI). Simulations can be searched based on their molecular composition, force field, temperature, membrane properties, and quality; the search results are ranked based on the simulation quality as evaluated against experimental data when available. Membranes can be visualized, and properties between different simulations and experiments compared.
The NMRlipids Databank-API provides programmatic access to all simulation data in the NMRlipids Databank. This enables wide range of novel data-driven applications from construction of machine learning models that predict membrane properties, to automatic analysis of virtually any property across all simulations in the Databank. For examples of novel analyses enabled by the NMRlipids databank API see the NMRlipids databank manuscript.
Functions available for simulation analyses are described in NMRlipids databank API functions. A project template designed to intialize projects that analyse data from NMRlipids databank contains a minimum example for looping over available simulations. For further examples, see codes that analyze the area per lipid, C-H bond order parameters, X-ray scattering form factors, and principal component equilibration. For these analyses, the universal molecule and atom names are connected to simulation specific names using README.yaml and mapping files as described in Universal molecule and atom names.
Adding simulations into the NMRlipids databank
The NMRlipids Databank is open for additions of simulation data by anyone. For detailed instructions to add new data, to update databank analyses and run quality evaluations, see Adding simulations into the NMRlipids databank. Quick and minimal steps to add a new simulation are here:
Add trajectory and topology (tpr for Gromacs, pdb or corresponding to other programs) file into a Zenodo repository.
Create an info.yaml file containing the essential information on your simulation by filling the template. For instructions, see User input and content of README.yaml files and examples. Mapping files are described in Universal molecule and atom names and are available from here .
Save the created info.yaml file into a new directory with the next free integer into Scripts/BuildDatabank/info_files/ folder in the NMRlipids databank git and make a pull request to the main branch.
Do not hesitate to ask assistance via GitHub issues.
Adding experimental data into the NMRlipids databank
Instrutions are available at Adding experimental data into the NMRlipids databank.
The code has been tested in Linux environment with python 3.7 or newer and recent Gromacs version installed.
Setup using conda as distribution:
conda create --name databank python==3.7.16 MDAnalysis MDAnalysisTests
conda activate databank
(databank) pip install tqdm pyyaml
- List and descriptions of NMRlipids databank files
- User input and content of README.yaml files
- DOI (compulsory)
- TRJ (compulsory)
- TPR (compulsory)
- SOFTWARE (compulsory)
- PREEQTIME (compulsory)
- TIMELEFTOUT (compulsory)
- COMPOSITION (compulsory)
- DIR_WRK (compulsory)
- UNITEDATOM_DICT (compulsory for united atom trajectories)
- TYPEOFSYSTEM (compulsory)
- Individual force field names for molecules
- CPT (Gromacs)
- LOG (Gromacs)
- TOP (Gromacs)
- COMPOSITION (output)
- Adding simulations into the NMRlipids databank
- Adding experimental data into the NMRlipids databank
- Universal molecule and atom names
- NMRlipids databank API functions
- Examples and tutorials