Data Organization
The Databank includes three types of objects: Simulation, Experiment, and Molecule. Each is described by it’s own metadata file. Metadata is stored in YAML format. Simulation object stores precomputed properties in JSON format. Experiment object stores experimental data in JSON format.
Databank objects are stored in the separated repository; by default, it’s a BilayerData but any of its forks or even completely new repo could be used instead.
It is connected into the NMRlipids Databank repository as a submodule into ./Data
folder.
Data/
├── Simulations/
│ └── rw5/
│ └── g8v/
│ └── au3/
│ └── b8c/
│ ├── data-piece.json
│ └── README.yaml
├── experiments/
│ ├── FormFactors/
│ │ └── 10.1001/
│ │ └── j.some.thing.3242.11/
│ │ └── 1/
│ │ ├── form-factor-data.json
│ │ └── README.yaml
│ └── OrderParameter/
│ │ └── 10.1001/
│ │ └── j.some.thing.3242.11/
│ │ └── 1/
│ │ ├── lipid1-data.json
│ │ ├── lipid2-data.json
│ │ └── README.yaml
├── info_files/
│ └── some-folder/
│ ├── info1.yaml
│ └── info2.yaml
├── lipid_json_buildH/
│ ├── ua-dictionary-1.json
│ └── ua-dictionary-2.json
├── Ranking/
│ ├── lipid1-headgroup-ranking.json
│ └── lipid2-tail1-ranking.json
└── Molecules/
├── membrane/
│ └── lipid1/
│ ├── lipid1-forcefield2-mapping.yaml
│ └── metadata.yaml
└── solution/
└── ion1/
├── ion1-forcefield3-mapping.yaml
└── metadata.yaml
Simulation record
Simulation metadata README.yaml
A file containing all the relevant information of each simulation entry. These files are stored at Scripts.DatabankLib.NMLDB_SIMU_PATH
.
For more information see Simulation metadata (README.yaml).
Simulation computed properties
1. apl.json
contains area per lipid (Å2) as a function of time (ps) for a simulation.
See Scripts.DatabankLib.analyze.computeAPL()
for computing details.
2. POPC_OrderParameters.json
contains C-H bond order parameters and their uncertainties.
Key in json format gives the universal C and H atom names.
Values are average over lipids, standard deviation, and the standard error of the mean, respectively.
See Scripts.DatabankLib.analyze.computeOP()
for computing details.
3. FormFactor.json
contains simulated X-ray scattering form factor.
X-axis value unit are Å-1 and y-axis value units are e/nm2.
See Scripts.DatabankLib.analyze.computeFF()
for computing details.
4. XXXXDensity.json
contains electron densities calculated from trajectory.
We store separately lipid, water, and total densities.
X-axis units are nm and y-axis units e/nm3.
See Scripts.DatabankLib.analyze.computeFF()
for computing details.
5. thickness.json
contains thickness (nm) calculated from the trajectory.
See Scripts.DatabankLib.analyze.computeThickness()
for computing details.
6. eq_times.json
contains a dictionary with relative equilibration time of molecule principal components. It demonstrates how long time is required in this particular simulation to sample the ensemble for each lipid molecule.
See Scripts.DatabankLib.analyze.computeNMRPCA()
for computing details.
Simulation quality files
POPC_OrderParameters_quality.json
contains a dictionary with the quality of each C-H bond against experiments if available. First key is the DOI for the source of experimental data. Second key gives C-H pair univeral atom names. Third key gives values for simulation order parameter, its stardard deviation, standard error of the mean, experimental order parameter, its error, and finally the quality of the order parameter. Quality is the probability for the agreement between simulated and experimental values taking into account the error bars.
For more details, see the NMRlipids databank manuscript.
POPC_FragmentQuality.json
contains fragment qualities determined separately for each lipid in the simulation with experimental data available using Eq. (4) in the NMRlipids databank manuscript.SYSTEM_quality.json
contains total qualities averaged over different lipids for different membrane parts calculated from Eq. (5) in the NMRlipids databank manuscript.FormFactorQuality.json
contains quality of form factor against experimental data determined as described in the NMRlipids databank manuscript. Second term is the scaling coefficient for experimental intensities (Eq. (6) in the NMRlipids databank manuscript).
Experiment record
Order parameter NMR experiment
TODO: write the block
README.yaml
contains metadata of the experiment, such as DOI, lamellar phase preparation protocol, temperature, B0 field, ions concentration and other relevant information.POPC_OrderParameters.json
contains C-H bond order parameters and their experimental uncertainties defined for universal atom names.
Form factor X-ray experiment metadata
TODO: write the block
README.yaml
contains metadata of the experiment, such as DOI, properties of the source, type of liposome suspension, and other relevant information.FormFactor.json
contains X-ray scattering form factor data.
Molecule record
Molecule metadata
TODO: write the block
Mapping files
Describe connection between universal atom names and simulation specific atom names. These enable automatic analyses over simulations with different naming conventions. For more information, see Universal molecule and atom names
Other files
info.yaml
Contains information given by contributor when adding data into the NMRlipids databank, i.e. all non-recomputable fields of simulation README.yaml. This file is given as an input to Scripts.BuildDatabank.AddData
to create README.yaml
files: python3 AddData.py -f info.yaml
. These files are not required but currently are stored historically in info_files subfolder of Scripts.DatabankLib.NMLDB_DATA_PATH
.
For more information see Simulation metadata (README.yaml) and Adding simulations.
Lipid united-atom dictionary
TODO: write the block