Usage
All datasets in MDAnalysisData
are accessible via fetch_*
functions in the MDAnalysisData.datasets
module. Datasets are
organized in submodules by the type of simulations that they
represent. The currently included datasets are:
AdK equilibrium trajectory without water. |
|
Ensembles of AdK transitions. |
|
Molecular dynamics trajectory of a single PEG chain in TIP3P water. |
|
MD simulation of I-FABP with water. |
|
NhaA equilibrium trajectory without water. |
|
YiiP equilibrium trajectory with water. |
|
Large vesicles library (coarse grained). |
|
Coarse-grained molecular dynamics of an amphiphilic fiber. |
Accessing a dataset
As an example, we can access the AdK equilibrium dataset with the
fetch_adk_equilibrium()
function:
>>> from MDAnalysisData import datasets
>>> adk = datasets.fetch_adk_equilibrium()
This will download the dataset from figshare (doi:
10.6084/m9.figshare.5108170.v1) and unpack it into
a cache directory. This means that only the first time executing
fetch_adk_equilibrium()
will be
slow; at later times, the cached files will be used. The resulting
Bunch
object can be introspected for
what this dataset includes. In particular, it features a
DESCR
attribute with a
human-readable description of the dataset:
>>> print(adk.DESCR)
AdK equilibrium trajectory dataset
==================================
MD trajectory of apo adenylate kinase with CHARMM27 force field and
simulated with explicit water and ions in NPT at 300 K and 1
bar. Saved every 240 ps for a total of 1.004 µs. Produced on PSC
Anton. The trajectory only contains the protein and all solvent
stripped. Superimposed on the CORE domain of AdK by RMSD fitting.
The topology is contained in the PSF file (CHARMM format). The
trajectory is contained in the DCD file (CHARMM/NAMD format).
Notes
-----
Data set characteristics:
:size: 161 MB
:number of frames: 4187
:number of particles: 3341
:creator: Sean Seyler
:URL: `10.6084/m9.figshare.5108170.v1 <https://doi.org/10.6084/m9.figshare.5108170.v1>`_
:license: `CC-BY 4.0 <https://creativecommons.org/licenses/by/4.0/legalcode>`_
:reference: [Seyler2017]_
.. [Seyler2017] Seyler, Sean; Beckstein, Oliver (2017): Molecular dynamics
trajectory for benchmarking
MDAnalysis. figshare. Fileset. doi:
`10.6084/m9.figshare.5108170.v1
<https://doi.org/10.6084/m9.figshare.5108170.v1>`_
The topology and trajectory files can be accessed:
>>> print(adk.topology)
>>> print(adk.trajectory)
and one can immediately load it into an MDAnalysis.Universe
:
>>> import MDAnalysis as mda
>>> u = mda.Universe(adk.topology, adk.trajectory)
Managing data
When data is downloaded from a remote location, it is copied to a
local data directory and cached. Subsequently, the cached copy is
used. By default, data are locally stored in the data directory
~/MDAnalysis_data
(i.e., under the user’s home directory).
The location of the data directiory can be changed by setting the
environment variable MDANALYSIS_DATA
, for instance
export MDANALYSIS_DATA=/tmp/MDAnalysis_data
All fetch_*
functions also have a keyword argument data_home that
can be used to set an alternative data directory.
The location of the data directory can be obtained with
MDAnalysisData.base.get_data_home()
.
If a dataset or the whole data directory is removed then the data are downloaded again when they are needed. If data are downloaded as archives (zip or tar files) then both the archive and the unpacked data are stored; removing the archive will trigger a re-download because only the archive itself is checked with the checksum.
Only datasets that are needed are downloaded. However, the full data
directory can take up more than 2 GB of space. One may manually delete
subdirectories (e.g. data sets that are currently not needed) and the
whole data directory can we wiped (removed) with the function
MDAnalysisData.base.clear_data_home()
.