03 Aug 2020
On June 18 2020, MDAnalysis was pleased to release the first major version, 1.0.0. As described in our 2019 roadmap, this is the last version that supports 2.7. We will continue backporting relevant bug fixes where feasible (e.g. the upcoming 1.0.1), but the next major release will be 2.0.0, which will support Python 3.6+.
As we look forward to this next milestone, it is time to consider the next directions of MDAnalysis. The development of MDAnalysis has always been driven by the growing need for standardised, accessible analysis tools for open, reproducible, and collaborative research. While many major packages for molecular dynamics simulation provide their own set-up and analysis software, these are necessarily targeted to their own particular standards. MDAnalysis aims to provide analysis tools for simulation data in general, so historically a key objective has been to expand the number of supported package-specific data formats. As of version 1.0.0, we support over 40 file formats used in major packages for both molecular dynamics and quantum chemistry.
In 1.0.0 we also began to explore an exciting new approach: direct interoperability with other popular packages for molecular analysis by becoming API compatible instead of just file-format compatible, an approach also reinforced by discussions at the 2019 MolSSI Workshop: Molecular Dynamics Software Interoperability. Our new converters are distinct from topology parsers and coordinate readers as a third avenue for loading data into MDAnalysis. In 1.0.0 we added converters for two libraries: the molecular editor ParmEd, and chemfiles, a library for reading data from computational chemistry formats.
The general lack of interoperability between software packages in the molecular modelling community has been highlighted in the 2019 report of the NSF MolSSI on Molecular Dynamics Software Interoperability, noting consequences such as great duplication of effort in developing and maintaining similar tools across different formats; significant barriers to collaborating and transferring data; and requiring scientists to learn multiple packages and languages to access the full breadth of available analysis algorithms.
Moving forward, our plan is to increase the range of analyses and formats accessible to users by becoming interoperable with other relevant libraries. This reduces the need to duplicate and support existing tools within our own framework, and allows MDAnalysis to become a general-purpose analysis toolkit. We are already in the process of expanding compatible libraries in 2.0.0 by adding support for the widely popular RDKit cheminformatics toolkit through a Google Summer of Code projects
being carried out by Cédric Bouysset (@cbouy).
By the end of 2021, we aim to have expanded the range of our Converters framework to include packages in three categories: widely-used analysis libraries, such as MDTraj and pytraj; libraries that can expand the range of formats we can support, such as OpenBabel; and direct interfaces with computational chemistry engines such as OpenMM and Psi4.
Ensuring robust interoperability is best done as a community effort. If you are interested in contributing, or have comments or suggestions on our future directions, please get in touch!
— @MDAnalysis/coredevs
20 May 2020
We are happy to announce that MDAnalysis is hosting three GSoC
students this year – @hmacdope, @cbouy, and @yuxuanzhuang. This is
the first year that MDAnalysis has been accepted as its own
organization with GSoC and we are grateful to Google for granting us
three student slots so that we can have three exciting GSoC
projects.
Trajectory storage has always proved problematic for the molecular
simulation community, as large volumes of data can be generated
quickly. Traditional trajectory formats suffer from poor portability,
large file sizes and limited ability to include metadata relevant to
simulation. The Trajectory New Generation (TNG) format developed by
the GROMACS team represents the first trajectory format with small
file sizes, metadata storage, archive integrity verification and
user/software signatures. The primary goal of this
project
is for @hmacdope to refactor the existing TNG code into C++ to provide
clarity and usability for GROMACS, other simulation packages and
analysis tools. Thin FORTRAN and Python layers are also desirable to
encourage widespread adoption and are a secondary goal of the
project. An efficient and transferable implementation of the TNG
format will represent a major step forward for the computational
molecular sciences community, enabling easy storage and replication of
simulations.
This project is a collaboration with the GROMACS developer team
with @acmnpv from GROMACS serving as a co-mentor.
Hugo MacDermott-Opeskin is a PhD student in computational chemistry at
the Australian National University. His work focuses on studying
membrane biophysics through molecular dynamics simulations coupled
with enhanced sampling techniques. Hugo can be found on github as
@hmacdope and on twitter as @hugomacdermott.
When not hard at work Hugo can be found running or mountain biking
in the Canberra hills.
Through GSoC Hugo aims to bring the TNG next generation trajectory
format to the simulation community and he will document his experience at
his “Biophysics Bonanza” blog.
Cédric Bouysset: From RDKit to the Universe and back
The aim of the RDKit interoperability
project
is to give MDAnalysis the ability to use RDKit’s Chem.Mol
structure as an input to an MDAnalysis Universe
, but also to
convert a Universe
or AtomGroup
to an RDKit molecule
. RDKit is
one of the most complete and one of the most commonly used
chemoinformatics package, yet it lacks file readers for formats
typically encountered in MD simulations. @cbouy will implement in
MDAnalysis the ability to switch back and forth between a Universe
and an RDKit molecule
to perform typical chemoinformatics
calculations and so add a lot of value to both packages.
Cédric is a PhD student in molecular modelling at Université Côte
D’Azur, France. His research aims to decipher the molecular basis
of chemosensory perception (smell and taste) using computational
tools. His day-to-day work includes; modelling bitter taste receptors,
building machine-learning models to search for molecules with
interesting olfactive or sapid properties, maintaining the website
of the Global Consortium of Chemosensory Researchers, and a bit of
teaching. In his free time he enjoys cooking and playing video games.
Cédric can be found on github as @cbouy and on twitter as
@cedricbouysset.
Cédric will describe his progress in his blog.
Yuxuan Zhuang: Serialize Universes for parallel
As we approach the exascale barrier, researchers are handling
increasingly large volumes of molecular dynamics (MD) data. Whilst
MDAnalysis is a flexible and relatively fast framework for complex
analysis tasks in MD simulations, implementing a parallel computing
framework would play a pivotal role in accelerating the time to
solution for such large datasets. To achieve a flawless
implementation of parallelism, @yuxuanzhuang will implement
serialization support for
Universe
,
the core of MDAnalysis. Furthermore, he will adapt this new
serialization functionality to accelerate MDAnalysis’ analysis modules
using distributed computing frameworks, e.g. Dask, multiprocessing, or
MPI.
Yuxuan is a PhD student at Stockholm University. He mainly works on
understanding pentameric ligand-gated ion channels from MD simulations.
His daily workflow involves setting up and running simulations,
on lab clusters or HPC centers, and performing various analyses on the
MD trajectories in his jupyter notebook. Yuxuan can be found on github
as @yuxuanzhuang.
Yuxuan will chronicle his work on his blog.
— @richardjgowers @IAlibay @acmnpv @fiona-naughton @orbeckst (mentors)
10 Mar 2020
The
inaugural Google Season of Docs 2019 has wrapped up. Google
sponsored a technical writer to work with an open source project to
work on their documentation. MDAnalysis was one of the GSoD
projects with
technical writer @lilyminium.
She successfully completed her project A user
guide structured by topic. She shared her thoughts in her blog post
Project report: A user guide for MDAnalysis.
Quick Start Guide
Especially for new users, @lilyminium created the new Quick Start
Guide, which is now the recommended first tutorial when learning
MDAnalysis.
User Guide
The new User Guide is meant to make it easy for all users to
quickly become productive with MDAnalysis.
It starts with a Getting Started section with installation
instructions, examples, the Quick Start Guide, and a FAQ. A discussion
of the key data structures follows because understanding how to
work with Universe
and AtomGroup
is fundamental to MDAnalysis. A
section on selections explains how to create AtomGroup
s. The
next chapters explain working with trajectories (including the new
on-the-fly transformations) and general
input/output. Most analysis classes are described and
explained with examples, making the analysis section especially useful
for anyone who “quickly wants to run analysis X” on their own
trajectories.
The User Guide also documents a number of important internals and
usage patterns as well as the development process, which makes it a key
reference for intermediate users and developers.
As one seasoned core developer said: “Amazing, reading this I can
still learn new things about MDAnalysis!”
You can already see the pre-1.0 version of the new User Guide on
our website; an expanded version of the User Guide will be released
together with the upcoming 1.0 release of MDAnalysis.
More to come…
Furthermore, the new MDAnalysis docs will follow the
layout and style of the User Guide.
Finally, @lilyminium will continue working with MDAnalysis as
our newest MDAnalysis Core Developer!
— @richardjgowers, @orbeckst