Blog

Google Summer of Code 2022

Google Summer of Code with
MDAnalysis 2022

MDAnalysis has been accepted as an organization for Google Summer of Code 2022! If you are interested in working with us this summer and you are new to open source, please read the advice and links below and write to us on the mailing list.

We are looking forward to all applications. Note that the GSoC program was updated compared to previous years: GSoC will welcome not just students, but any new and beginner open source contributors over 18 years old. Projects are also now scoped as either 175-hour (medium) or 350-hour (long) size. Finally, the duration can be extended from the standard 12 weeks to 22 weeks.

The application window deadline is April 19, 2022 18:00 (UTC). As part of the application process you must familiarize yourself with Google Summer of Code 2022.

If you are interested in working with us please read on and contact us on our mailing list. Apply as soon as possible at https://summerofcode.withgoogle.com; the application window opens on April 4, 2022 but potential GSoC Contributors are expected to familiarize themselves with application requirements and mentoring organizations as soon as possible. It’s also never too early to discuss application ideas with us!

Project Ideas

If you have your own idea about a potential project we’d love to work with you to develop this idea; please write to us on the developer list to discuss it there.

We also have listed several possible projects for you to work on. Our initial list of ideas (see summaries in the table below) contains various projects of different scope and with different skill requirements. However, check the ideas page — we might add more ideas after the posting date of this post.

The listed skills are suggested rather than essential, although they will be used as part of our decision criteria in choosing GSoC contributors. Our only essential requirement is that you need to demonstrate to us that you’re able and keen to learn anything that you don’t know yet, and we will be happy to help you learn during your project with us.

project name difficulty project size description skills mentors
1 Generalise Groups medium 350 hours Generalise concept of groups Python, NetworkX, Molecular modelling @lilyminium, @fiona-naughton, @richardjgowers, @IAlibay, @micaela-matta
2 Type hinting medium 175 hours Add type hints to the MDAnalysis library Python @IAlibay, @jbarnoud
3 Extend MDAnalysis Interoperability medium 350 hours Extend converters module to other relevant packages Python, Molecular Modelling @lilyminium, @IAlibay, @fiona-naughton, @hmacdope
4 Benchmarking and performance optimization medium 175 hours write benchmarks for automated performance analysis and address performance bottlenecks Python @hmacdope, @orbeckst, @jbarnoud
5 Context-aware guessers medium 350 hours Extend how the library guesses properties such as bonds, masses or atom symbols; and write guessers that know about the context of the system (database of origin, force field…) Python, Molecular modelling @jbarnoud @micaela-matta @IAlibay

Information for prospective GSoC Contributors

You must meet our own requirements if you want to be a GSoC Contributor with MDAnalysis this year (read all the docs behind these links!). You must also meet the eligibility criteria. Our GSoC FAQ collects common questions from applicants.

The MDAnalysis community values diversity and is committed to providing a productive, harassment-free environment to every member. Our Code of Conduct explains the values that we as a community uphold. Every member (and every GSoC Contributor) agrees to follow the Code of Conduct.

As a start to get familiar with MDAnalysis and open source development you should follow these steps:

Complete the Quick Start Guide

We have a Quick Start Guide explaining the basics of MDAnalysis. You should go through it at least once to understand how MDAnalysis is used. Continue reading the User Guide to learn more.

Introduce yourself to us

Introduce yourself on the mailing list. Tell us your github handle, what you plan to work on during the summer or what you have already done with MDAnalysis.

Close an issue of MDAnalysis

You must have at least one commit in the development branch of MDAnalysis in order to be eligible, i.e., you must demonstrate that you have been seriously engaged with the MDAnalysis project. We have a list of easy bugs and suggested GSOC Starter issues to work on in our issue tracker on GitHub. We only accept one GSOC Starter issue per applicant so that everybody gets a chance. If you want to dive deeper, we encourage you to tackle some of the other issues in our issue tracker.

We also appreciate contributions which add more tests or update/improve our documentation.

We recommend you start your application by working on an issue. It will give you a better understanding of MDAnalysis as a project and improve the quality of your application.

To start developing for MDAnalysis have a look at our guide on contributing to MDAnalysis and write to us on the mailing list if you have more questions about setting up a development environment or how to contribute.

— Hugo @hmacdope, Micaela @micaela-matta, Jonathan @jbarnoud, Richard @richardjgowers, Lily @lilyminium, Fiona @fiona-naughton, Irfan @IAlibay, Oliver @orbeckst

Call for participation - NumFOCUS open-source DEI research project

MDAnalysis is working with our fiscal sponsor NumFOCUS on a research project funded by the Gordon & Betty Moore Foundation to understand the barriers to participation that contributors, particularly those from historically underrepresented groups, face in the open-source software community. The NumFOCUS research team would like to talk to past and current (both new or long-term) contributors, project developers and maintainers about their experiences joining and contributing to MDAnalysis.

Interested in sharing your experiences?

Please complete this brief “Participant Interest” form which also contains additional information on the research goals, privacy, and confidentiality considerations. Your participation will be valuable to enabling the growth and sustainability of diverse and inclusive open-source software communities. Accepted participants will participate in a 30-minute interview with a NumFOCUS research team member.

Thank you for your interest!

MDAnalysis Command Line Interface (CLI)

The Logo of the MDAnalysis command line interface

We are happy to announce the first version of a command-line interface (CLI) of MDAnalysis! The CLI allows users to calculate radial distribution functions (RDF), Root-mean-square deviations (RMSD), and many more directly from the command line. For example, extracting the RDF between the water oxygens is just one line of code

mda interrdf -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

For a detailed synopsis and a list of all available modules check the sections below as well as the documentation or the repository.

Providing easier access to the library for day-to-day analysis, mdacli is not only for the MDAnalysis community but also perfect for non-Python users. The project evolved from the experiences of @joaomcteixeira in TaurenMD and from @PiCoCentauri in MAICoS.

Installation

Install mdacli from PyPI

pip install mdacli

A Conda build will be released soon.

Examples

To use mdacli, after installation open your terminal and run

mda -h

This command provides a list of all available modules similar to the in the documentation. The synopsis for calculating a radial distribution function (RDF) between two groups is given by

mda interrdf -h

A sample water trajectory is provided of rigid SPC/E water is provided testfiles. topol.tpr is a GROMACS topolgy file and traj.trr the corresponding trajectory. The oxygen-oxygen RDF can be calculated using

mda interrdf -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

The oxygens are selected with the -g1 and -g2 flags. More verbose output is achieved by using the -v flag. Even more, information is provided with the --debug flag. All warnings of the run can also be directly stored to a log file using the --logfile flag

mda --debug --logfile rdf.log interrdf -v -s topol.tpr -f traj.trr -g1 "name OW" -g2 "name OW"

The results of each analysis are stored by default in the current directory. The output directory can be changed with the -o flag and an additional prefix can be set using the -pre flag.

The results of the RDF calculations are two .csv files. The actual RDF is saved in the 2nd and 3rd columns of InterRDF_count_bins_rdf.csv. The header rows of each file provide information about the stored data. Simple results such as bare numbers or strings are stored as JSON dumps. More complex data such as 4 or higher-dimensional arrays are saved as a bunch of CSV files zipped together.

Contributions

Contributions are always welcome on Github. Not only in the form of code contributions via pull requests but also issues, especially since the project is in an early development phase.

Motivation and Background

The CLI was developed since we were facing the same problem in the labs regularly. New students/scientists start writing their analysis scripts and face the same challenges and problems i.e.

  • How to initialize the universe and loop through frames without copying many lines of code?
  • How to write a CLI parser to analyze several of their simulations?
  • How to process and save their trajectories in a clever way?

Some of these problems can be solved by using the MDAnalysis.analysis.base.AnalysisBase class of MDA. However, this class is limited to Python. A generic CLI wrapper for all classes based on the AnalysisBase could therefore help people to analyze their simulation data with the least effort. With this approach, it is easier to use for MDA-users since they just stay within their known universe with known selection commands and results structures. An existing framework makes it also more attractive for users and developers to write their analysis using the base.AnalysisBase. For more technical details visit the documentation.

@PicoCentauri and @joaomcteixeira