760 Open Source Applications on O2 [BioGrids]
QuickStart
BioGrids provides biomedical software titles on O2, e.g. AlphaFold, samtools and R. The full list of titles is available on the BioGrids Website. All applications in the collections are curated daily by software engineers who are embedded at Harvard Medical School in BCMP. The collection utilizes the capsules technology that has been developed at Harvard. This allows end users to use all applications in the collection without further configuration. BioGrids can also be installed on Linux workstations and Mac computers.
To start using BioGrids on O2 immediately use the following command:
All BioGrids applications are now available in your current shell as if they were natively installed on the system.
Obtain detailed information on all titles and available versions with these easy steps:
ace2sam maq2sam-long plot-bamstats soap2sam.pl
blast2sam.pl maq2sam-short psl2sam.pl wgsim
bowtie2sam.pl md5fa sam2vcf.pl wgsim_eval.pl
export2sam.pl md5sum-lite samtools zoom2sam.pl
fasta-sanitize.pl novo2sam.pl samtools.pl
interpolate_sam.pl plot-ampliconstats seq_cache_populate.pl
Version information for: /programs/x86_64-linux/samtools
Default version: 1.20
In-use version: 1.20
Installed versions: 1.20 1.19.2 1.18 1.17 1.16.1 1.16 1.15.1 1.15 1.14 1.13 1.12 1.10 1.9 1.8 1.7 1.6 1.5 1.4.1 1.3.1 1.3 0.1.19
Other available versions: 1.19.2 1.18 1.17 1.16.1 1.16 1.15.1 1.15 1.14 1.13 1.12 1.10 1.9 1.8 1.7 1.6 1.5 1.4.1 1.3.1 1.3 0.1.19
Overrides use this shell variable: SAMTOOLS_X
You can list all installed titles and versions with the biogrids-cli command.
BioGrids can be installed on Linux and Mac computers for both single and multi-user systems. See https://biogrids.org for more information.
For inquiries about SBGrid, please contact help@sbgrid.org (and see the section on SBGrid below).
BioGrids Details
Overview
Academic Software Platform (ASP) integrates two major stacks of scientific software: BioGrids stack of ~700 biomedical applications (https://biogrids.org/software/) as well as SBGrid stack of ~500 structural biology applications. The BioGrids stack is ready for execution on the O2 cluster at Harvard Medical School and on the E2 cluster at Boston Children’s Hospital. Multiple versions of applications are maintained, and users can also easily install the same stack of software on laptops, research workstations or cloud resources, under Linux or Mac OS X operating systems (Fig. 1). Access to BioGrids is offered to all research groups at Harvard and affiliated institutions.
Fig. 1. BioGrids applications are supported on a wide variety of computational platforms.
1. What are the benefits of using BioGrids compiled versions of applications?
An experienced team of software engineers, who complete over 200 software installations per year, professionally compile all BioGrids titles. The team (https://biogrids.org/about/staff/), which is embedded in the Department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School, takes care of all software dependencies, completes extensive testing of each title, and preconfigures the execution environment for each application. All BioGrids applications are executed through the ASP environment, which ensures that there are no clashes in execution and library paths. Once an application is included in the BioGrids collection, all you have to do to launch the application is to call it by its name (Figure 2). No shell modifications, library inclusions or path modifications are needed.
2. What software titles are available?
As of March 2021, BioGrids supports 378 open source applications, with extensive coverage of bioinformatics, imaging, and scientific visualization. The full list of applications is always available on the BioGrids website at https://biogrids.org/software/.
Additional titles and new versions of the existing applications can be requested by completing the following form https://biogrids.org/help/?tab=software or by email to help@biogrids.org. |
Based on the user requests, applications and new versions of the existing applications are added to the collection on a monthly basis, or potentially more often on as-needed basis. Please note that server-based software (web servers, database servers, docker containers, etc) are generally not supported.
Notably, we also provide access to two commercial applications:
Human Gene Mutation Database (HGMD) - Tools For Variant Discovery software is available, as BioGrids coordinates a Longwood-wide license. Access is provided on a fiscal year basis, July-June of the following year. We grant complimentary access to users who submit requests within the current fiscal year, and ask those users to contribute to the shared license upon renewal (around $300/year/user for online access, and more for access to the downloadable database). To request access please complete the following form https://www.biogrids.org/hgmd/register. It typically takes ~5 business days to approve requests.
Schrodinger Drug Discovery platform (access limited to Harvard Medical School groups only). As part of our long-standing engagement with Schrodinger, SBGrid (Morin et al., 2013) offers access to an extensive collection of Schrodinger Drug Discovery platform software. In conjunction with software access, we provide additional services ranging from technical support and installation, dedicated computational resources, compound libraries, scientific advice, and collaborative projects. There is no additional fee for exploratory use of this special medicinal chemistry platform. For moderate and heavy use, we will assess an additional fee to offset license and infrastructure costs. Please contact insilico@sbgrid.org to inquire about our support for small molecule discovery projects.
3. How can I run BioGrids applications on HMS and BCH clusters?
All BioGrids applications are preinstalled on O2 at HMS and on E2 at BCH in the /programs directory. To use these applications, source the biogrids environment with the following command (Figure 2):
$ source /programs/biogrids.shrc
Afterwards, users can execute all BioGrids applications without further configuration. E.g. to run bowtie, just type bowtie in your command prompt. Alternatively the applications can be also installed as modules (see below), but the advantage of our native environment is the ability to mix and match any application from the BioGrids with simple calls to binaries.
Figure 2. BioGrids shell supports access to a stack of ~300 biomedical applications without further configuration.
4. Are applications supported as modules?
There are two ways to run BioGrids applications in a module environment (Figure S1). You can preload the entire BioGrids collection with ~700 applications as a single module with the following command:
$ module load biogrids/latest
Alternatively, you can also activate modules for individual applications. In order to do so you will first need to modify your MODULEPATH in $HOME/.bashrc file with the following command:
export MODULEPATH=$MODULEPATH:/programs/share/modulefiles/x86_64-linux/biogrids
After this modification, the “module avail” command should display an additional 900 entries for all versions of BioGrids supported software.
5. Where can I find full documentation and tutorials?
The software page (https://biogrids.org/software/) on the BioGrids website provides an excellent starting point to explore functionality of individual applications (Figure S2). For each software title we meticulously curate the following information: a) a brief description and relevant keywords, b) Find Out More page with additional version and citation information, c) link to software website and d) link to relevant web forums.
Additional documentation resource include:
Support wiki, at https://biogrids.org//wiki with information mostly focused on installation and configuration of the BioGrids environment
A YouTube Channel with training videos https://www.youtube.com/user/SBGridTV, which is currently mostly focused on structural biology software.
6. Are BioGrids titles available through Open OnDemand?
Open OnDemand (OOD) is an NSF-funded, open-source High Performance Computing (HPC) based project (Figure S3). It provides web-based graphical access to HPC compute nodes, enabling GUI-based applications, file management, and graphical desktops for compute clusters.
BioGrids has piloted Jupyter, RStudio, IGV and several other applications in OOD on HPC clusters at BCH and HMS. We provide templates for implementing other GUI-based BioGrids applications. No additional software installations are needed. The BioGrids environment greatly simplifies OOD templating, typically only requiring setting the name of the cluster, queue, and particular BioGrids application. Further customization can be made to provide additional resources.
7. How can I run BioGrids applications on other computers/AWS?
In order to access BioGrids applications outside of O2/E2, you will need a Linux or Mac computer preconfigured with BioGrids software. You can complete this installation yourself (Figure S4), on a single computer or laptop, using our Installation Manager (https://biogrids.org/wiki/client_install), or work with your local system administrator to install the software on your network. Installation in cloud environments is also supported.
8. Can I run a particular version of a BioGrids application?
Definitely. The ASP environment supports access to multiple versions of each application. By default, you have access to the most recent, stable version of an application. To change the version in use, simply set a single environment variable, for example: $ export SAMTOOLS_X=1.9. Please visit the BioGrids wiki for full documentation: https://biogrids.org/wiki/versions. If you would like to use a version not currently available at your site, please contact help@biogrids.org.
9. I urgently need access to a software title/version which is not included in BioGrids. What to do?
Please email help@biogrids.org immediately and indicate that your request is urgent. We will review your request and provide an installation timeline within 1 business day. In the meantime, you can always install the application on your own in your home directory. We recommend that you transition to the BioGrids version once the application becomes available.
10. Can Research Computing advise me on the use of BioGrids software?
All Research Computing consultants at HMS and at BCH are familiar with BioGrids setup, and utilize BioGrids software. In additional the following HMS and BCH staff are involved with the project:
BioGrids Campus Champions, who participate in regular operational huddles and are BioGrids experts
Alex Truong, Harvard Medical School
BioGrids Steering Committee - meet on an annual basis, and advise on an overall direction of the project
Bill Barnett - Harvard Medical School
Shira Rockowitz - Boston Children’s Hospital
11. BioGrids Updates and Newsletter
The BioGrids newsletter is distributed on a monthly basis in conjunction with the monthly software upgrade cycle. Information on recently added or updated software is available at https://biogrids.org/software/recent/archive/.
12. What is BioGrids and how is it supported?
BioGrids is a part of larger Academic Software Initiative (ASI, formerly SBGrid) - a non-profit, academic effort by the Sliz Group in the Department of Biological Chemistry and Molecular Pharmacology at Harvard Medical School to develop a distributed research computing infrastructure, with focus on scientific software. The key elements of the initiative include:
Development and support of an Academic Software Platform (ASP)
Curation and licensing of ASP collections of scientific software
Development of an infrastructure to support software trainings (e.g. SBGrid YouTube channel, Quo Vadis Structural Biology Conference series),
Engagement in cutting edge biomedical research, with emphasis on piloting new software tools. Development of local and global computing resources (e.g. CryoEM Computing Center at Harvard Medical School, Open Science Grid WSMR portal, or HMS in-silico Environment), with emphasis on enabling cutting edge science.
The focus of BioGrids is to specifically maintain a stack of biomedical research applications. Its efforts support BioGrids Consortium - a community of academics that utilize BioGrids software stack.
13. Is the Academic Software Platform utilized more broadly? How is SBGrid related to BioGrids?
The ASP provides a general software execution platform that can support various collections of scientific applications. It was developed at Harvard Medical School in 2001, initially specifically to support the structural biology community with the SBGrid stack of applications. The ASP SBGrid stack of 453 applications (https://sbgrid.org/software/) includes software collection such as X-ray Crystallography, Electron Microscopy, NMR, Computational Chemistry, Structure Visualization and Structure Analysis. The SBGrid stack is now used globally in 22 countries by 395 research groups (https://sbgrid.org/members/). Access to the SBGrid stack of applications is available to structural biologists who are SBGrid Consortium members. The BioGrids stack of applications, utilizing the same ASP infrastructure, was established in 2016, and is now utilized by research groups at Harvard Medical School and Boston Children’s Hospital.
14. How is the software licensed?
Please note that use of BioGrids software is subject to the terms of the end-user-licenses of all individual titles. In particular commercial use might be limited. A full copy of corresponding licenses is typically available at individual software websites - for your reference we provide links to software websites on the BioGrids page https://biogrids.org/software/. SBGrid support is limited to technical guidance, and unless additional written arrangements are in place, all intellectual rights are retained by the users.
15. How to Acknowledge and Cite BioGrids?
In all publications which rely on the use of BioGrids-compiled software, please reference the primary SBGrid citation URL (Morin et al., 2013). Users can also include the BioGrids logo in the acknowledgement slides of their presentations.
References:
Supplemental Figures
Fig. S1. Biogrids modules loaded on O2 cluster
Fig. S2. Listing of all BioGrids applications is available on http://biogrids.org website. Information about each software title includes brief description, link to the software website, citation information and links to relevant software forums.
Fig. S3. RStudio application available through HMS O2 Open OnDemand system.
Fig. S4. BioGrids software installer running on OS X operating system.