Using Java on O2

 

 

As Java is a resource-intensive program, it should only be used on compute nodes. You can submit a job through the Slurm scheduler, load a Java module, and then run commands as needed. See this wiki page for general information on submitting interactive or batch jobs on O2.

Java versions

We have several versions of Java on O2, which are accessible through our module system. Use the module spider command to search for available Java modules. 

mfk8@login01:~$ srun --pty -p interactive -t 0-1 bash # request interactive job for 1 hour mfk8@compute-a-01-01:~$ module spider java # search for all Java modules mfk8@compute-a-01-01:~$ module spider java/jdk-1.8u112 # get more information on a specific Java module, including how to load it mfk8@compute-a-01-01:~$ module load java/jdk-1.8u112 # load a Java module # can run Java now


Basic usage

Specifying path to jar file

There are multiple programs on O2, such as Trimmomatic, Picard, or IGVTools, for which the underlying code is written in Java. When you load a module for such a program, it will also automatically co-load a corresponding Java module for you. Then, when you run the program, you need to specify the full path to the JAR (archive file format that Java uses to package code and files required for program execution) like so:

mfk8@compute-a-01-01:~$ java -jar /path/to/my/file.jar ... # ... should be replaced with the actual parameters to run the tool


We have made it easier to construct commands for each Java-based program that has a module on O2. When this type of module is loaded, an environment variable containing the directory of the JAR is set. You can run module help on a specific module to learn the appropriate usage. Here is an example demonstrated using the trimmomatic/0.36 module:

# learn how to load trimmomatic/0.36 # as well as what the variable containing the directory of the JAR file is named mfk8@compute-a-01-01:~$ module spider trimmomatic/0.36 ---------------------------------------------------------------------------------- trimmomatic: trimmomatic/0.36 ---------------------------------------------------------------------------------- Description: Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data. This module can be loaded directly: module load trimmomatic/0.36 Help: For detailed instructions, go to: http://www.usadellab.org/cms/?page=trimmomatic To use, type java -jar $TRIMMOMATIC/trimmomatic-0.36.jar [options] # load module mfk8@compute-a-01-01:~$ module load trimmomatic/0.36 # run trimmomatic mfk8@compute-a-01-01:~$ java -jar $TRIMMOMATIC/trimmomatic-0.36.jar ... # ... should be replaced with the actual parameters to run the tool

Specifying memory

To constrain the memory usage of your Java program, add -Xmx<size> where <size> is replaced with an amount of memory. This parameter will set the maximum heap size of the Java virtual machine. It is advantageous to add -Xmx to ensure your program will not exceed the memory you requested for your job from the scheduler. For example, to set the heap size to 5G:

Commonly encountered problems

GUIs, like IGV, lag over X11

If you run a Java program with a graphical user interface (GUI) on O2 but display the graphics on your local computer using X11, you will find that program is slow to respond, or lags. This behavior is expected, as HPC clusters are not designed for this use case. An example of this scenario is if you have BAM files on O2 that you would like to visualize with IGV without moving the alignment files off of the cluster. Unfortunately, you will not the see same responsiveness from a graphical program like IGV running on O2 and displayed on your local computer using X11, as you would with running the program locally. Preferable solutions include: (1) downloading files from the cluster to open in IGV on a local computer, or (2) use the /n/groups/genomebrower_uploads storage space, which will create a URL for your file than can be used on your local computer. Access to /n/groups/genomebrower_uploads is granted by request-only, so write to rchelp@hms.harvard.edu if you would like to use this space. 

Overefficient jobs necessitate adjusting garbage collection settings

Sometimes, Java applications can be overefficient, and try to use more cores than were requested through the scheduler. We have now set by default the garbage collection to  -XX:+UseSerialGC in order to prevent the overefficiency problem, this would cause a conflict error if a user explicitly requests a parallel garbage collector.