Using Java on O2
As Java is a resource-intensive program, it should only be used on compute nodes. You can submit a job through the Slurm scheduler, load a Java module, and then run commands as needed. See this wiki page for general information on submitting interactive or batch jobs on O2.
Java versions
We have several versions of Java on O2, which are accessible through our module system. Use the module spider
command to search for available Java modules.
mfk8@login01:~$ srun --pty -p interactive -t 0-1 bash # request interactive job for 1 hour
mfk8@compute-a-01-01:~$ module spider java # search for all Java modules
mfk8@compute-a-01-01:~$ module spider java/jdk-1.8u112 # get more information on a specific Java module, including how to load it
mfk8@compute-a-01-01:~$ module load java/jdk-1.8u112 # load a Java module
# can run Java now
Basic usage
Specifying path to jar file
There are multiple programs on O2, such as Trimmomatic, Picard, or IGVTools, for which the underlying code is written in Java. When you load a module for such a program, it will also automatically co-load a corresponding Java module for you. Then, when you run the program, you need to specify the full path to the JAR (archive file format that Java uses to package code and files required for program execution) like so:
mfk8@compute-a-01-01:~$ java -jar /path/to/my/file.jar ...
# ... should be replaced with the actual parameters to run the tool
We have made it easier to construct commands for each Java-based program that has a module on O2. When this type of module is loaded, an environment variable containing the directory of the JAR is set. You can run module help
on a specific module to learn the appropriate usage. Here is an example demonstrated using the trimmomatic/0.36
module:
# learn how to load trimmomatic/0.36
# as well as what the variable containing the directory of the JAR file is named
mfk8@compute-a-01-01:~$ module spider trimmomatic/0.36
----------------------------------------------------------------------------------
trimmomatic: trimmomatic/0.36
----------------------------------------------------------------------------------
Description:
Trimmomatic performs a variety of useful trimming tasks for illumina
paired-end and single ended data.
This module can be loaded directly: module load trimmomatic/0.36
Help:
For detailed instructions, go to:
http://www.usadellab.org/cms/?page=trimmomatic
To use, type
java -jar $TRIMMOMATIC/trimmomatic-0.36.jar [options]
# load module
mfk8@compute-a-01-01:~$ module load trimmomatic/0.36
# run trimmomatic
mfk8@compute-a-01-01:~$ java -jar $TRIMMOMATIC/trimmomatic-0.36.jar ...
# ... should be replaced with the actual parameters to run the tool
Specifying memory
To constrain the memory usage of your Java program, add -Xmx<size>
where <size>
is replaced with an amount of memory. This parameter will set the maximum heap size of the Java virtual machine. It is advantageous to add -Xmx
to ensure your program will not exceed the memory you requested for your job from the scheduler. For example, to set the heap size to 5G:
Commonly encountered problems
GUIs, like IGV, lag over X11
If you run a Java program with a graphical user interface (GUI) on O2 but display the graphics on your local computer using X11, you will find that program is slow to respond, or lags. This behavior is expected, as HPC clusters are not designed for this use case. An example of this scenario is if you have BAM files on O2 that you would like to visualize with IGV without moving the alignment files off of the cluster. Unfortunately, you will not the see same responsiveness from a graphical program like IGV running on O2 and displayed on your local computer using X11, as you would with running the program locally. Preferable solutions include: (1) downloading files from the cluster to open in IGV on a local computer, or (2) use the /n/groups/genomebrower_uploads
storage space, which will create a URL for your file than can be used on your local computer. Access to /n/groups/genomebrower_uploads
is granted by request-only, so write to rchelp@hms.harvard.edu if you would like to use this space.
Overefficient jobs necessitate adjusting garbage collection settings
Sometimes, Java applications can be overefficient, and try to use more cores than were requested through the scheduler. We have now set by default the garbage collection to -XX:+UseSerialGC
in order to prevent the overefficiency problem, this would cause a conflict error if a user explicitly requests a parallel garbage collector.