Batch small jobs together as a big job

 

Many bioinformatics workflows run the same command on multiple files. When the file sizes are small, the command could only take a few seconds to finish.If you submit the processing of each file as a job often causes the job scheduler to complain of short-running jobs.

Here we will show you how to batch multiple small jobs into a large job with an example.

You can cut and paste the commands below onto the O2 command line to get the idea how it works.

Note: When copying/pasting commands, you can include any text starting with #. They will be ignored by Linux.

Log on to O2

If you need help connecting to O2, please review the How to login to O2 wiki page.

From Windows, use MobaXterm or PuTTY to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your HMS ID instead of user123:

ssh user123@o2.hms.harvard.edu

Start interactive job, and create working folder

Create a working directory on scratch and change into the newly-created directory. For example, for user abc123, the working directory will be

srun --pty -p interactive -t 0-12:0:0 --mem 2000M -n 1 /bin/bash mkdir /n/scratch/users/a/abc123/testBatchJob cd /n/scratch/users/a/abc123/testBatchJob

Copy some testing data to current folder

cp /n/groups/shared_databases/rcbio/rsem/two_group_input/group1/* .

Take a look at the files

Let us  work on them one by one: 

Batch them together:

Let us know if you have any question. Please include your working folder and commands used in your email. Any comment and suggestion are welcome!