Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Many bioinformatics workflows run the same command on multiple files. When the file sizes are small, the command could only take a few seconds to finish.If you submit the processing of each file as a job often causes the job scheduler to complain of short-running jobs.

...

From Windows, use MobaXterm or PuTTY to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.

From a Mac Terminal, use the ssh command, inserting your eCommons HMS ID instead of user123:

Code Block
ssh user123@o2.hms.harvard.edu

...

Create a working directory on scratch3 scratch and change into the newly-created directory. For example, for user abc123, the working directory will be

Code Block
srun --pty -p interactive -t 0-12:0:0 --mem 2000M -n 1 /bin/bash
mkdir /n/scratch3scratch/users/a/abc123/testBatchJob  
cd /n/scratch3scratch/users/a/abc123/testBatchJob

...

Code Block
# We want to convert each of them into fasta format, the command is available in module fastx
# you can run the command on the files one by one as shown below:
module load fastx/0.0.13

for fq in *fq; do
    echo submitting job for $fq
    sbatch -p short -t 0-0:10:0 --mem 20M --mail-type=END --wrap "fastq_to_fasta  -Q33 -i $fq -o ${fq%.fq}.fa"
done


# you should see output:
submitting job for t1_s1_1.fq
Submitted batch job 34710674

submitting job for t1_s1_2.fq
Submitted batch job 34710675

submitting job for t1_s2_1.fq
Submitted batch job 34710676

submitting job for t1_s2_2.fq
Submitted batch job 34710677

# After a few minutes, if you check the job reports, the jobs only ran a few seconds to finish.
O2sacctO2_jobs_report -j 34710674

# Output
       JobID  Partition          State               NodeList                Start      Timelimit        Elapsed    CPUTime   TotalCPU                 AllocTRES     MaxRSS 

------------ ---------- -------------- ---------------------- -------------------- -------------- -------------- ---------- ---------- ------------------------- ---------- 

34710674                     COMPLETED       compute-a-16-166  2019-02-22T07:36:51                      00:00:00   00:00:00  00:02.829                                      

34710674.ba+                 COMPLETED       compute-a-16-166  2019-02-22T07:36:44                      00:00:07   00:00:07  00:02.829    cpu=1,mem=0.02G,node=1      0.00G 

# The jobs only run a few seconds. It is not efficient for the scheduler, so it is better to run all of them 4 files in same job. 


# Or if you prefer to use Slurm script to submit jobs: 
module load fastx/0.0.13
for fq in *fq; do
    echo submitting job for $fq
    sbatch job.sh $fq 
done

# In job.sh: 
#!/bin/bash
#SBATCH -p short
#SBATCH -t 0-00:10:00
#SBATCH --mem=20M
#SBATCH --mail-type=END

fastq_to_fasta  -Q33 -i $1 -o ${1%.fq}.fa

...