|
Many bioinformatics workflows run the same command on multiple files. When the file sizes are small, the command could only take a few seconds to finish.If you submit the processing of each file as a job often causes the job scheduler to complain of short-running jobs.
...
From Windows, use MobaXterm or PuTTY to connect to o2.hms.harvard.edu and make sure the port is set to the default value of 22.
From a Mac Terminal, use the ssh
command, inserting your eCommons HMS ID instead of user123:
Code Block |
---|
ssh user123@o2.hms.harvard.edu |
...
Create a working directory on scratch3 scratch and change into the newly-created directory. For example, for user abc123, the working directory will be
Code Block |
---|
srun --pty -p interactive -t 0-12:0:0 --mem 2000M -n 1 /bin/bash mkdir /n/scratch3scratch/users/a/abc123/testBatchJob cd /n/scratch3scratch/users/a/abc123/testBatchJob |
...
Code Block |
---|
# We want to convert each of them into fasta format, the command is available in module fastx # you can run the command on the files one by one as shown below: module load fastx/0.0.13 for fq in *fq; do echo submitting job for $fq sbatch -p short -t 0-0:10:0 --mem 20M --mail-type=END --wrap "fastq_to_fasta -Q33 -i $fq -o ${fq%.fq}.fa" done # you should see output: submitting job for t1_s1_1.fq Submitted batch job 34710674 submitting job for t1_s1_2.fq Submitted batch job 34710675 submitting job for t1_s2_1.fq Submitted batch job 34710676 submitting job for t1_s2_2.fq Submitted batch job 34710677 # After a few minutes, if you check the job reports, the jobs only ran a few seconds to finish. O2sacctO2_jobs_report -j 34710674 # Output JobID Partition State NodeList Start Timelimit Elapsed CPUTime TotalCPU AllocTRES MaxRSS ------------ ---------- -------------- ---------------------- -------------------- -------------- -------------- ---------- ---------- ------------------------- ---------- 34710674 COMPLETED compute-a-16-166 2019-02-22T07:36:51 00:00:00 00:00:00 00:02.829 34710674.ba+ COMPLETED compute-a-16-166 2019-02-22T07:36:44 00:00:07 00:00:07 00:02.829 cpu=1,mem=0.02G,node=1 0.00G # The jobs only run a few seconds. It is not efficient for the scheduler, so it is better to run all of them 4 files in same job. # Or if you prefer to use Slurm script to submit jobs: module load fastx/0.0.13 for fq in *fq; do echo submitting job for $fq sbatch job.sh $fq done # In job.sh: #!/bin/bash #SBATCH -p short #SBATCH -t 0-00:10:00 #SBATCH --mem=20M #SBATCH --mail-type=END fastq_to_fasta -Q33 -i $1 -o ${1%.fq}.fa |
...