Install and run HiC-Pro-2.10.0
Start an interactive job, with a walltime of 12 hours, 2000MB of memory. Load related modules:
1
2
3
srun --pty -p interactive -t 0-12:0:0 --mem 16G -c 2 /bin/bash
module purge
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
Create a work directory in home, and change into the newly-created directory:
1
2
mkdir -p /home/$USER/HiC-Pro
cd /home/$USER/HiC-Pro
The software needs a few R and python packages that we don't have in the R and python modules. Here are the commands I use to install them. You can install your own copy or feel free to use my installations.
If you do want to install it, please change ld32 to your username:
1
2
3
mkdir -p /home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 /home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib
R -e 'install.packages(c("ggplot2", "RColorBrewer"), repos="http://cran.us.r-project.org", lib="/home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1")'
pip install --install-option="--prefix=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib" bx-python==0.8.1 iced==0.5.1 pysam==0.14.1 pandas==0.23.1
Set up the path, so that the downstream command can use the packages:
1
2
export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
Download the software, unpack:
1
2
3
wget https://github.com/nservant/HiC-Pro/archive/v2.10.0.tar.gz
tar -xvzf v2.10.0.tar.gz
cd HiC-Pro-2.10.0
Modify the config file:
1
2
3
4
5
6
7
8
9
10
# open the config file using text editor 'nano'
nano config-install.txt
# copy the following to the file. Make sure to replace my username (ld32) with your username
PREFIX = /home/ld32/pub/HiC-Pro-2.10.0
BOWTIE2_PATH =
SAMTOOLS_PATH =
R_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
PYTHON_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7
CLUSTER_SYS = SLURM
Finally, install the software:
1
2
3
4
5
mkdir -p /home/$USER/pub/HiC-Pro-2.10.0 /home/$USER/bin
make configure
# disable a python package installation command, otherwise it give permission error
sed -i "s/\${PYTHON_PATH}\/python setup.py install;//g" Makefile
make CONFIG_SYS=config-install.txt install
Download some test data and run some test runs in serial mode (without submitting additional sbatch jobs)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# download and untar the data
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
tar -xvzf HiCPro_testdata.tar.gz
# modify the config file for the test run:
vi config_test_latest.txt
# modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes
BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes
# load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually)
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/
export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
# run the software in single machine mode
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test
# you should see the software run from step1 to the last step as mentioned on page: https://github.com/nservant/HiC-Pro
# for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/
Download some test data and run some test runs in parallel mode (submit additional sbatch jobs to run the software):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# download and untar the data
wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz
tar -xvzf HiCPro_testdata.tar.gz
# modify the config file for the test run:
nano config_test_latest.txt
# modify row 22 and set default partition to short
JOB_QUEUE = short
# modify row 23 and set email to your email address
JOB_MAIL = xxx@hms.harvard.edu
# modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes
BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes
# load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually)
module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9
export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/
export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/
export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1
# run the software in parallel mode using Slurm
HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test_para -p
# You should see:
--------------
Run HiC-Pro 2.10.0 parallel mode
The following command will launch the parallel workflow through 2 torque jobs:
sbatch HiCPro_step1_IMR90_split.sh
The following command will merge the processed data and run the remaining steps per sample:
sbatch HiCPro_step2_IMR90_split.sh
--------------
# Now you can submit the step1 jobs:
cd hicpro_latest_test_para
sbatch HiCPro_step1_IMR90_split.sh
# After step1 jobs finish (you should receive email), submit the next step:
sbatch HiCPro_step2_IMR90_split.sh
# Or if you don't want to wait, you can also submit two steps at the same time
# Notice the -d option, that is job dependency setting for step2 to wait for the first step:
cd hicpro_latest_test_para
jobid=`sbatch --parsable HiCPro_step1_IMR90_split.sh`
sbatch -d afterok:$jobid HiCPro_step2_IMR90_split.sh
# Or even better you can modify the slurm job running script to directly submit jobs for you:
vi /home/ld32/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/scripts/make_slurm_script.sh
# Comment out the user reminder message at row 90, 91, 130 and 131 like:
# echo "The following command will launch the parallel workflow through $count torque jobs:"
# echo sbatch ${torque_script}
# echo "The following command will merge the processed data and run the remaining steps per sample:"
# echo sbatch ${torque_script_s2}
# And at the bottom of the script, add these command to submit the jobs:
jobid=`sbatch --parsable $torque_script`
echo Submitted batch job $jobid
sbatch -d afterok:$jobid $torque_script_s2
echo Two jobs are all submitted. Please use sacct and your email to monitor them.
# for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/