Install and run HiC-Pro-2.10.0

Quick start: if you don't want to install the software yourself, you can directly use my installation like bellow:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 # start an interactive job with 2 CPU and 2000M memory srun --pty -p interactive -t 0-12:0:0 --mem 2000MB -c 2 /bin/bash # load related module module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9 # make a testing folder and go to it mkdir /n/scratch3/users/${USER:0:1}/$USER/HiC-Pro-test && cd /n/scratch3/users/${USER:0:1}/$USER/HiC-Pro-test # download and untar the data wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz tar -xvzf HiCPro_testdata.tar.gz # modify the config file for the test run: nano config_test_latest.txt # modify row 22 and set default partition to short JOB_QUEUE = short # modify row 23 and set email to your email address JOB_MAIL = xxx@hms.harvard.edu # modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes # load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually) module purge module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9 export PATH=$PATH:/home/ld32/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/ export PYTHONPATH=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/ export R_LIBS_USER=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 # run the software in parallel mode using Slurm HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test_para -p # You should see: Run HiC-Pro 2.10.0 parallel mode Submitted batch job 17841812 Submitted batch job 17841813 Two jobs are all submitted. Please use sacct and your email to monitor them. # Or if you want to run the workflow in serial mode: HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test

### Stop here if you don't install the software yourself.

-----------------------------------------------------------------------------------------------------------------------------------------------------------

If you want to install the software on your own, please read on:

Start an interactive job, with a walltime of 12 hours, 2000MB of memory. Load related modules:

1 2 srun --pty -p interactive -t 0-12:0:0 --mem 16G -c 2 /bin/bash module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9

Create a work directory in  home, and change into the newly-created directory:

1 2 mkdir -p /home/$USER/HiC-Pro cd /home/$USER/HiC-Pro

The software needs a few R and python packages that we don't have in the R and python modules. Here are the commands I use to install them. You can install your own copy or feel free to use my installations. 

If you do want to install it, please change ld32 to your username:

1 2 3 mkdir -p /home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 /home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib R -e 'install.packages(c("ggplot2", "RColorBrewer"), repos="http://cran.us.r-project.org", lib="/home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1")' pip install --install-option="--prefix=/home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib" bx-python==0.8.1 iced==0.5.1 pysam==0.14.1 pandas==0.23.1

Set up the path, so that the downstream command can use the packages: 

1 2 export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/ export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1

Download the software, unpack:  

1 2 3 wget https://github.com/nservant/HiC-Pro/archive/v2.10.0.tar.gz tar -xvzf v2.10.0.tar.gz cd HiC-Pro-2.10.0

Modify the config file: 

1 2 3 4 5 6 7 8 9 10 # open the config file using text editor 'nano' nano config-install.txt # copy the following to the file. Make sure to replace my username (ld32) with your username PREFIX = /home/ld32/pub/HiC-Pro-2.10.0 BOWTIE2_PATH = SAMTOOLS_PATH = R_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 PYTHON_PATH = /home/ld32/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7 CLUSTER_SYS = SLURM

Finally, install the software: 

1 2 3 4 5 mkdir -p /home/$USER/pub/HiC-Pro-2.10.0 make configure # disable a python package installation command, otherwise it give permission error sed -i "s/\${PYTHON_PATH}\/python setup.py install;//g" Makefile make CONFIG_SYS=config-install.txt install

Download some test data and run some test runs in serial mode (without submitting additional sbatch jobs)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # download and untar the data wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz tar -xvzf HiCPro_testdata.tar.gz # modify the config file for the test run: vi config_test_latest.txt # modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes # load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually) module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9 export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/ export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/ export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 # run the software in single machine mode HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test # you should see the software run from step1 to the last step as mentioned on page: https://github.com/nservant/HiC-Pro # for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/

Download some test data and run some test runs in parallel mode (submit additional sbatch jobs to run the software):  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 # download and untar the data wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz tar -xvzf HiCPro_testdata.tar.gz # modify the config file for the test run: nano config_test_latest.txt # modify row 22 and set default partition to short JOB_QUEUE = short # modify row 23 and set email to your email address JOB_MAIL = xxx@hms.harvard.edu # modify row 39 and set the path for BOWTIE2_IDX_PATH to /n/groups/shared_databases/bowtie2_indexes BOWTIE2_IDX_PATH = /n/groups/shared_databases/bowtie2_indexes # load modules and set up path the the newly installed software (you can add them to your ~/.bashrc, so that you don't have to run them manually) module load gcc/6.2.0 python/2.7.12-ucs4 R/3.4.1 samtools/1.3.1 bowtie2/2.2.9 export PATH=$PATH:/home/$USER/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/bin/ export PYTHONPATH=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/pythonlib/lib/python2.7/site-packages/ export R_LIBS_USER=/home/$USER/pub/HiC-Pro-2.10.0/dependencies/rlib/3.4.1 # run the software in parallel mode using Slurm HiC-Pro -c config_test_latest.txt -i test_data -o hicpro_latest_test_para -p # You should see: -------------- Run HiC-Pro 2.10.0 parallel mode The following command will launch the parallel workflow through 2 torque jobs: sbatch HiCPro_step1_IMR90_split.sh The following command will merge the processed data and run the remaining steps per sample: sbatch HiCPro_step2_IMR90_split.sh -------------- # Now you can submit the step1 jobs: cd hicpro_latest_test_para sbatch HiCPro_step1_IMR90_split.sh # After step1 jobs finish (you should receive email), submit the next step: sbatch HiCPro_step2_IMR90_split.sh # Or if you don't want to wait, you can also submit two steps at the same time # Notice the -d option, that is job dependency setting for step2 to wait for the first step: cd hicpro_latest_test_para jobid=`sbatch --parsable HiCPro_step1_IMR90_split.sh` sbatch -d afterok:$jobid HiCPro_step2_IMR90_split.sh # Or even better you can modify the slurm job running script to directly submit jobs for you: vi /home/ld32/pub/HiC-Pro-2.10.0/HiC-Pro_2.10.0/scripts/make_slurm_script.sh # Comment out the user reminder message at row 90, 91, 130 and 131 like: # echo "The following command will launch the parallel workflow through $count torque jobs:" # echo sbatch ${torque_script} # echo "The following command will merge the processed data and run the remaining steps per sample:" # echo sbatch ${torque_script_s2} # And at the bottom of the script, add these command to submit the jobs: jobid=`sbatch --parsable $torque_script` echo Submitted batch job $jobid sbatch -d afterok:$jobid $torque_script_s2 echo Two jobs are all submitted. Please use sacct and your email to monitor them. # for your reference, here is the test data downloading page: https://zerkalo.curie.fr/partage/HiC-Pro/