File Transfer

 

NOTE: Do not transfer files when connected to the HMS VPN. The HMS VPN system is designed for secure remote access and management.  With increased remote work as part of COVID-19 response, the HMS VPN is getting much higher use.  It cannot, and was not designed, to support large data transfers. Doing so hinders access to the HMS network for the HMS Community and will cause us to terminate your data transfer.

NOTE: For Harvard's Accellion Kiteworks file transfer appliance, allowing you to "mail" large attachments from your desktop, please see more details at: https://it.hms.harvard.edu/kiteworks

NOTE: For information on /n/files (aka research.files.med.harvard.edu), see the bottom of this page.

NOTE: For guidelines on transferring data from O2 to NCBI's FTP for GEO submission, please reference this wiki page.

NOTE: When using scp/sftp to ssh into a transfer server outside of the HMS Network, DUO will simply hang if you do not have a default Duo Method setup. Reference this page for instructions on setting up a default Duo authentication method.


Tools For Copying



There are a number of secure ways to copy files to and from O2. The tools listed below encrypt your data and login credentials during the transfer over the internet. Be aware of which file systems you want to copy from and to. You might be copying from your laptop or desktop hard drive, or from some other site on the Internet.

Graphical tools

  • FileZilla - a Mac/Linux/Windows standalone sftp tool, with available Firefox browser plugin

  • winscp - a Windows scp/sftp app

How To Copy Data to O2

Connection parameters:

  • host: transfer.rc.hms.harvard.edu

  • port: 22  (the SFTP port) 

  • username: your HMS ID (formerly known as eCommons ID), the ID you use to login to O2, in lowercase, e.g., ab123 (not your Harvard ID or Harvard Key) 

  • password: your HMS ID password, the password you use when logging in to O2

Command line tools available on the O2 File Transfer Servers

  • scp, sftp, rsync - these are automatically installed on Mac and Linux

  • pscp, psftp - Windows-only. These can be installed with the PuTTY ssh program.

  • ftp - available on O2 for downloading from external sites which only accept FTP logins. But, O2 does not accept incoming FTP logins.

  • asperaa data transport and streaming technology, now owned by IBM.

  • awscli - Amazon AWS command line interface

  • basemount - an Illumina tool to mount BaseSpace Sequence Hub data.

  • bbcp - a point-to-point network file copy application from NERSC

  • lftp - can transfer files via FTPFTPSHTTPHTTPSFISHSFTPBitTorrent, and FTP over HTTP proxy.

  • gcloud - Google Cloud command line interface, including the gsutil command

  • rclone - rsync for cloud storage



For graphical tools, see the documentation that came with the program. Also, see our instructions on how to use these tools with two-factor auth. Many tools will by default copy somewhere in your /home directory, which has a small 100GB storage quota. Make sure to explicitly specify whether you want to copy there or to a different location like: /n/scratch3/users/m/mfk8/



If you just have a single file to copy and you're on a Mac, you can also run a command like the following from the Terminal application:

1 me@mydesktop:~$ scp myfile my_o2_id@transfer.rc.hms.harvard.edu:/n/scratch3/users/m/mfk8/

By default, scp will copy to/from your home directory on the remote computer. You need to give the full path, starting with a /, in order to copy to other filesystems.

Transfers on the O2 File Transfer Servers

You can connect to the transfer nodes using ssh at the hostname: transfer.rc.hms.harvard.edu . If you're on Linux or Mac, you can use the native terminal application to connect to the transfer nodes. If you're on Windows, you will need to install a program to connect to the transfer servers; we recommend MobaXterm. In either terminal or MobaXterm, type the following command:

1 ssh yourecommons@transfer.rc.hms.harvard.edu

where you substitute yourecommons for your actual eCommons ID in lowercase. Once you authenticate, you'll be on one of the transfer servers. From here, you can enter commands like scp, sftp, rsync,etc. See the above section on command line tools available on the O2 File Transfer Servers for more details. You can run transfer commands directly on the transfer servers after logging in. We do not have a job scheduler running on the transfer cluster, so you do not need to submit an sbatch job or request an interactive session with srun to run such transfer processes. The transfer servers do not have any research applications (modules) are not available, as well.

If you have a large amount of data to transfer, please keep in mind that your session must stay active for the transfer to complete. For example, if your computer disconnects from your wifi network, the transfer will abort. You can prevent this from happening by modifying your transfer command to "ignore hang ups" with the nohup command like so:


1 nohup your_transfer_command &



The nohup command says to avoid cancelling a running command, even if the user disconnects from the session. The & means to run the process in the background. Any text which would normally be printed to screen will be put into a file named nohup.out. You obviously would substitute your_transfer_command with your actual scp or rsync or rclone, etc. command.

Interactive command line copying on O2

File transfer processes are too resource intensive to run on the O2 login servers, but you can run these interactively from a compute node as you would any other application. Launch an interactive session with the following srun command, and then run your commands once logged into a compute node:

1 2 mfk8@login02:~$ srun --pty -p interactive -t 0-12:00 /bin/bash mfk8@compute-a-01-1:~$



Batch Copying on O2

Experienced users can set up batch copies using rsync or recursive cp. Please do not run large transfers on the O2 login nodes (login0X). They will be slow and subject to suspension, as they are competing with dozens of simultaneous logins and programs.

If you want to copy a large set of files, it may be best to submit as a job to O2. For example:

1 mfk8@login02:~$ sbatch -p short -t 0-12:00 --wrap="rsync mydir1/ mydir2/"

This will run in the short partition like any other job.

The main advantage of batch copying is that you can make it part of a workflow. For example, you can use dependencies to run your analysis only when the job copying input files to O2 has finished. For example:

1 mfk8@login02:~$ sbatch --dependency=afterok:<jobid>

Very big copies

Contact Research Computing if you want to copy multiple tebibytes (a tebibyte is 1.0995 terabytes). We may be able to speed up the process.

Special considerations for the '/n/files' filesystem, aka research.files.med.harvard.edu

The O2 login nodes and most compute nodes do not currently mount /n/files. There are 2 ways to access this filesystem from O2:

  1. Use O2's dedicated file transfer servers

    1. SSH login to the hostname: transfer.rc.hms.harvard.edu . You will be connected to a system which has access to /n/files .

    2. Once logged in, just run your commands (e.g. rsync, scp, cp) normally without using sbatch.

    3. Transfer servers can not submit jobs to the cluster, and research applications (modules) are not available from those systems.

  2. If you have a batch job workflow that must use /n/files , you can request access to be able to use the "transfer" job partition. This partition has access to a few lower performance compute nodes which mount /n/files . They are only recommended when using the transfer servers is not an option, as these nodes are slower and generally less available.

Using the transfer job partition

Please note that we have restricted use of the `transfer` partition, to ensure that only those who need to access /n/files on O2 will run jobs in this partition. You can contact us to request access to the transfer partition. Here are examples of jobs using this partition:

1 mfk8@login02:~$ sbatch -p transfer -t 0-12:00 --wrap="rsync /n/files/directory ."



1 2 mfk8@login02:~$ srun --pty -p transfer -t 0-12:00 /bin/bash mfk8@compute-a-01-1:~$ ls -l /n/files

Special considerations for the '/n/standby' filesystem, aka Standby

The O2 login nodes and compute nodes do not currently mount /n/standby. To access this filesystem from O2:

  • Use O2's dedicated file transfer servers.

  • SSH login to the hostname: transfer.rc.hms.harvard.edu . You will be connected to a system which has access to /n/standby

  • Once logged in, just run your transfer commands (rsync, cp, or mv) normally without using sbatch.

  • Transfer servers can not submit jobs to the cluster, and research applications (modules) are not available from those systems.

Here are the commands you can run:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # If you are transferring large data set, you can start a screen session, so that you can go back to the session in case you lose connection: # For details: https://wiki.rc.hms.harvard.edu/pages/viewpage.action?pageId=20676715 # copy screen default setting file. Only need run this once. Does not hurt to run more than once though. mfk8@login02:~$ cp /n/shared_db/misc/rcbio/data/screenrc.template.txt ~/.screenrc # Start a new screen session. If you already have one screen session before, you also attach that one. See above link how to attach a screen session. mfk8@login02:~$ screen # login to transfer cluster: mfk8@login02:~$ ssh transfer mfk8@transfer01:~$ rsync -av --remove-source-files /n/groups/lab/tier2 /n/standby/hms/dept/lab/ # In case the transfer stops somehow, rerun the rsync command. rsync will resume from the breakpoint: mfk8@transfer01:~$ rsync -av --remove-source-files /n/groups/lab/tier2 /n/standby/hms/dept/lab/



For more information on the Standby Storage option, please reference the HMS Research Computing Storage page, or the dedicated Standby page.