Often, researchers will need to submit their experimental data to the NCBI database Gene Expression Omnibus (GEO) prior to a publication or at the completion of a study. GEO accepts data from such methods as ChIP-seq, RNA-seq, and bisulfite sequencing, among many other types.
This page is a guide to transferring your data, that is stored on the O2 cluster, to NCBI's FTP server. Please note that this page is not a replacement for the submission instructions provided by GEO; it is a supplement that clarifies how to perform the transfer process using HMS RC resources.
Here is a description of the transfer and submission preparation process:
- Ensure your experiment and data type is accepted by the GEO database. See here and here for more details.
- Create an NCBI account if you do not already have one.
- Collect the files necessary for the GEO submission process - metadata spreadsheet, raw data, and processed data files. These files should all be on the O2 cluster.
- To aid in transfer times, you can compress your raw data files with either bzip2 or gzip.
- You should include md5 checksums for your raw data files in your metadata spreadsheet; these will be used to identify if any files were corrupted or transferred incompletely. You can calculate these checksums with the
md5sum command on the O2 cluster.
- Create a folder named with your NCBI username, and move all of the files you want to submit there.
- If you will transfer 1TB or more of data, contact GEO before you do so. See here.
- Transfer the data to GEO using the transfer cluster:
- Your username and password for the transfer cluster are the same credentials you use to log in to O2 (eCommons ID and password). Your username must be in lowercase.
- You can use
lftp to connect to the NCBI FTP server. To obtain the appropriate
lftp command, log into GEO and navigate to this page. The command will be listed under "Uploading your submission" > "FTP instructions" > "Linux/Unix" > "Here is a typical 'lftp' session". The command will be in the form of:
- Once you're connected to the server, you can transfer your data like so:
mirror -R NCBIaccount_directory
- Send an email to NCBI when your transfer has completed. More information can be found here.
If you have any difficulty following these instructions, feel free to contact us!