Download data from GEO
Previously (a year ago), NCBI still offers ftp downloading (wget ftp path) or simple-clicking downloading data on GEO (from supplementary section). However, right now they only support batch command line downloading using their tool called sratools.
0.SRAtools Installation: ncbi-vdb -> ngs -> sratools. Detailed installation guidelines at here. On our server (ramiz01-05) or on CHTCcondor, we need to install it locally, therefore here are my commands:
mkdir ncbi
cd ncbi
git clone https://github.com/ncbi/sra-tools.git
git clone https://github.com/ncbi/ngs.git
git clone https://github.com/ncbi/ncbi-vdb.git
cd ncbi-vdb
./configure --prefix=path/to/ncbi/sra-tools
make
make install
cd ../ngs
./configure --prefix=path/to/ncbi/sra-tools
make
make install
## I didn't have any issue with ngs installation on our server ramiz01-05 but the above command fails on the HTCondor. According to the error, I configure the subunit tools manually and respectively:
make -C ngs-sdk
make -C ngs-java
make -C ngs-sdk install
make -C ngs-java install
## *******************************
cd ../sra-tools
./configure --prefix=path/to/ncbi/sra-tools
make
make install
Besides, on chtc condor, their gcc version is not high enough to configure sra-tools, therefore, I locally installed gcc with the latest version with the help from this blog.
1.Usage:
To download the sra file: prefetch SRR** To get the fastq file: fastq-dump [parameters] SRR** For example, in my case, I downloaded paired-ended Hi-C reads and save it in two fastq files (SRR5339829_1.fastq SRR5339829_2.fastq) each containing one end:
$sraDir/bin/fastq-dump -F --split-files -O $outPath SRR5339829
2.Warning message: It seems that sratools is still under development and also due to the network issue you may get some warning message like
2018-07-22T20:47:35 fastq-dump.2.9.1 sys: timeout exhausted while reading the file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed )
Stay calm and ignore such warnings! :) Seriously! As long as in the end, you get a mini report like the following, the downloading procedure is successful:
Read 256975996 spots for SRR5339835
Written 256975996 spots for SRR5339835
3.Network failure on HTCondor: After successfully install the sra-tool locally, no matter which command I try, it will always give the same error information:
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed )
2018-07-19T21:49:06 test-sra.2.9.1 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA )
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '128.105.244.82' from '128.105.244.177'
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -31104 ( SSL - Processing of the ServerHello handshake message failed )
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '128.105.244.82' from '128.105.244.177'
It is due to the network proxy and the solution recommended by Christina from chtc condor is that:
Adding the following to a job’s executable (the shell script) fixed the sra download error:
unset http_proxy
unset HTTPS_PROXY
unset FTP_PROXY
export HOME=$_CONDOR_SCRATCH_DIR
License
Copyright 2017-present Ye Zheng.
Released under the MIT license.