Download data from GEO

Previously (a year ago), NCBI still offers ftp downloading (wget ftp path) or simple-clicking downloading data on GEO (from supplementary section). However, right now they only support batch command line downloading using their tool called sratools.

0.SRAtools Installation: ncbi-vdb -> ngs -> sratools. Detailed installation guidelines at here. On our server (ramiz01-05) or on CHTCcondor, we need to install it locally, therefore here are my commands:

mkdir ncbi
cd ncbi
git clone https://github.com/ncbi/sra-tools.git
git clone https://github.com/ncbi/ngs.git

git clone https://github.com/ncbi/ncbi-vdb.git

cd ncbi-vdb

./configure --prefix=path/to/ncbi/sra-tools

make
make install

cd ../ngs
./configure --prefix=path/to/ncbi/sra-tools

make
make install


## I didn't have any issue with ngs installation on our server ramiz01-05 but the above command fails on the HTCondor. According to the error, I configure the subunit tools manually and respectively:

make -C ngs-sdk
make -C ngs-java
make -C ngs-sdk install

make -C ngs-java install
## *******************************


cd ../sra-tools
./configure --prefix=path/to/ncbi/sra-tools

make
make install

Besides, on chtc condor, their gcc version is not high enough to configure sra-tools, therefore, I locally installed gcc with the latest version with the help from this blog.

1.Usage:

To download the sra file: prefetch SRR** To get the fastq file: fastq-dump [parameters] SRR** For example, in my case, I downloaded paired-ended Hi-C reads and save it in two fastq files (SRR5339829_1.fastq SRR5339829_2.fastq) each containing one end:

$sraDir/bin/fastq-dump -F --split-files -O $outPath SRR5339829

2.Warning message: It seems that sratools is still under development and also due to the network issue you may get some warning message like

2018-07-22T20:47:35 fastq-dump.2.9.1 sys: timeout exhausted while reading the file within network system module - mbedtls_ssl_read returned -76 ( NET - Reading information from the socket failed )

Stay calm and ignore such warnings! :) Seriously! As long as in the end, you get a mini report like the following, the downloading procedure is successful:

Read 256975996 spots for SRR5339835
Written 256975996 spots for SRR5339835

3.Network failure on HTCondor: After successfully install the sra-tool locally, no matter which command I try, it will always give the same error information:

2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed )
2018-07-19T21:49:06 test-sra.2.9.1 sys: mbedtls_ssl_get_verify_result returned 0x8 (  !! The certificate is not correctly signed by the trusted CA  )
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '128.105.244.82' from '128.105.244.177'
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -31104 ( SSL - Processing of the ServerHello handshake message failed )
2018-07-19T21:49:06 test-sra.2.9.1 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '128.105.244.82' from '128.105.244.177'

It is due to the network proxy and the solution recommended by Christina from chtc condor is that:

Adding the following to a job’s executable (the shell script) fixed the sra download error:

unset http_proxy
unset HTTPS_PROXY
unset FTP_PROXY
export HOME=$_CONDOR_SCRATCH_DIR

License

Copyright 2017-present Ye Zheng.

Released under the MIT license.