Extract paired-end reads from (NCBI) SRA files

- 1 min

SRA stores all the sequencing from GIO experiments in files in .sra format. These files are managed using the SRA Toolkit.

I recently download some .sra files from this GEO corresponding to paired-end sequencing data. My surprise when I run fastq-dump (from SRA toolkit) utility and I got only one file rather than two.

From the documentation of the tool, it seems that the option --split-files should be enough but not. We need to add the --split-3 option. If we run fastq-dump with this configuration in a single-end experiment a single .fastq files will be create, otherwise two files with suffixes _1 and _2 will be the matched paired read files (.fastq) while a posible third file (no sufix) will contain the non matched reads.

I currently run fastq-dump as:

fastq-dump --split-files --split-3 SRR1813404.sra -O SRR1813404
Carles Hernandez-Ferrer

Carles Hernandez-Ferrer

Bioinformatics, data analysis and software development

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora