Converting BAM files to fastq
There are a lot of resources on the internet that describe how to convert bam files to fastq files for realignment, but many of them are using different methods and different tools. I have tested and evaluated many of these conversion tools and here are my findings.
Navigation
Why is it important to choose the right tool?
To preface, converting a .bam
file to a .fastq
file may seem to be a trivial task since they both contain the same information, but the .bam
file is aligned to a reference genome rather than just scattered reads. It turns out there are several edge cases that as a whole do not happen very often, but on a NGS scale can become abundant due to the amount of reads.
Therefore, although most tools that convert from bam to fastq are able to do what they do, there are slight differences in how these tools handle these edge cases that make some better than others.
Which is the best tool?
The tools I tested are samtools 1.9’s fastq
, Picard 2.10.9’s SamToFastq
, and biobambam 2.0.106. In my testing, I have found that samtools delivers the best results with the least amount of system resources and time.
Samtools bam to fastq workflow
Now that we have chosen which tool to use, we need to know how to properly use it. Samtools thankfully has very robust documentation and support, which makes it one of the easier packages to work with.
Converting the entire BAM file to fastq for realignment to a new reference genome
If you have an old bam file that was aligned to an older build of the reference genome, you may want to realign the reads to an updated reference. In the case of the human genome, this is a common problem that people have right now since chromosomal coordinates are completely different between the old GRCh37 and the new GRCh38 reference assemblies. This makes analysis much more difficult since, for example, the location of genes in GRCh37 are different in GRCh38.
To realign the bam file, you will need to convert to fastq and then run alignment again.
Step 1. Sorting by read name
This step is critical since the resulting paired-end fastq files need to be in pairs. You sort the bam file like this:
samtools sort -n -@ $(nproc) -o ${sorted_bam} ${original_bam}
-n Sort by read names (i.e., the QNAME field) rather than by chromosomal coordinates. -@ Set number of sorting and compression threads. By default, operation is single-threaded. $(nproc) will give the number of actual processors you have. -o Write the final sorted output to FILE, rather than to standard output.
Step 2. Converting to fastq
Now that the bam file is sorted, we can now convert it to a fastq.
samtools fastq -@ $(nproc) -1 ${fastq1} -2 ${fastq2} -s ${singleton} -0 ${ambiguous} ${sorted}
-1 Write reads with the READ1 FLAG set (and READ2 not set) to FILE instead of outputting them. If the -s option is used, only paired reads will be written to this file. -2 Write reads with the READ2 FLAG set (and READ1 not set) to FILE instead of outputting them. If the -s option is used, only paired reads will be written to this file. -s Write singleton reads to FILE. -0 Write reads where the READ1 and READ2 FLAG bits set are either both set or both unset to FILE instead of outputting them.
Step 3. Align to new reference genome
Now that you have fresh fastq files, you are ready for alignment. There are many different alignment algorithms and programs designed for different purposes, so I will not cover that here.’
Converting a region of the bam file to fastq
If you have a bam file where you only want reads from specific regions, you can do that with samtools view
. Note: run this command before you sort the bam file by read name otherwise you will have trouble finding the region.
samtools view -h -b ${original_bam} chr2:100-1000 > ${region_bam}
-h Include the header in the output. -b Output in the BAM format.
DanielprivY
http://interpharm.pro/# best online pharmacy without prescriptions
indian pharmacy no prescription – internationalpharmacy.icu Their mobile app makes managing my medications so easy.