#samtools

  • 计算覆盖度

  • 计算比对率

1 统计 参考序列的长度,比对上的reads数目


samtools idxstats in.sam >in.stat

cat in.stat

#    chr1 16616 8633270 0

#    chr2 275406953 3220386 0

#    .....

samtools view -c in.sam #直接输出 number of alignment 

    #44385909

the output is TAB-delimited with each line consisting of reference sequence name, sequece length, #mapped reads and #unmapped reads. it is written to stdout.

in.stat 中第二列为每条染色体的长度,第三列为比对上的reads数,即alignment数目,因为某些reads会比对至参考序列中的多个位置,所以alignment数目比实际上的reads数目多。

如果用bowtie进行mapping,align_summary.txt中的比对记录与samtools idxstats的结果会不一致。

` samtools view -c in.bam`得到的reads数目同idxstats的结果一致,都是alignments 数目,比实际reads数目多(因为部分reads有多个alignments)

##2 samtools中的flag

Flag Chr Description

|————|————-|

0x0001 p the read is paired in sequencing
0x0002 P the read is mapped in a proper pair
0x0004 u the query sequence itself is unmapped
0x0008 U the mate is unmapped
0x0010 R strand of the mate
0x0040 1 the read is the first read in a pair
0x0080 2 the read is the second read in a pair
0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate

新的document

flag CHR Description

—|—|—

0x1 PAIRED paired-end (or multiple-segment) sequencing technology
0x2 PROPER_PAIR each segment properly aligned according to the aligner
0x4 UNMAP segment unmapped
0x8 MUNMAP next segment in the template unmapped
0x10 REVERSE SEQ is reverse complemented
0x20 MREVERSE SEQ of the next segment in the template is reverse complemented
0x40 READ1 the first segment in the template
0x80 READ2 the last segment in the template
0x100 SECONDARY secondary alignment
0x200 QCFAIL not passing quality controls
0x400 DUP PCR or optical duplicate
0x800 SUPPLEMENTARY supplementary alignment

###2.1 samtools view -f 和 -F

samtools中的view工具使用-f INT来依照INT与标记致保留reads,使用-F INT来跳过reads。当然我们还可使用它们来过滤其它的信息。比如我们想知道有多少paired end reads有mate并且都有map时,可以使用-f 1 -F 12来过滤。



samtools view -c -f 1 -F 12 test.bam

其中-f 1指定只包含那些paired end reads,-F 12是不包含那些unmapped(flag 0x0004)以及mate是unmapped(flag 0x0008)。0x0004 + 0x0008 = 12.

###2.2 samtools flagstat


samtools flagstat test.bam

 44385909 + 0 in total (QC-passed reads + QC-failed reads) #总共的alignment read数(QC-passed + QC-failed)

 3877817 + 0 secondary #

 0 + 0 supplementary

 0 + 0 duplicates

 44385909 + 0 mapped (100.00%:nan%) #

 40508090 + 0 paired in sequencing

 20371796 + 0 read1

 20136294 + 0 read2

 36191000 + 0 properly paired (89.34%:nan%)

 38130076 + 0 with itself and mate mapped

 2378014 + 0 singletons (5.87%:nan%)

 1057880 + 0 with mate mapped to a different chr

 146040 + 0 with mate mapped to a different chr (mapQ>=5)

calculate depth



samtools depth -a *bamfile*  |  awk '{sum+=$3} END { print "Average = ",sum/NR}'


#vcftools、bcftools

  • 提取高质量位点

samtools mpileup -g -f genomes/NC_008253.fna alignments/sim_reads_aligned.sorted.bam > variants/sim_variants.bcf
bcftools call -c -v variants/sim_variants.bcf > variants/sim_variants.vcf