#samtools
1 统计 参考序列的长度,比对上的reads数目
samtools idxstats in.sam >in.stat
cat in.stat
# chr1 16616 8633270 0
# chr2 275406953 3220386 0
# .....
samtools view -c in.sam #直接输出 number of alignment
#44385909
the output is TAB-delimited with each line consisting of reference sequence name, sequece length, #mapped reads and #unmapped reads. it is written to stdout.
in.stat 中第二列为每条染色体的长度,第三列为比对上的reads数,即alignment数目,因为某些reads会比对至参考序列中的多个位置,所以alignment数目比实际上的reads数目多。
如果用bowtie进行mapping,align_summary.txt中的比对记录与samtools idxstats的结果会不一致。
` samtools view -c in.bam`得到的reads数目同idxstats的结果一致,都是alignments 数目,比实际reads数目多(因为部分reads有多个alignments)
##2 samtools中的flag
|————|————-|
0x0001 |
p |
the read is paired in sequencing |
0x0002 |
P |
the read is mapped in a proper pair |
0x0004 |
u |
the query sequence itself is unmapped |
0x0008 |
U |
the mate is unmapped |
0x0010 |
R |
strand of the mate |
0x0040 |
1 |
the read is the first read in a pair |
0x0080 |
2 |
the read is the second read in a pair |
0x0100 |
s |
the alignment is not primary |
0x0200 |
f |
the read fails platform/vendor quality checks |
0x0400 |
d |
the read is either a PCR or an optical duplicate |
新的document
—|—|—
0x1 |
PAIRED |
paired-end (or multiple-segment) sequencing technology |
0x2 |
PROPER_PAIR |
each segment properly aligned according to the aligner |
0x4 |
UNMAP |
segment unmapped |
0x8 |
MUNMAP |
next segment in the template unmapped |
0x10 |
REVERSE |
SEQ is reverse complemented |
0x20 |
MREVERSE |
SEQ of the next segment in the template is reverse complemented |
0x40 |
READ1 |
the first segment in the template |
0x80 |
READ2 |
the last segment in the template |
0x100 |
SECONDARY |
secondary alignment |
0x200 |
QCFAIL |
not passing quality controls |
0x400 |
DUP |
PCR or optical duplicate |
0x800 |
SUPPLEMENTARY |
supplementary alignment |
###2.1 samtools view -f 和 -F
samtools中的view工具使用-f INT来依照INT与标记致保留reads,使用-F INT来跳过reads。当然我们还可使用它们来过滤其它的信息。比如我们想知道有多少paired end reads有mate并且都有map时,可以使用-f 1 -F 12来过滤。
samtools view -c -f 1 -F 12 test.bam
其中-f 1指定只包含那些paired end reads,-F 12是不包含那些unmapped(flag 0x0004)以及mate是unmapped(flag 0x0008)。0x0004 + 0x0008 = 12.
###2.2 samtools flagstat
samtools flagstat test.bam
44385909 + 0 in total (QC-passed reads + QC-failed reads) #总共的alignment read数(QC-passed + QC-failed)
3877817 + 0 secondary #
0 + 0 supplementary
0 + 0 duplicates
44385909 + 0 mapped (100.00%:nan%) #
40508090 + 0 paired in sequencing
20371796 + 0 read1
20136294 + 0 read2
36191000 + 0 properly paired (89.34%:nan%)
38130076 + 0 with itself and mate mapped
2378014 + 0 singletons (5.87%:nan%)
1057880 + 0 with mate mapped to a different chr
146040 + 0 with mate mapped to a different chr (mapQ>=5)
calculate depth
samtools depth -a *bamfile* | awk '{sum+=$3} END { print "Average = ",sum/NR}'
#vcftools、bcftools
samtools mpileup -g -f genomes/NC_008253.fna alignments/sim_reads_aligned.sorted.bam > variants/sim_variants.bcf
bcftools call -c -v variants/sim_variants.bcf > variants/sim_variants.vcf