Pipeline
Here we present a pipeline in analysis of pool or individual sequencing data(pair end, Illumina) for your reference.
You may need to modify it for different situations, such as single end reads and different platforms, e.g. SOLiD.
Note that pooled targeted sequencing data usually have very high coverage. To speed up the analysis, you can
downsample the input bams/sams by PICARD before running SNVerPool.
1. Mapping
Build reference index: /path/to/bwa index -a bwtsw ref.fasta Align reads to reference: /path/to/bwa aln -I ref.fa pe_1.fq > pe_1.sai /path/to/bwa aln -I ref.fa pe_2.fq > pe_2.sai /path/to/bwa sampe ref.fa pe_1.sai pe_2.sai pe_1.fq pe_2.fq > pe.sam Filter and sort: /path/to/samtools view -Suh -F 12 -f 2 -q 20 pe.sam \ | /path/to/samtools sort - pe.sorted Notes: Here is just an example of applying filters, which is able to filter proper pairs and reads with mapping quality above 20 for downstream analysis. You may, of course, not set any filters here, since SNVer will also set the same criteria as default. Build index for bam: /path/to/samtools index pe.sorted.bam
2. Duplication Removal
/path/to/java -jar /path/to/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true \ I=pe.sorted.bam O=pe.sorted.dedup.bam M=pe.sorted.bam.metrics \ ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT
3. SNV Detection
a) For individual sequencing data /path/to/java -jar /path/to/SNVer-0.2.0/SNVerIndividual.jar \ -i pe.sorted.dedup.bam -o prefix_of_output -r ref.fasta -l target.bed b) For pooled Sequencing data /path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -c pool.info \ -i input_bam -o prefix_of_output -r ref.fasta -l target.bed or /path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -n 96 \ -i input_bam -o prefix_of_output -r ref.fasta -l target.bed
4. Annotation
/path/to/annovar/convert2annovar.pl -format vcf4 pe.vcf > input /path/to/annovar/summarize_annovar.pl --verdbsnp 132 --buildver hg19 \ --outfile sum input /path/to/humandb