SNVer is a statistical tool for calling common and rare variants in analysis of pool or individual next-generation sequencing data. It reports one single overall p-value for evaluating the significance of a candidate locus being a variant, based on which multiplicity control can be obtained. Loci with any (low) coverage can be tested and depth of coverage will be quantitatively factored into final significance calculation. SNVer runs very fast, making it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data.

Pipeline

Here we present a pipeline in analysis of pool or individual sequencing data(pair end, Illumina) for your reference. You may need to modify it for different situations, such as single end reads and different platforms, e.g. SOLiD. Note that pooled targeted sequencing data usually have very high coverage. To speed up the analysis, you can downsample the input bams/sams by PICARD before running SNVerPool.

1. Mapping


Build reference index:
/path/to/bwa index -a bwtsw ref.fasta
	
Align reads to reference:
/path/to/bwa aln -I ref.fa pe_1.fq > pe_1.sai
/path/to/bwa aln -I ref.fa pe_2.fq > pe_2.sai
		
/path/to/bwa sampe ref.fa pe_1.sai pe_2.sai pe_1.fq pe_2.fq > pe.sam
	
Filter and sort:
/path/to/samtools view -Suh -F 12 -f 2 -q 20 pe.sam \
| /path/to/samtools sort - pe.sorted

Notes: Here is just an example of applying filters, which is able to filter 
proper pairs and reads with mapping quality above 20 for downstream analysis. 
You may, of course, not set any filters here, since SNVer will also set the 
same criteria as default. 
	
Build index for bam:
/path/to/samtools index pe.sorted.bam

2. Duplication Removal


/path/to/java -jar /path/to/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true \
I=pe.sorted.bam O=pe.sorted.dedup.bam M=pe.sorted.bam.metrics \
ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT 

3. SNV Detection


a) For individual sequencing data

/path/to/java -jar /path/to/SNVer-0.2.0/SNVerIndividual.jar \
-i pe.sorted.dedup.bam -o prefix_of_output -r ref.fasta -l target.bed

b) For pooled Sequencing data

/path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -c pool.info \
-i input_bam -o prefix_of_output -r ref.fasta -l target.bed
or
/path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -n 96 \
-i input_bam -o prefix_of_output -r ref.fasta -l target.bed

4. Annotation


/path/to/annovar/convert2annovar.pl -format vcf4 pe.vcf > input

/path/to/annovar/summarize_annovar.pl --verdbsnp 132 --buildver hg19 \
--outfile sum input /path/to/humandb