SNVer

Pipeline

Here we present a pipeline in analysis of pool or individual sequencing data(pair end, Illumina) for your reference. You may need to modify it for different situations, such as single end reads and different platforms, e.g. SOLiD. Note that pooled targeted sequencing data usually have very high coverage. To speed up the analysis, you can downsample the input bams/sams by PICARD before running SNVerPool.

1. Mapping

Build reference index:
/path/to/bwa index -a bwtsw ref.fasta
	
Align reads to reference:
/path/to/bwa aln -I ref.fa pe_1.fq > pe_1.sai
/path/to/bwa aln -I ref.fa pe_2.fq > pe_2.sai
		
/path/to/bwa sampe ref.fa pe_1.sai pe_2.sai pe_1.fq pe_2.fq > pe.sam
	
Filter and sort:
/path/to/samtools view -Suh -F 12 -f 2 -q 20 pe.sam \
| /path/to/samtools sort - pe.sorted

Notes: Here is just an example of applying filters, which is able to filter 
proper pairs and reads with mapping quality above 20 for downstream analysis. 
You may, of course, not set any filters here, since SNVer will also set the 
same criteria as default. 
	
Build index for bam:
/path/to/samtools index pe.sorted.bam

2. Duplication Removal

/path/to/java -jar /path/to/picard/MarkDuplicates.jar REMOVE_DUPLICATES=true \
I=pe.sorted.bam O=pe.sorted.dedup.bam M=pe.sorted.bam.metrics \
ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT

3. SNV Detection

a) For individual sequencing data

/path/to/java -jar /path/to/SNVer-0.2.0/SNVerIndividual.jar \
-i pe.sorted.dedup.bam -o prefix_of_output -r ref.fasta -l target.bed

b) For pooled Sequencing data

/path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -c pool.info \
-i input_bam -o prefix_of_output -r ref.fasta -l target.bed
or
/path/to/java -jar /path/to/SNVer-0.2.0/SNVerPool.jar -n 96 \
-i input_bam -o prefix_of_output -r ref.fasta -l target.bed

4. Annotation

/path/to/annovar/convert2annovar.pl -format vcf4 pe.vcf > input

/path/to/annovar/summarize_annovar.pl --verdbsnp 132 --buildver hg19 \
--outfile sum input /path/to/humandb

SNVer

Rare and Common Variants Detection in Next Generation Sequencing

Pipeline

1. Mapping

2. Duplication Removal

3. SNV Detection

4. Annotation

Project

Command-line

Tools

Links