A brief description of the procedure for clinical tumor NGS testing

Step	Description	Tools and database	Output
Base calling and duplicate removal	Base calling and duplicate removal, also known as initial analysis	Sequencing platform configuration software	FASTQ format
Primer removal	Primer sequences for amplicon sequencing must be removed from the reads	CutAdapt, BWA, etc.	FASTQ or BAM format
Adaptor removal	Remove the adaptor sequences from the end of reads. It may interfere with the alignment and cause false-positive/false-negative variant calling if not being trimmed	CutAdapt, BWA, Trimmomatic, SeqPrep, etc.	FASTQ or BAM format
Low-quality base removal	Low-quality bases may also interfere with the alignment and cause false results. These bases should usually be trimmed from the ends of read	CutAdapt, BWA, Trimmomatic, SeqPrep, etc.	FASTQ or BAM format
Alignment	In the alignment step, paired-/single-end reads are aligned to the reference genome. SNVs and small indels could be recognized in this step	BWA, Novalign, Stampy , SOAP2, LifeScope, Bowtie, etc.	BAM format
Duplicate removal (optional)	Duplicates can be introduced by PCR amplifications in the library construction and sequencing steps. Implausible duplicates in the original DNA decrease the accuracy of the calling and should be removed. Probe hybridization capture sequencing generates fewer duplicates, because DNA is randomly fragmented during library construction. Amplicon sequencing does not require deduplication if there are no allele barcodes, and requires if there are	Picard Mark Duplicates, SAMtools, etc.	BAM format
Indel realignment (optional)	Misalignment is usually seen around indels which can cause false results, especially at the beginning or end of the reads. Local realignment method can determine these locations, minimize this error, and increase accuracy	GATK RealignerTargetCreator and IndelRealigner, SRMA, etc.	BAM format
Base quality score recalibration (optional)	The base quality score could be recalibrated after the alignment/realignment to decrease the false-positive rate	GATK BaseRecalibrator and PrintReads, ReQON, etc.	BAM format
Variant calling	Variant calling refers to the detection and description of variations (including SNVs and small indels) based on differences between sequencing data and reference genomes	GATK UnifiedGenotyper, GATK HaplotypeCaller, SAMtools, MuTect, Varscan, Platypus, etc.	VCF format
Annotation	The variant interpretation relies on detailed annotation. The basic annotation includes gene name, gene structure areas (exon, splicing region, intron, intragenic region, etc.), and coding information. SNP information, pathogenicity, and other references could also be included	ANNOVAR, SnpEff, , Cartagenia Bench Lab NGS, dbSNP, 1000 Genomes, ESP6500, SIFT, PhyloP, MutationTaster, COSMIC, OMIM, ClinVar, HGMD, etc.	CSV, TSV, TXT, Excel, etc.
Filtering	Disease related variants could be identified by strict filtering large amount of annotated variant calling results. Typical filtering criteria removes low-quality variants, non-coding regions (eg, intron and intragenic region), synonymous SNVs, and known low-frequency SNPs in healthy populations. Labs should set up an internal database to analyze the false positives that often occur on their own platforms and perform rigorous filtering of these false positives	Cartagenia Bench Lab NGS, SnpSift, etc.	CSV, TSV, TXT, Excel, database, etc.