WEScover


WEScover helps users to check whether genes of interest could be sufficiently covered in terms of breadth and depth by whole exome sequencing (WES). For each transcript, breadth of coverage data was calculated at 10x, 20x, and 30x read depth from the 1000 Genomes Project (1KGP) (N = 2,692). A user will be able to minimize the chance of false negatives by selecting a targeted gene panel test for the genes that WES cannot cover well.

Breadth and depth of coverage for NOTCH1 are illustrated below. For some of the exons, breadth of coverage seems to be sub-optimal that could result in false negative results with WES.

Coverage from gnomAD project for NOTCH1

WEScover provides detailed coverage information including difference in breadth of coverage between continent-level populatios.


Contintental population breath of coverage violin plot for CCDS43905.1/NOTCH1

Phenotype, genetic test names, or gene symbols can be used to retrieve coverage information in the query window. The output summary helps users to choose WES vs. targeted gene panel testing.

User input




1. Contact

General inquires may be addressed to:

Sek Won Kong

Specific questions regarding the use of our website, data generation and technical assistance, please send an email to:

In-Hee Lee

2. Introduction

WEScover provides an interface where users can search genes of interest according to phenotypes, targeted gene panel tests, or gene symbols to check for breadth of coverage across whole exome sequencing (WES) datasets. Breadth and depth of coverage data were collected from the 1000 Genomes Project (1KGP) using the GRCh38 reference human genome. Breadth of coverage refers to the proportion of gene that is covered at a per-site read depth (e.g., x10, x20 or x30), the average number of times a given region has been sequenced by independent reads, on a population scale. Users may check for genes related to phenotype of interest and determine whether they could be comprehensively covered by WES instead of targeted gene panel testing. Conversely, if candidate genes have a mean breadth of coverage lower than 95% in two population scale WES datasets then targeted gene panel testing should be considered to minimize potential false negatives. This user guide provides an overview of WEScover.


3. Using the Genetic Testing Registry to identify targeted gene panel tests and candidate genes.

As as example, we illustrate how the Genetic Testing Registry (GTR) can be used to identify list of genetic tests for for nonalcoholic fatty liver disease (NAFLD). First, the user must go to the GTR website and query for conditions and/or phenotypes of interest, in this case NAFLD as shown in the figure below.


This search will bring the user to its phenotype entry in GTR. The panels that may be used for this phenotype may be found by clicking the first link in the “Available tests” section. A table with a list of test names will be returned, and clicking on the Genes and analytes option will show genes included in these gene panel test (GPT).


PNPLA3 is the gene associated with NAFLD and five GTR registered genetic testings. Before deciding whether to recommend WES or genetic testing panels, users may check breadth of coverage of PNPLA3 and decide whether it is comprehensively covered by WES.


4. Using WEScover


4.1.User input

The User input panel, shown in the figure below, is set up such that queries to WEScover may be done by phenotype, gene panel tests (GPT), or gene symbol. All fields contain an autocomplete setting that will help the users navigate quickly and easily find their search terms of interest. Phenotypes, disease conditions that users may want to test for, may be typed into the Phenotypes field. Pressing the filter button next to Phenotypes will filter the database for all gene panels that test for the given phenotype, which may be seen by clicking GPT name. There is also a filter button next to this field which will return all genes that that are targeted by the queried tests when Gene symbol is clicked. Depth of coverage may be fixed at 10x, 20x, or 30x, where the default is set at 20x.


Following the example described previously, users may type "Fatty liver disease, nonalcoholic 1", as written in GTR, under Phenotype then click the Filter button to return all genes associated with this phenotype. When Gene symbol is clicked, only PNPLA3 will be listed. After all inputs are set, the Submit query button may be clicked.

4.2. Table

Breadth of coverage is reported for the exons of queried genes by their Consensus Coding Sequence identifier (CCDS ID). Clicking “Submit query” generates a table shown below with the following columns:


  • Gene symbol: Official symbol for the gene by HUGO Gene Nomenclature Committee (HGNC).
  • CCDS ID is the unique accession identifier in the CCDS Project for the exon of the gene.
  • Global coverage reports the values for mean, minimum, and maximum breadth of coverage, written as percentages, in three separate columns. These values were computed from five super population: African (AFR), Admixed American (AMR), East Asian (EAS), European (EUR), and SAS (South Asian). Mean coverage by population may be found in the Details section. Each row is colored in the following manner: green for global mean coverage greater than 99%, yellow for coverage between 99% and 95%, and red for coverage less than 95%.
  • ANOVA F-statistic reports the resulting statistic for a one-way ANOVA test done to compare the means between the five super populations. In this context, ANOVA tests the null hypothesis that all samples in the dataset are drawn from a population with the same mean breadth of coverage.
  • Raw P-Value is reported from the resulting ANOVA for the given gene.
  • Adj. P-Value is the same raw p-value adjusted for false discovery rate (FDR) for performing this test for each of the 28,161 exons in the dataset.
  • Action contains a Detail button that reports population summary, coverage plots, and gene panel tests for the gene.

Given that the global mean coverage for PNPLA3 is reported to be less than 95% and therefore poorly covered by WES, additional information may be found by clicking the Detail button in the last column of the table. Detail brings up a small window that contains different tabs as described in the next section.

4.3. Details

Population summary reports the mean breadth of coverage for each CCDS ID by super population. This information may be used to highlight differences in coverage between different populations. For example, exons that are comprehensively covered by WES in Europeans may have a lower mean breadth of coverage in another ancestry, suggesting the use of gene panels instead.


Coverage plots shows two plots: the breadth of coverage distribution and coverage metric across genomic loci for the gene. On the left, a violin plot shows the breadth of coverage distribution by super populations. The black horizontal line in the plot marks the mean breadth of coverage from exomes in the Genome Aggregation Database (gnomAD) as a global estimate from large-scale data (over 123,000 exomes in the latest release 2.0.2).

The second plot shows coverage metric (from exomes in gnomAD browser) over genomic positions in the selected gene for the hg19 reference genome. The plot consists of three parts: the coverage metric (top), exons and transcripts in the gene (middle), and position of the gene in the chromosome (bottom). The coverage metric is defined as the proportion of exomes in the gnomAD (y-axis) which achieved the target depth of coverage (10x, 20x, or 30x) at the given locus (x-axis). The coverage metrics at different target depths are represented by different colors: 10x as light blue, 20x as medium blue, and 30x as dark blue.


If the given position in the gene is well-covered in most of gnomAD exomes, the position will have high metric values (and in dark colors, too, if the position is also well-covered at higher target depths). For the PLPNA3 gene, most coding exons (except for the leftmost one) are well-covered at all depths. On the other hand, if a given region have high values only in light colors (e.g., 90% exomes attained 10x coverage, but only 10% succeed at 30x), or if the region have low metric values over any target depths (e.g., only 5% of exomes had covered the region at any level), it indicates that the region is not well covered among exomes in gnomAD. In the previous figure, the leftmost exon in PLPNA3 gene is less-covered compared to other exons. Clicking on the image will open an entry for the corresponding gene in the gnomAD browser, which provides more detailed information. The gene models and genomic positions follows the coverage metric plot, as a guide to match genomic positions to the gene. Due to differences in reference genomes used in the data (breadth of coverage values from 1KGP are based on the latest human reference genome (hg38), while coverage metric from gnomAD is based on the previous version (hg19)), some genes may not have a corresponding plot for coverage metric.

Gene panels provides a list of all panels registered in GTR that target this CCDS by gene symbol. Each panel is listed by its unique accession version and provides a hyperlink to its entry in GTR when clicked. Given all genes reported by WEScover to be poorly covered by WES, users may browse these panels and read their entries in GTR to learn how to gain access to these tests.

5. References

Chang, W., Cheng, J., Allaire, J.J., Xie, J. and McPherson, J. (2017). shiny: Web Application Framework for R. R package version 1.0.5. https://CRAN.R-project.org/package=shiny.

Kong, S.W., et al. Measuring coverage and accuracy of whole-exome sequencing in clinical context. Genet Med 2018. https://www.ncbi.nlm.nih.gov/pubmed/29789557.

Lek, M., et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 2016;536(7616):285-291. https://www.ncbi.nlm.nih.gov/pubmed/27535533.

Meienberg, J., et al. New insights into the performance of human whole-exome capture platforms. Nucleic Acids Res 2015;43(11):e76. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4477645.

Meynert, A.M., et al. Variant detection sensitivity and biases in whole genome and exome sequencing. BMC Bioinformatics 2014;15:247.https://www.ncbi.nlm.nih.gov/pubmed/25038816.

Park, J.H., et al. I148M variant in PNPLA3 reduces central adiposity and metabolic disease risks while increasing nonalcoholic fatty liver disease. Liver Int. 2015 Dec;35(12):2537-46. doi: 10.1111/liv.12909. https://www.ncbi.nlm.nih.gov/pubmed/26148225.

Pruitt, K.D., et al. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 2009;19(7):1316-1323. https://www.ncbi.nlm.nih.gov/pubmed/19498102.

Rubinstein, W.S., et al. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Res 2013;41 (Database issue):D925-935. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531155.

Stavropoulos, D.J., et al. Whole Genome Sequencing Expands Diagnostic Utility and Improves Clinical Management in Pediatric Medicine. NPJ Genom Med 2016;1. https://www.ncbi.nlm.nih.gov/pubmed/28567303.

Wang, J., et al. Diagnostic yield of clinical next-generation sequencing panels for epilepsy. JAMA Neurol 2014;71(5):650-651. https://www.ncbi.nlm.nih.gov/pubmed/24818677.

Wang, Q., et al. Novel metrics to measure coverage in whole exome sequencing datasets reveal local and global non-uniformity. Sci Rep 2017;7(1):885. https://www.ncbi.nlm.nih.gov/pubmed/28408746.

Data



Breadth of coverage distribution by super population



Exome coverage in gnomAD



Source


  • The source code of this Shiny app can be foud at GitHub