solutions knomeBASE

knomeBASE™

knomeBASE (pdf description) provides geneticists with the genome interpretation informatics and tools they need to shortlist and validate candidate variants, genes, and pathways.

knomeBASE is for geneticists and other researchers who have the capacity to interpret genomes on their own—but need the appropriate informatics support and software tools. Researchers who seek a fully outsourced discovery solution should review knomeDISCOVERY.

Screen Shot 2012 03 27 at 9.25.48 PM knomeBASE

knomeBASE process

Our process starts with raw sequence data (client-supplied or Knome-supplied). We  then complete project-specific curation, review sequence quality, and run your sequence data through kGAP, our automated informatics engine.

kGAP richly annotates all known and novel variants—including allele frequencies, effects on protein structure and function, and phenotypic associations—based on integration of more than a dozen public and private data sources. In addition, kGAP creates a compact, easily queriable database of annotated genotypes for each genome–-and a comparison database for all studied genomes.

These easily queriable databases are staged and loaded into proprietary desktop software tools designed to enable geneticists and molecular biologists to efficiently interpret multiple genomes.

Process map (click to expand)

Screen Shot 2012 04 04 at 7.31.42 PM knomeBASE

Informatics

Underlying the knomeBASE service is kGAP™, an informatics engine that automates the process of annotating, comparing and distilling whole genome sequence data—transforming raw sequence data into a format optimized for interpretation. Designed to process many genomes at once, kGAP completes in a day what would otherwise require months of effort and a team of specialists.

Screen Shot 2012 04 04 at 7.08.52 PM knomeBASE

Standardization
To allow the comparison of many genomes at once – even if sequenced at different times, under different specifications, or on different platforms – we convert raw sequence data into a standard diploid genome format, aligned to reference, that allows us to thoroughly capture simple and complex variants. We call this format the Human Genome Format (HGF).

Annotation
To annotate known and novel variants, we have curated and harmonized reference data from more than a dozen sources, including dbSNP, 1000 Genomes, HapMap III, HGMD, ENSEMBL, RefGene, GeneAlias, HPRD, MsigDB, KEGG, Reactome, SIFT, GO terms, and PubMed. Using the resulting reference database (>100,000 reports), we richly annotate the genomes in your study, including gene-associated phenotypes, ranges of genotype-, sex-, and ethnicity-appropriate risk estimates for site-phenotype associations; and direct links to publications.

Distillation
To speed interpretation, we distill each annotated genome into a compact, easily queriable Variable Site Database (~9 GB) that details: genotypes at all 40 million variable sites known to vary the human genome, including base substitutions and short indels; call confidence, including reference matching and no-calls; variant frequencies in appropriate populations; geneID sand associated phenotypes; site-associated phenotypes; with estimated odds ratio ranges and p-values; variant-specific effects on protein sequence and function (predicted); and protein-protein interactions.

Variable Site Databases are small enough to be managed on a desktop computer, yet detailed enough to provide the rich information needed for interpretation.

Comparison
In addition to creating a Variable Site Database for each genome, we create a single compact database that summarizes the distribution of variants among all genomes within a study. This Variable Site Comparison Database enables the fast and flexible querying of multiple genomes.

knomeBASE deliverables

The deliverables for knomeBASE include your enhanced data and our genome interpretation tools, scripts, and libraries.

dashboard knomeBASE
Enhanced data. Your genomes are delivered on a secure hard drive and accessible through an easy to use dashboard interface. Just click to access standardized data sets in Human Genome Format (HGF) for each genome, Variable Site Databases for each genome, and a Variable Site Comparison Database for all genomes within a study.

 

 

kdk knomeBASEknomeVARIANTS™. This query tool helps pinpoint candidate causal variants. It includes a query interface (shown to the left), scripting libraries, and data conversion utilities. Simply identify cases vs. controls and a putative inheritance mode, then add sensible filter criteria to automatically generate a sorted shortlist of leading candidates.

 

 

pathways knomeBASE

knomePATHWAYS™. This visualization tool overlays variants found in your genomes onto known gene interaction and coexpression networks, helping identify functional interactions between variants in distinct genes.

Demos of our software tools

Click the images below for demos of our tools in action.

pathways part1 knomeBASE
knomePATHWAYS: overlaying variants onto gene interaction networks in order to spot important patterns (part 1).
pathways part2 knomeBASE
knomePATHWAYS: overlaying variants onto gene interaction networks in order to spot important patterns (part 2).
tumornormalgermline knomeBASE
knomeVARIANTS: examining shared germline variants that may influence inherited cancer risk.
nonsense knomeBASE
knomeVARIANTS: examining a genome for nonsense variants, selecting those in genes already implicated in a disease or other phenotype.
drugresponse knomeBASE
knomeVARIANTS: looking through a single genome to better understand drug response.
tumornormalsomatic knomeBASE
knomeVARIANTS: examining the genome of tumor and healthy tissue from the same person.

 

Pricing

knomeBASE, including informatics and software tools, is available for $500 to $750 per genome or exome, depending on volume. For more information, please contact us at research@knome.com or call (617) 715-1000.