Information

11.1: ‘Omics Technologies - Biology

11.1:  ‘Omics  Technologies - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The complete set of DNA within an organism is called its genome. Genomics is therefore the large-scale description, using techniques of molecular biology of many genes or even whole genomes at once. This type of research is facilitated by technologies that increase throughput (i.e. rate of analysis), and decrease cost. The –omics suffix has been used to indicate high-throughput analysis of many types of molecules, including transcripts (transcriptomics), proteins (proteomics), and the products of enzymatic reactions, or metabolites (metabolomics; Figure (PageIndex{1})). Interpretation of the large data sets generated by –omics research depends on a combination of computational, biological, and statistical knowledge provided by experts in bioinformatics. Attempts to combine information from different types of –‘omics studies is sometimes called systems biology.


Omics Technologies and Bio-engineering

Omics Technologies and Bio-Engineering: Towards Improving Quality of Life, Volume 1 is a unique reference that brings together multiple perspectives on omics research, providing in-depth analysis and insights from an international team of authors. The book delivers pivotal information that will inform and improve medical and biological research by helping readers gain more direct access to analytic data, an increased understanding on data evaluation, and a comprehensive picture on how to use omics data in molecular biology, biotechnology and human health care.


AMPs as a Disease Marker

The appeal of “omics” research lies in the possibility of identifying an AMP profile that distinguishes one particular patient in a clinically useful manner to categorize them as high- or low-risk for developing a post-operative infection (e.g., pneumonia). Several studies have demonstrated the importance of cathelicidin (Braff et al., 2007 Kovach et al., 2012), beta-defensins (Chong et al., 2008 Scharf et al., 2012), and several other AMPs in regulating epithelial immunity (Cole and Waring, 2002 Tecle et al., 2010). Consequently, it is feasible that integrating these variables simultaneously may reveal unexplored or intriguing connections, and likely identify invaluable correlations that are clinically significant.

Antimicrobial peptides have been extensively evaluated for their role in inflammation (Lai and Gallo, 2009), while little research has investigated their potential role in rejection following organ transplantation. Although these patients are typically immunosuppressed, abnormal alterations in AMPs following transplantation may contribute to or serve as a marker of inflammation and rejection. A diagnostic tool to yield a molecular AMP profile of a transplant patient could serve as a prognostic indicator of organ failure. Nuclear magnetic resonance (NMR) based metabolomic technologies have also attempted to identify other urine biomarkers as an indicator of chronic renal failure and renal transplant function (Bell et al., 1991 Foxall et al., 1993). Similar technologies could be employed to identify how AMPs may correlate with clinical outcomes in transplant patients. In comparison, trauma and burn patients exhibit profound defects in immune regulation following injury, including perturbations in AMPs (Steinstraesser et al., 2004 Bhat and Milner, 2007). A diagnostic AMP profile may again provide invaluable data to predict healing, immune integrity, or graft survival. The clinical utility of such targeted profiles could undoubtedly be applied to numerous disease states involving infection and/or inflammation, to serve as markers of prognosis.

Currently, NMR and Mass Spectrometry are two major platforms by which metabolomic analyses are evaluated via bioinformatic tools. Several skin AMPs were recently implicated in tumorigenesis of cutaneous squamous cell carcinoma (Scola et al., 2012). Significant metabolic alterations usually ensue as normal cells are transformed into a malignant phenotype. Using a limited rationale, AMPs may simply reflect alterations in the local environment from the presence of the malignancy, or may simply be irrelevant to the malignancy. A more sophisticated rationale suggests that alterations in AMPs may serve as a biomarker for disease severity and/or progression, and denote a significant underlying process that contributes to a malignancy. Interestingly, the wound repair process and cancer progression are both associated with alterations in the inflammatory/immune microenvironment. During wound repair, AMPs are released from epithelial and infiltrating immune cells to stimulate re-epithelialization, new vessel formation, and extracellular matrix (ECM) remodeling (Radek and Gallo, 2007). However, the dynamics of cancer progression and tissue repair differ in that wound healing is a self-limiting process, while tumor formation is characterized by a continuous, uncontrolled activation of similar pathways that facilitate tumor growth and metastasis. One key observation is that prominent associations exist between the cytokines, chemokines, and growth factors present in healing wounds and wound fluid, as compared to tumors. In parallel, a striking difference in the temporal regulation of these factors was determined by the combination of several genomic technologies (Pedersen et al., 2003). Furthermore, proteomics and genomic methods are now being employed through a multidisciplinary translational research approach to improve the bioactive components in matrix therapies for non-healing wounds to specifically modulate the temporal and local release of these micromolecules (Sweitzer et al., 2006). Degradomics is emerging in the wound healing field as a new technology that assimilates the current knowledge database of ECM regulation and deciphers the complex interactions between proteases and their respective inhibitors using systems biology as a means to improve wound integrity in chronic wounds (Hermes et al., 2011). Since AMPs are an integral part of wound healing and inflammation, the knowledge gained from utilizing these evolving “omics” technologies may be extrapolated to other dimensions of data analysis that span other disease states which share similar mechanisms of disease progression.

Antimicrobial peptide regulation can clearly modulate and be influenced by the composition of the microbial flora of the human host. Several AMPs are induced in response to both invasive pathogens, as well as commensal strains of bacteria to generate specific down-stream innate or adaptive immune signaling events. For instance, the cutaneous commensal Staphylococcus epidermidis induces human β-defensin-2 and -3 via a TLR-2 signaling dependent mechanism (Lai et al., 2010). This interaction is beneficial for both the host and microbe by facilitating the eradication of pathogens on the skin via AMP induction, while simultaneously allowing S. epidermidis to further proliferate with fewer competitors for metabolic resources. Further complicating these interactions, microbes have evolved several mechanisms to evade host AMPs via altered cell surface charge, efflux transporters, proteases, or trapping proteins, and direct adaptations of host cellular processes (Nizet, 2006). These dynamic interactions between the host and the resident microbiota can significantly influence the overall homeostatic balance.

The integration of multiple “omics” disciplines is applicable to several tangible clinical situations where infection is a risk factor. For example, identification of patients most at risk for a urinary tract infection (UTI) would improve prophylactic therapies for susceptible patient populations, including burn-injured, surgical, or bedridden individuals. Recent studies deliver a long overdue confirmation that urine is not sterile, which challenges the current dogma (Nelson et al., 2010 Dong et al., 2011 Wolfe et al., 2012). Thus, integration of multiple “omics” technologies, such as 16S rRNA gene sequencing and proteomics, may identify correlations between specific AMPs and distinct genera of bacteria to identify unique patterns that could be employed as a diagnostic tool to predict those individuals who may be at a higher risk for a UTI. Furthermore, the development of a rapid, high-throughput assay that integrates multiple “omics” technologies to correlate AMPs with tissue specific microbiota would be invaluable to clinicians for prediction of UTI or other disease states.


INTRODUCTION

Pathway Tools [ 1–3] is a software environment for management, analysis and visualization of integrated collections of genome, pathway and regulatory data. Pathway Tools handles many types of information beyond pathways, and its capabilities are very extensive. The software has been under continuous development within the Bioinformatics Research Group within SRI International since the early 1990s. Pathway Tools serves following several different use cases in bioinformatics and systems biology: This article provides a comprehensive description of Pathway Tools. It describes both what the software does, and how it does it. Where possible it references earlier publications that provide more algorithmic details. However, in some cases those earlier publications are outdated by new developments in the software that are described here. This article also emphasizes new aspects of the software that have not been reported in earlier publications.

It supports development of organism-specific databases (DBs) [also called model-organism databases (MODs)] that integrate many bioinformatics datatypes.

It supports scientific visualization, web publishing and dissemination of those organism-specific DBs.

It performs computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons, which can be used for genome analysis.

It provides visual tools for analysis of omics datasets.

It provides tools for analysis of biological networks.

It provides comparative analyses of organism-specific DBs.

It supports metabolic engineering.

Pathway Tools is focused around a type of MOD called a Pathway/Genome Database (PGDB). A PGDB integrates information about the genes, proteins, metabolic network and regulatory network of an organism.

Pathway Tools has several components. The PathoLogic component allows users to create a new PGDB from the annotated genome of an organism. PathoLogic generates a new PGDB that contains the genes, proteins, biochemical reactions and predicted metabolic pathways and operons of the organism.

The Pathway/Genome Editors let PGDB developers interactively refine the contents of a PGDB, such as editing a metabolic pathway or an operon, or defining the function of a newly characterized gene.

The Pathway/Genome Navigator supports querying, visualization and analysis of PGDBs. The Navigator can run as a local desktop application and as a web server. The Navigator allows scientists to find information quickly, to display that information in familiar graphical forms and to publish a PGDB to the scientific community via the web. The Navigator provides a platform for systems-level analysis of functional-genomics data by providing tools for painting combinations of gene expression, protein expression and metabolomics data onto a full metabolic map of the cell, onto the full genome, and onto a diagram of the regulatory network of the cell.

Pathway Tools includes a sophisticated ontology and DB application programming interface (API) that allows programs to perform complex queries, symbolic computations and data mining on the contents of a PGDB. For example, the software has been used for global studies of the Escherichia coli metabolic network [ 4] and genetic network [ 5].

Pathway Tools is seeing widespread use across the bioinformatics community to create PGDBs in all domains of life. The software has been licensed by more than 1700 users to date. As well as supporting the development of the EcoCyc [ 6] and MetaCyc [ 7] DBs at SRI, and SRI's BioCyc collection of 500 PGDBs [ 7], the software is in use by genome centers, by experimental biologists, and by groups that are creating curated MODs for bacteria (such as the National Institute of Allergy and Infectious Diseases Bioinformatics Resource Centers PATRIC, BioHealthBase, Pathema and EuPathDB), for fungi (such as the Saccharomyces Genome Database and the Candida Genome Database), mammals (such as the Jackson Laboratory's MouseCyc) and for plants (such as Arabidopsis thaliana). See Section 9 for a more detailed listing of available PGDBs.

The organization of this article is as follows. Section ‘Pathway Tools use cases’ articulates in more detail the use cases for which Pathway Tools was designed. ‘Creating and curating a PGDB’ section relates how a new PGDB is created, and describes the computational inference procedures within Pathway Tools. It summarizes the interactive editing capabilities of Pathway Tools, and the associated author crediting system. It also describes tools for automatic upgrading of a PGDB schema, and for bulk updating of the genome annotation within a PGDB. ‘The pathway Tools schema’ section describes the schema of a PGDB. ‘Visualization and querying of PGDBs’ section relates the querying and visualization facilities of Pathway Tools. ‘Computational access to PGDBs’ section summarizes the mechanisms for importing and exporting data from Pathway Tools, and for accessing and updating PGDB data via APIs. ‘Systems biology analyses’ section describes multiple Pathway Tools modules for performing systems analyses of PGDBs including a tool for interactively tracing metabolites through the metabolic network, tools for performing network reachability analysis and for identifying dead-end metabolites, a tool for predicting antimicrobial drug targets by identifying metabolic network choke points and a set of comparative analysis tools. ‘Software and DB architecture’ section describes the software architecture of Pathway Tools. ‘Survey of pathway tools compatible DBs’ section lists the large family of PGDBs that have been created by Pathway Tools users outside SRI International, and describes a peer-to-peer data sharing facility within Pathway Tools that allows users to easily exchange their PGDBs. ‘Comparison with related software environments’ section compares Pathway Tools to related efforts.


Approaches to integrative analysis of multiple omics data

Multi-omics approaches have been applied to a wide range of biological problems and we have grouped these into three categories, “genome first”, “phenotype first”, and “environment first”, depending on the initial focus of the investigation. Thus, the genome first approach seeks to determine the mechanisms by which GWAS loci contribute to disease. The phenotype first approach seeks to understand the pathways contributing to disease without centering the investigation on a particular locus. And the environment first approach examines the environment as a primary variable, asking how it perturbs pathways or interacts with genetic variation. We then discuss briefly some statistical issues around data integration across omics layers and network modeling.

The genome first approach

In the absence of somatic mutations, primary DNA sequence remains unaltered throughout life and is not influenced by environment or development. Thus, for disease-associated genetic variants, it is assumed that a specific variant contributes to, and is not a consequence of, disease. Such variants constitute a very powerful anchor point for mechanistic studies of disease etiology and modeling interactions of other omics layers. GWASs often identify loci harboring the causal variants, but lack sufficient power to distinguish them from nearby variants that are associated with disease only by virtue of their linkage to the causative variant. Moreover, the identified loci typically contain multiple genes, which from a genomic point of view could equally contribute to disease. Thus, although GWAS results may be immediately useful for risk prediction purposes, they do not directly implicate a particular gene or pathway, let alone suggest a therapeutic target. Locus-centered integration of additional omics layers can help to identify causal single nucleotide polymorphisms (SNPs) and genes at GWAS loci and then to examine how these perturb pathways leading to disease.

Analyses of causal variants at GWAS loci focused originally on coding regions, but it has become clear that for many common diseases regulatory variation explains most of the risk burden [21]. Thus, transcriptomics, employing either expression arrays or RNA-Seq (Box 1), has proven to be particularly useful for identifying causal genes at GWAS loci [79,16,, 22–24]. A number of statistical methods have been developed for examining causality based on eQTL at GWAS loci, including conditional analysis and mediation analysis (Fig. 2). Large datasets of eQTLs are now available for a number of tissues in humans and animal models [17, 22, 25, 26].

Usage of omics applications to prioritize GWAS variants. Locus zoom plot for a complex GWAS locus shows several candidate genes could be causal. Heatmap using various omics approaches for evidence supporting or refuting candidate causal genes. Beyond literature queries for candidates, various omics technologies and databases can be used to identify causal genes, including: searching for expression in relevant tissues [173,174,175], summary data-based Mendelian randomization (SMR) [176], mediation analysis [177], conditional analysis [23], correlation analyses, searching for overlapping pQTLs [178, 179], and/or implementing epigenetic data to narrow candidates (discussed for FTO locus [16])

Identification of causal DNA variants affecting gene expression is complicated as a variety of elements, within the gene and hundreds of kilobases away from the gene, can contribute. Results from the ENCODE (Encyclopedia of DNA elements) and RoadMap Consortia have been particularly useful in this regard for defining enhancer and promoters in a variety of tissues in mice and humans (Box 1, Fig. 3). Once the causal variants or gene have been established, other omics layers can help identify the downstream interactions or pathways. As discussed further below, transcript levels often exhibit poor correlation with protein levels and thus proteomics data are expected to be more proximal to disease mechanisms. Moreover, proteomics techniques such as yeast two-hybrid screens or “pulldown analyses” can be used to identify interacting pathways contributing to disease [27]. For certain disorders, metabolomics can also be used to bridge genotype to phenotype [28].

Genome first approach at FTO GWAS locus. Claussnitzer et al [16] combined genomics, epigenomics, transcriptomics, and phylogenetic analysis to identify the functional element, the causative SNP, and the downstream genes mediating the genetic effect at the FTO locus in obesity. Circles represent genes in the locus and yellow circles represent genes implicated by the respective omics data. a Genomics: the FTO locus, containing several genes (circles), harbors the most significant obesity-associated haplotype in humans. SNPs that are in linkage disequilibrium with the risk allele are color coded—blue represents the non-risk (normal) haplotype and red the risk haplotype. b Epigenomics: publically available epigenomic maps and functional assays were used to narrow down the original associated region to 10 kb containing an adipose-specific enhancer. Chromatin capturing (Hi-C) was used to identify genes interacting with this enhancer. c Transcriptomics: this technique was used to identify which of the candidate genes are differentially expressed between the risk and normal haplotypes, identifying IRX3 and IRX5 as the likely downstream targets. In addition, conservation analysis suggested that rs1421085 (SNP that disrupts an ARID5B binding motif) is the causative SNP at the FTO locus. CRISPR-Cas9 editing of rs1421085 from background (TT) to risk allele (CC) was sufficient to explain the observed differences in expression of IRX3 and IRX5. d Functional mechanism: correlation and enrichment analysis were then used to identify potentially altered pathways that were then confirmed by in vitro and in vivo studies

A good example of a genome first approach is the study by Claussnitzer and colleagues [16] that involved analysis of the FTO locus that harbors the strongest association with obesity (Fig. 3). To identify the cell type in which the causal variant acts, they examined chromatin state maps of the region across 127 cell types that were previously profiled by the Roadmap Epigenomics Project (Box 1). A long enhancer active in mesenchymal adipocyte progenitors was shown to differ in activity between risk and non-risk haplotype. They then surveyed long-range three-dimensional chromatin (Hi-C) interactions involving the enhancer and identified two genes, IRX3 and IRX5, the expression of which correlated with the risk haplotype across 20 risk-allele and 18 non-risk-allele carriers. To identify the affected biologic processes, Claussnitzer and colleagues examined correlations between the expression of IRX3 and IRX5 with other genes in adipose tissue from a cohort of ten individuals. Substantial enrichment for genes involved in mitochondrial functions and lipid metabolism was observed, which suggests possible roles in thermogenesis. Further work using trans-eQTL analysis of the FTO locus suggested an effect on genes involved in adipocyte browning. Adipocyte size and mitochondrial DNA content were then studied for 24 risk alleles and 34 non-risk alleles and shown to differ significantly, consistent with an adipocyte-autonomous effect on energy balance. Claussnitzer and colleagues confirmed the roles of IRX2 and IRX5 using experimental manipulation in primary adipocytes and in mice. Finally, the causal variant at the FTO locus was predicted using cross-species conservation and targeted editing with CRISPR-Cas9 identified a single nucleotide variant that disrupts ARID5B repressor binding.

The phenotype first approach

A different way to utilize omics data to augment our understanding of disease is to simply test for correlations between disease, or factors associated with disease, and omics-based data. Once different entities of omics data are found to correlate with a particular phenotype, they can be fitted into a logical framework that indicates the affected pathways and provide insight into the role of different factors in disease development.

For example, Gjoneska et al. [20] used transcriptomic and epigenomic data to show that genomic and environmental contributions to AD act through different cell types. The authors first identified groups of genes that reflect transient or sustained changes in gene expression and cell populations during AD development. Consistent with the pathophysiology of AD, the transcriptomic data showed a sustained increase in immune-related genes, while synaptic and learning functions showed a sustained decrease. The authors then used chromatin immunoprecipitation and next-generation sequencing (NGS) to profile seven different epigenetic modifications that mark distinct functional chromatin states. They were able to identify thousands of promoters and enhancers that showed significantly different chromatin states in AD versus control. Next, the authors showed that these epigenetic changes correspond to the observed changes in gene expression, and used enrichment analysis to identify five transcription factor motifs enriched in the activated promoters and enhancers and two in the repressed elements. Finally, the authors used available GWAS data to see whether genetic variants associated with AD overlap any of the functional regions they identified. Notably, they found that AD-associated genetic variants are significantly enriched in the immune function-related enhancers but not promoters or neuronal function-related enhancers. This led the authors to suggest that the genetic predisposition to AD acts mostly through dysregulation of immune functions, whereas epigenetic changes in the neuronal cells are mostly environmentally driven.

In another example, Lundby and colleagues [29] used quantitative tissue-specific interaction proteomics, combined with data from GWAS studies, to identify a network of genes involved in cardiac arrhythmias. The authors began by selecting five genes underlying Mendelian forms of long QT syndrome, and immunoprecipitated the corresponding proteins from lysates of mouse hearts. Using mass spectrometry (MS), they then identified 584 proteins that co-precipitated with the five target proteins, reflecting potential protein–protein interactions. Notably, many of these 584 proteins were previously shown to interact with ion channels, further validating the physiological relevance of this experiment. They then compared this list of proteins with the genes located in 35 GWAS loci for common forms of QT-interval variation, and identified 12 genes that overlapped between the two sets. This study provides a mechanistic link between specific genes in some of the GWAS loci to the genotype in question, which suggests a causative link in the locus.

The environment first approach

In this approach, multi-omics analyses are used to investigate the mechanistic links to disease using an environmental factor such as diet as the variable. To accurately assess environmental or control factors such as the diet in humans is very difficult and so animal models have proven particularly valuable for examining the impact of the environment on disease. Here, we give three examples of multi-omic study designs used to examine the impact of the environment on disease.

One kind of study design is to examine multiple environmental conditions to determine how these perturb physiologic, molecular, and clinical phenotypes. For example, Solon-Biet and colleagues [30] explored the contribution of 25 different diets to the overall health and longevity of over 800 mice. They compared the interaction between the ratio of macronutrients with a myriad of cardiometabolic traits (such as lifespan, serum profiles, hepatic mitochondrial activity, blood pressure, and glucose tolerance) in order to elucidate specific dietary compositions associated with improved health. The ratio of protein to carbohydrate in the diet was shown to have profound effects on health parameters later in life, offering mechanistic insight into how this is achieved.

The second study design seeks to understand the interactions between genetics and the environment. For example, Parks and coworkers [31, 32] recently studied the effects of a high fat, high sucrose diet across about 100 different inbred strains of mice. By examining global gene expression in multiple tissues and metabolites in plasma, they were able to identify pathways and genes contributing to diet-induced obesity and diabetes. In the case of dietary factors, the gut microbiome introduces an additional layer of complexity as it is highly responsive to dietary challenges and also contributes significantly to host physiology and disease. Recent multi-omic studies [31, 33, 34] have revealed an impact of gut microbiota on host responses to dietary challenge and on epigenetic programming.

The third type of study design involves statistical modeling of metabolite fluxes in response to specific substrates. For example, the integration of bibliographic, metabolomic, and genomic data have been used to reconstruct the dynamic range of metabolome flow of organisms, first performed in Escherichia coli [35] and since extended to yeast [36, 37] and to individual tissues in mice [38] and humans [39]. Other applications have explored various connections between metabolome models and other layers of information, including the transcriptome [40] and proteome [41,42,43]. Refinement of these techniques and subsequent application to larger population-wide datasets will likely lead to elucidation of novel key regulatory nodes in metabolite control.

Integration of data across multi-omics layers

A variety of approaches can be used to integrate data across multiple omics layers depending on the study design [44]. Two frequently used approaches involve simple correlation or co-mapping. Thus, if two omics elements share a common driver, or if one perturbs the other, they will exhibit correlation or association (Fig. 4). A number of specialized statistical approaches that often rely on conditioning have been developed. In these approaches a statistical model is used to assess whether each element of the model—for example, a SNP and expression change—contributes to the disease independently versus one being the function of the other. For example, a regression-based method termed “mediation analysis” was developed to integrate SNP and gene expression data, treating the gene expression as the mediator in the causal mechanism from SNPs to disease [45, 46]. Similar approaches have been applied to other omics layers [46, 47]. More broadly, multi-layer omics can be modeled as networks, based on a data-driven approach or with the support of prior knowledge of molecular networks. A practical consideration in multi-omic studies is the correlation of identities of the same objects across omics layers, known as ID conversion. This is performed using pathway databases such as KEGG and cross-reference tables [47]. Ideally, the multi-omics datasets will be collected on the same set of samples, but this is not always possible GWAS and expression data are frequently collected from different subjects. In such cases, it is possible to infer genetic signatures (eQTL) or phenotypes based on genotypes [48,49,50].

The flow of biologic information from liver DNA methylation to liver transcripts, proteins, metabolites, and clinical traits. A panel of 90 different inbred strains of mice were examined for DNA methylation levels in liver using bisulfite sequencing. CpGs with hypervariable methylation were then tested for association with clinical traits such as a obesity and diabetes, b liver metabolite levels, c liver protein levels, and d liver transcript levels. Each dot is a significant association at the corresponding Bonferroni thresholds across CpGs with the clinical traits and metabolite, protein, and transcript levels in liver. The genomic positions of hypervariable CpGs are plotted on the x-axis and the positions of genes encoding the proteins or transcripts are plotted on the y-axis. The positions of clinical traits and metabolites on the y-axis are arbitrary. The diagonal line of dots observed to be associated with methylation in the protein and transcript data represent local eQTL and pQTL. The vertical lines represent “hotspots” where many proteins or transcripts are associated with CpG methylation at a particular locus. Figure taken with permission from [180], Elsevier

Investigating the quantitative rules that govern the flow of information from one layer to another is also important when modeling multiple data types. For example, one of the fundamental assumptions behind many of the RNA co-expression networks is that fluctuations in RNA abundance are mirrored by proteins. However, while the tools for effective interrogation of transcriptome are widely available and commonly used, effective interrogation of proteomes at the population level is a relatively new possibility (Box 1). A number of studies have now shown that while levels of many proteins are strongly correlated with their transcript levels, with coincident eQTL and protein QTL (pQTL), the correlations for most protein–transcript pairs are modest [51,52,53,54,55,56,57,58]. The observed discordance of transcript and protein levels is likely to be explained by regulation of translation, post-translation modifications, and protein turnover. Together these studies suggest that RNA may be a good predictor of abundance of only some proteins, identifying groups of genes that confer to this rule and those that do not. In the context of disease oriented research, such studies constitute an important step for creating an analytical framework that will later be applied to interpretation of disease-specific datasets. In addition, especially in context of limited availability of human samples, such studies are useful for choosing among possible experimental approaches.

A key concept of modern biology is that genes and their products participate in complex, interconnected networks, rather than linear pathways [59]. One way to model such networks is as graphs consisting of elements that exhibit specific interactions with other elements [60,61,62,63,64]. Such networks were first constructed based on metabolic pathways, with the metabolites corresponding to the nodes and the enzymatic conversions to the edges [65, 66]. Subsequently, networks were modeled based on co-expression across a series of perturbations with the genes encoding the transcripts corresponding to the nodes and the correlations to the edges [67,68,69]. In the case of proteins, edges can be based on physical interactions, such as those identified from global yeast two-hybrid analyses or a series of “pulldowns” [27]. Networks can also be formed based on genomic interactions captured by HiC data [70, 71], and physical interactions can also be measured across different layers, such as in ChIP-Seq, which quantifies DNA binding by specific proteins.

For studies of disease, co-expression networks can be constructed based on variations in gene expression that occur among control and affected individuals separately [72,73,74]. Comparison of network architecture between control and disease groups allows the identification of closely connected nodes (“modules”) most correlated with disease status. In general, co-expression or interaction networks are “undirected” in the sense that the causal nature of the interactions is unknown. Interaction networks can be experimentally tested, although the high number of suggestive interactions identified in each study makes indiscriminate testing prohibitive. If genetic data, such as GWAS loci for disease or eQTLs for genes, are available it may be possible to infer causality using DNA as an anchor [75,76,77]. Such integration of genetic information with network modeling has been used to highlight pathways that contribute to disease and to identify “key drivers” in biologic processes [72,73,74, 78]. For example, Marbach and colleagues [79] combined genomics, epigenomics, and transcriptomics to elucidate tissue-specific regulatory circuits in 394 human cell types. They then overlaid the GWAS results of diseases onto tissue-specific regulatory networks in the disease-relevant tissues and identified modules particularly enriched for genetic variants in each disease. In another example, Zhang and coworkers [64] examined transcript levels from brains of individuals with late onset AD and analyzed co-expression and Bayesian causal modeling to identify modules associated with disease and key driver genes important in disease regulatory pathways. Together, these studies illustrate how network analysis can be used to narrow down the focus of disease research into specific functional aspects of particular cell types or tissues, considerably facilitating downstream mechanistic efforts and hypothesis generation.


Big Data in Biology Has Produced a Surfeit of Hypotheses – Perhaps Too Many

Even if you aren’t a biologist yourself, you might have heard the terms ‘big data’ and ‘multi-omics technologies’ thrown around. Next-generation sequencing – or multi-omics technology – that spawns ‘biological big data’ has evolved rapidly in a matter of two decades, becoming ubiquitous in research. And DNA sequencing is at the foundation of the multi-omics technologies. First developed by Frederick Sanger in the late 1970s, sequencing has evolved into an automated process that allows us to rapidly and repeatedly sequence a single stretch of DNA, resulting in highly accurate sequence readouts.

Genomics refers to the information we obtain by sequencing genomes, or the entire DNA, of different organisms. Genomics has expanded to give birth to transcriptomics (involving RNA sequencing) as well as proteomics, epigenomics and metabolomics. Sequencing and analysis have transformed our knowledge of biological systems and their inner workings. We can now access information that we never before dreamed of possessing, such as heterogeneity across single cells within a cancerous tumour, implication of little-known genes in the cause and progression of many debilitating human illnesses, the evolutionary trajectories of our ancestors and an improved understanding of the model systems we work with in our laboratories.

These insights have been aided by the steep fall in the price of sequencing over the years, enabling more and more laboratories to contribute to the ever-expanding repository of ‘-omics’ data. This ultimately should bring us to a question: is more data always better? While it’s hard to overstate the benefits of ‘-omics’ technologies, how this data is used and built on is also important. And this is where we may be checking the progress of modern biology.

First, there’s no denying that multi-omics technologies have significant diagnostic and therapeutic implications that have been translated into tangible human benefits. But while there is no doubt that the medical applications of biology are essential to human welfare, there also exists a fundamental aspect of research that seeks to understand of biology for its own sake. And I fear that the present ubiquity of multi-omics technologies, coupled to their success in clinical applications, may coax biologists to steer their research interests to align more with the clinical side of biology than they used to before. While the advancement of medical research and its applications are very important, we need to be careful and ensure the pursuit and quality of basic science doesn’t suffer as a result.

Big data does have its place in fundamental research. In the realm of sub-organismal biology, multi-omics technologies have allowed us to discover hitherto unknown cell types, correlate new genes with pathways and define biomolecular associations over entire genomes. Put differently: multi-omics has allowed us to produce testable hypotheses. The reports that detail these findings often gush about how the stage has been set for more research from the starting point that is big data. But has this really happened?

Of course, there’s an argument to be made that establishing causal relationships from data correlations is in the jurisdiction of the experimental biologist. But has someone experimentally investigated the biological mechanisms involved? I don’t think the answer is a resounding “yes”. Instead, I suspect that we are drowning in data, sidelining the traditional experimental validation of hypotheses while generating even more hypotheses.

So while next-generation sequencing has done a great deal of good for biology research, it’s possible we have reached a moment where we need to pause and look at the big picture that emerges from our individual preoccupations with sculpting a single pixel each. What are we doing with the data we generate? Are we using the data to test hypotheses and gain critical insights into biological mechanisms? Are we furthering fundamental research? Are multi-omics tools and traditional experimental research working in harmony?

I raise these questions as a novice – a student proposing to enter biology research at the point where -omics technology is increasingly towering over the life sciences, and I’m sure that these questions plague my peers, too. The ‘-omics’ revolution is well underway but I believe it’s the answers to questions like these that will determine what sort of mark this ongoing revolution will leave on the future of science.

Amruta Swaminathan is a master’s student at the Indian Institute of Science Education and Research (IISER), Pune.


Single-Cell Multi-Omics Market, 2025: One of the Most Rapidly Evolving Markets - Expansion into New Research Applications such as Single-Cell Metabolomics - ResearchAndMarkets.com

Healthcare experts have found the single-cell multi-omics market to be one of the most rapidly evolving markets, which is predicted to grow at a CAGR of 21.16% during the forecast period, 2020-2025.

The market is driven by the need for the development of an advanced solution based on single-cell technology for clinical research in various applications such as cancer, rare disease, cell biology, and synthetic biology, among others.

The market is favored by the development of single-cell technology-based solutions for visualization and analysis of cell heterogeneity, tumor micro-environment, and antibody development. The gradual increase in the prevalence of oncology and rare diseases globally has furthered the single-cell multi-omics market.

Furthermore, several contract research organizations are focusing on the development of single-cell technology-based services, which enable simultaneous analysis of genomics, proteomics, and transcriptomics, providing deeper insights on a disease progression.

Competitive Landscape

The exponential rise in the application of precision medicine on the global level has created a buzz among companies to invest in the development of high-resolution multiplex diagnostics providing information on cellular interaction and tissue heterogeneity to understand disease biology and pathology. Due to technologically advanced solutions and intense market penetration, 10x Genomics, Inc. has been a pioneer and a significant competitor in this market.

North America holds the largest share of the single-cell multi-omics market due to improved healthcare infrastructure, rise in per capita income, and availability of state-of-the-art research laboratories and institutions in the region. Apart from this, Asia-Pacific is anticipated to grow at the fastest CAGR of 21.53% during the forecast period 2020-2025.

The market utilizes several technologies, such as barcoding, sequencing, mass cytometry, and microscopy, for the development of instruments and assays for single-cell analysis of tissue and cells to gain an understanding of cell heterogeneity and cellular mechanism. Each solution offered by the leading players is the combination of next-generation omics tools for application in several clinical areas, such as oncology, neurology, immunology, and pathology.

Key Topics Covered:

1 Product Definition

2 Scope of Work

3 Research Methodology

3.2 Secondary Data Sources

3.3 Market Estimation Model

3.4 Criteria for Company Profiling

4 Market Overview

4.1.1 Technological Development in Single-Cell Sequencing

4.1.1.1 Advancements in Imaging Techniques for Single-Cell Sequencing

4.1.1.2 Advancements in Single-Cell Collection and Analysis System

5 Market Dynamics

5.3.1 Increasing number of Large-Scale Genomics Studies Leveraging Single-Cell RNA Sequencing (sc-RNA)

5.3.2 Increasing Adoption of Personalized Medicine for the Screening and Diagnostics of Genetic Disorders

5.3.3 Increasing Disposable Income in Emerging Economies

5.4.1 High Cost of Single-Cell Analysis and Integration of Data

5.4.2 Limited Availability of Large Online Data Storage and Analysis Platforms

5.5.1 Massive Scope for Adoption of Genomic-Based Medicine in Emerging Nations

5.5.2 Requirement for the Development of Advanced Solutions Based on Single-Cell Technology

5.5.3 Increased Use of Single-Cell Technology Solutions for the Development of Therapeutics Drugs and Comprehensive Treatment Plan

5.5.4 Expansion into New Research Applications such as Single-Cell Metabolomics

6 Competitive Insights

6.1 Market Share Analysis (by Company), 2019

6.2 Growth Share Analysis (Opportunity Mapping)

7 Global Single-Cell Multi-Omics Market (by Product Type)

8 Global Single-Cell Multi-Omics Market (by Omics Type)

8.2 Single-Cell Transcriptomics

9 Global Single-Cell Multi-Omics Market (by Sample Type)

10 Global Single-Cell Multi-Omics Market (by Technique)

10.1 Single-Cell Isolation and Dispensing

10.1.1 Fluorescence-Activated Cell Sorting (FACS)

10.1.3 Magnetic-Activated Cell Sorting (MACS)

10.1.4 Laser Capture Microdissection

10.2.2 Polymerase Chain Reaction

10.2.3 Next-Generation Sequencing

11 Global Single-Cell Multi-Omics Market (by Application)


Challenges for the application of OMICS in OEH

The development of new OMICS technologies is an important first step towards implementation of OMICS markers in OEH. However, similar to other (bio)markers of exposure, susceptibility and effect, the successful implementation of OMICS markers in OEH requires appropriate study designs, thorough validation of markers, and careful interpretation of study results.49–51

Study design

As indicated in table 1 the transcriptome, proteome and metabolome are highly variable over time and are likely to be influenced by the disease process. This indicates that great care should be given to the timing of biological sample collection and adequate processing (eg, field stabilisation of mRNA) of the sample to minimise measurement error and to avoid potential differential misclassification biases. In table 2 the advantages and disadvantages of the different human observational study (HOS) designs with regard to the collection and use of biological markers are given. In general, it can be stated that hospital-based case-control studies are the least suitable for the application of these technologies in HOS research, as they are more prone to selection and differential bias, while prospective studies or cross-sectional studies seem most suitable for such approaches. Moreover, hospital case-control studies are problematic as it is impossible to determine if changes in biomarkers are the cause or consequence of a disease. Semi-longitudinal studies might be extremely powerful for some OMICS technologies such as transcriptomics, proteomics and metabolomics where biological measures are taken before and after exposure or change in disease status. In these study designs each individual serves as their own control eliminating the influence of population variance.

Comparison of advantages and limitations relevant to the collection of biological specimens and data interpretation in molecular epidemiology study designs (adapted from Garcia-Closas et al, 200649)

Validation of biomarkers

The value of an OMICS-based biomarker in OEH depends on the reliability of an assay to qualitatively and quantitatively assess the biomarker and on the association between the biomarker and the biological endpoint of interest (exposure, susceptibility or health effect). The reliability of an assay can be tested by investigating the variability of an assay within and between laboratories and comparing results to the variability of existing assays (standards). A necessary step towards an increase in the reliability of OMICS assays is standardisation. Several initiatives have developed standards for new OMICS assays with regards to comparison to existing techniques (microarray quality control (MAQC)), data formats to describe experimental details (minimum information about a microarray experiment (MIAME)) and assessment of sample quality (external RNA controls consortium (ERCC)).52 53 Once the reliability of assays has been established in the laboratory transitional studies that assess the association between biomarkers and biological endpoints in humans are needed.49 To achieve an accurate estimate of the association between a biomarker and a biological endpoint reliable and valid measurements of exposure and covariates are needed as well.

A true association between a biomarker and a biological endpoint can be obscured by measurement error. To acquire insight in impact of measurement error on the observed association between a biomarker and a biological endpoint a repeated sampling design, at least on part of the population, is necessary. Repeated sampling on individuals will allow researchers to compare biomarker variability within individuals to biomarker variability between individuals. One measure that can be used to assess the variability of biomarkers within and between individuals is the intraclass correlation coefficient, which represents the proportion of the total variance that can be attributed to the between-individual variance.49 The level of measurement error that is acceptable for a biomarker depends on the magnitude of the true association between the biomarker and the biological endpoint of interest. For biomarkers with a dichotomous outcome (eg, genotyping) the accuracy of the biomarker is based on the sensitivity (eg, probability of correctly identifying an SNP) and the specificity (eg, probability of incorrectly identifying an SNP) of the biomarker.

Interpretation of study results

In recent years technological developments have had a major impact on the development of new types of study designs of OMICS-based studies. One trend that has been seen consistent within the different OMICS fields is the enormous increase in resolution of the assays (the number of “endpoints” that can be assessed in a single assay) and throughput of the assays (the number of samples that can be analysed per time period). Many of the improvements are based on the introduction of chip-based assays such as DNA microarrays. A major implication of the possibility to investigate multiple endpoints (eg, up to 1 000 000 SNPs in a single assay) in large populations is the possibility for researchers to move away from hypothesis-based studies (focused on a limited set of endpoints) towards hypothesis-free (agnostic) types of study designs (including much larger sets of endpoints). Although the hypothesis-free studies might contribute considerably to the elucidation of the complex biological processes that underlie clinically manifested health effects, it is important to realise that the interpretation of data generated by these types of studies requires a different approach than the interpretation of data generated by more traditional hypothesis-based studies. In hypothesis-based study designs “frequentist” measures such as 95% confidence intervals or p values provide a reasonably good measure to assess the statistical significance of the study's finding. However, the interpretation of such measures is based on the inclusion of a limited number of hypotheses for which the researchers assume that there is a good possibility that the null hypothesis might be rejected (ie, there is a high prior probability of a true positive finding). In a hypothesis-free analytic approach, a study is initiated without a well-defined hypothesis for each included endpoint investigated (ie, a flat prior probability for each finding). However, as a result of chance, the increased number of possible endpoints in a study is accompanied by higher probability of the possibility of detecting statistically significant false-positive results.54 Therefore, the traditional statistical approaches that are commonly used in epidemiology are of less value in hypothesis-free studies. A current challenge for the OMICS field is the development of (statistical) approaches that can be used for the interpretation of the high-dimensional data generated by these high-throughput techniques. Several statistical strategies (and also approaches in study designs) have been developed to reduce the probability of false-positive results. Examples are the Bonferroni adjustment for multiple significance testing or more sophisticated Bayesian approaches which include estimation of the false-positive report probability.15–17 54 55 However, replication of the initial findings in follow-up studies remains the strongest safeguard against false-positive results. Studies that incorporate thousands of biological endpoints should therefore primarily be seen as discovery studies that can aid the generation of new hypotheses. Therefore, new OMICS studies should incorporate strategies for built-in replication of the study findings. Application of a different analytical technique to test the hypothesis a priori in a second/validation set of samples will reduce the possibility that the initial finding was an artefact of the technology used. A potential strategy for built-in replication is to perform the initial analysis on a subset of well-characterised samples matched on potential confounders and effect modifiers and confirm the findings by using alternative analysis methods on the remaining often larger sample set. A potential problem in OEH research is, however, that replication is often complicated as there are often only a limited number of relatively small studies on a single exposure. Even if another large study can be found on a single exposure replication might still be complicated by the fact that the populations are exposed to different levels.

In addition to aspects that contribute to random error, systematic error (bias) is also a potential threat to the validity of HOS utilising OMICS technologies.56–58 The types of bias that might occur will be largely similar to types of bias that might occur in all HOS. However, issues such as sample collection, handling and storage of samples and analysis technique-specific biases might be especially relevant for studies applying OMICS technologies.57 59 60 Very recently guidelines for the reporting of genetic association studies (STREGA) have been published.61 These guidelines underline the necessity of detailed reporting in publications on genetic association studies to allow scientists to assess the potential of bias in study outcomes. Development of similar guidelines for the other OMICS fields will contribute to the identification of relevant types of bias.

Pathway analysis and systems biology

OMICS technologies will enable researchers to look at the complete complement, expression, and regulation of genes, proteins and metabolites. However, at the present time, most statistical analyses are often based on a (simplistic) one-by-one comparison of markers between exposure and/or disease groups. Recently, analytical tools/databases have become available to perform more integrated analyses of biological functions and changes in biological functions as a result of environmental factors. Examples of such approaches are gene ontology (GO), pathway analysis and structural equation modelling (SEM).62–65 GO is based on a library that consists of gene profiles that are associated with biological processes.66 Gene sets that are identified in microarray experiments as differently expressed are tested for their association with a profile in the GO library.63 In pathway analysis, not only the profile of genes associated with a specific biological process is tested, but also the functional interactions between genes in a profile.62 While still large gaps in the knowledge of biological pathways exist, each new study will contribute to build a base of knowledge necessary for these types of analyses. SEM is a statistical approach that can be used to simultaneously model multiple genes and multiple SNPs within a gene in a hierarchical manner that reflects their underlying role in a biological system.65

The increasing knowledge of biological pathways will facilitate the integration of the separate OMICS fields into systems biology approaches. System biology has been described as a global quantitative analysis of the interaction of all components in a biological system to determine its phenotype.67–69 This integration is facilitated by a continuous increase in computing power and possibilities for data sharing.


MD, PhD, DMD, PharmD, DNP, ScD, or equivalent

  • Free for Harvard-affiliated institutions
  • CTSA member: $935.00
    (Note: this is a 25% discount off the standard fee.)
  • Non-CTSA member: $1250.00
  • Course fees can be modified for participants unable to meet financial requirements due to their geographical location. Email us to inquire about potential eligibility.

The Harvard Catalyst Education Program is accredited by the Massachusetts Medical Society to provide continuing medical education for physicians.

Harvard Catalyst Education Program’s policy requires full participation and the completion of all activity surveys to be eligible for CME credit no partial credit is allowed.


Evolution of Translational Omics: Lessons Learned and the Path Forward (2012)

The completion of the human genome sequence in 2001 and the technologies that have emerged from the Human Genome Project have ushered in a new era in biomedical science. Using technologies in genomics, proteomics, and metabolomics, together with advanced analytical methods in biostatistics, bioinformatics, and computational biology, scientists are developing a new understanding of the molecular and genetic basis of disease. By measuring, in each patient sample, thousands of genetic variations, mutations, or changes in gene and protein expression and activity, scientists are identifying previously unknown, molecularly defined disease states and searching for complex biomarkers that predict responses to therapy and disease outcome.

This new understanding is beginning to shape both the ways in which diseases are managed and how new drugs and tests are being developed and used. For example, Oncotype DX (Paik et al., 2004) is a multiparameter gene expression test that helps determine which patients with early stage breast cancer are at higher risk of recurrence and thus may be more likely to benefit from chemotherapy, while allowing women at lower risk to safely forgo chemotherapy. These patients avoid the toxicities, cost, and quality-of-life issues associated with treatment. Increasingly, drugs are being developed to target specific disease subtypes or mutations, and companion diagnostic tests are being developed to identify the subsets of patients most likely to respond or least likely to suffer serious side effects.

Despite great promise, progress in translating such &ldquoomics-based&rdquo tests into direct clinical applications has been slower than anticipated. This has been attributed to the time-consuming, expensive, and uncertain development

pathway from disease biomarker discovery to clinical test the underdeveloped and inconsistent standards of evidence to assess biomarker validity the heterogeneity of patients with a given diagnosis and the lack of appropriate study designs and analytical methods for these analyses (IOM, 2007). Some also have questioned the excitement afforded omics-based discoveries, suggesting that advancements will have primarily modest effects in patient care (Burke and Psaty, 2007).

Nevertheless, patients themselves recognize the promise of molecularly driven medicine and are looking to the scientific community to provide validated, reliable clinical tests that accurately measure and predict response to treatment and provide more effective ways of screening for disease. Among scientists and clinicians, omics-based tests are seen as presenting opportunities for important new clinical trial design strategies and hopefully reducing the time and cost of developing new treatments (Macconaill and Garraway, 2010).

As is true in all areas of scientific research, rigorous standards must be applied to assess the validity of any study results, particularly if the study involves patients. Recently, the scientific community raised serious concerns about several omics-based tests developed to predict sensitivity to chemotherapeutic agents, developed by investigators at Duke University. The initial papers describing these omics-based tests garnered extensive attention because results suggested a potential major advance in the discovery and use of omics-based tests to direct choice of therapy for individual cancer patients. Almost from the time of initial publication, however, concerns were raised about the validity of these gene expression&ndashbased tests Keith Baggerly and Kevin Coombes of MD Anderson Cancer Center first approached the Duke University principal investigators, Anil Potti and Joseph Nevins, with questions on November 8, 2006 (Baggerly, 2011), soon after the October 22 electronic publication of the article (PubMed, 2006). Clinical investigators at their institution were interested in using the methods, but the statisticians could not reproduce the results with the publicly available data and information. These concerns were heightened upon the publication of an article by Baggerly and Coombes (2009), detailing several errors in the development of the tests, inconsistencies between primary data and data used in the articles, and the inability to reproduce results reported by the investigators. In addition, in July 2010, a letter to the director of the National Cancer Institute (NCI) signed by a group of more than 30 respected statisticians and bioinformatics scientists brought additional scrutiny to these concerns, especially because these omics-based tests were being used in clinical trials to direct patient care (Baron et al., 2010).

Between October 2007 and April 2008, three cancer clinical trials were launched at Duke University, in which patients with lung cancer or breast cancer were assigned to a chemotherapy regimen on the basis of the test results (see Appendix B for additional details).

Dr. Varmus asked the IOM to conduct an independent analysis of the omics-based tests developed at Duke and define evaluation criteria for ensuring high standards of evidence for the development of omics-based tests prior to their use in clinical trials. In an interview for the Cancer Letter, Dr. Varmus summarized the committee&rsquos task:

The Duke episode, from my perspective, was simply another way of illus trating the dangers of not doing it right, not having the right kinds of safeguards. And with my various colleagues, including colleagues at Duke, I asked the Institute of Medicine to do a study. The intention was not to investigate wrongdoing, because that was going to be taken care of in other ways, but to think about what needs to be in place to ensure that correct evaluation of new approaches to cancer care had been undertaken, that we met competing standards, and that the evidence base for changing diagnosis itself or evaluation of responses or, more importantly, choice of therapies&mdashwas based on good evidence. I asked the IOM &hellip to think carefully about what kinds of hoops people need to jump through before new information about cancer is actually used in the clinical setting. The risks are high here. (Goldberg, 2011, p. 4)

NCI biostatistician Lisa McShane provided further motivation for the committee&rsquos work:

I have witnessed the birth of many omics technologies and remain excited about their potential for providing important biological insights and their potential to lead to clinical tests that might improve care for cancer patients. It is important, however, that we understand the challenges and potential pitfalls that can be encountered with use of these technologies. Some unfortunate events at Duke University involving the use of genomic predictors in cancer clinical trials were a major impetus for the formation of this committee. We need to take a step back to evaluate the process by which tests based on omics technologies are developed and determined to be fit for use as a basis for clinical trial designs in which they may be used to determine patient therapy. (McShane, 2010, p. 1-2)

The scientific community needs to address these gaps if we are to realize the full potential of omics research in patient care. Omics technologies not only hold great promise, but also pose substantial risks if not properly developed and validated for clinical use.

With support from NCI, the Food and Drug Administration (FDA), the Centers for Disease Control and Prevention, the U.S. Department of Veterans Affairs, the American Society for Clinical Pathology, and the College of American Pathologists, an IOM committee was charged to identify appropriate evaluation criteria for developing clinically applicable omics-based tests and to recommend an evaluation process for determining when predictive tests using omics-based technologies are fit for use in clinical trials, especially those in which the assay is used to direct patient care (Box 1-1). The IOM appointed a 20-member committee with a broad range of expertise and experience, including experts in discovery and development of omics-based technologies, clinical oncology, biostatistics and bioinformatics, clinical pathology, ethics, patient advocacy, development and regulation of diagnostic tests, university administration, and scientific publication.

An ad hoc committee will review the published literature to identify appropriate evaluation criteria for tests based on &ldquoomics&rdquo technologies (e.g., genomics, epigenomics, proteomics, and metabolomics) that are used as predictors of clinical outcomes. The committee will recommend an evaluation process for determining when predictive tests based on omics technologies are fit for use as a basis for clinical trial design, including stratification of patients and predicting response to therapy in clinical trials. The committee will identify criteria important for the analytical validation, qualification, and utilization components of test evaluation.

The committee will apply these evaluation criteria to predictive tests used in three cancer clinical trials conducted by Duke University investigators (NCT00509366, NCT00545948, NCT00636441). For example, the committee may assess the analytical methods used to generate and validate the predictive models, examine how the source data that were used to develop and test the predictive models were generated or acquired, assess the quality of the source data, and evaluate the appropriateness of the use of the predictive models in clinical trials.

The committee will issue a report with recommendations regarding criteria for using models that predict clinical outcomes from genomic expression profiles and other omics profiles in future clinical trials, as well as recommendations on appropriate actions to ensure adoption and adherence to the recommended evaluation process. The report will also include the committee&rsquos findings regarding the three trials in question.

Before the IOM committee convened for its first meeting, investigators at Duke concluded that the omics-based tests used in the three clinical trials were invalid. They terminated the clinical trials, and began the process of retracting the papers describing the development of the tests. As a result, the committee did not undertake a detailed analysis of the data and computer code used in the development of those tests. Rather, the committee focused on how errors in the development process resulted in those tests being used in clinical trials before they were fully validated, and on developing best practices that would prevent invalid tests from progressing to the clinical testing stage in the future.

A rigorous process was undertaken in the development of the committee&rsquos recommendations that included a review of the field of omics-based research, the processes necessary for verification and validation of omics-based tests, examination of what transpired in the development of the omics-based tests listed in the statement of task as well as other case studies of omics-based test development selected by the committee, and identification of the parties responsible for funding, oversight, and publication of results. Recommendations developed by the committee should be considered a roadmap critical to omics-based test development. The recommendations address the roles and responsibilities of all partners involved in the process, including individual scientists, their institutions, funding agencies that support the work, journals that publish the results of these studies, and FDA, which ultimately helps to define how these tests will make their way to clinical application.

The processes and criteria for adoption and use of omics-based tests in standard clinical practice are outside the scope of this report. The process of taking an omics-based test into clinical trials to evaluate a test for clinical utility and use is described, but no recommendation is made on how, finally, to take a test from the clinical trial setting into clinical practice. However, discussion of this step is critical for understanding the recommendations of the committee because this step may involve using an omics-based test to direct patient management in clinical trials, which is within the charge of the committee. Regardless, if an omics-based test is to be considered for use in clinical practice, one of three pathways needs to be followed to determine clinical utility, and all of these require a fully specified and validated omics-based test. When considering the parties responsible in the development of omics-based tests, the committee considered international funders to be outside the scope of the recommendations. Issues specific to tests that fall outside the committee&rsquos definition of omics-based tests, such as single gene tests and whole genome sequencing, are also not addressed.

It is important to note that the IOM&rsquos study is in no way linked to the concurrent scientific misconduct investigation at Duke University, and that inquiries about misconduct were not within this committee&rsquos purview.

Precise definitions and use of correct terminology are important for ensuring understanding, especially given the complexity of the rapidly expanding field of omics. The committee defined terminology that was central to its deliberations and recommendations (Box 1-2). Where possible, the committee used widely accepted definitions, such as those from the Biomarkers Definition Working Group. The terms &ldquoanalytical validation,&rdquo &ldquoclinical validation,&rdquo and &ldquoclinical utility&rdquo have been adapted from the widely used definitions of the Evaluation of Genomic Applications in Practice and Prevention initiative, established by the Centers for Disease Control and Prevention (Teutsch et al., 2009). The committee has adapted this terminology by incorporating statistics and bioinformatics validation through use of the term &ldquoclinical/biological validation.&rdquo

Analytical Validation: Traditionally, &ldquoassessing [an] assay and its measurement performance characteristics, determining the range of conditions under which the assay will give reproducible and accurate data. a With respect to omics, assessing a test&rsquos &ldquoability to accurately and reliably measure the &hellip analyte[s] &hellip of interest in the clinical laboratory, and in specimens representative of the population of interest.&rdquo b

Biomarker: &ldquoA characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a[n] &hellip intervention.&rdquo c

Clinical Utility: &ldquoEvidence of improved measurable clinical outcomes, and [a test&rsquos] usefulness and added value to patient management decision-making compared with current management without [omics] testing.&rdquo b

Clinical/Biological Validation: Assessing a test&rsquos &ldquoability to accurately and reliably predict the clinically defined disorder or phenotype of interest.&rdquo b

Cross-validation: A statistical method for preliminary confirmation of a computational model&rsquos performance using a single dataset, by dividing the data into multiple segments, and iteratively fitting the model to all but one segment and then evaluating its performance on the remaining segment.

Effect Modifier: A measure that identifies patients most likely to be sensitive or resistant to a specific treatment regimen or agent. An effect modifier is particularly useful when that measure can be used to identify the subgroup of patients for whom treatment will have a clinically meaningfully favorable benefit-to-risk profile.

High-Dimensional Data: Large datasets characterized by the presence of many more predictor variables than observations, such as datasets that result from measurements of hundreds to thousands of molecules in a relatively small number of biological samples. The analysis of such datasets requires appropriate computing power and statistical methods.


Watch the video: Introduction to Metabolomics (May 2022).