We are searching data for your request:
Upon completion, a link will appear to access the found materials.
From RNA-seq reads to differential expression results
Many methods and tools are available for preprocessing high-throughput RNA sequencing data and detecting differential expression.
High-throughput sequencing technologies are now in common use in biology. These technologies produce millions of short sequence reads and are routinely being applied to genomes, epigenomes and transcriptomes. Sequencing steady-state RNA in a sample, known as RNA-seq, is free from many of the limitations of previous technologies, such as the dependence on prior knowledge of the organism, as required for microarrays and PCR (see Box 1: Comparisons of microarrays and sequencing for gene expression analysis). In addition, RNA-seq promises to unravel previously inaccessible complexities in the transcriptome, such as allele-specific expression and novel promoters and isoforms [1–4]. However, the datasets produced are large and complex and interpretation is not straightforward. As with any high-throughput technology, analysis methodology is critical to interpreting the data, and RNA-seq analysis procedures are continuing to evolve. Therefore, it is timely to review currently available data analysis methods and comment on future research directions.
Making sense of RNA-seq data depends on the scientific question of interest. For example, determining differences in allele-specific expression requires accurate determination of the prevalence of transcribed single nucleotide polymorphisms (SNPs) . Alternatively, fusion genes or aberrations in cancer samples can be detected by finding novel transcripts in RNA-seq data [6, 7]. In the past year, several methods have emerged that use RNA-seq data for abundance estimation [8, 9], detection of alternative splicing [10–12], RNA editing  and novel transcripts [11, 14]. However, the primary objective of many biological studies is gene expression profiling between samples. Thus, in this review we focus on the methodologies available to detect differences in gene level expression between samples. This sort of analysis is particularly relevant for controlled experiments comparing expression in wild-type and mutant strains of the same tissue, comparing treated versus untreated cells, cancer versus normal, and so on. For example, comparison of expression changes between the cultured pathogen Acinetobacter baumannii and the pathogen grown in the presence of ethanol - which is known to increase virulence - revealed 49 differentially expressed genes belonging to a range of functional categories . Here we outline the processing pipeline used for detecting differential expression (DE) in RNA-seq and examine the available methods and open-source software tools to perform the analysis. We also highlight several areas that require further research.
Most RNA-seq experiments take a sample of purified RNA, shear it, convert it to cDNA and sequence on a high-throughput platform, such as the Illumina GA/HiSeq, SOLiD or Roche 454 . This process generates millions of short (25 to 300 bp) reads taken from one end of the cDNA fragments. A common variant on this process is to generate short reads from both ends of each cDNA fragment, known as 'paired-end' reads. The platforms differ substantially in their chemistry and processing steps, but regardless of the precise details, the raw data consist of a long list of short sequences with associated quality scores these form the entry point for this review.
An overview of the typical RNA-seq pipeline for DE analysis is outlined in Figure 1. First, reads are mapped to the genome or transcriptome. Second, mapped reads for each sample are assembled into gene-level, exon-level or transcript-level expression summaries, depending on the aims of the experiment. Next, the summarized data are normalized in concert with the statistical testing of DE, leading to a ranked list of genes with associated P-values and fold changes. Finally, biological insight from these lists can be gained by performing systems biology approaches, similar to those performed on microarray experiments. We critique below the currently available methodologies for each of these steps for RNA-seq data analysis. Rather than providing a complete list of all available tools, we focus on examples of commonly used open-source software that illustrate the methodology (Table 1). For a complete list of RNA-seq analysis software, see [17, 18].
Overview of the RNA-seq analysis pipeline for detecting differential expression. The steps in the pipeline are in red boxes the methodological components of the pipeline are shown in blue boxes and bold text software examples and methods for each step (a non-exhaustive list) are shown by regular text in blue boxes. References for the tools and methods shown are listed in Table 1. First, reads are mapped to the reference genome or transcriptome (using junction libraries to map reads that cross exon boundaries) mapped reads are assembled into expression summaries (tables of counts, showing how may reads are in coding region, exon, gene or junction) the data are normalized statistical testing of differential expression (DE) is performed, producing and a list of genes with associated P-values and fold changes. Systems biology approaches can then be used to gain biological insights from these lists.
Types of Centrifugation Method
There are commonly three kinds of centrifugation techniques:
Density gradient Centrifugation
It is another centrifugation method in which the sample containing different molecules mixes with the high-density solution. Density gradient centrifugation makes the use of substrates (sucrose, glycerol etc.) to prepare a solution for the development of a density gradient.
For separating different concentrates, the solution (where the sample has to be mixed) should have low concentration and high diffusing property. Before centrifugation of the sample solution, it should be stored in the uniform mixture for some time.
On centrifugation, the different molecules of different density will occupy the place in a density gradient, respective to their masses. Finally, a concentrated layer of the particles is isolated by puncturing the centrifuge tube.
- Rate zonal centrifugation: In this kind, a sample is centrifuged in a pre–established density gradient.
- Isopycnic centrifugation: Here, a self-generating gradient forms during the centrifugation of the sample.
Rate zonal Centrifugation
It refers to the velocity centrifugation, where the separation of the particles occurs based on their sedimentation rate that directly depends upon the particle mass or size. The particles’ shape also influences the rate of sedimentation.
It makes the use of a high-density sucrose solution of 5-20% to create a density gradient. The sedimentation of dense particles occurs at the bottom of the centrifugal tube, and the light particles remain at the top. Therefore, zonal centrifugation develops discrete bands or zones, respective to the density of the particles present in the sample.
For the separation of bands, a centrifuge tube is prickled from the bottom. The sedimentation coefficient and the mass of the molecule of interest can be determined by this technique. The particle size or mass may affect sedimentation efficiency, as the sedimentation coefficient is proportional to the mass of a particle.
It refers to the equilibrium density gradient centrifugation where the particles are concentrated based on their buoyant density-independent of particle size and shape. The time taken for the centrifugation of particles does not affect the rate of sedimentation.
This method is employed for concentrating such particles possessing the same size but different density. In isopycnic centrifugation, 20-70% sucrose solution develops a gradient to separate the different molecules. The sedimentation of biomolecules will not occur until the buoyant density does not equal the density of the gradient.
It is a centrifugation method that depends upon the differences in the sedimentation rate that directly influences the separation of different components according to their relative mass, shape and density.
Generally, it separates the subcellular components. In differential centrifugation, blend the cells. Pour the ruptured cell components into a centrifuge tube that preserves the integrity of the cell components.
Then, the suspended components in a solution are subjected to initial centrifugation at low centrifugal force for a defined time. The initial centrifugation causes sedimentation of larger molecules at the bottom (as the pellets) and supernatant at the top.
After centrifugation under a high centrifugal force, the supernatant is repeatedly decanted to sediment different cellular components. Therefore, the centrifugal force plus the centrifugal time increases successively after each step.
This method increases the earth’s gravitational force by using centrifugal force (1,000,000 times) greater than the gravitational force. Ultracentrifugation causes sedimentation of particles as small as 10kDa.
In a homogenous mixture, the dense particles will appear as the pellets at the solvent sink, and the lighter particles will appear as supernatant at the solvent top. Ultracentrifugation uses a dense solution of sucrose or caesium chloride to concentrate particles.
Uses of Centrifugation
Following are the applications of the centrifugation technique:
- Environmental science: In the treatment of wastewater.
- Molecular science: Helps in the extraction of biomolecules like DNA, RNA, protein etc.
- Medical research: Helps in the separation of different components from urine, blood serum etc.
- Chemical science: Helps in the process of uranium enrichment.
- Food science: Helps in the production of skimmed milk by removing fat.
Therefore, we can conclude that the centrifugation technique has broad applicability in research biochemistry, molecular biology, cellular biology and medical science.
- Preface to the First Edition
- Pedagogical Struggles
- Crystallizing and Focusing – My Way
- How to Use this Book in the Classroom
- Modeling Definitions
- Modeling Essential System Features
- Primary Focus: Dynamic (Dynamical) System Models
- Measurement Models & Dynamic System Models Combined: Important!
- Top-Down & Bottom-Up Modeling
- Source & Sink Submodels: One Paradigm for Biomodeling with Subsystem Components
- Systems, Integration, Computation & Scale in Biology
- Overview of the Modeling Process & Biomodeling Goals
- Looking Ahead: A Top-Down Model of the Chapters
- Some Basics & a Little Philosophy
- Algebraic or Differential Equation Models
- Differential & Difference Equation Models
- Different Kinds of Differential & Difference Equation Models
- Linear & Nonlinear Mathematical Models
- Piecewise-Linearized Models: Mild/Soft Nonlinearities
- Solution of Ordinary Differential (ODE) & Difference Equation (DE) Models
- Special Input Forcing Functions (Signals) & Their Model Responses: Steps & Impulses
- State Variable Models of Continuous-Time Systems
- Linear Time-Invariant (TI) Discrete-Time Difference Equations (DEs) & Their Solution
- Linearity & Superposition
- Laplace Transform Solution of ODEs
- Transfer Functions of Linear TI ODE Models
- More on System Stability
- Looking Ahead
- Initial-Value Problems
- Graphical Programming of ODEs
- Time-Delay Simulations
- Multiscale Simulation and Time-Delays
- Normalization of ODEs: Magnitude- & Time-Scaling
- Numerical Integration Algorithms: Overview
- The Taylor Series
- Taylor Series Algorithms for Solving Ordinary Differential Equations
- Computational/Numerical Stability
- Self-Starting ODE Solution Methods
- Algorithms for Estimating and Controlling Stepwise Precision
- Taylor Series-Based Method Comparisons
- Stiff ODE Problems
- How to Choose a Solver?
- Solving Difference Equations (DEs) Using an ODE Solver
- Other Simulation Languages & Software Packages
- Two Population Interaction Dynamics Simulation Model Examples
- Taking Stock & Looking Ahead
- Compartmentalization: A First-Level Formalism for Structural Biomodeling
- Mathematics of Multicompartmental Modeling from the Biophysics
- Nonlinear Multicompartmental Biomodels: Special Properties & Solutions
- Dynamic System Nonlinear Epidemiological Models
- Compartment Sizes, Concentrations & the Concept of Equivalent Distribution Volumes
- General n-Compartment Models with Multiple Inputs & Outputs
- Data-Driven Modeling of Indirect & Time-Delayed Inputs
- Pools & Pool Models: Accommodating Inhomogeneities
- Recap & Looking Ahead
- Output Data (Dynamical Signatures) Reveal Dynamical Structure
- Multicompartmental Model Dimensionality, Modal Analysis & Dynamical Signatures
- Model Simplification: Hidden Modes & Additional Insights
- Biomodel Structure Ambiguities: Model Discrimination, Distinguishability & Input–Output Equivalence
- *Algebra and Geometry of MC Model Distinguishability
- Reducible, Cyclic & Other MC Model Properties
- Tracers, Tracees & Linearizing Perturbation Experiments
- Recap and Looking Ahead
- Kinetic Interaction Models
- Law of Mass Action
- Reaction Dynamics in Open Biosystems
- Enzymes & Enzyme Kinetics
- Enzymes & Introduction to Metabolic and Cellular Regulation
- Extensions: Quasi-Steady State Assumption Theory
- Enzyme-Kinetics Submodels Extrapolated to Other Biomolecular Systems
- Coupled-Enzymatic Reactions & Protein Interaction Network (PIN) Models
- Production, Elimination & Regulation Combined: Modeling Source, Sink & Control Components
- The Stoichiometric Matrix N
- Special Purpose Modeling Packages in Biochemistry, Cell Biology & Related Fields
- Stochastic Dynamic Molecular Biosystem Modeling
- When a Stochastic Model is Preferred
- Stochastic Process Models & the Gillespie Algorithm
- Physiologically Based (PB) Modeling
- Experiment Design Issues in Kinetic Analysis (Caveats)
- Whole-Organism Parameters: Kinetic Indices of Overall Production, Distribution & Elimination
- Noncompartmental (NC) Biomodeling & Analysis (NCA)
- Recap & Looking Ahead
- Stability of NL Biosystem Models
- Stability of Linear System Models
- Local Nonlinear Stability via Linearization
- Bifurcation Analysis
- Oscillations in Biology
- Other Complex Dynamical Behaviors
- Nonlinear Modes
- Recap & Looking Ahead
- Basic Concepts
- Formal Definitions: Constrained Structures, Structural Identifiability & Identifiable Combinations
- Unidentifiable Models
- SI Under Constraints: Interval Identifiability with Some Parameters Known
- SI Analysis of Nonlinear (NL) Biomodels
- What’s Next?
- Sensitivity to Parameter Variations: The Basics
- State Variable Sensitivities to Parameter Variations
- Output Sensitivities to Parameter Variations
- *Output Parameter Sensitivity Matrix & Structural Identifiability
- *Global Parameter Sensitivities
- Recap & Looking Ahead
- Biomodel Parameter Estimation (Identification)
- Residual Errors & Parameter Optimization Criteria
- Parameter Optimization Methods 101: Analytical and Numerical
- Parameter Estimation Quality Assessments
- Other Biomodel Quality Assessments
- Recap and Looking Ahead
- Prospective Simulation Approach to Model Reliability Measures
- Constraint-Simplified Model Quantification
- Model Reparameterization & Quantifying the Identifiable Parameter Combinations
- The Forcing-Function Method
- Multiexponential (ME) Models & Use as Forcing Functions
- Model Fitting & Refitting With Real Data
- Recap and Looking Ahead
- Physiological Control System Modeling
- Neuroendocrine Physiological System Models
- Structural Modeling & Analysis of Biochemical & Cellular Control Systems
- Transient and Steady-State Biomolecular Network Modeling
- Metabolic Control Analysis (MCA)
- Recap and Looking Ahead
- Statistical Criteria for Discriminating Among Alternative Models
- Macroscale and Mesoscale Models for Elucidating Biomechanisms
- Mesoscale Mechanistic Models of Biochemical/Cellular Control Systems
- Candidate Models for p53 Regulation
- Recap and Looking Ahead
- A Formal Model for Experiment Design
- Input–Output Experiment Design from the TF Matrix
- Graphs and Cutset Analysis for Experiment Design
- Algorithms for Optimal Experiment Design
- Sequential Optimal Experiment Design
- Recap and Looking Ahead
- Local and Global Parameter Sensitivities
- Model Reduction Methodology
- Parameter Ranking
- Added Benefits: State Variables to Measure and Parameters to Estimate
- Global Sensitivity Analysis (GSA) Algorithms
- What’s Next?
- Transform Methods
- Laplace Transform Representations and Solutions
- Key Properties of the Laplace Transform (LT) & its Inverse (ILT)
- Short Table of Laplace Transform Pairs
- Laplace Transform Solution of Ordinary Differential Equations (ODEs)
- Vector Spaces (V.S.)
- Linear Equation Solutions
- Measures & Orthogonality
- Matrix Analysis
- Matrix Differential Equations
- Singular Value Decomposition (SVD) & Principal Component Analysis (PCA)
- Inputs & Outputs
- Dynamic Systems, Models & Causality
- Input–Output (Black-Box) Models
- Time-Invariance (TI)
- Continuous Linear System Input–Output Models
- Structured State Variable Models
- Discrete-Time Dynamic System Models
- Composite Input–Output and State Variable Models
- State Transition Matrix for Linear Dynamic Systems
- The Adjoint Dynamic System
- Equivalent Dynamic Systems: Different Realizations of State Variable Models – Nonuniqueness Exposed
- Illustrative Example: A 3-Compartment Dynamic System Model & Several Discretized Versions of It
- Transforming Input–Output Data Models into State Variable Models: Generalized Model Building
- Basic Concepts and Definitions
- Observability and Controllability of Linear State Variable Models
- Linear Time-Varying Models
- Linear Time-Invariant Models
- Output Controllability
- Output Function Controllability
- Controllability and Observability with Constraints
- Positive Controllability
- Relative Controllability (Reachability)
- Conditional Controllability
- Structural Controllability and Observability
- Observability and Identifiability Relationships
- Controllability and Observability of Stochastic Models
- Realizations (Modeling Paradigms)
- The Canonical Decomposition Theorem
- How to Decompose a Model
- Controllability and Observability Tests Using Equivalent Models
- Observable and Controllable Canonical Forms from Arbitrary State Variable Models Using Equivalence Properties
- Additional Predictor-Corrector Algorithms
- Derivation of the Akaike Information Criterion (AIC)
- The Stochastic Fisher Information Matrix (FIM): Definitions & Derivations
“Professor Joe” – as he is called by his students – is a Distinguished Professor of Computer Science and Medicine and Chair of the Computational & Systems Biology Interdepartmental Program at UCLA – an undergraduate research-oriented program he nurtured and honed over several decades. As an active full-time member of the UCLA faculty for nearly half a century, he also developed and led innovative graduate PhD programs, including Computational Systems Biology in Computer Science, and Biosystem Science and Engineering in Biomedical Engineering. He has mentored students from these programs since 1968, as Director of the UCLA Biocybernetics Laboratory, and was awarded the prestigious UCLA Distinguished Teaching Award and Eby Award for Creative Teaching in 2003, and the Lockeed-Martin Award for Teaching Excellence in 2004. Professor Joe also is a Fellow of the Biomedical Engineering Society. Visiting professorships included stints at universities in Canada, Italy, Sweden and the UK and he was a Senior Fulbright-Hays Scholar in Italy in 1979.
Professor Joe has been very active in the publishing world. As an editor, he founded and was Editor-in-Chief of the Modeling Methodology Forum – a department in seven of the American Journals of Physiology – from 1984 thru 1991. As a writer, he authored or coauthored both editions of Feedback and Control Systems (Schaum-McGraw-Hill 1967 and 1990), more than 200 research articles, and recently published his opus textbook: Dynamic Systems Biology Modeling and Simulation (Academic Press/Elsevier November 2013 and February 2014).
Much of his research has been based on integrating experimental neuroendocrine and metabolism studies in mammals and fishes with data-driven mathematical modeling methodology – strongly motivated by his experiences in “wet-lab”. His seminal contributions to modeling theory and practice are in structural identifiability (parameter ambiguity) analysis, driven by experimental encumbrances. He introduced the notions of interval and quasi-identifiablity of unidentifiable dynamic system models, and his lab has developed symbolic algorithmic approaches and new internet software (web app COMBOS) for computing identifiable parameter combinations. These are the aggregate parts of otherwise unidentifiable models that can be quantified – with broad application in model reduction (simplification) and experiment design. His long-term contributions to quantitative understanding of thyroid hormone production and metabolism in mammals and fishes have recently been crystallized into web app THYROSIM – for internet-based research and teaching about thyroid hormone dynamics in humans.
Last but not least, Professor Joe is a passionate straight-ahead jazz saxophone player (alto and tenor), an alternate career begun in the 1950s in NYC at Stuyvesant High School – temporarily suspended when he started undergrad school, and resumed again in middle-age. He recently added flute to his practice schedule and he and his band – Acoustically Speaking –can be found occasionally gigging in Los Angeles or Honolulu haunts.
Affiliations and Expertise
Distinguished Professor Computer Science, Medicine & Biomedical Engineering Chair, Computational & Systems Biology Interdepartmental Program UCLA Los Angeles CA
Chad Brassil, a colleague in the School of Biological Sciences, does research in the area of theoretical ecology by utilizing mathematics, principally nonlinear differential equations, as a tool for understanding ecology and evolution. A primary theoretical interest of his has been examining the implications of temporal variation for fundamental ecological theory. Current work is incorporating evolutionary dynamics into models of tropical diversity. In addition, he utilizes maximum likelihood techniques to bridge the gap between theoretical and empirical ecology.
Bo Deng has interests in Mathematical Biology which include: the origins and the evolution of DNA codes, electrical neurophysiology and neural communication, foodweb chaos and ecological stability, disease dynamics and epidemic modeling. The main tools which Professor Deng uses in his research activities include: information and communication theory, circuitry, differential equations, qualitative theory of dynamical systems, and applied nonlinear analysis. Through modeling, Professor Deng hopes to use mathematics to gain better understanding on biological processes.
Huijing Du's main research develops computational and mathematical models for studying biological problems in a quantitative manner. With a new 3D hybrid framework which includes the discrete stochastic Subcellular Element Model, continuum reaction-diffusion-advection partial differential equations and a stochastic decision-making model for cell lineage transition, her results demonstrate how a modeling approach coupling biologically relevant scales can provide new insights into the complex biological problems related to intestinal crypt structure, embryonic development and epidermal tissue regeneration.
Yu Jin has research interest in applied mathematics with the main focus on dynamical systems and mathematical biology. Her research work is the conjoining of nonlinear dynamics and biology. This includes the establishment of appropriate mathematical models (mainly ordinary/partial/functional differential equations and difference equations) for phenomena in spatial ecology, population dynamics, and epidemiology, as well as mathematical and computational analysis for models. Her current research is mainly focused on spatial population dynamics, especially on population spread and persistence in streams or rivers.
Emeritus Professor Glenn Ledder works in mathematical modeling for life sciences and physical sciences. He is currently working with an international team of plant scientists (including Sabrina Russo of the UNL School of Biological Sciences) to develop a tree physiology model that connects water flow and photosynthesis to soil properties and environmental conditions. The long-term goal of the project is to incorporate the tree physiology model into a dynamic energy budget model that can track the growth of a tree over time in various settings. Such a model would be useful in predicting responses of tree communities to climate change. He is also interested in models that combine population dynamics of predator-prey systems with optimal foraging.
Richard Rebarber has research interests in Mathematical Ecology and in Distributed Parameter Control Theory. His research in Ecology is in population dynamics, including: the effect of parameter uncertainty (such as modeling error) and perturbations (such as global warming) on long-term and transient population growth the application of robust control theory methods to population analysis and management stability properties of nonlinear models and analysis of models with stochasticity.
Emeritus Professor Thomas Shores is interested in the numerical solutions of ordinary and partial differential equations, especially those singular and nonlinear equations which are amenable to sinc methods. He is also interested in numerical methods for problems in inverse theory, especially parameter identification problems in ordinary and partial differential equations. More generally, he has research interests in issues dealing with scientific computation. Finally, he is interested in the mathematical modeling of populations and porous medium problems.
Brigitte Tenhumberg uses stochastic, discrete time models tailored to specific biological systems to advance the understanding of ecological processes. The models she uses include stochastic dynamic programming, matrix models, and agent based simulation models. One area of research emphasis is optimal decision making of animals (foraging or life history decisions) or humans (management of wildlife populations). Recent work addresses topics in invasion ecology, in particular understanding ecological mechanisms promoting ecosystem resistance to invasions.
Drew Tyre is a colleague in the School of Natural Resources. His work focuses on using statistical and mechanistic models of single species population dynamics to help managers make better informed decisions about both game and non-game species. He is interested in applying robust control methods to structured population models, optimization methods to conservation and utilization decisions, and Bayesian Hierarchical models to survey and harvest data.
Emeritus Professor Steve Dunbar has research interests in nonlinear differential equations, and applied dynamical systems, particularly those which arise in mathematical biology. In conjunction with his work with differential equation models and systems of mathematical biology, he is also interested in stochastic processes, the numerical and computer-aided solution of differential equations, and mathematical modeling. He also is interested in issues of mathematical education at the high school and collegiate level. He is the Director of the American Mathematics Competitions program of the Mathematical Association of America which sponsors middle school and high school mathematical competitions leading to the selection and training of the USA delegation to the annual International Mathematical Olympiad. In addition, he has interests in documenting trends in collegiate mathematics course enrollments and using mathematical software to teach and learn mathematics.
Emeritus Professor Wendy Hines does research in dynamical systems. She is interested in the general theory and also applications to delay equations and partial differential equations. Currently she is working on a reaction-diffusion equation with nonlocal diffusion which models gene propagation through a population. This is a very interesting problem as very little has be done on it and it defies the application of standard reaction-diffusion methods.
Emeritus Professor David Logan works in the areas of applied mathematics and ecological modeling. His interests include ordinary and partial differential equations, difference equations, and stochastic processes. His current research in mathematical ecology includes work on nutrient cycling, physiologically-structured population dynamics, the effects of global climate change on ecosystems and food webs, and insect eco-physiology.
What is Selective Media
Selective media refer to a type of growth media that allows the growth of selected microorganisms in the medium. For example, if a particular microorganism is resistant to a particular antibiotic such as tetracycline or ampicillin, that antibiotic can be added to the medium, prohibiting the growth of other microorganisms in that medium. Selective growth media also ensures the survival and proliferation of microorganisms with certain properties. The gene that gives the ability to grow in the selective medium is known as the marker. Eukaryotic cells can also be grown in selective media. The selective media for eukaryotic cells commonly contain neomycin. A microorganism that grows on a selective agar medium is shown in figure 1.
What is Continuous Variation?
In continuous variation, a series of successive changes of a particular characteristic in a population is demonstrated from one extreme to the other without a break. Different characteristics of a population might show continuous variation. Such characteristics are formed by the combined effect of polygenes and environmental factors. If a population of cows is considered as an example, the milk yield is not only influenced by genetic factors but also by environmental factors. If the genetic factors are present for a high yield of milk, it can be suppressed by environmental factors like quality of pasture, inadequate diet, extreme weather conditions, diseases, etc.
The frequency distribution of a characteristic that presents a continuous variation is a normal distribution curve with a typical bell shape. In such a curve, the mean, mode and median are considered to be the same. The height of humans, weight, hand span and shoe size are several examples of continuous variation.
Figure 01: Shape of the Distribution of a Continuous Variation
As shown in the above figure, continuous variation fluctuates around an average (mean) of species. This variation shows a smooth bell shaped curve within the population. Continuous variations are common, and they do not disturb the genetic system. Moreover, these variations are caused due to polygenic inheritance and are often affected by environmental influences.
Results and discussion
The conceptual framework of MethylPurify is shown in Figure 1. Under the assumption that tumor tissues often contain two major components of cells, that is, tumor and normal, MethylPurify only takes WGBS or RRBS data from a tumor tissue as input, and tries to infer the unknown fraction of normal cells within. After removing duplicated reads and mapping them with BS-map , MethylPurify divides the reference genome into small 300 bp bins and assigns reads mapped to each bin. The true methylation levels in most bins are similar between the two components and thus not informative to tumor purity inference or differential methylation analysis. Instead, MethylPurify aims to find informative bins that have differential methylation between normal and tumor cells, and use them to help infer the tumor purity and the methylation level of each component. It relies on the following characteristics of the DNA methylome data: (1) all CpG cytosines within a short genomic interval (approximately 300 bp) in a pure cell population share similar methylation levels which are either mostly methylated or mostly unmethylated  (2) the number of bisulfite reads mapped to each genomic interval to tumor and normal cells are in accordance with their relative compositions in the mixture, subject to standard sampling noise.
Overview of MethylPurify. (a) A differentially methylated region (DMR) between tumor and normal cells. Solid and hollow red circles represent methylated and unmethylated cytosines, respectively. (b) Short reads from two cell populations after bisulfite treatment and sonication. (c) A library of bisulfite reads in a mixture of two cell populations. (d) EM algorithm iteratively estimates three parameters: the minor composition (α1) and the methylation level of each population (m1, m2) in M step, and assigns reads to each population in E step. (e) Among all 300 bp bins, the parameters estimated from informative bins converge on a final mixing ratio estimate. (f) Top, density plot of predicted minor component from selected informative bins. Bottom, separated methylation level of tumor and normal cells based on the predicted mixing ratio, and DMRs are detected as consecutive differentially methylated bins (DMBs).
MethylPurify uses the following mixture model to estimate the two components in the tumor methylome data. Given a mixture of bisulfite reads from two components, the relative compositions of the minor and major components can be represented as α1 and 1 - α1, and the methylation levels of the two components within each 300 bp bin can be represented as m1 and m2, respectively. Given initial parameter values of α1, m1, and m2, each read in a bin can be assigned to its most likely component given the read assignment in a bin, parameter values of α1, m1, and m2 can be re-estimated to maximize the probability of seeing the specified read assignment. For each 300 bp bin across the genome, MethylPurify uses expectation maximization (EM) to iteratively estimate parameters and assign reads until convergence (see Methods section for details).
Due to the sampling noise and other confounding biases, α1 estimates from individual bins will be distributed around the true value. To reach a more reliable mixing ratio from all α1 estimates, MethylPurify uses the following bootstrapping approach to prioritize the informative bins. First, it selects only bins with over 10 CpGs, 10-fold read coverage (termed qualifying bins thereafter), then samples equal number of reads as the actual number of reads in each bin with replacement 50 times to get 50 sets of EM converged α1, m1, and m2 parameters. To avoid complications of copy number aberrations (CNA) in cancer at this step, MethylPurify filters bins in regions with frequent copy number alterations as well as their 1,000 bp flanking regions, and only selects one qualifying bin within each CpG island. Then MethylPurify finds the 500 bins with the smallest parameter variance in the 50 sampling and uses the mode of their α1 estimate as the α1 for the whole tumor sample (Figure 1e,f). With the sample α1, a few EM iterations in each bin could quickly converge on the m1 and m2 estimates and read assignment across the genome. To avoid local maxima of EM, MethylPurify starts from two distinct initial values of m1 and m2 in each bin, representing α1 component being hyper- and hypo-methylated, and the convergence point with higher likelihood is selected as the final prediction (see Methods section for details).
The output of MethylPurify will report the mixing ratio of the two components (α1: 1 - α1) in the whole sample and the methylation level of each component (m1 and m2) in each qualifying bin across the genome. MethylPurify could also detect differentially methylated regions (DMRs) as consecutive differentially methylated bins (DMBs).
Inference of mixing ratio from simulated mixture of bisulfite reads from tumor and normal cell lines
To validate MethylPurify in estimating the mixing ratio, we used simulated mixture of whole genome bisulfite sequencing data from two separate breast cell lines . HCC1954 cell line (thereafter refer to as HCC) is derived from an estrogen receptor (ER)/progesterone receptor (PR) negative and ERBB2 positive breast tumor, and human mammary epithelial cell line (HMEC) is immortalized from normal breast epithelial cells. Bisulfite sequencing for the two cell lines have slightly different read lengths (approximately 70 to 100 bp) and sequencing coverage (27-fold and 20-fold, respectively). We randomly sampled bisulfite reads from the two cell lines at 20-fold total coverage with varying mixing ratios from 0:1 (all HMEC) to 1:0 (all HCC) with a step of 0.05.
We first examined how the parameter estimation varies with changing inputs. At different mixing ratios, the average variance (of all qualifying bins by bootstrapping) of the minor component percentage α1 is very small and stable (Figure 2a). The variance of α1 initially increases with the mean of α1, but is suppressed as α1 approaches 0.5 since α1 is designated as the minor component to be always ≤0.5 in our model. In contrast, the estimated methylation level of the minor component m1 is the most variable. This is reasonable because at low α1 (close to 0), the minor component has very little read coverage at high α1 (close to 0.5), it is sometimes difficult to determine which component is minor so m1 could fluctuate depending on whether MethylPurify assigns the methylated or unmethylated reads to the minor component.
Parameter estimation and properties of informative bins. (a) Averaged standard deviations of three free parameters at different mixing ratios by bootstrap sampling of HCC (breast cancer) and HMEC (normal mammary epithelial) cell lines for all qualifying bins. (b) Predicted mixing ratios at different standard derivation cutoffs of the minor composition. (c, d) Some properties of informative bins compared with genome-wide background. (c) The number of CpG counts in informative bins vs. the rest bins with over 10 fold read coverage. (d) Distribution of predicted methylation levels of informative bins in each cell line.
Since m1 is the most variable among the three parameters and dominates the sum of the variances, MethylPurify later only uses the standard deviation (stdev) of m1 from bootstrapping to rank all qualifying bins. Indeed, the informative bins, defined as qualifying bins with m1 stdev <0.1 (after filtering CNA regions and selecting one bin with smallest stdev from each CpG island), in general give very stable α1 estimates at different mixing ratios (Figure 2b). A closer examination of the informative bins found that they often contain significantly more CpGs (Figure 2c), and have a strong dichotomy of reads being either mostly methylated (1) or mostly unmethylated (0) (Figure 2d). So in the remaining text, the top 500 informative bins with the smallest parameter variance by bootstrap were used to vote for the mixing ratio for the whole sample.
We then evaluated whether MethylPurify could correctly infer the mixing ratio of the two components. When given a pure cell line without mixing, MethylPurify correctly reported a warning for insufficient number (66 and 322 for HMEC and HCC cell lines, respectively) of informative bins. Further examination of such bins in HCC cell line suggested that they have significant overlap with ASM regions  (P = 0.0086 by Fisher’s exact test). For all samples with real mixing, MethylPurify identified sufficient number of informative bins across the genome (see Additional file 1: Figure S1 as an example), and their respective α1 estimates are often centered around the true α1 (Figure 3a). Over 20 repeated simulations at each mixing ratio, MethylPurify gives predicted α1 that tightly surrounds the true α1 with two interesting twists (Figure 3b,c). The first is that since MethylPurify dictates α1 to represent the minor component, α1 estimates tend to be slightly lower when the mixing is close to 0.5:0.5. The second is that MethylPurify tends to slightly under estimate the cancer component. This might be because even as cell lines, the cancer HCC is more heterogeneous than the normal HMEC, as supported by the larger number of informative bins in HCC than HMEC alone, causing the EM algorithm to assign a small portion of the HCC reads to the HMEC component. This implies that in tumor samples, MethylPurify might also tend to slightly underestimate the tumor percentage due to tumor heterogeneity.
Prediction performance on simulated data. (a) Histogram of the predicted mixing ratio from selected informative bins insimulation results when bisulfite reads of HCC and HMEC cell lines are mixed at different ratios. Dotted blue lines highlight the true minor components. Black line is the density of the predicted mixing ratio. (b, c) Predicted minor compositions (α1) at different mixing ratios where the composition of tumor cell line is above 50% (b) or below 50% (c). Error bars represent standard derivations derived from 20 mixing simulations.
Detection of differentially methylated bins in the simulated mixture
We next evaluated whether MethylPurify could correctly predict the methylation level of each component in the mixture and identify the differentially methylated regions between the two components. At HCC and HMEC mixing ratio of 0.7:0.3, we analyzed all 90,748 qualifying bins (300 bp with over 10 CpGs and over 10-fold coverage) to evaluate the performance. Under the gold standard of methylation difference >0.5 between the two pure cell lines, we found that MethylPurify could predict differentially methylated bins at 96.5% sensitivity and 88.0% specificity (Figure 4a). At coverage range from 10-fold to 40-fold, the performance of MethylPurify decreases only slightly with decreased coverage, although the number of qualifying bins with enough coverage decreases (Additional file 2: Figure S2).
Predicted DMBs (DMBs predicted to have methylation difference over 0.5 from mixture reads by MethyIPurify) are compared with true DMBs (DMBs inferred from BS-seq reads in two separate cell lines) in HCC and HMEC cell lines. (a) Overlap between predicted (from mixture, red) and true (blue) DMBs in the mixture of HCC1954 (70%) and HMEC (30%). (b, c) Correlations of predicted and true methylation levels in normal (b) and tumor cell line (c) for qualifying bins. (d) An example of differentially methylated region between HCC and HMEC cell lines. Each point represents one qualifying bin of length 300 bp. HMEC.t and HCC.t are true methylation profiles in this region, while HMEC.p and HCC.p are predicted methylation profiles from mixture reads. (e) Correlation of predicted and true methylation differences. (f) DMB prediction sensitivity and correlation of methylation between predicted and true differential methylation at different mixing ratio of the two cell lines.
Detailed examinations revealed that in the true positive predicted regions HMEC is often fully unmethylated (0) while HCC is fully methylated (1). This is consistent with many studies showing that cancer samples often have global hypomethylation and CpG promoter hypermethylation -. In contrast, in the false positive bins, the methylation levels in the individual cell lines are often at intermediate levels (Additional file 3: Figure S3). These might represent the methylation variability regions previously reported in tumor DNA methylation studies ,,, and might cause reads to be assigned to the wrong component. For example, a tumor sample has 1/3 normal and 2/3 cancer, and in one region the methylation level of the normal and cancer components are 0% and 50%, respectively. Assume MethylPurify correctly estimated the minor component α1 to be 1/3, it would naturally assign the 1/3 methylated reads to the minor normal component, and 2/3 unmethylated reads to the major cancer component. In this case, although MethylPurify incorrectly called the cancer component as hypomethylated, it nonetheless correctly identified this region as differentially methylated, whereas a standard cancer/normal differential call might miss it.
To reduce the above effect of tumor heterogeneity, we removed bins that show strong read methylation variability (var >0.1) in the HCC (Additional file 4: Figure S4). We then examined whether DNA methylation levels of the two components can be correctly estimated in the remaining qualifying bins. The correlation between the true and predicted methylation level is at 0.89 for the minor normal component and 0.98 for the major tumor component, respectively (Figure 4b,c). Figure 4d is an example showing the true and predicted methylation levels from each cell line in the mixture. The predicted methylation difference from the cell line mixture is highly correlated with the estimated methylation difference by directly comparing the two individual cell lines (Figure 4e). Further examination of bins that were called in the wrong directions found many to have lower sequence coverage. This suggests that the mixture sampling might introduce biases, i.e. the mixing at specific bins could be off from the genome-wide ratio of 0.7:0.3. In fact, if we examine only bins with >15-fold coverage, the correlation of methylation difference estimated from individual cell lines vs. mixture increased from 0.71 to 0.75.
We then tested the performance of MethylPurify when the normal (HMEC) component of the mixture varies from 0.1 to 0.5. When the mixing ratio is close to 0.5:0.5, determining which component is hypo- or hyper-methylated becomes an unidentifiable problem, so the correlation between the true and predicted methylation difference in the two components drops. Nonetheless, our ability to correctly call regions of differential methylation increases with the minor component percentage, from 89.4% at 0.1:0.9 mixing to 98.4% at 0.5:0.5 mixing, because there is enough coverage on each component to confidently identify bins with discordant methylation reads (Figure 4f).
Application of MethylPurify to lung cancer tissues
With the success of MethylPurify on cell line mixing simulations, we next tested MethylPurify on real tumors. We conducted reduced representation bisulfite sequencing on five primary lung adenocarcinoma samples as well as their respective adjacent normal tissues, and obtained approximately 15 to 40 million 90 bp reads for each sample. MethylPurify was able to process each tumor sample within 1 h on a single core, and estimated the normal component in the tumors to be between 18% and 33% (Figure 5a). In these samples, the true normal percentage in each tumor sample is unknown. In addition, methylation differences have been reported to well precede pathological differences, which have been used to predict cancer risk . Therefore, we instead focused on evaluating differentially methylated regions called by MethylPurify from tumor samples alone, using the tumor to normal comparison as the gold standard. In this standard, a 300 bp bin is defined as differentially methylated if the average methylation difference between cancer and normal in the region exceeds 0.5. We also tried other cutoffs to call differential methylation and got similar results (data not shown).
Application of MethylPurify to the lung adenocarcinoma samples. (a) Distribution of informative bins and the calculated minor components of five primary lung adenocarcinoma samples. (b) The numbers of DMBs inferred from normal-tumor comparison (blue, true DMB), predicted DMB by MethylPurify from only tumor tissue (red) and DMB inferred from TCGA (green) for each sample. (c-g) Violin plots show the distributions of CpG counts (c), read counts (d), tumor/normal methylation differences (e), tumor methylation levels (f), and numbers of lung adenocarcinoma samples with copy number alteration in TCGA (g) for false negative (FN), true positive (TP), false positive (FP), and true negative (TN) bins.
For each sample, we divided the genome into 300 bp bins and only considered qualifying bins with >10 CpG and >10-fold read coverage. Due to different sequencing depth on the different samples, the number of qualifying bins in different samples varies. We then examined the Cancer Genome Atlas (TCGA) lung adenocarcinoma methylation data  and used the differential methylated regions in TCGA that overlap with the qualifying bins in each sample to determine the number of differential DNA methylated bins to call in each sample. Differentially methylated bins called either from tumor samples alone or from the tumor to normal comparisons are both ranked by their absolute differential methylation levels, respectively. Using the tumor to normal comparison as gold standard, MethylPurify calls in the tumor samples alone could achieve sensitivity of over 57% and specificity of over 91% in the five samples tested (Figure 5b).
We then examined the false negatives and false positive predictions MethylPurify made on the tumor samples alone. Using sample 137B as an example since it has the best sequencing coverage, we found that the regions with false negative predictions often have fewer CpG count (P <2.2e-16, t-test, Figure 5c), lower coverage (P = 0.0037, Figure 5d), and smaller methylation differences (P <2.2e-16, Figure 5e) between tumor and normal. In contrast, the false positive bins are more similar to true positive ones in CpG count (P = 0.73) and read coverage (P = 0.83). Interestingly, their absolute DNA methylation in the tumor samples show more intermediate levels instead of the dichotomy of 0 for unmethylated or 1 for methylated (Figure 5f), and they often contain many reads with discordant methylation levels. They suggest that such regions are indeed differentially methylated, but were not detected in the normal cancer comparison because tumor heterogeneity reduced the observed normal to tumor methylation difference (Figure 5e). Indeed, among the false positive bins MethylPurify called from tumor alone, 25% to 32% have differential methylation support in TCGA lung adenocarcinoma data (Figure 5b). This suggests that these `false positives' should have been correct calls, but were missed by the tumor/normal comparison potentially due to tumor heterogeneity. This percentage is similar to the 24% to 29% true negative calls with TCGA support, implying that the differential methylation called by MethylPurify from the tumor samples alone is as good as the tumor/normal comparison.
BMPRII-LF is unique among the TGF-β superfamily receptors due to a C-terminal extension of 512 amino acids in its cytoplasmic domain. The functional significance of this extension was supported by demonstration of its binding to diverse cellular factors, including the endocytic protein EPS15R (Hartung et al., 2006), the dynein light chain Tctex-1 (Machado et al., 2003), the kinases cGKI (Schwappacher et al., 2009) and LIMK (Lee-Hoeflich et al., 2004), and Trb3, a regulator of Smurf1 stability and Smad-dependent signaling output (Chan et al., 2007). In different cellular contexts, such interactions were proposed to influence BMPRII Smad-dependent and Smad-independent signaling, BMPRII trafficking, and BMP-induced differentiation of distinct cell types. Moreover, the lethality of mice homozygous for BMPRII-SF but lacking BMPRII-LF (Leyton et al., 2013), the presence of disease-causing mutations within this C-terminal extension (exon 12) in PAH (Thomson et al., 2000 Machado et al., 2001), and the proposed role for differences in the expression ratio between BMPRII-SF and BMPRII-LF in determining the penetrance of PAH (Cogan et al., 2012) further demonstrate the importance of this molecular domain. However, the role(s) of this C-terminal extension (either at the level of coding sequence or protein) in the synthesis, degradation, and trafficking of BMPRII had not been addressed. Here we show that molecular determinants within the mRNA sequence and/or encoded by exon 12 of BMPRII regulate its synthesis and clathrin-mediated internalization. This differential regulation of alternatively spliced forms of BMPRII has direct implications for the overall and plasma membrane–localized steady-state levels of the BMPRII receptor forms, the kinetics of their degradation, and the intensity of their ability to activate the Smad1/5/8 pathway in response to ligand.
Our studies on the steady-state expression levels of BMPRII-SF, BMPRII-LF, and their mutants demonstrate that the C-terminal extension unique to BMPRII-LF has two elements that reduce its expression relative to BMPRII-SF. The major effects are contributed by the very C-terminal end of BMPRII-LF, accompanied by a contribution from the region between TC6 and TC7 (Figures 1 and 2). These differences are detected also at the level of the cell surface expression of the receptors, for which the contribution of the region between TC6 and TC7 is even more accentuated (Figure 4). To delineate further the mechanisms involved, we used an experimental setup based on metabolic pulse labeling of exogenously expressed isoforms and mutants of BMPRII (at equimolar levels) under the same promoter and with the same 5’-untranslated region (UTR) and 3’-UTR regions (Figure 2). This allowed us to identify the regulation of the synthesis of BMPRII isoforms at the level of translation. We show that the most-3’ region of exon 12 (99 nucleotides, numbers 4181–4279, encoding 32 amino acids and a stop codon), which is unique to BMPRII-LF and is predicted to fold into a stem-loop–based secondary structure (Figure 3B), attenuates the expression of BMPRII (Figures 2 and 3) on a translational level (Figure 2). Multiple molecular mechanisms have been proposed to regulate protein translation, including adaptation to the tRNA pool (codon usage), charge of amino acids that are incorporated in the polypeptide chain, mRNA folding energy, and different activities of RNA-binding proteins (Kozak, 1986 Wells, 2006 Ingolia et al., 2011 Tuller et al., 2011 Pop et al., 2014). Here we show that the reduced expression of BMPRII-LF and BMPRII-SFM (the BMPRII-SF mutant extended by addition of 99 coding nucleotides, numbers 4181–4279, from the 3’ end of BMPRII-LF) correlates with the presence of an RNA sequence with structure-forming tendencies, which can also attenuate the expression of unrelated proteins such as GFP (Figure 3, E and F). Of note, such secondary RNA structures are the basis of recognition by numerous RNA-binding proteins (Draper, 1995). In addition, when translation is carried out in endoplasmic reticulum-tethered polysomes, negative regulation of elongation would be expected to result in an overall decrease in the level of the protein being synthesized (e.g., BMPRII-LF). Within the TGF-β superfamily of receptors and ligands, TGF-β1 and TGF-β3 have been suggested to be regulated at the level of translation (Arrick et al., 1991 Fraser et al., 2002). To our knowledge, the present study is the first to report on such regulation of a receptor from this superfamily. Note that the differences in expression of BMPRII-SF and BMPRII-LF are greater than those between BMPRII-SF and BMPRII-SFM, attesting to the additional regulatory element(s) (e.g., the endocytosis signal that enhances BMPRII-LF degradation Figure 8).
Numerous studies on the endocytosis of receptors of the TGF-β superfamily suggested CME as the major internalization pathway a potential contribution by caveolar endocytosis has been contentious (Ehrlich et al., 2001 Yao et al., 2002 Di Guglielmo et al., 2003 Mitchell et al., 2004 Hartung et al., 2006 Chen, 2009 Hirschhorn et al., 2012 Shapira et al., 2012). Moreover, conflicting results were obtained when addressing the roles for endocytosis of the receptors in the regulation of activation of Smad signaling pathways (Hayes et al., 2002 Penheiter et al., 2002 Di Guglielmo et al., 2003 Hartung et al., 2006 Chen et al., 2009 Chen, 2009 Hirschhorn et al., 2012 Kim et al., 2012 Shapira et al., 2012, 2014 Umasankar et al., 2012). Such conflicts may reflect the reliance on treatments (chemical or genetic) that alter/inhibit altogether CME and/or caveolar endocytosis, since such general treatments affect not only the endocytosis of the respective receptors, but also the endocytosis, distribution, and trafficking of numerous cellular factors. In the present study, we identified CME as the major endocytosis pathway of BMPRII and reported marked differences in the endocytic potential of the alternatively spliced forms of BMPRII (i.e., lack of endocytosis of BMPRII-SF Figure 5). Moreover, we identified a previously unknown CME-targeting signal, of the dileucine class of endocytic motifs, localized at the C-terminal of BMPRII-LF (Figure 6). On the basis of this finding, we generated an endocytosis-defective BMPRII-LF mutant (BMPRII-LF-AA) and used it, together with the naturally alternatively spliced forms of BMPRII, to investigate the relationship between BMPRII endocytosis, expression level, degradation, and signaling (Figures 6–9).
Note that the different endocytosis rates of BMPRII-LF and BMPRII-LF-AA affect their cell surface expression levels, as measured by the proteinase K digestion assay (Figure 7). The approximately twofold difference between their surface expression levels correlates with a similarly higher degradation rate for the endocytosis-capable BMPRII-LF (Figure 8). The slower degradation of BMPRII-LF-AA appears to be due to the fact that it does not undergo endocytosis, since it is similar to that of BMPRII-SF, which also lacks the endocytosis-targeting motif (Figure 8). This conclusion is in line with the very similar decrease in the cell surface expression level between TC6 (which lacks the endocytosis motif) and TC7 (Figure 4). Moreover, in contrast to the endocytosis-defective BMPRII-SF and BMPRII-LF-AA, the degradation of BMPRII-LF is sensitive to both proteasomal and lysosomal inhibitors, in line with its significantly faster endocytosis (Figure 8G). These results are in line with a report that chloroquine increases cell surface BMPRII-LF levels and restores BMP9 signaling in endothelial cells harboring PAH-related BMPRII mutations (Dunmore et al., 2013). The notion of a positive correlation between the cell surface expression levels of BMPRII and the activation intensity of pSmad1/5/8 by BMP is also supported by the results depicted in Figure 9A, where BMPRII-SF higher surface expression correlated with increased levels of pSmad1/5/8. Because this BMPRII variant is hardly endocytosed, these findings may imply that endocytosis of BMPRII is dispensable for Smad1/5/8 activation. This notion is validated by the insensitivity of endogenous Smad1/5/8 activation by BMP2 to the CME inhibitor PitStop (Figure 9, B–D).
Taken together, the data in the present study support the notion that the expression levels and plasma membrane levels of BMPRII are determined by two molecular processes—translational regulation of protein synthesis (which provides the major contribution) and endocytosis/degradation (mild modulatory effect). Both mechanisms enhance the expression of BMPRII-SF relative to BMPRII-LF at the cell surface (where the receptors are exposed to stimulation by exogenous ligands), resulting in activation of the Smad1/5/8 pathway at elevated intensities. Of interest, BMPRII-SF was reported to be incapable of associating with and activating at least a subset of non-Smad BMP signals (Foletta et al., 2003 Lee-Hoeflich et al., 2004), implying that the alternative splicing of BMPRII may be an important regulator of the balance of activation of canonical versus noncanonical signals by BMPs.
There are two different types of DSC: Heat-flux DSC which measures the difference in heat flux between the sample and a reference and Power differential DSC which measures the difference in power supplied to the sample and a reference. [ citation needed ]
Heat-flux DSC Edit
With Heat-flux DSC, the changes in heat flow are calculated by integrating the ΔTref- curve. For this kind of experiment, a sample and a reference crucible are placed on a sample holder with integrated temperature sensors for temperature measurement of the crucibles. This arrangement is located in a temperature-controlled oven. Contrary to this classic design, the distinctive attribute of MC-DSC is the vertical configuration of planar temperature sensors surrounding a planar heater. This arrangement allows a very compact, lightweight and low heat capacitance structure with the full functionality of a DSC oven. 
Power differential DSC Edit
For this kind of setup, also known as Power compensating DSC, the sample and reference crucible are placed in thermally insulated furnaces and not next to each other in the same furnace like in Heat-flux-DSC experiments. Then the temperature of both chambers is controlled so that the same temperature is always present on both sides. The electrical power that is required to obtain and remain this state is then recorded instead of the temperature difference of the two crucibles. 
The basic principle underlying this technique is that when the sample undergoes a physical transformation such as phase transitions, more or less heat will need to flow to it than the reference to maintain both at the same temperature. Whether less or more heat must flow to the sample depends on whether the process is exothermic or endothermic. For example, as a solid sample melts to a liquid, it will require more heat flowing to the sample to increase its temperature at the same rate as the reference. This is due to the absorption of heat by the sample as it undergoes the endothermic phase transition from solid to liquid. Likewise, as the sample undergoes exothermic processes (such as crystallization) less heat is required to raise the sample temperature. By observing the difference in heat flow between the sample and reference, differential scanning calorimeters are able to measure the amount of heat absorbed or released during such transitions. DSC may also be used to observe more subtle physical changes, such as glass transitions. It is widely used in industrial settings as a quality control instrument due to its applicability in evaluating sample purity and for studying polymer curing.   
An alternative technique, which shares much in common with DSC, is differential thermal analysis (DTA). In this technique it is the heat flow to the sample and reference that remains the same rather than the temperature. When the sample and reference are heated identically, phase changes and other thermal processes cause a difference in temperature between the sample and reference. Both DSC and DTA provide similar information. DSC measures the energy required to keep both the reference and the sample at the same temperature whereas DTA measures the difference in temperature between the sample and the reference when the same amount of energy has been introduced into both. [ citation needed ]
The result of a DSC experiment is a curve of heat flux versus temperature or versus time. There are two different conventions: exothermic reactions in the sample shown with a positive or negative peak, depending on the kind of technology used in the experiment. This curve can be used to calculate enthalpies of transitions. This is done by integrating the peak corresponding to a given transition. It can be shown that the enthalpy of transition can be expressed using the following equation:
Differential scanning calorimetry can be used to measure a number of characteristic properties of a sample. Using this technique it is possible to observe fusion and crystallization events as well as glass transition temperatures Tg. DSC can also be used to study oxidation, as well as other chemical reactions.   
Glass transitions may occur as the temperature of an amorphous solid is increased. These transitions appear as a step in the baseline of the recorded DSC signal. This is due to the sample undergoing a change in heat capacity no formal phase change occurs.  
As the temperature increases, an amorphous solid will become less viscous. At some point the molecules may obtain enough freedom of motion to spontaneously arrange themselves into a crystalline form. This is known as the crystallization temperature (Tc). This transition from amorphous solid to crystalline solid is an exothermic process, and results in a peak in the DSC signal. As the temperature increases the sample eventually reaches its melting temperature (Tm). The melting process results in an endothermic peak in the DSC curve. The ability to determine transition temperatures and enthalpies makes DSC a valuable tool in producing phase diagrams for various chemical systems. 
Differential scanning calorimetry can also be used to obtain valuable thermodynamics information about proteins. The thermodynamics analysis of proteins can reveal important information about the global structure of proteins, and protein/ligand interaction. For example, many mutations lower the stability of proteins, while ligand binding usually increases protein stability.  Using DSC, this stability can be measured by obtaining Gibbs Free Energy values at any given temperature. This allows researchers to compare the free energy of unfolding between ligand-free protein and protein-ligand complex, or wild type and mutant proteins. DSC can also be used in studying protein/lipid interactions, nucleotides, drug-lipid interactions.  In studying protein denaturation using DSC, the thermal melt should be at least to some degree reversible, as the thermodynamics calculations rely on chemical equlibrium. 
The technique is widely used across a range of applications, both as a routine quality test and as a research tool. The equipment is easy to calibrate, using low melting indium at 156.5985 °C for example, and is a rapid and reliable method of thermal analysis. [ citation needed ]
DSC is used widely for examining polymeric materials to determine their thermal transitions. Important thermal transitions include the glass transition temperature (Tg), crystallization temperature (Tc), and melting temperature (Tm). The observed thermal transitions can be utilized to compare materials, although the transitions alone do not uniquely identify composition. The composition of unknown materials may be completed using complementary techniques such as IR spectroscopy. Melting points and glass transition temperatures for most polymers are available from standard compilations, and the method can show polymer degradation by the lowering of the expected melting temperature. Tm depends on the molecular weight of the polymer and thermal history. [ citation needed ]
The percent crystalline content of a polymer can be estimated from the crystallization/melting peaks of the DSC graph using reference heats of fusion found in the literature.  DSC can also be used to study thermal degradation of polymers using an approach such as Oxidative Onset Temperature/Time (OOT) however, the user risks contamination of the DSC cell, which can be problematic. Thermogravimetric Analysis (TGA) may be more useful for decomposition behavior determination. Impurities in polymers can be determined by examining thermograms for anomalous peaks, and plasticisers can be detected at their characteristic boiling points. In addition, examination of minor events in first heat thermal analysis data can be useful as these apparently "anomalous peaks" can in fact also be representative of process or storage thermal history of the material or polymer physical aging. Comparison of first and second heat data collected at consistent heating rates can allow the analyst to learn about both polymer processing history and material properties. [ citation needed ]
Liquid crystals Edit
DSC is used in the study of liquid crystals. As some forms of matter go from solid to liquid they go through a third state, which displays properties of both phases. This anisotropic liquid is known as a liquid crystalline or mesomorphous state. Using DSC, it is possible to observe the small energy changes that occur as matter transitions from a solid to a liquid crystal and from a liquid crystal to an isotropic liquid. 
Oxidative stability Edit
Using differential scanning calorimetry to study the stability to oxidation of samples generally requires an airtight sample chamber. It can be used to determine the oxidative-induction time (OIT) of a sample. Such tests are usually done isothermally (at constant temperature) by changing the atmosphere of the sample. First, the sample is brought to the desired test temperature under an inert atmosphere, usually nitrogen. Then, oxygen is added to the system. Any oxidation that occurs is observed as a deviation in the baseline. Such analysis can be used to determine the stability and optimum storage conditions for a material or compound. 
Safety screening Edit
DSC makes a reasonable initial safety screening tool. In this mode the sample will be housed in a non-reactive crucible (often gold or gold-plated steel), and which will be able to withstand pressure (typically up to 100 bar). The presence of an exothermic event can then be used to assess the stability of a substance to heat. However, due to a combination of relatively poor sensitivity, slower than normal scan rates (typically 2–3 °C/min, due to much heavier crucible) and unknown activation energy, it is necessary to deduct about 75–100 °C from the initial start of the observed exotherm to suggest a maximal temperature for the material. A much more accurate data set can be obtained from an adiabatic calorimeter, but such a test may take 2–3 days from ambient at a rate of a 3 °C increment per half-hour. [ citation needed ]
Drug analysis Edit
DSC is widely used in the pharmaceutical and polymer industries. For the polymer chemist, DSC is a handy tool for studying curing processes, which allows the fine tuning of polymer properties. The cross-linking of polymer molecules that occurs in the curing process is exothermic, resulting in a negative peak in the DSC curve that usually appears soon after the glass transition.   
In the pharmaceutical industry it is necessary to have well-characterized drug compounds in order to define processing parameters. For instance, if it is necessary to deliver a drug in the amorphous form, it is desirable to process the drug at temperatures below those at which crystallization can occur. 
General chemical analysis Edit
Freezing-point depression can be used as a purity analysis tool when analysed by differential scanning calorimetry. This is possible because the temperature range over which a mixture of compounds melts is dependent on their relative amounts. Consequently, less pure compounds will exhibit a broadened melting peak that begins at lower temperature than a pure compound.