35: Genome Editing - Biology

35: Genome Editing - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

35: Genome Editing


Working like genetic scissors, the Cas9 nuclease opens both strands of the targeted sequence of DNA to introduce the modification by one of two methods. Knock-in mutations, facilitated via homology directed repair (HDR), is the traditional pathway of targeted genomic editing approaches. [8] This allows for the introduction of targeted DNA damage and repair. HDR employs the use of similar DNA sequences to drive the repair of the break via the incorporation of exogenous DNA to function as the repair template. [8] This method relies on the periodic and isolated occurrence of DNA damage at the target site in order for the repair to commence. Knock-out mutations caused by CRISPR-Cas9 result in the repair of the double-stranded break by means of non-homologous end joining (NHEJ). NHEJ can often result in random deletions or insertions at the repair site, which may disrupt or alter gene functionality. Therefore, genomic engineering by CRISPR-Cas9 gives researchers the ability to generate targeted random gene disruption. Because of this, the precision of genome editing is a great concern. Genomic editing leads to irreversible changes to the genome.

While genome editing in eukaryotic cells has been possible using various methods since the 1980s, the methods employed had proved to be inefficient and impractical to implement on a large scale. With the discovery of CRISPR and specifically the Cas9 nuclease molecule, efficient and highly selective editing is now a reality. Cas9 derived from the bacterial species Streptococcus pyogenes has facilitated targeted genomic modification in eukaryotic cells by allowing for a reliable method of creating a targeted break at a specific location as designated by the crRNA and tracrRNA guide strands. [9] The ease with which researchers can insert Cas9 and template RNA in order to silence or cause point mutations at specific loci has proved invaluable to the quick and efficient mapping of genomic models and biological processes associated with various genes in a variety of eukaryotes. Newly engineered variants of the Cas9 nuclease have been developed that significantly reduce off-target activity. [10]

CRISPR-Cas9 genome editing techniques have many potential applications, including in medicine and agriculture. The use of the CRISPR-Cas9-gRNA complex for genome editing [11] was the AAAS's choice for Breakthrough of the Year in 2015. [12] Many bioethical concerns have been raised about the prospect of using CRISPR for germline editing, especially in human embryos. [13]

Other methods Edit

In the early 2000s, German researchers began developing zinc finger nucleases (ZFNs), synthetic proteins whose DNA-binding domains enable them to create double-stranded breaks in DNA at specific points. ZFNs has a higher precision and the advantage of being smaller than Cas9, but ZFNs are not as commonly used as CRISPR-based methods. Sangamo provides ZFNs via industry and academic partnerships but holds the modules, expertise—and patents—for making them. In 2010, synthetic nucleases called transcription activator-like effector nucleases (TALENs) provided an easier way to target a double-stranded break to a specific location on the DNA strand. Both zinc finger nucleases and TALENs require the design and creation of a custom protein for each targeted DNA sequence, which is a much more difficult and time-consuming process than that of designing guide RNAs. CRISPRs are much easier to design because the process requires synthesizing only a short RNA sequence, a procedure that is already widely used for many other molecular biology techniques (e.g. creating oligonucleotide primers). [14]

Whereas methods such as RNA interference (RNAi) do not fully suppress gene function, CRISPR, ZFNs, and TALENs provide full irreversible gene knockout. [15] CRISPR can also target several DNA sites simultaneously simply by introducing different gRNAs. In addition, the costs of employing CRISPR are relatively low. [15] [16] [17]

Discovery Edit

In 2012 Jennifer Doudna and Emmanuelle Charpentier published their finding that CRISPR-Cas9 could be programmed with RNA to edit genomic DNA, now considered one of the most significant discoveries in the history of biology. [18]

Patents and commercialization Edit

As of November 2013 [update] , SAGE Labs (part of Horizon Discovery group) had exclusive rights from one of those companies to produce and sell genetically engineered rats and non-exclusive rights for mouse and rabbit models. [19] By 2015 [update] , Thermo Fisher Scientific had licensed intellectual property from ToolGen to develop CRISPR reagent kits. [20]

As of December 2014 [update] , patent rights to CRISPR were contested. Several companies formed to develop related drugs and research tools. [21] As companies ramped up financing, doubts as to whether CRISPR could be quickly monetized were raised. [22] In February 2017 the US Patent Office ruled on a patent interference case brought by University of California with respect to patents issued to the Broad Institute, and found that the Broad patents, with claims covering the application of CRISPR-Cas9 in eukaryotic cells, were distinct from the inventions claimed by University of California. [23] [24] [25] Shortly after, University of California filed an appeal of this ruling. [26] [27]

Recent events Edit

In March 2017, the European Patent Office (EPO) announced its intention to allow claims for editing all kinds of cells to Max-Planck Institute in Berlin, University of California, and University of Vienna, [28] [29] and in August 2017, the EPO announced its intention to allow CRISPR claims in a patent application that MilliporeSigma had filed. [28] As of August 2017 [update] the patent situation in Europe was complex, with MilliporeSigma, ToolGen, Vilnius University, and Harvard contending for claims, along with University of California and Broad. [30]

In July 2018, the ECJ ruled that gene editing for plants was a sub-category of GMO foods and therefore that the CRISPR technique would henceforth be regulated in the European Union by their rules and regulations for GMOs. [31]

In February 2020, a US trial safely showed CRISPR gene editing on three cancer patients. [32]

In October 2020, researchers Emmanuelle Charpentier and Jennifer Doudna were awarded the Nobel Prize in Chemistry for their work in this field. [33] [34] They made history as the first two women to share this award without a male contributor. [35]

CRISPR-Cas9 genome editing is carried out with a Type II CRISPR system. When utilized for genome editing, this system includes Cas9, crRNA, and tracrRNA along with an optional section of DNA repair template that is utilized in either non-homologous end joining (NHEJ) or homology directed repair (HDR).

Major components Edit

Component Function
crRNA Contains the guide RNA that locates the correct segment of host DNA along with a region that binds to tracrRNA (generally in a hairpin loop form), forming an active complex.
tracrRNA Binds to crRNA and forms an active complex.
sgRNA Single-guide RNAs are a combined RNA consisting of a tracrRNA and at least one crRNA.
Cas9 An enzyme whose active form is able to modify DNA. Many variants exist with different functions (i.e. single-strand nicking, double-strand breaking, DNA binding) due to each enzyme's DNA site recognition function.
Repair template DNA molecule used as a template in the host cell's DNA repair process, allowing insertion of a specific DNA sequence into the host segment broken by Cas9.

CRISPR-Cas9 often employs a plasmid to transfect the target cells. [36] The main components of this plasmid are displayed in the image and listed in the table. The crRNA is uniquely designed for each application, as this is the sequence that Cas9 uses to identify and directly bind to specific sequences within the host cell's DNA. The crRNA must bind only where editing is desired. The repair template is also uniquely designed for each application, as it must complement to some degree the DNA sequences on either side of the cut and also contain whatever sequence is desired for insertion into the host genome.

Multiple crRNAs and the tracrRNA can be packaged together to form a single-guide RNA (sgRNA). [37] This sgRNA can be included alongside the gene that codes for the Cas9 protein and made into a plasmid in order to be transfected into cells. Many online tools are available to aid in designing effective sgRNA sequences. [38] [39]

Structure Edit

CRISPR-Cas9 offers a high degree of fidelity and relatively simple construction. It depends on two factors for its specificity: the target sequence and the protospacer adjacent motif (PAM) sequence. The target sequence is 20 bases long as part of each CRISPR locus in the crRNA array. [36] A typical crRNA array has multiple unique target sequences. Cas9 proteins select the correct location on the host's genome by utilizing the sequence to bond with base pairs on the host DNA. The sequence is not part of the Cas9 protein and as a result is customizable and can be independently synthesized. [40] [41]

The PAM sequence on the host genome is recognized by Cas9. Cas9 cannot be easily modified to recognize a different PAM sequence. However, this is ultimately not too limiting, as it is typically a very short and nonspecific sequence that occurs frequently at many places throughout the genome (e.g. the SpCas9 PAM sequence is 5'-NGG-3' and in the human genome occurs roughly every 8 to 12 base pairs). [36]

Once these sequences have been assembled into a plasmid and transfected into cells, the Cas9 protein with the help of the crRNA finds the correct sequence in the host cell's DNA and – depending on the Cas9 variant – creates a single- or double-stranded break at the appropriate location in the DNA. [42]

Properly spaced single-stranded breaks in the host DNA can trigger homology directed repair, which is less error-prone than the non-homologous end joining that typically follows a double-stranded break. Providing a DNA repair template allows for the insertion of a specific DNA sequence at an exact location within the genome. The repair template should extend 40 to 90 base pairs beyond the Cas9-induced DNA break. [36] The goal is for the cell's native HDR process to utilize the provided repair template and thereby incorporate the new sequence into the genome. Once incorporated, this new sequence is now part of the cell's genetic material and passes into its daughter cells.

Delivery Edit

Delivery of Cas9, sgRNA, and associated complexes into cells can occur via viral and non-viral systems. Electroporation of DNA, RNA, or ribonucleocomplexes is a common technique, though it can result in harmful effects on the target cells. [43] Chemical transfection techniques utilizing lipids and peptides have also been used to introduce sgRNAs in complex with Cas9 into cells. [44] [45] Types of cells that are more difficult to transfect (e.g. stem cells, neurons, and hematopoietic cells) require more efficient delivery systems, such as those based on lentivirus (LVs), adenovirus (AdV), and adeno-associated virus (AAV). [46] [47] [48]

Controlled genome editing Edit

Several variants of CRISPR-Cas9 allow gene activation or genome editing with an external trigger such as light or small molecules. [49] [50] [51] These include photoactivatable CRISPR systems developed by fusing light-responsive protein partners with an activator domain and a dCas9 for gene activation, [52] [53] or by fusing similar light-responsive domains with two constructs of split-Cas9, [54] [55] or by incorporating caged unnatural amino acids into Cas9, [56] or by modifying the guide RNAs with photocleavable complements for genome editing. [57]

Methods to control genome editing with small molecules include an allosteric Cas9, with no detectable background editing, that will activate binding and cleavage upon the addition of 4-hydroxytamoxifen (4-HT), [49] 4-HT responsive intein-linked Cas9, [58] or a Cas9 that is 4-HT responsive when fused to four ERT2 domains. [59] Intein-inducible split-Cas9 allows dimerization of Cas9 fragments [60] and rapamycin-inducible split-Cas9 system developed by fusing two constructs of split-Cas9 with FRB and FKBP fragments. [61] Other studies have been able to induce transcription of Cas9 with a small molecule, doxycycline. [62] [63] Small molecules can also be used to improve homology directed repair, [64] often by inhibiting the non-homologous end joining pathway. [65] These systems allow conditional control of CRISPR activity for improved precision, efficiency, and spatiotemporal control.

The clustered regularly interspaced short palindrome repeats (CRISPR)/Cas9 system is a gene-editing technology that can induce double-strand breaks (DSBs) anywhere guide ribonucleic acids (gRNA) can bind with the protospacer adjacent motif (PAM) sequence. [66] Single-strand nicks can also be induced by Cas9 active-site mutants, [67] also known as Cas9 nickases. [68] By simply changing the sequence of gRNA, the Cas9-endonuclease can be delivered to a gene of interest and induce DSBs. [69] The efficiency of Cas9-endonuclease and the ease by which genes can be targeted led to the development of CRISPR-knockout (KO) libraries both for mouse and human cells, which can cover either specific gene sets of interest or the whole-genome. [70] [71] CRISPR screening helps scientist to create a systematic and high-throughput genetic perturbation within live model organisms. This genetic perturbation is necessary for fully understanding gene function and epigenetic regulation. [72] The advantage of pooled CRISPR libraries is that more genes can be targeted at once.

Knock-out libraries are created in a way to achieve equal representation and performance across all expressed gRNAs and carry an antibiotic or fluorescent selection marker that can be used to recover transduced cells. [73] There are two plasmid systems in CRISPR/Cas9 libraries. First, is all in one plasmid, where sgRNA and Cas9 are produced simultaneously in a transfected cell. Second, is a two-vector system: sgRNA and Cas9 plasmids are delivered separately. [72] It's important to deliver thousands of unique sgRNAs-containing vectors to a single vessel of cells by viral transduction at low multiplicity of infection (MOI, typically at 0.1-0.6), it prevents the probability that an individual cell clone will get more than one type of sgRNA otherwise it can lead to incorrect assignment of genotype to phenotype. [70]

Once pooled library is prepared it is necessary to carry out a deep sequencing (NGS, next generation sequencing) of PCR-amplifed plasmid DNA in order to reveal abundance of sgRNAs. Cells of interest can be consequentially infected by the library and then selected according to the phenotype. There are 2 types of selection: negative and positive. By negative selection dead or slow growing cells are efficiently detected. It can identify survival-essential genes, which can be further serve as candidates for molecularly targeted drugs. On the other hand, positive selection gives a collection of growth-advantage acquired populations by random mutagenesis. [73] After selection genomic DNA is collected and sequenced by NGS. Depletion or enrichment of sgRNAs is detected and compared to the original sgRNA library, annotated with the target gene that sgRNA corresponds to. Statistical analysis then identify genes that are significantly likely to be relevant to the phenotype of interest. [70]

Apart from knock-out there are also knock-down (CRISPRi) and activation (CRISPRa) libraries, which using the ability of proteolytically deactivated Cas9-fusion proteins (dCas9) to bind target DNA, which means that gene of interest is not cut but is over-expressed or repressed. It made CRISPR/Cas9 system even more interesting in gene editing. Inactive dCas9 protein modulate gene expression by targeting dCas9-repressors or activators toward promoter or transcriptional start sites of target genes. For repressing genes Cas9 can be fused to KRAB effector domain that makes complex with gRNA, whereas CRISPRa utilizes dCas9 fused to different transcriptional activation domains, which are further directed by gRNA to promoter regions to upregulate expression. [75] [76] [77]

Disease models Edit

Cas9 genomic modification has allowed for the quick and efficient generation of transgenic models within the field of genetics. Cas9 can be easily introduced into the target cells along with sgRNA via plasmid transfection in order to model the spread of diseases and the cell's response to and defense against infection. [78] The ability of Cas9 to be introduced in vivo allows for the creation of more accurate models of gene function and mutation effects, all while avoiding the off-target mutations typically observed with older methods of genetic engineering.

The CRISPR and Cas9 revolution in genomic modeling does not extend only to mammals. Traditional genomic models such as Drosophila melanogaster, one of the first model organisms, have seen further refinement in their resolution with the use of Cas9. [78] Cas9 uses cell-specific promoters allowing a controlled use of the Cas9. Cas9 is an accurate method of treating diseases due to the targeting of the Cas9 enzyme only affecting certain cell types. The cells undergoing the Cas9 therapy can also be removed and reintroduced to provide amplified effects of the therapy. [79]

CRISPR-Cas9 can be used to edit the DNA of organisms in vivo and to eliminate individual genes or even entire chromosomes from an organism at any point in its development. Chromosomes that have been successfully deleted in vivo using CRISPR techniques include the Y chromosome and X chromosome of adult lab mice and human chromosomes 14 and 21, in embryonic stem cell lines and aneuploid mice respectively. This method might be useful for treating genetic disorders caused by abnormal numbers of chromosomes, such as Down syndrome and intersex disorders. [80]

Successful in vivo genome editing using CRISPR-Cas9 has been shown in numerous model organisms, including Escherichia coli, [81] Saccharomyces cerevisiae, [82] Candida albicans, [83] Caenorhabditis elegans, [84] Arabidopsis spp., [85] Danio rerio, [86] and Mus musculus. [87] [88] Successes have been achieved in the study of basic biology, in the creation of disease models, [84] [89] and in the experimental treatment of disease models. [90]

Concerns have been raised that off-target effects (editing of genes besides the ones intended) may confound the results of a CRISPR gene editing experiment (i.e. the observed phenotypic change may not be due to modifying the target gene, but some other gene). Modifications to CRISPR have been made to minimize the possibility of off-target effects. Orthogonal CRISPR experiments are often recommended to confirm the results of a gene editing experiment. [91] [92]

CRISPR simplifies the creation of genetically modified organisms for research which mimic disease or show what happens when a gene is knocked down or mutated. CRISPR may be used at the germline level to create organisms in which the targeted gene is changed everywhere (i.e. in all cells/tissues/organs of a multicellular organism), or it may be used in non-germline cells to create local changes that only affect certain cell populations within the organism. [93] [94] [95]

CRISPR can be utilized to create human cellular models of disease. [96] For instance, when applied to human pluripotent stem cells, CRISPR has been used to introduce targeted mutations in genes relevant to polycystic kidney disease (PKD) and focal segmental glomerulosclerosis (FSGS). [97] These CRISPR-modified pluripotent stem cells were subsequently grown into human kidney organoids that exhibited disease-specific phenotypes. Kidney organoids from stem cells with PKD mutations formed large, translucent cyst structures from kidney tubules. The cysts were capable of reaching macroscopic dimensions, up to one centimeter in diameter. [98] Kidney organoids with mutations in a gene linked to FSGS developed junctional defects between podocytes, the filtering cells affected in that disease. This was traced to the inability of podocytes to form microvilli between adjacent cells. [99] Importantly, these disease phenotypes were absent in control organoids of identical genetic background, but lacking the CRISPR modifications. [97]

A similar approach was taken to model long QT syndrome in cardiomyocytes derived from pluripotent stem cells. [100] These CRISPR-generated cellular models, with isogenic controls, provide a new way to study human disease and test drugs.

Biomedicine Edit

CRISPR-Cas technology has been proposed as a treatment for multiple human diseases, especially those with a genetic cause. [101] Its ability to modify specific DNA sequences makes it a tool with potential to fix disease-causing mutations. Early research in animal models suggest that therapies based on CRISPR technology have potential to treat a wide range of diseases, [102] including cancer, [103] progeria, [104] beta-thalassemia, [105] [106] sickle cell disease, [107] hemophilia, [108] cystic fibrosis, [109] Duchenne's muscular dystrophy, [110] Huntington's disease, [111] [112] and heart disease. [113] CRISPR has also been used to cure malaria in mosquitos, which could eliminate the vector and the disease in humans. [114] CRISPR may also have applications in tissue engineering and regenerative medicine, such as by creating human blood vessels that lack expression of MHC class II proteins, which often cause transplant rejection. [115]

In addition, clinical trials to cure beta thalassemia and sickle cell disease in human patients using CRISPR-Cas9 technology have shown promising results. [116] [117]

CRISPR in the treatment of infection Edit

CRISPR-Cas-based "RNA-guided nucleases" can be used to target virulence factors, genes encoding antibiotic resistance, and other medically relevant sequences of interest. This technology thus represents a novel form of antimicrobial therapy and a strategy by which to manipulate bacterial populations. [118] [119] Recent studies suggest a correlation between the interfering of the CRISPR-Cas locus and acquisition of antibiotic resistance. [120] This system provides protection of bacteria against invading foreign DNA, such as transposons, bacteriophages, and plasmids. This system was shown to be a strong selective pressure for the acquisition of antibiotic resistance and virulence factor in bacterial pathogens. [120]

Therapies based on CRISPR–Cas3 gene editing technology delivered by engineered bacteriophages could be used to destroy targeted DNA in pathogens. [121] Cas3 is more destructive than the better known Cas9. [122] [123]

Research suggests that CRISPR is an effective way to limit replication of multiple herpesviruses. It was able to eradicate viral DNA in the case of Epstein-Barr virus (EBV). Anti-herpesvirus CRISPRs have promising applications such as removing cancer-causing EBV from tumor cells, helping rid donated organs for immunocompromised patients of viral invaders, or preventing cold sore outbreaks and recurrent eye infections by blocking HSV-1 reactivation. As of August 2016 [update] , these were awaiting testing. [124]

CRISPR may revive the concept of transplanting animal organs into people. Retroviruses present in animal genomes could harm transplant recipients. In 2015, a team eliminated 62 copies of a particular retroviral DNA sequence from the pig genome in a kidney epithelial cell. [125] Researchers recently demonstrated the ability to birth live pig specimens after removing these retroviruses from their genome using CRISPR for the first time. [126]

CRISPR and cancer Edit

CRISPR has also found many applications in developing cell-based immunotherapies. [127] The first clinical trial involving CRISPR started in 2016. It involved removing immune cells from people with lung cancer, using CRISPR to edit out the gene expressed PD-1, then administrating the altered cells back to the same person. 20 other trials were under way or nearly ready, mostly in China, as of 2017 [update] . [103]

In 2016, the United States Food and Drug Administration (FDA) approved a clinical trial in which CRISPR would be used to alter T cells extracted from people with different kinds of cancer and then administer those engineered T cells back to the same people. [128]

In November 2020, CRISPR has been used effectively to treat glioblastoma (fast-growing brain tumor) and metastatic ovarian cancer, as those are two cancers with some of the worst best-case prognosis and are typically diagnosed during their later stages. The treatments have resulted in inhibited tumor growth, and increased survival by 80% for metastatic ovarian cancer and tumor cell apoptosis, inhibited tumor growth by 50%, and improved survival by 30% for glioblastoma. [129]

Knockdown/activation Edit

Using "dead" versions of Cas9 (dCas9) eliminates CRISPR's DNA-cutting ability, while preserving its ability to target desirable sequences. Multiple groups added various regulatory factors to dCas9s, enabling them to turn almost any gene on or off or adjust its level of activity. [125] Like RNAi, CRISPR interference (CRISPRi) turns off genes in a reversible fashion by targeting, but not cutting a site. The targeted site is methylated, epigenetically modifying the gene. This modification inhibits transcription. These precisely placed modifications may then be used to regulate the effects on gene expressions and DNA dynamics after the inhibition of certain genome sequences within DNA. Within the past few years, epigenetic marks in different human cells have been closely researched and certain patterns within the marks have been found to correlate with everything ranging from tumor growth to brain activity. [11] Conversely, CRISPR-mediated activation (CRISPRa) promotes gene transcription. [130] Cas9 is an effective way of targeting and silencing specific genes at the DNA level. [131] In bacteria, the presence of Cas9 alone is enough to block transcription. For mammalian applications, a section of protein is added. Its guide RNA targets regulatory DNA sequences called promoters that immediately precede the target gene. [132]

Cas9 was used to carry synthetic transcription factors that activated specific human genes. The technique achieved a strong effect by targeting multiple CRISPR constructs to slightly different locations on the gene's promoter. [132]

RNA editing Edit

In 2016, researchers demonstrated that CRISPR from an ordinary mouth bacterium could be used to edit RNA. The researchers searched databases containing hundreds of millions of genetic sequences for those that resembled CRISPR genes. They considered the fusobacteria Leptotrichia shahii. It had a group of genes that resembled CRISPR genes, but with important differences. When the researchers equipped other bacteria with these genes, which they called C2c2, they found that the organisms gained a novel defense. [133] C2c2 has later been renamed to Cas13a to fit the standard nomenclature for Cas genes. [134]

Many viruses encode their genetic information in RNA rather than DNA that they repurpose to make new viruses. HIV and poliovirus are such viruses. Bacteria with Cas13 make molecules that can dismember RNA, destroying the virus. Tailoring these genes opened any RNA molecule to editing. [133]

CRISPR-Cas systems can also be employed for editing of micro-RNA and long-noncoding RNA genes in plants. [135]

Gene drive Edit

Gene drives may provide a powerful tool to restore balance of ecosystems by eliminating invasive species. Concerns regarding efficacy, unintended consequences in the target species as well as non-target species have been raised particularly in the potential for accidental release from laboratories into the wild. Scientists have proposed several safeguards for ensuring the containment of experimental gene drives including molecular, reproductive, and ecological. [136] Many recommend that immunization and reversal drives be developed in tandem with gene drives in order to overwrite their effects if necessary. [137] There remains consensus that long-term effects must be studied more thoroughly particularly in the potential for ecological disruption that cannot be corrected with reversal drives. [138] As such, DNA computing would be required.

In vitro genetic depletion Edit

Unenriched sequencing libraries often have abundant undesired sequences. Cas9 can specifically deplete the undesired sequences with double strand breakage with up to 99% efficiency and without significant off-target effects as seen with restriction enzymes. Treatment with Cas9 can deplete abundant rRNA while increasing pathogen sensitivity in RNA-seq libraries. [139]

Prime editing Edit

Prime editing [140] (or base editing) is a CRISPR refinement to accurately insert or delete sections of DNA. The CRISPR edits are not always perfect and the cuts can end up in the wrong place. Both issues are a problem for using the technology in medicine. [141] Prime editing does not cut the double-stranded DNA but instead uses the CRISPR targeting apparatus to shuttle an additional enzyme to a desired sequence, where it converts a single nucleotide into another. [142] The new guide, called a pegRNA, contains an RNA template for a new DNA sequence to be added to the genome at the target location. That requires a second protein, attached to Cas9: a reverse transcriptase enzyme, which can make a new DNA strand from the RNA template and insert it at the nicked site. [143] Those three independent pairing events each provide an opportunity to prevent off-target sequences, which significantly increases targeting flexibility and editing precision. [142] Prime editing was developed by researchers at the Broad Institute of MIT and Harvard in Massachusetts. [144] More work is needed to optimize the methods. [144] [143]

Human germline modification Edit

As of March 2015, multiple groups had announced ongoing research with the intention of laying the foundations for applying CRISPR to human embryos for human germline engineering, including labs in the US, China, and the UK, as well as US biotechnology company OvaScience. [145] Scientists, including a CRISPR co-discoverer, urged a worldwide moratorium on applying CRISPR to the human germline, especially for clinical use. They said "scientists should avoid even attempting, in lax jurisdictions, germline genome modification for clinical application in humans" until the full implications "are discussed among scientific and governmental organizations". [146] [147] These scientists support further low-level research on CRISPR and do not see CRISPR as developed enough for any clinical use in making heritable changes to humans. [148]

In April 2015, Chinese scientists reported results of an attempt to alter the DNA of non-viable human embryos using CRISPR to correct a mutation that causes beta thalassemia, a lethal heritable disorder. [149] [150] The study had previously been rejected by both Nature and Science in part because of ethical concerns. [151] The experiments resulted in successfully changing only some of the intended genes, and had off-target effects on other genes. The researchers stated that CRISPR is not ready for clinical application in reproductive medicine. [151] In April 2016, Chinese scientists were reported to have made a second unsuccessful attempt to alter the DNA of non-viable human embryos using CRISPR – this time to alter the CCR5 gene to make the embryo resistant to HIV infection. [152]

In December 2015, an International Summit on Human Gene Editing took place in Washington under the guidance of David Baltimore. Members of national scientific academies of the US, UK, and China discussed the ethics of germline modification. They agreed to support basic and clinical research under certain legal and ethical guidelines. A specific distinction was made between somatic cells, where the effects of edits are limited to a single individual, and germline cells, where genome changes can be inherited by descendants. Heritable modifications could have unintended and far-reaching consequences for human evolution, genetically (e.g. gene-environment interactions) and culturally (e.g. social Darwinism). Altering of gametocytes and embryos to generate heritable changes in humans was defined to be irresponsible. The group agreed to initiate an international forum to address such concerns and harmonize regulations across countries. [153]

In February 2017, the United States National Academies of Sciences, Engineering, and Medicine (NASEM) Committee on Human Gene Editing published a report reviewing ethical, legal, and scientific concerns of genomic engineering technology. The conclusion of the report stated that heritable genome editing is impermissible now but could be justified for certain medical conditions however, they did not justify the usage of CRISPR for enhancement. [154]

In November 2018, Jiankui He announced that he had edited two human embryos to attempt to disable the gene for CCR5, which codes for a receptor that HIV uses to enter cells. He said that twin girls, Lulu and Nana, had been born a few weeks earlier. He said that the girls still carried functional copies of CCR5 along with disabled CCR5 (mosaicism) and were still vulnerable to HIV. The work was widely condemned as unethical, dangerous, and premature. [155] An international group of scientists called for a global moratorium on genetically editing human embryos. [156]

Policy barriers to genetic engineering Edit

Policy regulations for the CRISPR-Cas9 system vary around the globe. In February 2016, British scientists were given permission by regulators to genetically modify human embryos by using CRISPR-Cas9 and related techniques. However, researchers were forbidden from implanting the embryos and the embryos were to be destroyed after seven days. [157]

The US has an elaborate, interdepartmental regulatory system to evaluate new genetically modified foods and crops. For example, the Agriculture Risk Protection Act of 2000 gives the United States Department of Agriculture the authority to oversee the detection, control, eradication, suppression, prevention, or retardation of the spread of plant pests or noxious weeds to protect the agriculture, environment, and economy of the US. The act regulates any genetically modified organism that utilizes the genome of a predefined "plant pest" or any plant not previously categorized. [158] In 2015, Yinong Yang successfully deactivated 16 specific genes in the white button mushroom to make them non-browning. Since he had not added any foreign-species (transgenic) DNA to his organism, the mushroom could not be regulated by the USDA under Section 340.2. [159] Yang's white button mushroom was the first organism genetically modified with the CRISPR-Cas9 protein system to pass US regulation. [160]

In 2016, the USDA sponsored a committee to consider future regulatory policy for upcoming genetic modification techniques. With the help of the US National Academies of Sciences, Engineering, and Medicine, special interests groups met on April 15 to contemplate the possible advancements in genetic engineering within the next five years and any new regulations that might be needed as a result. [161] In 2017, the Food and Drug Administration proposed a rule that would classify genetic engineering modifications to animals as "animal drugs", subjecting them to strict regulation if offered for sale and reducing the ability for individuals and small businesses to make them profitable. [162] [163]

In China, where social conditions sharply contrast with those of the West, genetic diseases carry a heavy stigma. [164] This leaves China with fewer policy barriers to the use of this technology. [165] [166]

Recognition Edit

In 2012 and 2013, CRISPR was a runner-up in Science Magazine's Breakthrough of the Year award. In 2015, it was the winner of that award. [125] CRISPR was named as one of MIT Technology Review ' s 10 breakthrough technologies in 2014 and 2016. [167] [168] In 2016, Jennifer Doudna and Emmanuelle Charpentier, along with Rudolph Barrangou, Philippe Horvath, and Feng Zhang won the Gairdner International award. In 2017, Doudna and Charpentier were awarded the Japan Prize in Tokyo, Japan for their revolutionary invention of CRISPR-Cas9. In 2016, Charpentier, Doudna, and Zhang won the Tang Prize in Biopharmaceutical Science. [169] In 2020, Charpentier and Doudna were awarded the Nobel Prize in Chemistry, the first such prize for an all-female team, "for the development of a method for genome editing." [170]


To assess the reconstruction tools, we performed both a qualitative and quantitative evaluation. As a first step, we created a list of relevant features for genome-scale reconstruction and software quality and we scored each tool depending on the performance (1: poor, 5: outstanding). These features are related to software performance, ease of use, similarity of output networks to high-quality manually curated models and adherence to common data standards. In addition, we evaluated 18 specific features related mostly with the second stage (refinement) of the protocol for generating high-quality genome-scale metabolic reconstructions [5]. The criteria to assign a particular score in each feature is specified in Additional file 1: Table S2. Note that not all the tools were designed for the second stage, so they scored poorly on quite some features. Many of these features have not been assessed in previous reviews [8, 9].

Subsequently, to assess how similar the generated draft networks are to high-quality models, we reconstructed with different reconstruction tools the metabolic networks of two bacteria for which high-quality manually curated genome-scale models already were available. We chose to reconstruct the metabolic network of Lactobacillus plantarum and Bordetella pertussis, representatives of gram-positive and gram-negative bacteria, respectively. These microorganisms were selected because of three reasons. First, the corresponding GSMMs are not stored in the BIGG database, so tools that are able to use the BIGG database (AuReMe, CarveME, MetaDraft, RAVEN) in the reconstruction process cannot use the specific information for these microorganisms. If Escherichia coli or Bacillus subtilis would have been chosen instead we would have favored these tools because high-quality models for E. coli or B. subtilis already exist in the BIGG database and they would have been used as templates or inputs. Second, we chose these microorganisms because we were fully informed of the quality of the reconstructions as we built them ourselves and they have proven to be able to accurately replicate experimental data [11, 12, 42, 43], even by independent researchers [44, 45]. Third, these networks were reconstructed almost entirely in a manual way, so we do not expect any bias for any particular tool.

In addition to the two previous species, we also reconstructed with all the tools draft networks for Pseudomonas putida, for which four lab-independent genome-scale models have been reconstructed. We compared the draft reconstructions with iJP962 [46], a model that is not in the BiGG database, that has been proven to accurately replicate experimental data and to be absent of inconsistencies [47].

The networks were generated using seven tools: AuReMe, CarveMe, Merlin, MetaDraft, ModelSEED, Pathway Tools and RAVEN. These cover most of the freely available software platforms. The general features of these tools are listed in Table 1.

General assessment overview

None of the tools got a perfect score for all of the evaluated features and usually, strengths in some tools are weaknesses in others (Fig. 1, Additional file 1: Figure S3, Tables S25 and S26 to see detailed evaluation). For example, on the one hand, ModelSEED and CarveMe were evaluated as outstanding when we checked whether the whole reconstruction process is automatic Merlin was evaluated as poor because users should interfere more to get a network ready to perform FBA. On the other hand, we consider Merlin as outstanding with respect to a workspace for manual refinement and information to assist users during this step CarveMe and ModelSEED do not provide further information for manual refinement nor a workspace for manual curation, so they were evaluated as poor in this category.

Qualitative assessment of the studied genome-scale metabolic reconstruction tools. We evaluated each of the tools (AU: AuReMe. CA: CarveMe. MD: MetaDraft. ME: Merlin. MS: ModelSEED. PT: Pathway Tools. RA: RAVEN) from an unsatisfactory (red) to an outstanding performance (dark green). In some categories such as continuous software maintenance and proper support, on the top of the figure, all the tools got the maximum score while in others such as automatic refinement using experimental data, none of the tools got the maximum. In most of the cases, strengths in some tools are weaknesses in others

In some cases, all the tools got the maximum score possible. For instance, all the tested tools are properly supported by specialist teams and also maintain up-to-date databases. In other cases, none of the tools got the maximum score. This was the case for automatic refinement of networks using experimental data. Some of the tools, such as ModelSEED and CarveMe, can use media composition to gap-fill the network. AuReMe and Pathway Tools also can use, in addition to media composition, known metabolic products to gap-fill the network. In spite of that, none of the tools can also use Biolog phenotype arrays, knockout experiments and different types of omics data (transcriptomic, proteomic, metabolomic, etc.) to automatically curate the network. Although some efforts have been done in this area [48,49,50,51], this seems like a major challenge for future tool development that should lead to improved metabolic reconstructions.

Compliance with the latest SBML standards has been pointed as one of the critical points to share and represent models [52]. Consequently, we evaluated if the tools use the latest SBML features in the import (inputs) and export (outputs) of networks. For inputs, we checked if the tools were able to read networks in SBML level 3 [22]. We additionally checked if the output networks satisfy the following three features: use of SBML level 3 [22] with FBC annotations [23], SBML groups [24], and MIRIAM compliant CV annotations [22, 53]. These features are used, for example, for models in the BIGG database and they ensure that the information is stored in a standard way. For inputs, we found that among the tools that are able to import and use networks (AuReMe, MetaDraft, RAVEN) all of them are able to use SBML level 3 but AuReMe generated slightly different networks when using SBML level 2. For outputs, MetaDraft and Merlin and RAVEN were the only ones that exported the networks with all the three features. Be aware that networks created with RAVEN have to be exported to SBML using the specific functions of RAVEN (not COBRA functions as a regular COBRA user would expect) because otherwise there will be no MIRIAM annotations in the SBML files. In addition, AuReMe and CarveMe lack MIRIAM compliant CV annotations and SBML Groups, and Pathway Tools and ModelSEED exported the networks in SBML level 2.

Network comparison

We reconstructed draft networks for Lactobacillus plantarum WCFS1, Bordetella pertussis Tohama I and Pseudomonas putida KT2440 with each reconstruction tool. L. plantarum is a lactic acid bacterium (LAB), used in the food fermentation industry and as a probiotic [54,55,56]. Its GSMM comprises 771 unique reactions, 662 metabolites, and 728 genes, and it has been used to design a defined media for this LAB [43], to explore interactions with other bacteria [57] and as a reference for reconstructing other LAB [58]. In contrast to this LAB, B. pertussis is a gram-negative bacterium, and the causative agent of the Whooping cough, a highly contagious respiratory disease [59]. The metabolic network of this pathogen was recently reconstructed, and it comprises 1672 unique reactions, 1255 metabolites, and 770 genes. As B. pertussis, Pseudomonas putida is also a gram-negative bacterium but the interest in this species relies on its capability as a cell factory to produce a wide variety of bulk and fine chemicals of industrial importance [60]. Its metabolic network comprises 1069 unique reactions, 987 metabolites, and 962 genes. While L. plantarum and B. pertussis are the main subject in the network comparisons, P. putida was used, as a model developed independently from us, to validate the tendencies obtained with the two previous species.

In total, 29 networks were created for L. plantarum, 27 for B. pertussis, and 27 for P. putida. The specific inputs and parameters for creating each network can be found in Additional file 1: File S1. Genes, metabolites, and reactions were extracted from the SBML files and compared with those in the manually curated model. For convenience, the manually curated model of L. plantarum, B. pertussis, and P. putida will be called hereafter iLP728, iBP1870, and iJP962, respectively.

Comparison of gene sets

Genes are the basis from which the genome-scale model is reconstructed. When a gene is included in a metabolic reconstruction, there is at least one biochemical reaction associated with that gene. When a gene is not in the reconstruction, either the reconstruction tool could not find an orthologous gene in the reference database or an orthologous gene was found but no biochemical reaction is associated with that gene. Gene sets are interesting to compare because if a gene present in the manually curated model is absent in a draft reconstruction, that could explain why some biochemical reactions are missing in the draft. Alternatively, if a gene is absent in the manually curated model but present in a draft reconstruction, that could explain the presence of reactions that should not be in the reconstruction. Moreover, gene sets are simple to compare among reconstructions because gene identifiers in all the cases are the same (the locus tag in the genome annotation) and so, in contrast to metabolites and reactions, there is no mapping-related bias in the comparison.

To assess how similar the draft networks were to the corresponding manually curated networks we calculated the Jaccard distance (JD) as well as the ratio between the percentage of covered genes and the percentage of additional genes (R) (Additional file 1: Tables S4–S7). The JD has been used before to measure the distance between genome-scale metabolic reconstructions, based on reaction sets [61] here, we also applied it to compare reconstructions in terms of genes and metabolites. We called JDg, JDr, and JDm to the JD between two reconstructions when they are compared in terms of genes, reactions and metabolites, respectively. Analogously, we called Rg, Rr, and Rm to the R when reconstructions are compared in terms of genes, reactions and metabolites, respectively. In general terms, a value of 0 in the JD means that the networks are identical and a value of 1 means that the networks do not share any element. For the R, higher values reflect a higher similarity to the original network and lower values reflect a lower similarity with the original network.

The values in the JDg ranged from 0.38 to 0.60 in L. plantarum and from 0.43 to 0.67 in B. pertussis (Additional file 1: Tables S4 and S5), while values in the Rg ranged from 1.18 to 13.16 in L. plantarum and from 0.84 to 3.52 in B. pertussis (Additional file 1: Tables S6 and S7). Although the similarity of the generated draft networks seems slightly better for L. plantarum than for B. pertussis, we found that it depends on which metric is analyzed. With the exception of one network, the Rg showed that all the draft networks of L. plantarum were more similar to iLP728 than the draft networks of B. pertussis to iBP1870, using the analog parameter settings. In contrast, the JDg showed that AuReMe, ModelSEED, RAVEN, and Merlin generated draft networks of L. plantarum which are more similar to iLP728 than the draft networks of B. pertussis with regard to iBP1870, and that CarveMe, MetaDraft, and Pathway Tools generated draft networks slightly more similar for B. pertussis. In general, similar values of JDg and Rg were obtained for P. putida (Additional file 1: File S3).

Additionally, when sorting the values of both metrics, we noticed that the JDg order does not correspond to that made with the Rg. The lowest JDg among the draft reconstructions for L. plantarum was obtained in the network generated with AuReMe when the gram-positive set of templates was used for B. pertussis, it was obtained with MetaDraft. In contrast, the highest Rg among the draft reconstructions for L. plantarum was obtained in the network generated with AuReMe when only Lactococcus lactis was used as template for B. pertussis, it was obtained with MetaDraft when Escherichia coli template was used.

Although the similarity scores for both metrics are not entirely consistent, some trends were observed. The networks more similar, in terms of genes, to the manually curated models were generated by MetaDraft, AuReMe, and RAVEN (Fig. 2). However, since parameters settings and inputs have a big effect on the similarity scores, the usage of these tools does not automatically ensure obtaining a draft network similar, in terms of genes, to a manually curated model. This is particularly true for RAVEN which also generated some networks with high JDg and low Rg scores. The same trends were obtained for P. putida (Additional file 1: Figure S2).

Jaccard distance versus the ratio between coverage and additional genes for draft reconstructions. We used the Jaccard distance and the ratio to measure the similarity between the draft reconstructions and the corresponding manually curated models, in this case, when the networks are analyzed in terms of genes. Draft reconstructions for Lactobacillus plantarum and Bordetella pertussis are represented in panels a and b, respectively. For both cases, the networks more similar to the manually curated models are located on the top left side of each plot. Thus, the draft reconstructions more similar to the manually curated models were created by AuReMe, MetaDraft, and RAVEN

We further analyzed the percentage of genes covered in the manually curated models and the percentage of genes not in the manually curated models to explain differences in Rg. For all the species we observed a wide variation in both variables (Figs. 3, 4 and Additional file 1: Figure S7). Among the five networks of L. plantarum with the highest coverage, two were created with AuReMe and three with RAVEN for B. pertussis, four were created with RAVEN and one with CarveMe. However, the networks created with RAVEN that recovered the highest percentages of genes also added a large number of genes which were not present in the manually curated models, decreasing the values in the Rg. In addition, AuReMe and MetaDraft created conservative draft networks with the lowest number of additional genes, which explains the higher values in the Rg. Finally, tools such as ModelSEED, Pathway Tools, and Merlin consistently created reconstructions with gene coverages not ranging in the highest values (in comparison with other networks) and adding a relatively large number of genes not present in the manually curated models, which explains why they had lower values in the Rg.

Overlap of genes in draft reconstructions for Lactobacillus plantarum with those in the manually curated model. In total, 29 networks were reconstructed with 7 tools (CarveMe: CA MetaDraft: MD AuReMe: AU Pathway Tools: PT ModelSEED: MS RAVEN: RA Merlin: ME). Several reconstructions, which are represented with different sub-indices, were generated for each tool using different parameters settings. Numbers inside bars represent percentages with respect to the total number of genes in iLP728. The coverage (blue bars) ranged from 49.7 to 87.8% while the percentage of additional genes (yellow bars) ranged from 4.3 to 65.0%. Most of the genes that were not recovered (dark green bars) are related to very specific metabolic functions that were carefully incorporated during the manual curation of iLP728 such as polysaccharide biosynthesis and transport

Overlap of genes in draft reconstructions for Bordetella pertussis with those in the manually curated model. In total, 27 networks were reconstructed with 7 tools (CarveMe: CA MetaDraft: MD AureME: AU Pathway Tools: PT RAVEN: RA Merlin: ME). Several reconstructions, which are represented with different sub-indices, were generated for each tool using different parameters settings. Numbers inside bars represent percentages with respect to the total number of genes in iBP1870. The coverage (blue bars) ranged from 49.4 to 83.0% while the percentage of additional genes (yellow bars) ranged from 18.6 to 99.0%. The genes that were not recovered (dark green bars) are related to very specific metabolic functions that were carefully incorporated during the manual curation of iBP1870 such as transport and ferredoxin/thioredoxin-related reactions

For L. plantarum we found 1613 different genes in total with all the tools, of which 885 were not present in iLP728. For B. pertussis, 1888 different genes were found, of which 1118 were not present in iBP1870. In addition, 79 genes were correctly predicted in all the draft networks for iLP728 for iBP1870, this was 131 genes. The distribution of metabolic pathways associated to those genes is wide for both species, with carbohydrate metabolism and amino acid metabolism accounting for more than 50% of the metabolic processes (Additional file 1: Tables S8 and S9). Additionally, 35 and 39 genes were not recovered in any network for iLP728 and iBP1870, respectively. The metabolic functions associated to those genes were very specific, with polysaccharide biosynthesis (63%) and transport (22%) top in the list for L. plantarum and with transport (41%) and ferredoxin/thioredoxin related reactions (30%) for B. pertussis. Finally, one gene in L. plantarum, which was associated with riboflavin biosynthesis, was recovered by all the networks but it was not present in iLP729. For B. pertussis, three such genes were found. These genes were associated with alternate carbon metabolism and cell envelope biosynthesis.

Comparison of reaction sets

Genes and biochemical reactions are connected within a reconstruction through gene-protein-reaction (GPR) associations. However, genes and reactions relationships are ultimately represented in reconstructions as boolean rules known as gene-reaction rules. With the exception of exchange, sink, demand, spontaneous and some transport reactions (e.g., those governed by diffusion), each reaction has a defined gene-reaction rule in the reference database used by each reconstruction tool. During the process of reconstruction, if orthologous genes are found that satisfy the gene-reaction rule of a particular reaction, that reaction is included in the draft reconstruction. Other reactions may be added to the draft reconstruction based on others criteria, such the probability of a particular pathway to exist in the microorganism under study or the need to fill particular gaps in the network in order to produce biomass. Nonetheless, we expect that networks which are more similar in terms of genes will also be more similar in terms of reactions.

In contrast to genes, however, reactions are labeled with different identifiers in different databases. Thus, the same reaction can be stored with two different identifiers in two different databases. During the reconstruction process, reactions are added from the reference database to the draft reconstruction and tools using different databases will generate reconstructions comprising reactions with different identifiers. We, therefore, used MetaNetX [62] to map reactions among reconstructions built with different databases. In this approach, reactions were compared using their identifiers (case sensitive string comparison). In addition, we compared networks using reaction equations, i.e., we compared reactions using their attributes instead of their identifiers. In this second approach, we considered that two reactions were the same if they had the same metabolites with the same stoichiometric coefficients. Some exceptions were made to also match reactions that differ only in proton stoichiometry (due to differences in metabolites charge) or to catch reactions which are written in the opposite direction (reactants in the side of products). We decided to include exchange reactions in the network comparison for completeness because CarveMe and ModelSEED automatically generate them as they are non-gene associated reactions, this automatically lowers the scores for the other tools that do not add exchange reactions. For most networks, comparison through reaction identifiers resulted in a lower percentage of coverage than through reaction equation comparison (Additional file 1: Tables S10 and S11). This lower coverage was due to some missing relationships between different databases in MetaNetX, which we discovered when comparing with the reaction equations. In total, 220 new unique reaction synonyms pairs were automatically discovered for both species with the second approach (Additional file 1: Table S12). To further overcome the missing relationships in MetaNetX, a semi-automatic algorithm was developed to assist the discovery of new metabolite synonyms. In total, 187 new metabolites synonyms were discovered (Additional file 1: Table S13) which led to the discovery of 282 additional reaction synonyms (Additional file 1: Table S14).

The comparison through reaction equations showed a wide variation in reaction coverage and percentage of additional reactions for all the species (Figs. 5 and 6 and Additional file 1: Figure S8). In addition, for those networks created with RAVEN (KEGG), ModelSEED, and Merlin, we observed a considerable number of reactions with a partial match with the manually curated model. These partial matches emerge from differences in proton stoichiometry, which indicates the existence of metabolites with different charge than those found in the manually curated models. In contrast to the gene sets comparison, where the coverage was as high as 88% and 83%, we only observed a maximum coverage of 72% and 58%, for L. plantarum and B. pertussis, respectively, even when considering partial matches. We classified the reactions that were not recovered in different categories (Additional file 1: Figures S3–S6) and we found that the low reaction coverage can be explained mainly by three reasons.

Overlap of reactions in draft reconstructions for Lactobacillus plantarum with those in the manually curated model. In total, 29 networks were reconstructed with 7 tools (CarveMe: C, MetaDraft: D, AuReMe: A, Pathway Tools: P, ModelSEED: S, RAVEN: R, Merlin: E). Several reconstructions, which are represented with different sub-indices, were generated for each tool using different parameters settings. Numbers inside bars represent percentages with respect to the corrected number of reactions in iLP728, which is the total number of reactions in iLP728 minus the biomass-related reactions (light green). We observed a wide variation in the coverage (blue bars) and the percentage of additional reactions (yellow bars). In addition, a considerable number of reactions in the networks build with ModelSEED, RAVEN (KEGG), and Merlin contained different stoichiometry for protons than those in iLP728 (dark green bars)

Overlap of reactions in draft reconstructions for Bordetella pertussis with those in the manually curated model. In total, 27 networks were reconstructed with 7 tools (CarveMe: C, MetaDraft: D, AuReMe: A, Pathway Tools: P, ModelSEED: S, RAVEN: R, Merlin: E). Several reconstructions, which are represented with different sub-indices, were generated for each tool using different parameters settings. Numbers inside bars represent percentages with respect to the corrected number of reactions in iBP1870, which is the total number of reactions minus the biomass-related reactions (light green). We observed a wide variation in the coverage (blue bars) and the percentage of additional reactions (yellow bars). In addition, a considerable number of reactions in the networks build with MODELSEED, RAVEN (KEGG), and Merlin contained different stoichiometry for protons than those in iBP1870 (draft green bars)

First, both manually curated models contain a considerable amount of reactions without gene-associations, including spontaneous, transport, exchange reactions, reactions added during the manual gap-filling and biomass-related reactions. For L. plantarum and B. pertussis, there are 241 and 657 of such reactions, representing 31% and 39% of the network, respectively. With the exception of CarveMe and ModelSEED, which can perform automatic gap-filling, all the rest of the tools are not able to recover most of the non-gene associated reactions, mainly because all the tools predict reactions based on genomic evidence. Thus, for both species, around, 50% of the reactions that were not recovered do not have gene-reaction associations in the manually curated model. Without considering exchange reactions, the coverage roughly increased by 15% and 12% for L. plantarum and B. pertussis, respectively, except for CarveMe and ModelSEED. Second, in around 30% of the reactions that were not recovered, there are at least 50% of the associated genes missing in the draft reconstructions. Third, even when all the genes associated with a particular reaction are recovered, specific substrate and cofactor usage is difficult to predict. Many times, the tools predict the correct metabolic activity but they fail in predicting the specific substrate used in the manually curated models. We created a collection of plain text files containing hundreds of examples where the associated genes were recovered by the tool but the reaction does not correspond to the one in the manually curated model because of different substrates (see section availability of data for details).

We again calculated the JDr and the Rr to assess how similar the networks were, in this case in terms of reactions. The first observation we made is that, independent of the metric and for both species, each reconstruction was less similar in terms of reactions than in terms of genes, which is consistent with the decrease in coverage. In addition, as in the gene comparison, the order of scores for the Rg and the Rr by magnitude was not the same. If we compare the similarity scores for reaction sets with the ones for gene sets, we see almost the same trend but with one difference. AuReMe and MetaDraft are still the tools with the best similarity scores but now CarveMe goes up in the list of scores and RAVEN goes down (Fig. 7, Additional file 1: Tables S4–S7). This was particularly true for B. pertussis where two networks reconstructed with CarveMe got the two first places in the JDr list. Almost the same trend was observed for P. putida (Additional file 1: Figure S2) being the higher scores for RAVEN instead of CarveMe the main difference.

Jaccard distance versus the ratio between coverage and percentage of additional reactions for draft reconstructions. We used the Jaccard distance and the ratio to measure the similarity between the draft reconstructions and the corresponding manually curated model, in this case, when the networks are analyzed in terms of reactions. Draft reconstructions for Lactobacillus plantarum and Bordetella pertussis are represented in panels a and b, respectively. For both cases, the networks more similar to the manually curated models are located on the top left side of the plot. Thus, the draft reconstructions more similar, in terms of reactions, to the manually curated models were created by AuReMe, MetaDraft, and CarveMe

Although RAVEN generated some reconstructions with high gene sets similarity to the manually curated models, it did not for reaction sets similarity. We, therefore, analyzed one of the networks reconstructed with RAVEN in more detail, one that was consistently in the top 5 list for both species for both metrics. We found one main reason for the decrease in performance. The analyzed network was created based on KEGG, so metabolites were not labeled as intracellular or extracellular. Hence, no transport or exchange reactions were present. Although there are functions to incorporate this kind of reactions in RAVEN, that is considered as manual curation because users must specify which compounds should be transported, and we here only tested how much work would it take to transform these draft networks into high-quality reconstructions.

We further analyzed reactions that were present and absent in all the reconstructions to understand which kind of metabolic processes they were related. Sixty-six reactions in iLP728 and 98 in iBP1870 were always found in all the draft networks. In agreement with the gene sets analysis, the associated metabolic processes are mainly amino acid metabolism, nucleotide metabolism, and carbohydrate metabolism (Additional file 1: Tables S15 and S16). Additionally, 165 reactions in iLP1870 and 598 in iBP1870 were not found by any tool. In both species, around 10% of those reactions were biomass-related reactions and from the rest, most of them were exchange reactions, transport reactions without gene associations and reactions in other categories that were not in the BIGG database (Additional file 1: Tables S17 and S18). Only one reaction, associated to amino acid metabolism, was found in all the draft networks of L. plantarum but not in iLP728 four reactions, associated mainly to carbohydrate metabolism, were found in all the draft networks but not in iBP1870.

Comparison of metabolite sets

Other important elements within metabolic reconstructions are metabolites. When a biochemical reaction is added to the draft network during the reconstruction process, all the reactants and products are added to the network too. As the draft metabolic networks were created with different tools, each of which using its own set of databases, they had different identifiers for the same metabolite. For those networks whose identifiers were different from BIGG, we again used MetaNetX and our own additional dictionary to map metabolites.

We calculated the JDm and the Rm to assess the metabolite sets similarity. For almost all the draft networks in both species, the values in the JDm were between the JDg and the JDr we found the same for the Rm (Additional file 1: Tables S4–S7). Again, when sorting the networks according to their metric scores, we found the same trends than for reaction sets. The first position in the lists were networks either reconstructed with MetaDraft, AureMe, or CarveMe. Moreover, independently of the metric and the species, MetaDraft reconstructed 40% of the networks among those in the top 5.

Two hundred six metabolites in iLP728 and 271 in iBP1870 were correctly predicted in all the draft networks. These metabolites were in both cases mainly associated with carbohydrate metabolism and amino acid metabolism (Additional file 1: Tables S19 and S20). Eighty-one metabolites in iLP728 and 278 in iBP1870 were not recovered in any network. Of those, 16 were related to the biomass of L. plantarum and 16 others were not in the BIGG database. For iBP1870, 44 were biomass-related and 47 others were not in the BIGG database. Finally, 9 and 11 metabolites were recovered in all the networks but they were not present in iLP728 and iBP1870, respectively. Mainly, they were associated to the metabolism of cofactors and vitamins and amino acid metabolism in the case of L. plantarum and carbohydrate metabolism and glycan biosynthesis in the case of B. pertussis (Additional file 1: Tables S21 and S22).

Topological analysis

To compare the topological features of each network, we calculated the number of dead-end metabolites, the number of orphan reactions, the number of unconnected reactions and other metrics (Additional file 1: Tables S23 and S24).

iLP728 has 113 dead-end metabolites while iBP1870 has 59. This is consistent with the observation that many pathways are disrupted in L. plantarum leading for example to well-known auxotrophies for many amino acids [42, 43]. With the exception of CarveMe, all the tools generated networks with a high number of dead-end metabolites, ranging from 244 and 999, and from 379 to 976, for L. plantarum and B. pertussis, respectively. The low number of dead-end metabolites in CarveMe is caused by the use of a manually curated universal model as a template which lacks dead-end metabolites.

Without considering exchange and demand/sink reactions, 127 and 449 reactions without gene associations (called orphan reactions) were found in iLP728 and iBP1870, respectively. These reactions are mainly associated with transport amino acid metabolism, and biomass formation. MetaDraft, AuReMe, and RAVEN returned metabolic networks with no orphan reactions. These tools only include reactions with genomic evidence and others lacking this support are not included. ModelSEED returned networks with a low amount of orphan reactions, which are related to exchange reactions. In contrast, CarveMe, Pathway Tools, and Merlin returned networks with a significantly larger number of orphan reactions (ranging from 66 to 491 in L. plantarum and from 115 to 736 in B. pertussis). For CarveMe, this is due to the inclusion of transport and spontaneous reactions as well as reactions needed to create biomass (from gap-filling) for Pathway tools, it is because of the addition of reactions to complete probable pathways and spontaneous reactions and for Merlin, this is solely due to spontaneous reactions.

Genome editing in the human liver: Progress and translational considerations

Liver-targeted genome editing offers the prospect of life-long therapeutic benefit following a single treatment and is set to rapidly supplant conventional gene addition approaches. Combining progress in liver-targeted gene delivery with genome editing technology, makes this not only feasible but realistically achievable in the near term. However, important challenges remain to be addressed. These include achieving therapeutic levels of editing, particularly in vivo, avoidance of off-target effects on the genome and the potential impact of pre-existing immunity to bacteria-derived nucleases, when used to improve editing rates. In this chapter, we outline the unique features of the liver that make it an attractive target for genome editing, the impact of liver biology on therapeutic efficacy, and disease specific challenges, including whether the approach targets a cell autonomous or non-cell autonomous disease. We also discuss strategies that have been used successfully to achieve genome editing outcomes in the liver and address translational considerations as genome editing technology moves into the clinic.

Keywords: AAV CRISPR/Cas9 Gene correction Genetic liver disease Genome editing Liver OTC deficiency User-designed nuclease.

Modeling PIDs Using Genome Editing

In vitro Models

Even though gene targeting using engineered endonucleases is not ready to be applied in a clinical setting, it already offers a valuable tool to model diseases at the cellular level. CRISPR/Cas9 has been shown to be particularly efficient in the generation of knockout cell lines as described above, after a DSB has been introduced into the genomic region matching the protospacer sequence of the gRNA, the cell uses either HDR or NHEJ to repair the defect. NHEJ can result in indels, which in turn can cause frameshifts and the occurrence of premature stop codons. A knockout cell generated in this way can be clonally expanded into a cell line that can be used for modeling studies. In addition, a useful property of the CRISPR/Cas9 system is that multiple genes can be knocked out simultaneously if several gRNAs are used together (multiplexing). A particular advantage compared to RNA interference is that CRISPR/Cas9 can be used to target regions in the non-coding genome [e.g., promoter and enhancer regions (98�)].

The advent of next-generation sequencing has stimulated a new wave of discovery of novel inborn errors of immunity (102). The ability to correct patient-specific iPSCs, or conversely, to introduce patient-specific mutations into a wild-type iPSC line using endonucleases represents an invaluable tool to prove the pathogenicity of newly discovered mutations and to gain insight into disease mechanisms in different cell types, depending on patients’ phenotypes. This approach also makes it possible to study the contribution of genetic background to the phenotypes arising from specific mutations by comparing patient-derived iPSCs with wild-type iPSCs into which the same mutations are introduced (Figure 3). In a recent study, iPSCs were generated from patients with Parkinson disease caused by the G2019S mutation of the LRRK2 gene and from healthy controls. When comparing the whole-genome gene expression patterns, the investigators found a high degree of heterogeneity among the different iPSCs lines. However, when they used ZFNs to correct the mutation in three of the patient-derived iPSC lines and compared these lines to the original lines, and when they introduced the mutation into a healthy control line and compared this line to the original line, the lines were much more closely matched with respect to gene expression (83). This shows the importance of comparing isogenic lines, as confounding due to differences in genetic background is minimized.

Figure 3. In vitro modeling. (A) Induced pluripotent stem cells (iPSCs) are reprogrammed from a patient(s) and from a healthy control(s). The iPSCs are differentiated into a cell type of interest, and the phenotypes of the patient-derived cells are compared to the phenotypes of the healthy control cells. The cells that are compared do not have the exact same genetic background (genetically and epigenetically unmatched). This can lead to confounding. (B) Using genome editing with engineered nucleases like ZFNs, TALENs, and CRISPR/Cas9, a pathogenetic mutation can be corrected in patient-derived cells or introduced into healthy control cells, and isogenic cell lines (i.e., identical genetic background) can be compared for relevant phenotypes.

Animal Models

Traditionally, animal models have been generated using homologous recombination: embryonic stem cells (ESCs) are electroporated with a highly homologous DNA template containing the sequence to be inserted but without using an engineered endonuclease to introduce a DSB. This approach results in a very low efficiency and requires the inclusion of an antibiotic resistance gene in the inserted sequence for the selection of cells in which HDR has occurred. ESCs with the desired inserted sequence are then expanded, injected in blastocysts, and subsequently implanted in pseudogestant females. The resulting chimeric animals have to be further bred until the introduced mutation is transmitted through the germline. With the currently available genome-editing tools, this process can be greatly streamlined. Via the introduction of a DSB at the desired target site, a specific sequence can be efficiently introduced into ESCs without the need for an antibiotic resistance gene. Furthermore, to create a gene knockout, one can simply rely on NHEJ to produce indels leading to frameshifts and early stop codons.

In recent years, many animal models have been successfully generated using ZFNs, TALENs, or CRISPR/Cas9. TALENs and CRISPR/Cas9 have been used to generate knockout Caenorhabditis elegans models by injecting the endonucleases into the gonads (103�). Similarly, more complicated animal models can be generated by injecting the endonucleases in mRNA form directly in zygotes (Figure 4). In the case of CRISPR/Cas9, this means that both the Cas9 and gRNA in RNA form are injected. Knock-in models can be generated by adding a DNA template to the injection mix, usually in the form of a single-stranded DNA oligonucleotide. Zebrafish models have been generated by injecting ZFNs or TALENs or CRISPR/Cas9 directly into the zygote (106�). This has been done in murine zygotes with ZFNs (110�), TALENs (113, 114), and extremely efficiently with CRISPR/Cas9 (115�). New mouse models can be generated in just a few weeks, instead of taking 1𠄲 years as in the conventional strategy. With CRISPR/Cas9, the specific gRNA needed for the injections can be generated in a simple one-day procedure (118). NSG mice have been efficiently generated in this way (119). In other studies, the IgM locus has been successfully knocked out in rats via the injection of ZFNs and TALENs directly into the zygotes (120, 121). Similarly, a rat model of X-SCID has been generated using ZFNs (122). The multiplexing capacity of CRISPR/Cas9 has allowed for multiple genes being knocked out simultaneously (123). Endonucleases have been used to generate knockout models in animals not previously amenable to efficient genetic modification: rabbits with IL2RG, RAG1, or RAG2 knockout (124�) hamsters with STAT2 knockout (128) mutant pigs (129�) and most impressively, monkeys with RAG1 knockout (132). These kinds of animal models will enable disease studies of unprecedented sophistication.

Figure 4. Schematic representation of zygote injection with CRISPR/Cas9. (A) Injection of gRNA and Cas9 will lead to indels that can lead to a frameshift and an early stop codon thereby creating Knockout (KO) mice. (B) Addition of a highly homologous DNA template containing a specific mutation will result in Knock-in (KI) mice, through the process of homology-directed repair (HDR). Reagents are injected in the cytoplasm of the zygote. Alternatively, these can be injected in the pronucleus of the zygote, but cytoplasmic microinjection is simpler and less toxic.


Brüssow H, Hendrix RW: Phage genomics: small is beautiful. Cell 2002, 108: 13-16. 10.1016/S0092-8674(01)00637-7

Timms AR, Cambray-Young J, Scott AE, Petty NK, Connerton PL, Clarke L, Seeger K, Quail M, Cummings N, Maskell DJ, et al.: Evidence for a lineage of virulent bacteriophages that target Campylobacter. BMC Genomics 2010, 11: 214. 10.1186/1471-2164-11-214

Carvalho C, Susano M, Fernandes E, Santos S, Gannon B, Nicolau A, Gibbs P, Teixeira P, Azeredo J: Method for bacteriophage isolation against target Campylobacter strains. Lett Appl Microbiol 2010, 50: 192-197. 10.1111/j.1472-765X.2009.02774.x

Carvalho C, Gannon B, Halfhide D, Santos S, Hayes C, Roe J, Azeredo J: The in vivo efficacy of two administration routes of a phage cocktail to reduce numbers of Campylobacter coli and Campylobacter jejuni in chickens. BMC Microbiol 2010, 10: 232. 10.1186/1471-2180-10-232

Atterbury RJ, Connerton PL, Dodd CER, Rees CED, Connerton IF: Isolation and characterization of Campylobacter bacteriophages from retail poultry. Appl Environ Microbiol 2003, 69: 4511-4518. 10.1128/AEM.69.8.4511-4518.2003

Hwang S, Yun J, Kim K-P, Heu S, Lee S, Ryu S: Isolation and characterization of bacteriophages specific for Campylobacter jejuni. Microbiol Immunol 2009, 53: 559-566. 10.1111/j.1348-0421.2009.00163.x

Hendrix RW: Jumbo bacteriophages. Curr Top Microbiol Immunol 2009, 328: 229-240. 10.1007/978-3-540-68618-7_7

Filée J, Bapteste E, Susko E, Krisch HM: A selective barrier to horizontal gene transfer in the T4-type bacteriophages that has preserved a core genome with the viral replication and structural genes. Mol Biol Evol 2006, 23: 1688-1696. 10.1093/molbev/msl036

Sambrook J, Russell DW: Molecular cloning: A Laboratory Manual. 3rd edition. Cold Spring Harbor: Cold Spring Harbor Laboratory Press 2001.

Moreira D: Efficient removal of PCR inhibitors using agarose-embedded DNA preparations. Nucleic Acids Res 1998, 26: 3309-3310. 10.1093/nar/26.13.3309

Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997, 25: 955-964. 10.1093/nar/25.5.955

Käll L, Krogh A, Sonnhammer ELL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol 2004, 338: 1027-1036. 10.1016/j.jmb.2004.03.016

Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004, 340: 783-795. 10.1016/j.jmb.2004.05.028

Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 2003, 31: 3406-3415. 10.1093/nar/gkg595

Darling AE, Mau B, Perna NT: Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010, 5: e11147. 10.1371/journal.pone.0011147

Zafar N, Mazumder R, Seto D: CoreGenes: a computational tool for identifying and cataloging "core" genes in a set of small genomes. BMC bioinform 2002, 3: 12. 10.1186/1471-2105-3-12

Shevchenko A, Wilm M, Vorm O, Mann M: Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal Chem 1996, 68: 850-858. 10.1021/ac950914h

Searle BC: Scaffold: a bioinformatic tool for validating MS/MS-based proteomic studies. Proteomics 2010, 10: 1265-1269. 10.1002/pmic.200900437

Kropinski AM, Prangishvili D, Lavigne R: Position paper: the creation of a rational scheme for the nomenclature of viruses of Bacteria and Archaea. Environ Microbiol 2009, 11: 2775-2777. 10.1111/j.1462-2920.2009.01970.x

Villegas A, She Y-M, Kropinski AM, Lingohr EJ, Mazzocco A, Ojha S, Waddell TE, Ackermann H-W, Moyles DM, Ahmed R, Johnson RP: The genome and proteome of a virulent Escherichia coli O157:H7 bacteriophage closely resembling Salmonella phage Felix O1. Virol J 2009, 6: 41. 10.1186/1743-422X-6-41

Hallin PF, Stærfeldt H-H, Rotenberg E, Binnewies TT, Benham CJ, Ussery DW: GeneWiz browser: an interactive tool for visualizing sequenced chromosomes. Stand Genomic Sci 2009, 1: 204-215.

Villegas A, Kropinski AM: An analysis of initiation codon utilization in the Domain Bacteria - concerns about the quality of bacterial genome annotation. Microbiol 2008, 154: 2559-2661. 10.1099/mic.0.2008/021360-0

Miller ES, Kutter E, Mosig G, Arisaka F, Kunisawa T, Ru W: Bacteriophage T4 Genome. Microbiol Mol Biol Rev 2003, 67: 86-156. 10.1128/MMBR.67.1.86-156.2003

Allen JR, Lasser GW, Goldman Da, Booth JW, Mathews CK: T4 phage deoxyribonucleotide-synthesizing enzyme complex. Further studies on enzyme composition and regulation. J Biol Chem 1983, 258: 5746-5753.

Chiu CS, Cook KS, Greenberg GR: Characteristics of a bacteriophage T4-induced complex synthesizing deoxyribonucleotides. J Biol Chem 1982, 257: 15087-15097.

Tseng MJ, He P, Hilfinger JM, Greenberg GR: Bacteriophage T4 nrdA and nrdB genes, encoding ribonucleotide reductase, are expressed both separately and coordinately: characterization of the nrdB promoter. J Bacteriol 1990, 172: 6323-6332.

Young P, Ohman M, Sjöberg BM: Bacteriophage T4 gene 55.9 encodes an activity required for anaerobic ribonucleotide reduction. J Biol Chem 1994, 269: 27815-27818.

Young P, Andersson J, Sahlin M, Sjöberg BM: Bacteriophage T4 anaerobic ribonucleotide reductase contains a stable glycyl radical at position 580. J Biol Chem 1996, 271: 20770-20775. 10.1074/jbc.271.34.20770

Kan S-C, Liu J-S, Hu H-Y, Chang C-M, Lin W-D, Wang W-C, Hsu W-H: Biochemical characterization of two thymidylate synthases in Corynebacterium glutamicum NCHU 87078. Biochimica et Biophysica Acta (BBA) - Proteins & Proteomic 2010, 1804: 1751-1759. 10.1016/j.bbapap.2010.05.006

Chace KV, Hall DH: Isolation of mutants of bacteriophage T4 unable to induce thymidine kinase activity. II. Location of the structural gene for thymidine kinase. J Virol 1975, 15: 855-860.

Mosig G: Recombination-dependent DNA replication in bacteriophage T4. Annu Rev Genet 1998, 32: 379-413. 10.1146/annurev.genet.32.1.379

Mesyanzhinov VV, Leiman PG, Kostyuchenko VA, Kurochkina LP, Miroshnikov KA, Sykilinda NN, Shneider MM: Molecular architecture of bacteriophage T4. Biochemistry Biokhimiia 2004, 69: 1190-1202. 10.1007/s10541-005-0064-9

Jing DH, Dong F, Latham GJ, von Hippel PH: Interactions of bacteriophage T4-coded primase (gp61) with the T4 replication helicase (gp41) and DNA in primosome formation. J Biol Chem 1999, 274: 27287-27298. 10.1074/jbc.274.38.27287

Fokine A, Chipman PR, Leiman PG, Mesyanzhinov VV, Rao VB, Rossmann MG: Molecular architecture of the prolate head of bacteriophage T4. Proc Natl Acad Sci USA 2004, 101: 6003-6008. 10.1073/pnas.0400444101

Ishmael FT, Trakselis MA, Benkovic SJ: Protein-protein interactions in the bacteriophage T4 replisome. The leading strand holoenzyme is physically linked to the lagging strand holoenzyme and the primosome. J Biol Chem 2003, 278: 3145-3152. 10.1074/jbc.M209858200

Berger JM, Gamblin SJ, Harrison SC, Wang JC: Structure and mechanism of DNA topoisomerase II. Nature 1996, 379: 225-232. 10.1038/379225a0

Carles-Kinch K, George JW, Kreuzer KN: Bacteriophage T4 UvsW protein is a helicase involved in recombination, repair and the regulation of DNA replication origins. EMBO J 1997, 16: 4142-4151. 10.1093/emboj/16.13.4142

Dudas KC, Kreuzer KN: UvsW protein regulates replication by unwinding R-loops UvsW protein regulates bacteriophage T4 origin-dependent replication by unwinding R-loops. Mol Cell Biol 2001, 21: 2706-2715. 10.1128/MCB.21.8.2706-2715.2001

Appasani K, Thaler DS, Goldberg EB: Bacteriophage T4 gp2 interferes with cell viability and with bacteriophage lambda Red recombination. J Bacteriol 1999, 181: 1352-1355.

Wang GR, Vianelli A, Goldberg EB: Bacteriophage T4 self-assembly: in vitro reconstitution of recombinant gp2 into infectious phage. J Bacteriol 2000, 182: 672-679. 10.1128/JB.182.3.672-679.2000

Arisaka F: Assembly and infection process of bacteriophage T4. Chaos (Woodbury, NY) 2005, 15: 047502. 10.1063/1.2142136

Baumann RG, Black LW: Isolation and characterization of T4 bacteriophage gp17 terminase, a large subunit multimer with enhanced ATPase activity. J Biol Chem 2003, 278: 4618-4627. 10.1074/jbc.M208574200

Kanamaru S, Kondabagil K, Rossmann MG, Rao VB: The functional domains of bacteriophage t4 terminase. J Biol Chem 2004, 279: 40795-40801. 10.1074/jbc.M403647200

Lehman SM, Kropinski AM, Castle AJ, Svircev AM: Complete genome of the broad-host-range Erwinia amylovora phage phiEa21-4 and its relationship to Salmonella phage Felix O1. Appl Environ Microbiol 2009, 75: 2139-2147. 10.1128/AEM.02352-08

Whichard JM, Weigt LA, Borris DJ, Li LL, Zhang Q, Kapur V, Pierson FW, Lingohr EJ, She Y-M, Kropinski AM, Sriranganathan N: Complete genomic sequence of bacteriophage Felix O1. Viruses 2010, 2: 710-730. 10.3390/v2030710

Friedrich NC, Torrents E, Gibb EA, Sahlin M, Sjöberg B-M, Edgell DR: Insertion of a homing endonuclease creates a genes-in-pieces ribonucleotide reductase that retains function. Proc Natl Acad Sci USA 2007, 104: 6176-6181. 10.1073/pnas.0609915104

Derbyshire V, Belfort M: Commentary lightning strikes twice: intron - intein coincidence. Proc Natl Acad Sci USA 1998, 95: 1356-1357. 10.1073/pnas.95.4.1356

Raghavan R, Minnick MF: Group I introns and inteins: disparate origins but convergent parasitic strategies. J Bacteriol 2009, 191: 6193-6202. 10.1128/JB.00675-09

Sandegren L, Sjöberg B-M: Distribution, sequence homology, and homing of group I introns among T-even-like bacteriophages: evidence for recent transfer of old introns. J Biol Chem 2004, 279: 22218-22227. 10.1074/jbc.M400929200

Sofia HJ, Chen G, Hetzler BG, Reyes-Spindola JF, Miller NE: Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods. Nucleic Acids Res 2001, 29: 1097-1106. 10.1093/nar/29.5.1097

Wang SC, Frey PA: S-adenosylmethionine as an oxidant: the radical SAM superfamily. Trends Biochem Sci 2007, 32: 101-110. 10.1016/j.tibs.2007.01.002

Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, et al.: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 2008, 36: W465-W469. 10.1093/nar/gkn180

Young I, Wang I, Roof WD: Phages will out: strategies of host cell lysis. Trends Microbiol 2000, 8: 120-128. 10.1016/S0966-842X(00)01705-4

Young R: Bacteriophage lysis: mechanism and regulation. Microbiol Rev 1992, 56: 430-481.

Young R, Bläsi U: Holins: form and function in bacteriophage lysis. FEMS Microbiol Rev 1995, 17: 191-205. 10.1016/0168-6445(94)00079-4

Ramanculov E, Young R: Genetic analysis of the T4 holin: timing and topology. Gene 2001, 265: 25-36. 10.1016/S0378-1119(01)00365-1

Deli A, Koutsioulis D, Fadouloglou VE, Spiliotopoulou P, Balomenou S, Arnaouteli S, Tzanodaskalaki M, Mavromatis K, Kokkinidis M, Bouriotis V: LmbE proteins from Bacillus cereus are de-N-acetylases with broad substrate specificity and are highly similar to proteins in Bacillus anthracis. FEBS J 2010, 277: 2740-2753. 10.1111/j.1742-4658.2010.07691.x

Iwashita S, Kanegasaki S: Deacetylation reaction catalyzed by Salmonella phage c341 and its baseplate parts. J Biol Chem 1976, 251: 5361-5365.

Kwiatkowski B, Beilharz H, Stirm S: Disruption of Vi bacteriophage III and localization of its deacetylase activity. J Gen Virol 1975, 29: 267-280. 10.1099/0022-1317-29-3-267

Roman E, Roberts I, Lidholt K, Kusche-Gullberg M: Overexpression of UDP-glucose dehydrogenase in Escherichia coli results in decreased biosynthesis of K5 polysaccharide. Biochem J 2003, 374: 767-772. 10.1042/BJ20030365


During the last decade, developments in DNA sequencing technology have led to a surge in the number of eukaryotic genomes being published. The bulk of these genomes belong to animals, plants, and fungi, while single-celled eukaryotes (protists) remain largely absent 1 . This is unfortunate as protists have an enormous diversity of cellular morphologies, physiology, and genetics, possibly even more so than their multicellular relatives 2 . Although there has been a recent increase in the number of available protist genomes e.g. 3,4,5 , some groups are still completely devoid of any genomic information 6 .

One protist group, of which we have no genomic information is the green algal order Dasycladales whose species have a very characteristic cellular morphology. Despite being unicellular, and having only a single nucleus, some species can grow to more than 10 cm in length 7 . Acetabularia acetabulum is the most studied species of Dasycladales. This umbrella-looking organism is elongated in an apical-basal direction with the root-like rhizoid in the basal end and a disc-shaped cap in the apical end, separated by a long stalk 7,8 .

The size and highly elaborate cellular morphology, together with a large and distinct nucleus, made Acetabularia an attractive model system for studies of cell biology and genetics. Already in the 1930s, Joachim Hämmerling used Acetabularia to prove that cellular morphogenesis was influenced by so-called “morphogenetic substances” (later confirmed to be RNA) which were produced by the nucleus and distributed to the rest of the cell 9 . By transplanting and exchanging the apical and basal parts between A. acetabulum and A. crenulata, he observed that the cap developed into the morphology of the basal donor, demonstrating that the nucleus-containing rhizoid was in control of the morphogenetic fate of the cell 10,11 .

Despite its popularity and importance in early cell biology and genetics, the interest in A. acetabulum and its sister species dropped towards the end of the 1990s. As of yet, no attempt has been reported to sequence and assemble the genome of A. acetabulum, or any other dasycladalean species. The lack of Dasycladalean genomes, and protist genomes in general, can to a large extent be explained by the challenge of obtaining sufficient levels of genomic DNA required for sequencing. The A. acetabulum cell has a life cycle of 3 months when cultivated in a highly nutritious media 12 and cultures cannot be grown densely (maximum 25 algae in 50 ml) 11 . Typical library preparation protocols for whole genome sequencing depend on several hundred nanograms of input DNA, which equates to thousands of A. acetabulum individuals for a single sequencing sample. Considering the potentially enormous size of the A. acetabulum genome, with the diploid nuclear genome estimated to be 1.85 pg (ca. 1.8 Gb) based on flow-cytometry measurements 13 , this further increases the demand for DNA input.

In order to solve the challenges of limited DNA material, several methods for amplification of genomic DNA have been developed. The earliest whole genome amplification (WGA) methods were based on short-length PCR amplifications with random or degenerate primers 14,15 . These methods often recovered only small fractions of the genomes and were hugely influenced by biases introduced by PCR amplification 16,17 . The most promising development in WGA has been the use of multiple displacement amplification (MDA). This method utilizes the phi29 polymerase which copies DNA with high fidelity and incorporates more than 70,000 nucleotides without falling off the template, resulting in large stretches of amplified DNA 16,18 . However, there are several challenges associated with the phi29-based MDA method. First, as the MDA method relies on random priming, the priming and amplification do not distinguish between target and possible contaminant DNA in the sample. De novo assemblies can therefore be challenging if databases lack target genomes or contaminant sequences 19 . Second, and again like PCR-based amplification methods, MDA is also prone to amplification bias. Observations made on bacterial genomes amplified by MDA have shown that certain genomic regions seem to be more readily amplified than others, creating highly uneven coverage across the genome 19,20,21,22 . MDA-generated data therefore rarely results in full genome recovery. López-Escardó et al. 23 used MDA to amplify the genome of three cells of the protist Monosiga brevicollis and showed highly uneven coverage and a genome recovery of 6–36% from each cell when mapping to a reference assembly, and Mangot et al. 24 recovered about 20% of the genome when assembling cells of the protist group MAST-4. However, both studies highlighted the importance of amplifying the DNA from several cells, as this greatly increased recovery 23,24 .

A promising method to reduce bias associated with MDA, and thereby increase genomic coverage, is to divide the amplification reaction into nano-sized droplets, a method referred to as droplet MDA (dMDA) 25,26,27 . The idea behind dMDA is to isolate the target DNA fragments into tiny droplets and thereby reducing the competition of encountering a polymerase, leading to a more uniform amplification and overall improved genome coverage. Marcy et al. 25 tested the effect of droplet MDA by detection of 10 gene loci from 14 dMDA and 12 standard MDA reactions of E. coli cells and found that all 10 loci were found in all 14 dMDA samples, but that several loci were missing from multiple standard MDA samples. In addition, samples generated with dMDA displayed a much more uniform amplification (measured by copies of loci/ul) than the samples generated by standard MDA. Likewise, the genome recovery from sequencing E. coli cells was increased from 59% with standard MDA to 89% using dMDA 28 .

The goal of the present study was to genome sequence and de novo assemble the genome of A. acetabulum. To obtain sufficient genomic DNA for sequencing we have amplified DNA from single embryonic cells using dMDA. We present an assessment of the sequencing data produced by single-cell dMDA, and its usefulness for assembly of large eukaryotic genomes. In addition, we compare three different assembly strategies assembling each single-cell dMDA library separately, assembling these individual assemblies into a meta-assembly, and assembling all the sequencing libraries combined (co-assembly). This study is among the first to use single-cell dMDA for sequencing and de novo assembly of a eukaryote genome and should serve as a useful reference for future attempts to sequence species that are difficult to cultivate or collected from the environment.

Gene Editing Now and in the Future

Over the last ten decades, the world has seen some of the most awe-inspiring and thought-provoking scientific and technological advancements and innovations in diverse sectors of the global economy. One of these technological advancements is CRISPR. It is a medical technology poised to change the world. Maybe not immediately, but in a few years, the concept of CRISPR could be commonplace.

CRISPR is an acronym for Clustered Regularly Interspaced Short Palindromic Repeats and a shorthand for CRISPR-Cas9, and it is a revolutionary technological advancement in medicine that can be used to alter or modify the genes of an organism. The protein Cas9 is simply an enzyme that functions as a pair of molecular scissors, with the ability to cut strands of DNA.

Scientists use CRISPR as a medium of discovering a specific bit of DNA inside a given cell. It helps researchers to change DNA sequences easily and alter gene function. The intriguing technology is packed with numerous potentials and is being applied in various ways for different purposes including gene defects correction, treating and curbing the widespread of infectious diseases, and improving life development.

Right before the novel technology was discovered and named by Francis Mojica, a Spanish microbiologist, scientists had ways of modifying the genes of both plants and animals as well as their functions. However, these methods took long years and required tons of money to achieve the objective. But with the invention of CRISPR, genome editing became a lot cheaper and easier. Currently, CRISPR is widely applied across the globe for several scientific research and sooner than later, CRISPR may be a part of practically everything we see from animals in farms, to garden plants.

Crispr-Cas9. By National Human Genome Research Institute (NHGRI) from Bethesda, MD, USA - CRISPR-Cas9 Editing of the Genome, CC BY 2.0,


CRISPR is a powerful tool widely used in the laboratory by many scientists. Some common and unusual applications of this tool that pose to impact and change different industries include:

Gene Editing

This is the most popular application of CRISPR. It is more or less the primary use of CRISPR. Gene editing in medicine has taken the spotlight with a new turn as CRISPR technology is poised to redefine the concept of genome editing. The technology is rapidly accelerating to peak levels of adoption in the medical industry. CRISPR technology in gene editing possesses tremendous potentials to detect and cure illnesses as well as prevent them, asides from researching underlying diseases in humans, and discovering novel ways to treat them.

Green Fuels

In some parts of the globe, CRISPR is now being applied outside medical research and treatment of diseases. It has been discovered that gene editing could facilitate the production of greener fuels – alternative fuels (biofuels) by algae. Before CRISPR technology was invented, algae produced lower levels of fat that weren't enough to aid the mass production of biodiesel available for use economically. However, with the gene-editing tool, gene modifications can be done to make algae produce just the right amount of fat needed to produce enough biodiesel. In the near future, oil companies could be producing about 25,000 algae alternative fuels daily.

Pet Breeding

CRISPR technology could be the next most sought-after technological innovation by pet owners seeking novel technologies to help and improve the quality of life for their pet animals. For example, gene editing tools are considered a trusted way to eradicate genetic diseases that are typically seen in purebred dogs. Dalmatians, a breed of distinctively spotted dogs, generally have a genetic mutation that makes them vulnerable to kidney stones. CRISPR technology can be used to edit their genes to reduce the percentage of Dalmatians with kidney stones.

Allergy-Free Food

Many individuals avoid certain foods not because they wouldn't love to have them, but because of the allergies that they get from eating or touching these foods. Food allergy affects a considerably larger percentage of the world's population and can be life-threatening with dire health complications. However, it is interesting to know that with CRISPR technology, allergy-free foods can be made. For example, CRISPR is being used by a research group in Netherland to modify the DNA of wheat to remove the gluten present. This to enable gluten-sensitive individuals– celiacs eat wheat.

Pest Control

Pests can be a great deal in some ecosystems. They could spread infectious diseases in plants and animals or invade a particular population. CRISPR technology researchers have discovered ways by which the genes of invasive pests can be modified to control them. For example, malaria-causing mosquitoes can be curbed with a gene modification that prevents a carrier from laying eggs. This would drastically reduce the widespread of the parasite.


Is not it mind-boggling to think that already extinct animals can be brought back by CRISPR technology? There are some animals we only got to see in books and eons-old documentaries. We didn't get to see them on earth as they lived and went into extinction millions of years ago. However, scientists keep discovering the most jaw-dropping and awe-inspiring technologies. Apparently, CRISPR has allowed scientists to discover ways to bring back extinct animals. Using CRISPR technology, scientists seek to introduce the genes of extinct animals in living relatives after which it is bred for generations until offspring match the DNA of extinct species.

Therapeutic Applications

Several pharmaceutical companies, both those just starting up as well as the reputably established ones are rapidly racing to adapt CRISPR technology and create CRISPR-based therapeutics including tissue regeneration, cancer immunotherapy, and gene therapy. Unlike a good number of gene modification tools the industry has seen, CRISPR poses a far less expensive, faster, and potentially safer for gene editing. It is highly promising for the treatment of diseases and creating disease-resistant genes.

source: flickr


As scientists continue with further research and delve deeper into exploring the endless possibilities of CRISPR, and less criticism from medical communities, it is safe to believe that the novel technology already occupies a significant part in the future of genome editing. CRISPR has the propensity to revolutionize and completely change the biological mechanism behind infectious diseases and gene defects, hence, postulating new theories and creating helpful therapies that will promote and enhance the development of life sciences in general.

Genome-wide identification, classification, evolutionary analysis and gene expression patterns of the protein kinase gene family in wheat and Aegilops tauschii

In this study we systematically identified and classified PKs in Triticum aestivum, Triticum urartu and Aegilops tauschii. Domain distribution and exon-intron structure analyses of PKs were performed, and we found conserved exon-intron structures within the exon phases in the kinase domain. Collinearity events were determined, and we identified various T. aestivum PKs from polyploidizations and tandem duplication events. Global expression pattern analysis of T. aestivum PKs revealed that some PKs might participate in the signaling pathways of stress response and developmental processes. QRT-PCR of 15 selected PKs were performed under drought treatment and with infection of Fusarium graminearum to validate the prediction of microarray. The protein kinase (PK) gene superfamily is one of the largest families in plants and participates in various plant processes, including growth, development, and stress response. To better understand wheat PKs, we conducted genome-wide identification, classification, evolutionary analysis and expression profiles of wheat and Ae. tauschii PKs. We identified 3269, 1213 and 1448 typical PK genes in T. aestivum, T. urartu and Ae. tauschii, respectively, and classified them into major groups and subfamilies. Domain distributions and gene structures were analyzed and visualized. Some conserved intron-exon structures within the conserved kinase domain were found in T. aestivum, T. urartu and Ae. tauschii, as well as the primitive land plants Selaginella moellendorffii and Physcomitrella patens, revealing the important roles and conserved evolutionary history of these PKs. We analyzed the collinearity events of T. aestivum PKs and identified PKs from polyploidizations and tandem duplication events. Global expression pattern analysis of T. aestivum PKs revealed tissue-specific and stress-specific expression profiles, hinting that some wheat PKs may regulate abiotic and biotic stress response signaling pathways. QRT-PCR of 15 selected PKs were performed under drought treatment and with infection of F. graminearum to validate the prediction of microarray. Our results will provide the foundational information for further studies on the molecular functions of wheat PKs.

Keywords: Collinearity events, Conserved exon–intron structures, Expression pattern Wheat protein kinase family,.

Watch the video: Γνωριμία με το συκώτι (August 2022).