Publicly available resources for learning metagenomics

Publicly available resources for learning metagenomics

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

We are starting a metagenomics project in our research group to study microbiota in the respiratory tract. Since the are no books yet about metagenomics, seems reading some reviews and online tutorials is the only alternative.

Could you suggest me essential papers, websites or other resources useful for my purpose.


The Coursera Bioinformatic Methods courses include classes on metagenomics and, more importantly, tools to use when applying these methods. I have not taken this particular course, but Coursera is generally very good with getting hands-on experience using computational tools. A course-like setting may also be conducive to group learning.

In the paper here compared lots of different tools for the different parts of the anlysis in metagenomics. It is a comprehensive study in which several labs participated.

You need to take into account that not necessarily the most popular tools are the best. And in the end it will depend on what you want to and how you do it.

Frontiers in Genetics

The editor and reviewers' affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review.


Catalogs of data portals and aggregators

While you can find separate portals that collect datasets on various topics, there are large dataset aggregators and catalogs that mainly do two things:

1. Provide links to other specific data portals. The examples of such catalogs are DataPortals and OpenDataSoft described below. The service doesn’t directly provide access to data. Instead, it allows users to browse existing portals with datasets on the map and then use those portals to drill down to the desirable datasets.

2. Aggregate datasets from various providers. This allows users to find health, population, energy, education, and many more datasets from open providers in one place – convenient.

Let’s have a look at the most popular representatives of this group.

DataPortals: meta-database with 524 data portals

This website’s domain name says it all. DataPortals has links to 588 data portals around the globe.

Data sources are listed alphabetically based on a city or region. Each portal is briefly described with tags (level regional/local, national, EU-official, Berlin, OSM, finance, etc.)

Users can contribute to the meta-database, whether a contribution entails adding a new feature and data portal, reporting a bug on GitHub, or joining the project team as an editor.

OpenDataSoft: a map with more than 2600 data portals

The open data portals register by OpenDataSoft is impressive – the company team has gathered more than 2600 of them. The homepage contains a zoomable interactive map, allowing users to search for data from organizations located in a region of interest.

You can also visit this page to browse sources in the listing, which are grouped by countries, dataset issuers, dataset names, themes, or typology (public sector or national level).

OpenDataSoft provides data management services by building data portals. With its platform, clients publish, maintain, process, and analyze their data.

Those who want to add their portal to the registry need to submit a form.

Knoema: home to nearly 3.2-billion time series data of 1040 topics from more than 1200 sources

This search engine was specifically designed for numeric data with limited metadata – the type of data specialists need for their machine learning projects. Knoema has the biggest collection of publicly available data and statistics on the web, its representatives state. Users have access to nearly 3.2-billion time series data of 1040 topics obtained from more than 1200 sources, the information is updated daily.

Knoema offers several efficient data exploration options:

  • a search panel on the homepage,
  • the World Data Atlas with datasets clustered by countries, sources, indicators, as well as other data like commodities’ value change or county groups, and
  • the Data Bulletin section with the latest releases of new datasets and updates of existing sources.

Datasets are also listed in alphabetical order.

Data exploration options on Knoema

Data scientists can study data online in tables and charts, download it as a CSV or Excel file, or export it as a visualization. Besides, Knoema users can access data via API. Supported languages are Python, C#, and R the JSON format and SDMX – the standard for exchanging statistical data and metadata – are also supported.

However, the export isn’t free and available for users with professional or enterprise plans.


Electronic supplementary material is available online at

Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.


2014 Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis . Nature 514, 494-497. (doi:10.1038/nature13591) Crossref, PubMed, ISI, Google Scholar

2015 Eighteenth-century genomes show that mixed infections were common at time of peak tuberculosis in Europe . Nat. Commun. 6, 6717. (doi:10.1038/ncomms7717) Crossref, PubMed, ISI, Google Scholar

2019 British red squirrels remain the only known wild rodent host for leprosy bacilli . Front. Vet. Sci. 6, 8. (doi:10.3389/fvets.2019.00008) Crossref, PubMed, ISI, Google Scholar

2018 Ancient genomes reveal a high diversity of Mycobacterium leprae in medieval Europe . PLoS Pathog. 14, e1006997. (doi:10.1371/journal.ppat.1006997) Crossref, PubMed, ISI, Google Scholar

Zhou Z, Alikhan N-F, Mohamed K, Fan Y, Group AS, Achtman M

. 2020 The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity . Genome Res. 30, 138-152. (doi:10.1101/gr.251678.119) Crossref, PubMed, ISI, Google Scholar

2011 A draft genome of Yersinia pestis from victims of the Black Death . Nature 478, 506-510. (doi:10.1038/nature10549) Crossref, PubMed, ISI, Google Scholar

2015 Early divergent strains of Yersinia pestis in Eurasia 5000 years ago . Cell 163, 571-582. (doi:10.1016/j.cell.2015.10.009) Crossref, PubMed, ISI, Google Scholar

2018 137 Ancient human genomes from across the Eurasian steppes . Nature 557, 369-374. (doi:10.1038/s41586-018-0094-2) Crossref, PubMed, ISI, Google Scholar

2019 Ancient Yersinia pestis genomes from across Western Europe reveal early diversification during the First Pandemic (541–750) . Proc. Natl Acad. Sci. USA 116, 12 363-12 372. (doi:10.1073/pnas.1820447116) Crossref, ISI, Google Scholar

2019 Phylogeography of the second plague pandemic revealed through analysis of historical Yersinia pestis genomes . Nat. Commun. 10, 4470. (doi:10.1038/s41467-019-12154-0) Crossref, PubMed, ISI, Google Scholar

2018 Pan-genome analysis of ancient and modern Salmonella enterica demonstrates genomic stability of the invasive Para C Lineage for millennia . Curr. Biol. 28, 2420-2428. (doi:10.1016/j.cub.2018.05.058) Crossref, PubMed, ISI, Google Scholar

2018 Salmonella enterica genomes from victims of a major sixteenth-century epidemic in Mexico . Nat. Ecol. Evol. 2, 520-528. (doi:10.1038/s41559-017-0446-6) Crossref, PubMed, ISI, Google Scholar

2020 Emergence of human-specific Salmonella enterica is linked to the Neolithization process . Nat. Ecol. Evol. 4, 324-333. (doi:10.1038/s41559-020-1106-9) Crossref, PubMed, ISI, Google Scholar

. 2014 Oral health and its implications in late Pleistocene Western Eurasian humans . PhD thesis, St. Louis, MO : Washington University . Google Scholar

. 2010 Oral health and frailty in the medieval English cemetery of St Mary Graces . Am. J. Phys. Anthropol. 142, 341-354. (doi:10.1002/ajpa.21228) Crossref, PubMed, ISI, Google Scholar

. 2019 A sub-continent of caries: prevalence and severity in early Holocene through recent Africans . Dental Anthropol. 32, 22-29. (doi:10.26575/daj.v32i2.285) Crossref, Google Scholar

Towle I, Irish JD, De Groote I, Fernée C

. 2019 Dental caries in human evolution: frequency of carious lesions in South African fossil hominins. BioRxiv 597385. (doi:10.1101/597385) Google Scholar

2013 Sequencing ancient calcified dental plaque shows changes in oral microbiota with dietary shifts of the Neolithic and Industrial revolutions . Nat. Genet. 45, 450-455. (doi:10.1038/ng.2536) Crossref, PubMed, ISI, Google Scholar

2014 Pathogens and host immunity in the ancient human oral cavity . Nat. Genet. 46, 336-344. (doi:10.1038/ng.2906) Crossref, PubMed, ISI, Google Scholar

Warinner C, Speller C, Collins MJ

. 2015 A new era in palaeomicrobiology: prospects for ancient dental calculus as a long-term record of the human oral microbiome . Phil. Trans. R. Soc. B 370, 20130376. (doi:10.1098/rstb.2013.0376) Link, ISI, Google Scholar

2019 Microbial differences between dental plaque and historic dental calculus are related to oral biofilm maturation stage . Microbiome 7, 102. (doi:10.1186/s40168-019-0717-3) Crossref, PubMed, ISI, Google Scholar

2014 A robust SNP barcode for typing Mycobacterium tuberculosis complex strains . Nat. Commun. 5, 4812. (doi:10.1038/ncomms5812) Crossref, PubMed, ISI, Google Scholar

. 2016 How old are bacterial pathogens? Proc. Biol. Sci. 283, 1836. (doi:10.1098/rspb.2016.0990) ISI, Google Scholar

Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M

. 2018 A genomic overview of the population structure of Salmonella . PLoS Genet. 14, e1007261. (doi:10.1371/journal.pgen.1007261) Crossref, PubMed, ISI, Google Scholar

. 2005 Periodontal microbial ecology . Periodontology 38, 135-187. (doi:10.1111/j.1600-0757.2005.00107.x) Crossref, ISI, Google Scholar

Socransky SS, Haffajee AD, Cugini MA, Smith C, Kent RL

. 1998 Microbial complexes in subgingival plaque . J. Clin. Periodontol. 25, 134-144. (doi:10.1111/j.1600-051X.1998.tb02419.x) Crossref, PubMed, ISI, Google Scholar

2019 Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle . Cell 176, 649-662. (doi:10.1016/j.cell.2019.01.001) Crossref, PubMed, ISI, Google Scholar

Nayfach S, Shi ZJ, Seshadri R, Pollard KS, Kyrpides NC

. 2019 New insights from uncultivated genomes of the global human gut microbiome . Nature 568, 505-510. (doi:10.1038/s41586-019-1058-x) Crossref, PubMed, ISI, Google Scholar

Velsko IM, Frantz LAF, Herbig A, Larson G, Warinner C

. 2018 Selection of appropriate metagenome taxonomic classifiers for ancient microbiome research . mSystems 3, e00080-18. (doi:10.1128/mSystems.00080-18) Crossref, PubMed, ISI, Google Scholar

2017 Comprehensive benchmarking and ensemble approaches for metagenomic classifiers . Genome Biol. 18, 182. (doi:10.1186/s13059-017-1299-7) Crossref, PubMed, ISI, Google Scholar

2017 Critical assessment of metagenome interpretation—a benchmark of computational metagenomics software . Nat. Methods 14, 1063-1071. (doi:10.1038/nmeth.4458) Crossref, PubMed, ISI, Google Scholar

Zhou Z, Luhmann N, Alikhan N-F, Quince C, Achtman M

. 2018 Accurate reconstruction of microbial strains from metagenomic sequencing using representative reference genomes . In RECOMB 2018 , pp. 225-240. Cham, Switzerland : Springer . Google Scholar

Cribdon B, Ware R, Smith O, Gaffney V, Allaby RG.

2020 PIA: more accurate taxonomic assignment of metagenomic data demonstrated on sedaDNA from the North Sea . Front. Ecol. Evol. 8, 84. (doi:10.3389/fevo.2020.00084) Crossref, ISI, Google Scholar

Konstantinidis KT, Tiedje JM

. 2005 Genomic insights that advance the species definition for prokaryotes . Proc. Natl Acad. Sci. USA 102, 2567-2572. (doi:10.1073/pnas.0409727102) Crossref, PubMed, ISI, Google Scholar

Jain C, Rodriguez R, Phillippy AM, Konstantinidis KT, Aluru S

. 2018 High throughput ANI analysis of 90 K prokaryotic genomes reveals clear species boundaries . Nat. Commun. 9, 5114. (doi:10.1038/s41467-018-07641-9) Crossref, PubMed, ISI, Google Scholar

. 2018 Minimap2: pairwise alignment for nucleotide sequences . Bioinformatics 34, 3094-3100. (doi:10.1093/bioinformatics/bty191) Crossref, PubMed, ISI, Google Scholar

. 2014 RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies . Bioinformatics 30, 1312-1313. (doi:10.1093/bioinformatics/btu033) Crossref, PubMed, ISI, Google Scholar

Zhou Z, Alikhan N-F, Sergeant MJ, Luhmann N, Vaz C, Francisco AP, Carrico JA, Achtman M

. 2018 GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens . Genome Res. 28, 1395-1404. (doi:10.1101/gr.232397.117) Crossref, PubMed, ISI, Google Scholar

Abranches J, Zeng L, Kajfasz JK, Palmer SR, Chakraborty B, Wen ZT, Richards VP, Brady LJ, Lemos JA

. 2019 Biology of oral streptococci . In Gram positive pathogens (eds

Fischetti VA, Novick RP, Ferretti JJ, Portnoy DA, Braunstein M, Rood JI

), pp. 426-434. Washington, DC : ASM Press . Crossref, Google Scholar

Johansson I, Witkowska E, Kaveh B, Lif HP, Tanner AC

. 2016 The microbiome in populations with a low and high prevalence of caries . J. Dent. Res. 95, 80-86. (doi:10.1177/0022034515609554) Crossref, PubMed, ISI, Google Scholar

. 2015 Longitudinal study of dental caries incidence associated with Streptococcus mutans and Streptococcus sobrinus in patients with intellectual disabilities . BMC Oral Health 15, 102. (doi:10.1186/s12903-015-0087-6) Crossref, PubMed, ISI, Google Scholar

2018 Differential preservation of endogenous human and microbial DNA in dental calculus and dentin . Sci. Rep. 8, 9822. (doi:10.1038/s41598-018-28091-9) Crossref, PubMed, ISI, Google Scholar

2018 Supragingival plaque microbiome ecology and functional potential in the context of health and disease . MBio 9, e01631-18. (doi:10.1128/mBio.01631-18) Crossref, PubMed, ISI, Google Scholar

2015 Dynamic changes in the subgingival microbiome and their potential for diagnosis and prognosis of periodontitis . MBio 6, e01926-14. (doi:10.1128/mbio.01926-14) Crossref, PubMed, ISI, Google Scholar

2012 Deep sequencing of the oral microbiome reveals signatures of periodontal disease . PLoS ONE 7, e37919. (doi:10.1371/journal.pone.0037919) Crossref, PubMed, ISI, Google Scholar

McLean JS, Liu Q, Thompson J, Edlund A, Kelley S

. 2015 Draft genome sequence of ‘Candidatus Bacteroides periocalifornicus,’ a new member of the Bacteriodetes phylum found within the oral microbiome of periodontitis patients . Genome Announc. 3, e01485-15. (doi:10.1128/genomeA.01485-15) Crossref, PubMed, Google Scholar

2015 The oral and gut microbiomes are perturbed in rheumatoid arthritis and partly normalized after treatment . Nat. Med. 21, 895-905. (doi:10.1038/nm.3914) Crossref, PubMed, ISI, Google Scholar

Wang J, Jia Z, Zhang B, Peng L, Zhao F

. 2019 Tracing the accumulation of in vivo human oral microbiota elucidates microbial community dynamics at the gateway to the GI tract . Gut 69, 1355-1356. (doi:10.1136/gutjnl-2019-318977) Crossref, PubMed, Google Scholar

Marotz CA, Sanders JG, Zuniga C, Zaramela LS, Knight R, Zengler K

. 2018 Improving saliva shotgun metagenomics by chemical host DNA depletion . Microbiome 6, 42. (doi:10.1186/s40168-018-0426-3) Crossref, PubMed, ISI, Google Scholar

2017 Metagenomic and metatranscriptomic analysis of saliva reveals disease-associated microbiota in patients with periodontitis and dental caries . NPJ Biofilms Microbiomes 3, 23. (doi:10.1038/s41522-017-0031-4) Crossref, PubMed, ISI, Google Scholar

Lassalle F, Spagnoletti M, Fumagalli M, Shaw L, Dyble M, Walker C, Thomas MG, Bamberg MA, Balloux F

. 2018 Oral microbiomes from hunter-gatherers and traditional farmers reveal shifts in commensal balance and pathogen load linked to diet . Mol. Ecol. 27, 182-195. (doi:10.1111/mec.14435) Crossref, PubMed, ISI, Google Scholar

2017 Circadian oscillations of microbial and functional composition in the human salivary microbiome . DNA Res. 24, 261-270. (doi:10.1093/dnares/dsx001) Crossref, PubMed, ISI, Google Scholar

2019 Transmission of human-associated microbiota along family and social networks . Nat. Microbiol. 4, 964-971. (doi:10.1038/s41564-019-0409-6) Crossref, PubMed, ISI, Google Scholar

2014 Relating the metatranscriptome and metagenome of the human gut . Proc. Natl Acad. Sci. USA 111, E2329-E2338. (doi:10.1073/pnas.1319284111) Crossref, PubMed, ISI, Google Scholar

2017 Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus . Nature 544, 357-361. (doi:10.1038/nature21674) Crossref, PubMed, ISI, Google Scholar

Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, Ginhoux F, Newell EW

. 2019 Dimensionality reduction for visualizing single-cell data using UMAP . Nat. Biotechnol. 37, 38-44. (doi:10.1038/nbt.4314) Crossref, ISI, Google Scholar

Lefort V, Desper R, Gascuel O

. 2015 FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program . Mol. Biol. Evol. 32, 2798-2800. (doi:10.1093/molbev/msv150) Crossref, PubMed, ISI, Google Scholar

. 2000 Probabilities for SV machines . In Advances in large margin classifiers (eds Smola AJ, Bartlett P, Schölkopf B, Schuurmans D). Boston, MA: MIT Press. Google Scholar

Zhou Z, Charlesworth J, Achtman M

. In press.Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res. Google Scholar

. 2015 Solving the etiology of dental caries . Trends Microbiol. 23, 76-82. (doi:10.1016/j.tim.2014.10.010) Crossref, PubMed, ISI, Google Scholar

Richards VP, Alvarez AJ, Luce AR, Bedenbaugh M, Mitchell ML, Burne RA, Nascimento MM

. 2017 Microbiomes of site-specific dental plaques from children with different caries status . Infect. Immun. 85, e00106–17. (doi:10.1128/IAI.00106-17). Crossref, PubMed, ISI, Google Scholar

Bowen WH, Burne RA, Wu H, Koo H

. 2018 Oral biofilms: pathogens, matrix, and polymicrobial interactions in microenvironments . Trends Microbiol. 26, 229-242. (doi:10.1016/j.tim.2017.09.008) Crossref, PubMed, ISI, Google Scholar

. 2018 Are the mutans streptococci still considered relevant to understanding the microbial etiology of dental caries? BMC Oral Health 18, 129. (doi:10.1186/s12903-018-0595-2) Crossref, PubMed, ISI, Google Scholar

Jonsson H, Ginolhac A, Schubert M, Johnson PL, Orlando L

. 2013 mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters . Bioinformatics 29, 1682-1684. (doi:10.1093/bioinformatics/btt193) Crossref, PubMed, ISI, Google Scholar

2013 Evolutionary and population genomics of the cavity causing bacteria Streptococcus mutans . Mol. Biol. Evol. 30, 881-893. (doi:10.1093/molbev/mss278) Crossref, PubMed, ISI, Google Scholar

Fonkou MDM, Dufour J-C, Dubourg G, Raoult D.

2018 Repertoire of bacterial species cultured from the human oral cavity and respiratory tract . Future Microbiol. 13, 1611-1624. (doi:10.2217/fmb-2018-0181) Crossref, PubMed, ISI, Google Scholar

. 2008 Prevalence and distribution of principal periodontal pathogens worldwide . J. Clin. Periodontol. 35, 346-361. (doi:10.1111/j.1600-051X.2008.01280.x) Crossref, PubMed, ISI, Google Scholar

Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM

. 2016 Mash: fast genome and metagenome distance estimation using MinHash . Genome Biol. 17, 132. (doi:10.1186/s13059-016-0997-x) Crossref, PubMed, ISI, Google Scholar

2011 Scikit-learn: machine learning in Python . J. Machine Learning Res. 12, 2825-2830. ISI, Google Scholar

Guyon I, Weston J, Barnhill S, Vapnik V

. 2002 Gene selection for cancer classification using Support Vector Machines . Machine Learning 46, 389-422. (doi:10.1023/A:1012487302797) Crossref, ISI, Google Scholar


A total of 4791 microarrays was grouped into eight tumour entities (four solid tumours with a total of 1958 arrays and four haemic tumours with a total of 2833 arrays). The minimal sample sizes is 177 arrays for probes from CLL patients, the maximal sample size is 1834 arrays for breast cancer tissue (see Table 2 ). The phenotype information on the individual tumour probes is very sparse and is not considered in the following analysis.

Figure 2 shows the SHD for all six combinations of solid tumours (red triangles), all six combinations of haemic tumours (black triangles), and for all 16 haemic-solid combinations (blue triangles) when conditional independence graphs are estimated for each entity and compared by SHD.

SHD in single pathways for comparisons within solid tumours (black), haemic tumours (red) and between group comparisons (blue).

There is no obvious evidence in any pathway that the SHD for a between group (haemic/solid) comparison is larger as the SHD for a within group (haemic/haemic or solid/solid) comparison.

The comparison within solid tumours can be summarized as follows. It holds that the breast-colon comparison (# of arrays: 1834/197) is only distinct for the Wnt signalling pathway (04310). The breast-lung comparison (# of arrays: 1834/386) results for most pathways in a pronounced difference except the AML pathway (05221) and the Mismatch repair pathway (03430). The breast-prostate comparison (# of arrays: 1834/416) shows marginal or non-significant differences for the p53 signalling pathway (04115), the ECM-receptor interaction pathway (04512), the AML pathway (05221), Non-small cell lung cancer pathway (05223), and the Mismatch repair pathway (03430). The colon-lung comparison (# of arrays: 197/386) shows marginal or non-significant differences for the ECM-receptor interaction pathway (04512), the AML pathway (05221), and the Non-small cell lung cancer pathway (05223). The colon-prostate comparison (# of arrays: 197/416) shows marginal or non-significant differences for the p53 signalling pathway (04115), Apoptosis (04210), the ECM receptor interaction pathway (04512), Prostate cancer pathway (05215), the AML pathway (05221), Non-small cell lung cancer pathway (05223), and the mismatch repair pathway (03430). The lung-prostate comparison (# of arrays: 386/416) shows marginal or non-significant differences for ECM-receptor interaction pathway (04512), Non-small cell lung cancer pathway (05223), and the Mismatch repair pathway (03430).

The comparison within haemic tumours can be summarized as follows. The ALL-AML comparison (# of arrays: 916/534) shows for each pathway a distinct conditional correlation structure. The ALL-CLL comparison (# of arrays: 916/177) shows marginal or non-significant differences for all pathway except Cell cycle (04110). The ALL-LYM (# of arrays: 916/331) comparison shows marginal or non-significant differences for p53 signalling (04115), ECM-receptor interaction (04512), Non-small cell lung cancer (05223). The AML-CLL comparison shows marginal or nonsignificant differences for the ECM receptor interaction (04512), Prostate cancer (05215), AML (05221), Non-small cell lung cancer (05223), and mismatch repair (03430). Comparing AML-LYM (# of arrays: 534/331) shows only the Mismatch repair pathway (03430) as marginal significant. The CLL-LYM comparison (# of arrays: 177/331) shows marginal or nonsignificant differences for p53 signalling (04115), ECM-receptor interaction (04512), Colon cancer (05210), Prostate cancer (05215), AML (05221), Non-small cell lung cancer (05223), and Mismatch repair (03430).

Table 6 in the Appendix presents the SHD and P-values for the between groups comparisons. They result in distinctive conditional correlation structures in all pathways for most of the pairs. More than two marginal or non-significant P-values are found in the COL-CLL, COL-LYM, LUN-ALL comparisons (see Table 4 ). No clear evidence for a difference in the COL-CLL comparison is found for p53 signalling (04115), Apoptosis (04210), ECM-receptor interaction (04512), Prostate cancer (05215), AML (05221), Non-small cell lung cancer (05223), mTOR signalling (04150), Base excision repair (03410), Nucleotide excision repair (03420), and Mismatch repair (03430) pathway. No clear evidence for a difference in the COL-LYM comparison is found for p53 signalling (04115), ECM-receptor interaction (04512), Prostate cancer (05215), AML (05221), Non-small cell lung cancer (05223), mTOR signalling (04150), and Mismatch repair (03430). Finally, no clear evidence for a difference in the COL-CLL comparison is found for p53 signalling (04115), Apoptosis (04210), ECM-receptor interaction (04512), Prostate cancer (05215), AML (05221), Non-small cell lung cancer (05223), mTOR signalling (04150), Base excision repair (03410), Nucleotide excision repair (03420), and mismatch repair (03430) pathway.

Table 4.

Number of pathways with no evidence for difference in conditional correlation structure.

Solid TumorsHeamic tumors
Mixed comparison

We use the number of pathways with no evidence for differential conditional correlation structure as a measure for similarity between tumor entities. Table 4 and Figure 3 summarize the situation. Table 5 lists the number of comparisons between and within groups with a permutation P-value above 0.1. The highest ranked pathways with respect to no evidence for a difference are Mismatch repair (03430), Non-small cell lung cancer (05223), AML (05221), ECM-receptor interaction (04512), and p53 signalling (04115) pathway. The pathways Cell cycle (04110) and Wnt signalling (04310) show in all except one comparison a significant difference in conditional correlation structure.

Similarity between tumours in terms of pathways with no evidence for a difference in conditional correlation structure.

Table 5.

Number of comparisons with no evidence (p ge 0.1) for a difference in conditional correlation structure (per pathway).

Pathway KEGG IDTotal 28 comparisonsSolid tumors 6 comparisonsHaemic tumors 6 comparisonsMixed tumors 16 comparisons

The major similarity between the entity pairs are visualized in Figure 3 . Every node represents an entity and the wide of the edges is the number of pathways with no evidence for a difference. Similar pathway structure can be found between ALL-CLL, BREAST-COLON, and COLON-CLL. In the haemic tumour entities there is a noticeable similarity between lymphatic tumours (ALL, CLL, Lymphoma). In the solid tumour entities there is a similarity between tumours (breast, colon, prostate) arising in gland tissues.


Use our subscription databases to find citations and full text articles. To get started, just click on the database that you would like to search. If you're outside the library, you may need to enter your library card number and PIN in order to access the database.


Academic OneFile

Find peer-reviewed, full-text articles from journals in the areas of the physical and social sciences, technology, medicine, engineering, the arts, literature, and more. More than 18,000 peer-reviewed journals and more than 9,200 in full text.

Available To: All Free Library locations and online with your library card.

  • Art
  • Homework Help & Study Aids
  • Literature
  • Medical
  • Newspapers, Magazines, & Journals
  • General Research
  • Science
  • Social Science
  • Technology
  • Adults
  • Teens

Academic Search Main Edition

Provides full-text and peer-reviewed journals essential for undergraduate & graduate studies. Subjects include biology, chemistry, engineering, physics, psychology, & religion from over 1,800 full-text and more than 1,200 full-text peer-reviewed journals.

Available To: All Free Library locations and online with your library card.

Access PA

An access portal to a suit of resources provided by the Commonwealth of Pennsylvania

Available To: All Free Library locations and online with your library card.

Access World News

Explore events and issues at the local, national, and international levels from over 9,000 news sources. Includes the Philadelphia Tribune, Philadelphia Magazine, and 30 other Philadelphia-area news sources. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

American Broadsides and Ephemera

Rare printed documents that illuminate the history and culture of earlier Americans. Find over 30,000 searchable images of printed items including confessions, menus, playbills, music programs, and more from 1760-1900. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

American Song

Listen to over 50,000 recordings from America's past songs by and about American Indians, miners, immigrants, slaves, children, pioneers, and cowboys songs of Civil Rights, Prohibition, the Civil War, and more.

Available To: All Free Library locations and online with your library card.

American State Papers, 1789-1838

The essential record of the first decade of the United States. Browse legislative and executive documents of the first 14 U.S. Congresses and find information on the major events that shaped the Early Republic. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

America's Historical Imprints

Explore centuries of history through books, pamphlets, & other material on culture and daily life. Includes: American Broadsides and Ephemera and Early American Imprints, Series I: Evans & Series II: Shaw-Shoemaker. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

America's Historical Newspapers (formerly Early American Newspapers, Series I 1690-1876)

Search early American newspapers from all 50 states published across three centuries. Find cover-to-cover reproductions of over 750 fully searchable newspapers from 1690 to the recent past. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

America's Obituaries & Death Notices

A collection of newspaper obituaries and death notices from around the country.

Available To: All Free Library locations and online with your library card.

AP Images (formerly AccuNet/AP Multimedia Archive)

Current photos & a selection of pictures/images from the 50 million print & negative library of photographs at the Associated Press. Search for a picture/image of your favorite celebrity.

Available To: All Free Library locations and online with your library card.

Archive of Americana

The comprehensive historical collections that form the Archive of Americana contain books, pamphlets, newspapers, government documents and ephemera.

Available To: All Free Library locations and online with your library card.


Archive-It is a web archiving service that collects and stores the websites of Philadelphia government agencies and selected cultural institutions. You can search or browse the collection to view these sites as they existed in months or years past.

Available To: All Free Library locations and online with your library card.

Art Full Text

A comprehensive resource for art information featuring full-text articles dating back to 1995, high-quality indexing and abstracting dating as far back as 1984 as well as abstracting of over 13,000 art dissertations.

Available To: All Free Library locations and online with your library card.

Art Index Retrospective 1920-1984

Bibliographic database that cumulates citations to Art Index volumes 1-32 of the printed index published between 1929-1984.

Available To: All Free Library locations and online with your library card.


Extensive information about over 120,000 artists from around the world.

Available To: Parkway Central Library only. Please visit the Art Department for access.

Audiobooks from Overdrive

Download audiobooks from Overdrive to your PC, Mac, or mobile device. Get the Libby app to borrow on the go. Watch this short video to learn more

Available To: All Free Library locations and online with your library card.

Biography in Context

Comprehensive online biographical reference database containing 414,000 biographies.

Available To: All Free Library locations and online with your library card.

Books & Authors

Books & Authors answers the question: What Do I Read Next? Explore authors, genres, books, and topics to discover books that match your interests. Search over 240,000 fiction and non-fiction librarian-recommended titles.

Available To: All Free Library locations and online with your library card.

  • Biography and Autobiography
  • Body, Mind and Spirit
  • Business and Economics
  • Fiction
  • Literature
  • Science
  • Summer Reading
  • Adults
  • Teens
  • Children
  • New Americans
  • Entrepreneurs
  • Romance
  • Seniors

Business Insights Essentials

Find full-text articles and statistical data about companies and industries for business owners, marketing professionals, and investors. Analyze and compare financial and statistical data with interactive charting tools. 8,900+ full-text journals.

Available To: All Free Library locations and online with your library card.

Business Plans Handbook Online

Search sample business plans from a wide range of businesses

Available To: All Free Library locations and online with your library card.

Career Cruising

Explore careers, discover your dream job, polish your skills and more

Available To: All Free Library locations and online with your library card.

Classical Music Library

A comprehensive database of distinguished classical recordings offered in streaming format. The Classical Music Library also includes anthology playlists linked to the major music history and appreciation textbooks.

Available To: All Free Library locations and online with your library card.


Learn to code for free at Codecademy.

Available To: Unrestricted, free web resource

Contemporary Authors

A bio-bibliographical guide to current writers in fiction, general nonfiction, poetry, journalism, drama, motion pictures, television, and other fields.

Available To: All Free Library locations and online with your library card.

Contemporary World Music

This resource includes contemporary and traditional world music including tracks of reggae, worldbeat, neo-traditional, world fusion, Balkanic jazz, African film, Bollywood, Arab swing and jazz, and other genres.

Available To: All Free Library locations and online with your library card.


A collection of ebooks and resources that teaches kids and teens how they can stay safe on the web.

Available To: All Free Library locations and online with your library card.

D&B Global Business Browser (formally OneSource: D&B Business Browser Pro)

Find company information and market research, including RMA and Datamonitor reports, in-depth company and executive profiles, and detailed financial information. Great for research on international and multinational companies.

Available To: All Free Library locations and online with your library card.

Dictionary of Literary Biography

Biographical and critical essays on the lives, works, and careers of the world's most influential literary figures. Includes more than 16,000 articles and thousands of images.

Available To: All Free Library locations and online with your library card.


Learn Spanish, English, French, German, Portuguese, or Italian

Available To: Unrestricted, free web resource

Early American Imprints, Series I: Evans (1639-1800)

Based on the renowned American Bibliography by Charles Evans. The definitive resource for every aspect of life in 17th- and 18th-century America. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

Early American Imprints, Series II: Shaw-Shoemaker (1801-1819)

Covering every aspect of American life during the early decades of the United States, this rich primary source collection provides full-text access to the 36,000 books, pamphlets and broadsides published from 1801 to 1819. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

EBooks from Overdrive

Available in a variety of formats, here you will find the Library's Overdrive ebook collection. Get the Libby app to borrow on the go. Watch this short video to learn more

Available To: All Free Library locations and online with your library card.

EBooks on EBSCOhost (formerly NetLibrary)

Approximately 17,000 electronic books in all subject areas from careers to classics to Cliffs notes.

Available To: All Free Library locations and online with your library card.

ERIC (Educational Resource Information Center)

Index to education related materials

Available To: All Free Library locations and online with your library card.

Expanded Academic ASAP

Academic periodical database

Available To: All Free Library locations and online with your library card.


Downloadable ebooks for Free Library of Philadelphia library card holders.

Available To: All Free Library locations and online with your library card.

Free Library of Philadelphia Online Public Access Catalog

Catalog of The Free Library of Philadelphia's Collection

Available To: All Free Library locations and online with your library card.

Free Library of Philadelphia's Digital Library Collections

Historical images in digital format.

Available To: All Free Library locations and online with your library card.

Free Library Podcasts

The Free Library Podcast is an easy way to participate in the author events and lectures that take place at the Central Library. Visit Author Events for upcoming events and information.

Available To: Unrestricted, free web resource

Gale eBooks (formerly Gale Virtual Reference Library)

Explore over 500 nonfiction, fully searchable reference titles. Access accurate information on science, literature, history, business, and more. Watch this video to learn more

Available To: All Free Library locations and online with your library card.

Gale Health and Wellness (formerly Health and Wellness Resource Center)

Find accurate, up-to-date information on a full range of health topics, from current diseases and disorders to in-depth coverage of alternative medicine practices. Access full-text medical journals, magazines, reference works, multimedia, and more.

Available To: All Free Library locations and online with your library card.

Gale in Context | Middle School (formerly Research in Context)

Gale in Context | Middle School provides support for middle grade students working on papers and projects. Includes full-text magazine & news articles, multimedia, biographies, and primary sources.

Available To: All Free Library locations and online with your library card.

  • Biography and Autobiography
  • Government Information
  • Homework Help & Study Aids
  • Literature
  • Newspapers, Magazines, & Journals
  • General Research
  • Science
  • Social Science
  • Teens
  • Children
  • History

Gale in Context | Opposing Viewpoints (formerly Opposing Viewpoints in Context)

Examine the many sides of today's hottest social issues. Explore pro/con viewpoints, articles, and infographics. Get to know this valuable resource with a video tutorial.

Available To: All Free Library locations and online with your library card.

Gale Literary Sources (formerly Artemis)

Cross-search all of Gale’s literature databases from a single digital space. Includes Dictionary of Literary Biography, Something About the Author, LitFinder, Literature Criticism Online, and Literature Resource Center.

Available To: All Free Library locations and online with your library card.

Gale OneFile | High School Edition (formerly InfoTrac Student Edition)

Content for high school students from magazines, journals, newspapers, reference books, and rich media (images, audio, video) covering a range of subjects, from science, history, and literature to political science, sports, and environmental studies.

Available To: All Free Library locations and online with your library card.

  • Environment and Nature
  • Homework Help & Study Aids
  • Literature
  • Newspapers, Magazines, & Journals
  • Politics
  • General Research
  • Science
  • Sports & Recreation
  • Adults
  • Teens
  • History

GCF LearnFree

Easy-to-follow tutorials on Microsoft Excel, Word, and Office, as well as Internet Basics, social media, and more. GCF LearnFree also offers free tutorials in math and reading.

Available To: Unrestricted, free web resources.

General OneFile

A broad collection of news articles, magazines, reference books, images, audio, and video that support general interest research and exploration. Offering more than 8,800 full-text titles, with millions of articles available.

Available To: All Free Library locations and online with your library card.

GREENR - Global Reference on the Environment, Energy, and Natural Resources

GREENR is an interdisciplinary resource for environment and sustainability studies and provides news, video, primary source documents, and statistics on energy systems, healthcare, food, climate change, population, economic development, and more.

Available To: All Free Library locations and online with your library card.

Gun Regulation and Legislation in America

This HeinOnline collection brings together more than 550 periodicals, legislative histories, and government documents on gun regulation. It includes an extensive bibliography and a balanced selection of external resources for further research.

Available To: All Free Library locations and online with your library card.

Historical Newspapers - Black Newspapers

Primary source material from ten historic Black newspapers, including the Chicago Defender, The Baltimore Afro-American, New York Amsterdam News, Pittsburgh Courier, Los Angeles Sentinel, Atlanta Daily World, and the Cleveland Call and Post

Available To: All Free Library locations and online with your library card.

HistoryMakers Digital Archive

Explore the nation's largest African American video oral history collection. Access high-quality primary source content, with fully searchable transcripts, from thousands of people from a broad range of backgrounds and experiences.

Available To: All Free Library locations and online with your library card.

Homework Help Online

Live, online tutoring every day from 10 a.m. - midnight for K-12, college, and adults. (Spanish 2 p.m. - 2 a.m.) Upload a writing sample and get feedback. Watch this video tutorial

Available To: All Free Library locations and online with your library card.


Stream movies, TV, music, and audiobooks directly to your computer or mobile device. These tutorials will show you how to get the most out of hoopla

Available To: All Free Library locations and online with your library card.

Informe Academico

Informe Académico proporciona acceso a periódicos y revistas especializadas de lengua española y portuguesa. La base de datos ofrece una amplia gama de contenidos sobre América Latina. (Full-text scholarly journals and magazines in Spanish & Portuguese.)

Available To: All Free Library locations and online with your library card.

Infotrac Newsstand (formerly National Newspaper Index)

Provides access to full-text newspapers and allows users to search articles instantly by title, headline, date, newspaper section, or other fields. The database offers a one-stop source for current news and searchable archive

Available To: All Free Library locations and online with your library card.


Input was a local Philadelphia panel discussion program, airing Sunday mornings from 1968 through early 1971

Available To: All Free Library locations and online with your library card.

Jazz Music Library

Jazz Music Library will be the largest and most comprehensive collection of jazz available online.

Available To: All Free Library locations and online with your library card.


Digital archive to back issues of over 300 journals. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.


Instant access to thousands of critically acclaimed movies, documentaries, and kids favorites. Stream up to four movies per month with your library card. Create an account to get started

Available To: All Free Library agencies and Internet with authentication.

Khan Academy

Free for students young and old, teachers, coaches, home schoolers, and you, Khan Academy is online education for the world

Available To: Unrestricted, free web resource

LearningExpress Library

Practice tests and test preparation study materials for academic and professional tests, including the SAT and GRE. Watch our video tutorial.

Available To: All Free Library locations and online with your library card.

LearningExpress Library - Recursos para Hispanohablantes

Recurso para el desarrollo académico y profesional ofrece el desarrollo de habilidades en matemáticas, ciencias y lectura/escritura para los alumnos de la escuela de edad y adultos, así como pruebas de la práctica para el nuevo GED, SAT, NCLEX-RN, y mas.

Available To: All Free Library locations and online with your library card.

LearningExpress Library Career Center (formerly Job & Career Help Center)

This LearningExpressLibrary module offers sections on business writing, the job search, skills for success and skills for successful interviews.

Available To: All Free Library locations and online with your library card.

LinkedIn Learning (formerly

Learn business, creative, and technology skills to achieve your personal and professional goals.

Available To: All Free Library locations and online with your library card.

Literature Criticism Online

This extensive compilation of literary commentary represents a range of modern and historical views on authors and their works across regions, eras, and genres. Covers Children’s, Classical, Contemporary, Drama, Poetry, Shakespearean, and more.

Available To: All Free Library locations and online with your library card.

Literature Resource Center

Access to biographies, bibliographies and critical analysis of authors from every age and literary discipline

Available To: All Free Library locations and online with your library card.


Find the sonnets of Shakespeare, the poetry of Angelou, presidential speeches, and short stories by Poe. Discover full-text literature from more than 150,000 poems, 7,100 short stories & novels, 3,800 essays, 2,400 speeches, and 1,250 plays.

Available To: All Free Library locations and online with your library card.

Magazines on OverDrive (formerly RBdigital Magazines)

Read and download over 3,000 digital magazines on OverDrive or the Libby app.

Available To: All Free Library locations and online with your library card.

Mango Languages

An online language-learning system for a variety of languages.

Available To: All Free Library locations and online with your library card.


Practice materials for the ASVAB, SAT, and ACT exams. Perfect for students considering a military career.

Available To: All Free Library locations and online with your library card.


Learn about the latest treatments, drugs or supplements define medicals words view videos get the latest research or find out about clinical trials

Available To: All Free Library locations and online with your library card.

Music Online

Find and listen to music from a variety of genres

Available To: All Free Library locations and online with your library card.

New York Times Anywhere

Enjoy an unlimited number of free 72-hour passes to, including historical coverage and international Spanish and Chinese editions. Crossword access is not included. Registration at is required.

Available To: Free Library card holders

New York Times at the Library

Enjoy free access to from all Free Library locations, including historical coverage and international Spanish and Chinese editions. Crossword access is not included. Registration at is required.

Available To: All Free Library locations

NewsBank Hot Topics

Trending news and topics for your next assignment! Covers current events, business & economics, civics, government, & politics, social issues, science, technology & health, sports, arts & literature, and people in the news. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

  • Art
  • Business and Economics
  • Health and Fitness
  • Homework Help & Study Aids
  • Literature
  • Newspapers, Magazines, & Journals
  • Politics
  • General Research
  • Sports & Recreation
  • Technology
  • Adults
  • Teens

NewsBank Main Menu

The main portal to Newsbank Inc.'s databases, providing international, national, local, and historical news coverage.

Available To: All Free Library locations and online with your library card.

Newsbank Selected America's Historical Newspapers

Articles from selected American newspapers covering many topics from 1690-1922.

Available To: All Free Library locations and online with your library card.

Newsbank Special Reports

NewsBank's Special Reports focus on topics of current interest. They include content from sources throughout the world to provide a global perspective, current and background information, statistics, maps, images, websites, and suggested search terms.

Available To: All Free Library locations.

OverDrive Sala de Lectura Electrónica en Español

Descargue libros electrónicos y audiolibros en español para niños, adolescentes, y adultos. (Download ebooks and audiobooks in Spanish for kids, teens, and adults.)

Available To: All Free Library locations and online with your library card.

Overdrive World Languages eReading Room

Download ebooks and audiobooks in Spanish, Chinese, Vietnamese, Russian, French, and other languages. Get the Libby app to borrow on the go. Watch this short video to learn more

Available To: All Free Library locations and online with your library card.

Oxford Music Online (Includes Grove Music Online)

The world's premier authority on all aspects of music.(Formerly Grove Music Online, New Grove Dictionary of Music, and Musicians Online)

Available To: All Free Library locations and online with your library card.

Pennsylvania Historical Newspapers

174 full-text historical Pennsylvania newspapers from 1790-1922. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

Philadelphia Daily News

Philadelphia Daily News in full text from 1/4/1978 – Present (includes sporadic gaps in coverage)

Available To: All Free Library locations and online with your library card.

Philadelphia Evening Telegraph

Find full text issues of this afternoon daily published between 7/1/1864 and 6/30/1871 online here. These and additional years from 6/30/1871 to 6/28/1918 are available on microfilm at Parkway Central Library. Many coverage gaps.

Available To: All Free Library locations and online with your library card.

Philadelphia Inquirer

Philadelphia Inquirer in full text from 1/1/1981 – Present (includes sporadic gaps in coverage)

Available To: All Free Library locations and online with your library card.

Philadelphia Inquirer Digital Archive 1860-2001

Search and browse the full text of the Philadelphia Inquirer as published 1860-2001.

Available To: All Free Library locations and online with your library card.

Philadelphia Inquirer Historical Archive (formerly Civil War Archive)

Full text of the Philadelphia Inquirer from 1829-1922. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

Philadelphia Press Index

Electronic version of a paper index to the Philadelphia Press, one of Philadelphia's oldest newspapers.

Available To: All Free Library locations and online with your library card.

Philadelphia Tribune (1912-2001)

Full access to the oldest continuously published daily Black newspaper in the United States.

Available To: All Free Library locations and online with your library card.

Philadelphia Works Resume Builder

Resume Builder is a simple online tool for creating a resume. It uses a fill-in-the-blanks template, provides job descriptions from O*NET OnLine service, and produces a Microsoft Word document that can be saved and edited.

Available To: Unrestricted, free web resource

Popular Music Library

Find a wide range of popular music from around the world, including pop music, alternative, country, Christian, electronic, hip-hop, metal, punk, new age, R&B, reggae, rock, soundtracks and more.

Available To: All Free Library locations and online with your library card.

POWER Library

Pennsylvania Online World of Electronic Resources

Available To: All Free Library locations and online with your library card.

Readers' Guide Retrospective: 1890-1982

Search more than 3 million articles from 375 leading magazines from 1890 to 1982.

Available To: Parkway Central Library only.

Reference Solutions (Formerly ReferenceUSA)

A powerful tool providing access to in-depth information on businesses and residents. Research companies, locate addresses, phone numbers, conduct market research, and more. Watch this video to learn more.

Available To: All Free Library locations and online with your library card.

Salem Press Reference

Searchable database of reference titles developed by Salem Press

Available To: All Free Library locations and online with your library card.

Sanborn Maps, 1867–1970 (Formerly Sanborn Maps Geo Edition)

Explore America’s building history through over 660,000 black-and-white, large-scale maps, which chart the growth of more than 12,000 towns and cities. Read this blog post to learn more.

Available To: All Free Library locations and online with your library card.

Science Reference Center

Contains full text for hundreds of science encyclopedias, reference books, periodicals, and other reliable sources. In addition, Science Reference Center includes more than 280,000 high-quality science images.

Available To: All Free Library locations and online with your library card.

Serial Set Maps

The more than 70,000 maps in the U.S. Congressional Serial Set include great atlases, small individual maps, triangulation surveys, weekly weather maps, and more. Covers the years of publication 1817-1994. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

Slavery in America and the World: History, Culture, and Law

This HeinOnline collection brings together essential legal materials on slavery in the U.S. and the English-speaking world. It includes nearly 2,000 titles with every statute and case on slavery, and hundreds of historic texts and modern histories.

Available To: All Free Library locations and online with your library card.

Smithsonian Global Sound for Libraries

With an extraordinary array of more than 35,000 individual tracks of streaming music, spoken word, and natural and human-made sounds, users can listen to performances by American folk icons such as Woody Guthrie, Lead Belly, Pete Seeger and many more.

Available To: All Free Library locations and online with your library card.

Something About the Author

Something About the Author examines the lives and works of authors and illustrators for children and young adults and is the preeminent source on literature for young people. Includes over 12,000 entries and nearly 17,000 images.

Available To: All Free Library locations and online with your library card.

Stanford Encyclopedia of Philosophy

The Stanford Encyclopedia of Philosophy is an authoritative, comprehensive Web-based reference work about philosophy, useful to scholars of all levels as well as the general public.

Available To: All Free Library locations and online with your library card.

Student Resources In Context

Primary documents, biographies, topical essays, background information, critical analyses, full-text coverage of 800 magazines, photographs and illustrations, audio and video clips

Available To: All Free Library locations and online with your library card.

Student Search: General Encyclopedia

Search across a range of digital encyclopedias to find information on any topic.

Available To: All Free Library locations and online with your library card.


Listing of full-text journals available electronically at the Free Library. Replaces JournalWebCite.

Available To: All Free Library locations and online with your library card.

U.S. Citizenship Test Prep

Find information about the U.S. Citizenship test as well as study materials and practice tests.

Available To: All Free Library locations and online with your library card.

U.S. Congressional Serial Set (1817-1980)

The bound, sequentially numbered volumes of all the reports, documents, and journals of the U.S. Senate and House of Representatives constitutes a rich source of primary source material on all aspects of American history. *Chrome browser not supported.

Available To: All Free Library locations and online with your library card.

U.S. History In Context

U.S. History In Context provides a complete overview of U.S. history that covers the most-studied events, issues and current information.

Available To: All Free Library locations and online with your library card.


Take world-class courses with students from around the globe at Udacity.

Available To: Unrestricted, free web resources.

Universal Class

Take online continuing education courses in a wide variety of subjects at your own pace from real instructors.

Available To: All Free Library locations and online with your library card.

World History In Context

A multicultural, global resource that moves chronologically from antiquity to the present and geographically around the globe, to ensure that the events, movements and individuals that defined and shaped our world are covered with a sense of balance.

Available To: All Free Library locations and internet with authentication.

Rota, P. A. et al. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science 300, 1394–1399 (2003).

Dawood, F. S. et al. Emergence of a novel swineorigin influenza A (H1N1) virus in humans. N Engl j Med 360, 2605–2615 (2009).

Gao, R. et al. Human infection with a novel avian-origin influenza A (H7N9) virus. New England Journal of Medicine 368, 1888–1897 (2013).

Team, W. E. R. Ebola virus disease in West Africa—the first 9 months of the epidemic and forward projections. N Engl J Med 371, 1481–1495 (2014).

Dunne Jr, W., Westblade, L. & Ford, B. Next-generation and whole-genome sequencing in the diagnostic clinical microbiology laboratory. European journal of clinical microbiology & infectious diseases 31, 1719–1726 (2012).

Bloch, K. C. & Glaser, C. Diagnostic approaches for patients with suspected encephalitis. Current infectious disease reports 9, 315–322 (2007).

Kollef, K. E. et al. Predictors of 30-day mortality and hospital costs in patients with ventilator-associated pneumonia attributed to potentially antibiotic-resistant gram-negative bacteria. CHEST Journal 134, 281–287 (2008).

Yozwiak, N. L. et al. Virus identification in unknown tropical febrile illness cases using deep sequencing. Plos neglected tropical diseases 6, e1485 (2012).

Chiu, C. Y. Viral pathogen discovery. Current opinion in microbiology 16, 468–478 (2013).

Minakshi, P. et al. Complete genome sequence of bluetongue virus serotype 16 of goat origin from India. Journal of Virology 86, 8337–8338 (2012).

Diemer, G. S. & Stedman, K. M. A novel virus genome discovered in an extreme environment suggests recombination between unrelated groups of RNA and DNA viruses. Biol Direct 7, 13–13 (2012).

Daly, G. M. et al. A viral discovery methodology for clinical biopsy samples utilising massively parallel next generation sequencing. Plos one 6, e28879 (2011).

Quail, M. A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics 13, 341 (2012).

Naccache, S. N. et al. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24, 1180–1192, doi: 10.1101/gr.171934.113 (2014).

Ho, T. & Tzanetakis, I. E. Development of a virus detection and discovery pipeline using next generation sequencing. Virology 471–473, 54–60, doi: 10.1016/j.virol.2014.09.019 (2014).

Wang, Q., Jia, P. & Zhao, Z. VirusFinder: software for efficient and accurate detection of viruses and their integration sites in host genomes through next generation sequencing data. Plos one 8, doi: 10.1371/journal.pone.0064465.g001 (2013).

Bhaduri, A., Qu, K., Lee, C. S., Ungewickell, A. & Khavari, P. A. Rapid identification of non-human sequences in high-throughput sequencing datasets. Bioinformatics 28, 1174–1175 (2012).

Kostic, A. D. et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nature biotechnology 29, 393–396 (2011).

Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421, doi: 10.1186/1471-2105-10-421 (2009).

Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res 40, e155, doi: 10.1093/nar/gks678 (2012).

Pickett, B. E. et al. ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic acids research 40, D593–D598 (2012).

Squires, R. B. et al. Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and other respiratory viruses 6, 404–416 (2012).

Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature methods 9, 357–359 (2012).

Zhao, Y., Tang, H. & Ye, Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28, 125–126 (2012).

Akobeng, A. K. Understanding diagnostic tests 3: receiver operating characteristic curves. Acta paediatrica 96, 644–647 (2007).

Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428, doi: 10.1093/bioinformatics/bts174 (2012).

Deng, X. et al. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res 43, e46, doi: 10.1093/nar/gkv002 (2015).

Jiang, C., Schieffelin, J. S., Li, J. & Sun, W. Dengue fever: a new challenge for China? Global health action 7, 26421, doi: 10.3402/gha.v7.26421 (2014).

Lu, R. et al. Complete genome sequence of Middle East respiratory syndrome coronavirus (MERS-CoV) from the first imported MERS-CoV case in China. Genome announcements 3, e00818–00815 (2015).

Kreuze, J. F. et al. Complete viral genome sequence and discovery of novel viruses by deep sequencing of small RNAs: a generic method for diagnosis, discovery and sequencing of viruses. Virology 388, 1–7, doi: 10.1016/j.virol.2009.03.024 (2009).

Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

States, D. J., Gish, W. & Altschul, S. F. Improved sensitivity of nucleic acid database searches using application-specific scoring matrices. Methods 3, 66–70 (1991).

Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772–780, doi: 10.1093/molbev/mst010 (2013).

Schulz, M. H., Zerbino, D. R., Vingron, M. & Birney, E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086–1092 (2012).

Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18, 821–829, doi: 10.1101/gr.074492.107 (2008).

Melicher, D., Torson, A. S., Dworkin, I. & Bowsher, J. H. A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach. BMC genomics 15, 188, doi: 10.1186/1471-2164-15-188 (2014).

Huerta-Cepas, J., Dopazo, J. & Gabaldón, T. ETE: a python Environment for Tree Exploration. BMC bioinformatics 11, 24 (2010).

Freitas, T. A., Li, P. E., Scholz, M. B. & Chain, P. S. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res, doi: 10.1093/nar/gkv180 (2015).

6 Biomedical hypothesis generation

The increasing growth rate of the scientific literature makes it challenging for researchers to stay up-to-date with all relevant research in order to formulate novel research hypotheses in their specific disciplines. The generation of hypotheses, also known as literature-based discovery (LBD), attempts to make novel biomedical discoveries from the literature with computational approaches.

6.1 Task definition

Automated LBD, which uses published articles to discover new biomedical knowledge via generating new hypotheses, is a sub-field of BLM. The goal of hypothesis generation is to detect underlying relations that are not present in the text but instead are inferred by the presence of other explicit relations. In other words, it is a way to identify implicit connections by logically combining explicit facts scattered throughout different studies.

Specifically, hypothesis generation is usually referred to the process of connecting two pieces of knowledge previously regarded as unrelated [138]. For example, it may be known that disease A is caused by chemical B, and that drug C is known to reduce the amount of chemical B in the body. However, because the respective articles were published separately from one another (called 𠆍isjoint data’), the relationship between disease A and drug C may be unknown. Hypothesis generation aims to detect these implicit relations from biomedical articles. Figure 11 presents examples of hypothesis generation by inferring unseen relations.

Examples of hypothesis generation by inferring unseen relations from biomedical articles.

It is worthwhile to mention that biomedical hypothesis generation is different than relational extraction. Relation extraction focuses on extracting relationships between entities that have been explicitly identified in the text, while hypothesis generation attempts to reveal relationships that are unknown yet.

Thilakaratne et al. [8] surveyed the existing computational techniques used in the LBD process with a set of key milestones over a timeline of topics including LBD validation checks, major LBD tools, application areas, domains and generalizability of LBD methodologies. The survey did not mention applications with deep learning on LBD and challenges, which are covered in this paper.

6.2 Problem settings

The core objective of hypothesis generation is to predict a possible relationship between two biomedical terms based on a text corpus [2, 139]. Different from a typical link prediction problem, which are usually either based on triangle closing models [140] or positive semi-definite graph kernels [141], hypothesis generation aims at providing a rationale and evidence in the form of connecting terms. There are two variations of the problem setting, closed discovery and open discovery. The former enables people to perform confirmatory analysis, while the latter is for scenarios that require more exploratory paradigms [142]. As an famous example, consider the queries ‘Is Fish oils and Raynaud’s Disease connected?’ versus ‘What are the therapeutic options for Raynaud’s Disease?’. The first question is a closed discovery problem, where the answer could either YES or NO. If the answer is Yes, the following step should be identifying the evidence that supports this claim. The second question is an open discovery problem. The answer needs to be obtained by exploring all concepts with Raynaud’s disease as potential therapeutic indication. Such questions usually have a ‘grounded’ biomedical concept on one side and a meta-type that defines the characteristics of the possible terms that can appear on the other side [143].

Biomedical hypothesis generation has led to many discoveries such as potential disease treatments [144�], as well as understanding and discovering new health benefits of supplements [148]. It is particularly promising in a number of applications in pharmaceutical industry such as drug development [149�], drug repurposing [144, 150, 152�] and pharmacovigilance [152, 158�], which are further highlighted below.

Drug discovery and development are expensive and time-consuming processes. Drug repurposing refers to the process of identification of disease targets as alternative potential indications of existing drugs. Successful repurposing can save lots of time and financial cost for drug development because it does not need to go through the initial in-silico and part of the in-vitro phases. For example, COVID-19 is now a global epidemic. Until March 2020 there have been nearly 300K confirmed cases with over 11K deaths 185 countries and areas ( There is an urgent need on the development of effective treatment for COVID-19. Complete de-novo drug discovery would be very time consuming in this case. Remdesivir, a drug that was originally developed for treating ebola, has demonstrated effectiveness of treating COVID-19 [161], so does hydroxychloroquine, whose original indication is malaria [162]. LBD can provide essential helps for the drug repurposing process. Andronis et al. [163] reviewed various LBD methods that are critical for the detection of hidden connections between biomedical entities and suggest that visualization techniques can help scientists perform tests. Tari et al. [153] used a declarative programming language, AnsProlog, to achieve the automated reasoning for the incomplete information of indirect relationships for drug indications. They also introduced several publicly available knowledge resources such as chemical structures, side effects and signaling pathways for identifying alternative drug indications [154].

Pharmacovigilance refers to ‘the pharmacological science relating to the collection, detection, assessment, monitoring and prevention of adverse effects with pharmaceutical products’ [164]. On this topic, Shang et al. [158] developed a scalable LBD method which uses distributional statistics to infer and apply discovery patterns to evaluate the plausibility of drug/adverse drug reaction pairs for pharmacovigilance. Hristovski et al. [159] presented a tool for providing pharmacological and pharmacogenomics explanations for known adverse drug effects through genes or proteins that link the drugs to the adverse effects. Mower et al. [160] extended this paradigm by evaluating machine learning classifiers when applied to high-dimensional representations of relationships extracted from the literature as a way to identify substantiated drug/adverse drug reaction pairs.

6.3 Methods

Most of the LBD systems are based on or derived from Swanson’s ABC co-occurrence model [139]. In such a model, explicit knowledge is encoded in the text in forms of 𠆊 implies B’ and 𠆋 implies C’ relations. Implicit knowledge can be discovered by drawing a ‘therefore A implies C’ conclusion. For instance, dietary fish oil is mentioned in articles with blood viscosity and vascular reactivity. These two terms are also mentioned in articles with Raynaud’s disease. Swanson proposed that it is reasonable that dietary fish oil and Raynaud’s disease might be associated. This result has been validated experimentally [165] (see Figure 12 ).

The example of ABC model for connecting fish oil and Raynaud’s disease.

Various tools have been developed to use the ABC co-occurrence model for hypothesis generation [166]. For example, Swanson’s Arrowsmith tool has utilized the co-occurrence of biomedical terms in titles from MEDLINE abstracts to identify existing associations [167]. The user of the system needs to input a starting term and could get choices of appropriate intermediate terms provided by the system. The predicted target terms are ranked by counting the number of intermediate terms. Similar idea have been explored in other systems as well (e.g. CoPub [168] and FACTA+ [169]). These systems typically use the text in the abstract, not just the title. The BITOLA system made use of both the number of intermediate concepts and the number of publications that support these intermediate links as the score for ranking the association candidates [170]. These methods only considered local knowledge in terms of the intermediate terms that co-occur with the starting and the target terms. Recently Zhao et al. [100] discussed different situations of ABC co-occurrence and proposed a factor graph model, CausalTriad, which utilizes more holistic textual and structural knowledge to infer the causal hypothesis.

In addition to the above ABC co-occurrence based modeling paradigms, there were also other methodologies for LBD. For example, the rarity principle [171�], which looks at infrequently co-occurring terms rather than frequently co-occurring ones. Bibliometric based systems used the citation information to find the linkage and target literature [173]. Sang et al. [29] further developed a biomedical knowledge graph-based LBD approach for drug discovery.

Figure 13 presents a typical end-to-end pipeline of LBD. This system takes a pair of medical terms (a medical term and meta information in the case of open discovery) as input. The task of the ‘Hypotheses Generation Module’ is to list a set of assumptions that relate two inputs by mediation, e.g. 𠆏ish oils Beta-Thromboglobulin Raynaud Disease’. The ‘Ranking Module’ is then responsible for generating these postulates, which would be finally given to the end-user for further validations. The generated hypotheses can be ranked via selected tools and algorithms in the ranking module. Due to the cascading nature of these modules, it is assumed that the output quality of the generated module will affect the overall quality of the final result.

An overview schematic of a general end-to-end pipeline of LBD.

6.4 Potential applications with deep learning

Most of the studies in hypothesis generation are based on the ABC model. Deep learning models have rarely been used directly in this task potentially due to the high interpretability requirements of the LBD process. It can be envisioned that deep learning models training from large annotated biomedical text corpus should be able to achieve better numerical performance measured on the generated hypotheses. However, it is essential that those hypotheses are explainable. Therefore, effective deep learning interpretability mechanisms [174, 175] are needed.

6.5 Challenges

Despite the promises, challenges remain for biomedical literature based hypothesis generation. In particular, 1) the assumptions in certain methods (such as the ABC co-occurrence based approach) is too simple to capture the complexity of biomedical processes. Enrichment of these technologies with comprehensive biomedical context is important and challenging 2) many existing LBD methodologies and systems are developed for research purpose. It is important to get them deployed in real application settings where those systems can really help with, such as basic science research, pharmaceutical research and development, as well as clinical care, so that the new discoveries can be prospectively evaluated 3) the contents of the biomedical articles could be biased towards their specialized disciplines. Sometimes the discoveries from different articles could be contradictory. Obtaining reliable and convincing hypotheses in this scenario is challenging.

7. Pew Internet

Curated by: Pew Research Center
Example data set: Teens, Social Media & Technology 2018

The Pew Research Center’s mission is to collect and analyze data from all over the world. They cover all sorts of topics like politics, social media, journalism, the economy, online privacy, religion, and demographic trends. While they do their own nonpartisan, non-advocacy research and analysis, they also offer their raw data for public access. Access simply requires a brief registration on the site and credit to Pew Research Center as the source of the data, with a waiver that Pew is not responsible for alternative data conclusions.

In a way, making data accessible is also another research project for Pew. They already have all the information about how they use the data in their research and they are interested in learning how others use their data as well. They have one request — to contact them by email if anything is published as a result of the data acquired.

Next seminar:

7. 6. 2019 (Friday) 15:00 Lucas Moitinho-Silva, Institute of Clinical Molecular Biology, CAU

Exploration of microbial abundance patterns in sponge microbiomes

Animal-microbe symbiosis research requires a broad range of techniques from different fields of science. In my talk, I will explore the dichotomy between high microbial abundance (HMA) and low microbial abundance (LMA) sponges with ecological and machine learning methods. I will present the analysis of microbiomes of about 170 sponge species (

1800 samples) of which data are public available as part of the sponge microbiome project. In the final part of my talk, I will show how I am transferring these approaches to my current projects about the human microbiome.

10. 5. 2019 (Friday) 15:00 Silke Szymczak, Institute of Clinical Molecular Biology, CAU

Looking into the black box of random forests

Machine learning methods and in particular random forests are promising approaches for classification and regression based on omics data sets. I will first describe in layman's terms how the random forest algorithm works and how a prediction model can be built. However, these complex models are not easy to interpret and one strategy for better understanding is variable selection, i.e. the identification of variables that are important for prediction.
In the second part of my talk I will then present our novel method called surrogate minimal depth (SMD). It is based on the structure of the decision trees in the forest and additionally takes into account relationships between variables. In simulation studies we showed that correlation patterns can be reconstructed and that SMD is more powerful than existing variable selection methods. Thus, SMD is a promising approach to get more insight into the complex interplay of predictor variables and outcomes in a high dimensional data setting.

8. 3. 2019 (Friday) 15:00 Martin Jahn, GEOMAR Kiel

Implications of the virome on marine sponge holobionts

Phages are increasingly recognized as important members of host associated microbial communities. While phage-bacteria interactions have been studied for more than one century comparatively little is known about how phages interact with their animal hosts. An attractive model that allows us to study host-microbe interactions in a natural environment are marine sponges, which are associated with stable, highly complex and specific microbial communities. As filter-feeding animals, sponges pump up to 24,000 litres of seawater through their system per day exposing them to high amounts of external viruses. High exposure to phages, a major bacteriolytic element, raises questions on how microbiome homeostasis can be maintained. Moreover, the diversity and function of residual phages on the sponge microbial community and their distribution in the animal's landscape are largely unexplored.

Therefore, I recently investigated 36 DNA/RNA viromes of four Mediterranean sponge species and nearby seawater references using viral metagenomics. In this seminar, I will walk you through the steps from sampling design to taxonomic and functional analysis and discuss methodological aspects of it. Finally, I will highlight possibilities to connect sequencing with supplementary approaches such as microscopy and functional assays what should be widely applicable to other systems.

8. 2. 2019 (Friday) 15:00 Ribana Roscher, Institute of Geodesy and Geoinformation (IGG), Universität Bonn

Machine Learning for Earth Remote Sensing

Remote sensing observations play an important role in the geo- and bioscientific community, since they enable various applications to accurately monitor the Earth and its changes - on a microscopic level as well as from space. Beside the challenge to deal with large amounts of data and limited class label information, current and future challenges comprise the definition and integration of prior and domain knowledge, the learning of sophisticated features and the fusion of multiple sensor data. This talk will cover several remote sensing applications with focus on deep learning methods which are addressed in my group, and I will present my vision of future methods to learn better models of complex geo- and biophysical processes and phenomena.

18. 1. 2019 (Friday) 15:00 Ana Filipa Moutinho, Max Planck Institute for Evolutionary Biology, Plön

The genomic and structural drivers of protein adaptive evolution

The frequency and nature of adaptive mutations is a long-standing focus of the study of molecular evolution. Here, we address the impact of structural architecture among protein coding regions on the rate of adaptive mutations. We used population genetics to study molecular evolution on a fine scale by analysing the impact of genetic variants in the different conformations of protein structure. With this, we aimed to understand how protein biophysics and coding sequence evolution influence fitness and adaptation. By using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue scale, across different categories of protein function, chaperone affinity, protein-protein interactions, intrinsic protein disorder and structural motifs. We found that most of the adaptive mutations occur at the surface of proteins and that gene age strongly influences the rate of adaptation. Moreover, we observe that the functional class of proteins also plays a role in adaptation, with genes encoding for processes of protein regulation and signaling pathways exhibiting the highest values for the rate of adaptive substitutions. We therefore propose that the rate of adaptive mutations in proteins is driven by new inter-molecular interactions, both at the intra-organism, within protein networks, and at the inter-organism level, through the coevolution with pathogens, and/or by the acquisition of new biochemical activities.

7. 12. 2018 (Friday) 15:00 Mario Stanke, Institut für Mathematik und Informatik, Universität Greifwald

Comparative Genome Annotation

The talk will treat the trade annotation problem: many genomes of different species or strains of a clade are given and the clade is so narrow that larger parts of these genomes can be aligned, e.g. the clade of murine species.
Instead of annotating each genome one-by-one, we develop methods that simultaneously annotate all genomes, thereby exploiting evidence from selection and introducing a coupling of the previously independent sequential labeling problems in order to increase accuracy and consistency of the structural genome annotations. I will present ongoing efforts to improve the AUGUSTUS gene prediction tool.

9. 11. 2018 (Friday) 15:00 Johannes Zimmermann, Institute of Experimental Medicine, CAU Kiel

Fishing with metabolic networks - crafting, catching, curating

Metabolic networks are repositories of knowledge about the metabolic processes that occur in an organism. They are successfully used to examine various phenomenons rather on a integrative pathway level than on gene level only. In my talk, I want to give an introduction to metabolic networks theory focusing on the construction, analysis and curation of such networks. Own contributions are discussed as well as finally the application of metabolic networks to community modeling spotlighted by the metaorganinsm paradigm.

1. 10. 2018 (Monday) 15:00 Dan Graur, University of Houston

Something Old, Something New, Something Borrowed, Something Blue: Applying the Concept of Mutational Load to Genomic Sequences to Determine an Upper Limit on the Functional Fraction of the Human Genome

For the human population to maintain a constant size from generation to generation, an increase in fecundity must compensate for the reduction in the mean fitness of the population caused by deleterious mutations. The required increase depends on the deleterious-mutation rate and the number of sites in the genome that are functional. These dependencies and the fact that there exists a maximum tolerable replacement level fertility (e.g., humans cannot have 100 children) allow us to estimate an upper limit for the fraction of the human genome that can be functional. By estimating the fraction of deleterious mutation out of all mutations in known functional regions, we conclude that the fraction of the human genome that can be functional cannot exceed 25%, and is almost certainly much lower.

15. 6. 2018 (Friday) 15:00 Andreas Tauch, de.NBI Administration Office - ELIXIR Germany, Bielefeld University, Bielefeld, Germany

Bioinformatics in Germany: toward a national-level infrastructure

The German Network for Bioinformatics Infrastructure (de.NBI) is a national initiative funded by the Federal Ministry of Education and Research (BMBF). The mission of the de.NBI initiative is (i) to provide high-quality bioinformatics services to users in basic and applied life sciences research from academia, industry and biomedicine (ii) to offer bioinformatics training to users in Germany and Europe through a wide range of workshops and courses and (iii) to foster the cooperation of the German bioinformatics community with international network structures. The infrastructure network was launched by the BMBF in March 2015 and, after two national calls, now includes 40 service projects operated by 30 project partners that are organized in eight service centers. Scientists from Kiel University and from the Fritz Lipmann Institute Jena joined the de.NBI nework as associated partners in 2017. The staff of de.NBI develops further and maintains almost 100 bioinformatics services for the human, plant and microbial research fields and provides comprehensive training courses to support users with different expertise levels in bioinformatics. The network is currently expanding its activities to the European level, as the de.NBI consortium was assigned by the BMBF to establish and run the German node of ELIXIR, the European life-sciences Infrastructure for biological Information. Like de.NBI on the national level, ELIXIR-DE is coordinated from Bielefeld University and includes over twenty partner institutes across Germany.

20. 4. 2018 (Friday) 15:00 Eli Levy Karin, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany

Statistical techniques in molecular evolution // Tools to explore eukaryotic metagenomics

My PhD focused on developing computational and statistical methods in the field of molecular evolution. In my talk I will give a brief overview of my work, which dealt with various aspects of sequence analysis. I will then present in greater detail one of the projects, TraitRateProp. TraitRateProp is a probabilistic method that allows testing whether the rate of sequence evolution is associated with changes in a binary phenotypic character trait. The method further allows the detection of specific sequence sites whose evolutionary rate is most noticeably affected following the character transition, suggesting a shift in functional/structural constraints. TraitRateProp was first evaluated in simulations and then applied to study the evolutionary process of plastid plant genomes upon a transition to a heterotrophic lifestyle.
Finally, I will present my current work on developing and applying computational tools for the analysis of eukaryotic metagenomics data. Metagenomics is revolutionizing the study of microbes and their fundamental roles in biological, geological, and chemical processes on earth. Despite the important roles eukaryotes play in most environments, they have received little research attention, due to their lower abundance in samples and to the complexity of their gene and genome architectures. To date, we generally cannot reliably predict eukaryotic genes in metagenomics sequences. However, being able to analyze eukaryotic metagenomics data is of great importance to numerous scientific fields, including biotechnology and medicine, ecology and evolution. In my study, I work on developing computational tools for the high-throughput discovery of eukaryotic gene sequences in metagenomics data and for their functional annotation.

23. 3. 2018 (Friday) 15:00 Christoph Kaleta, Institute of Experimental Medicine, Kiel University

Transcriptomic alterations during ageing reflect the shift from cancer to degenerative diseases in the elderly

Disease epidemiology during ageing shows a transition from cancer to degenerative chronic disorders as dominant contributors to mortality in the old. Nevertheless, it has remained unclear to what extent molecular signatures of ageing reflect this phenomenon. Here we report on the identification of a conserved transcriptomic signature of ageing based on gene expression data from four vertebrate species across four tissues. We find that ageing-associated transcriptomic changes follow trajectories similar to the transcriptional alterations observed in degenerative ageing diseases but are in opposite direction to the transcriptomic alterations observed in cancer. We confirm the existence of a similar antagonism on the genomic level, where a majority of shared risk alleles that increase the risk of cancer decrease the risk of chronic degenerative disorders and vice versa. These results reveal a fundamental trade-off between cancer and degenerative ageing diseases that sheds light on the pronounced shift in their epidemiology during ageing.

23. 2. 2018 (Friday) 15:00 Tim Lachnit, Zoological Institute, Kiel University

Viruses, the neglected part of metaorganisms

Eukaryotic organisms are associated and have co-evolved with a complex bacterial community. Together host and bacteria form a synergistic relation. Disturbance of the homeostasis between host and its associated partners may contribute to disease development. While in the last decades research has focused on bacteria host interactions viruses have been disregarded although they represent the most abundant entity in the world outnumbering bacterial cells and are one of the key regulators of bacterial communities killing 20-40% of all bacterial cells each day. In this seminar I’ll introduce you to the viral world living in association with diverse organisms of different habitats ranging from marine algae, sponges, freshwater polyps to fecal samples of mice. On the basis of these examples I’ll emphasize different methodical problems including viral isolation, library preparation and sequence data analysis that challenge viral research and have to be taken into account when working with viruses.

8. 12. 2017 (Friday) 15:00 Bernhard Haubold, Max Planck Institute for Evolutionary Biology, Plön

Sequence Complexity and Gene Function in the Human Genome

Genome sequences vary locally in their complexity due to duplication events in their evolutionary past. As a result, there is long-standing interest in elucidating the relationship between sequence complexity and the function of the encoded genes. However, measuring local sequence complexity is problematic as most metrics have no bounds that coincide with a known minimum for completely ordered sequences, and a maximum for random sequences. An exception to this rule is our complexity measure CM, which is bounded by 0 and an expectation of 1. This measure is robust to variation in GC-content, and can be computed efficiently. We have implemented CM in our program macle for MAtch CompLExity. Macle takes as input a genome sequence in FASTA format for indexing. In the case of the complete human genome, indexing takes 3.5 h using 128 GB RAM. Given the resulting index, macle computes CM in sliding windows of arbitrary width across the entire genome in roughly 19 s using 25 GB RAM.
To investigate the relationship between sequence complexity and gene function, we determined which genes were enriched in regions of a given complexity. We found that high complexity regions were strongly enriched for regulatory genes active in development. In contrast, low complexity regions were enriched for genes involved in immunity. We end by speculating on the role of the few unannotated regions of high complexity found.

10. 11. 2017 (Friday) 15:00 Tobias Marschall, Max Planck Institute for Informatics, Saarbrücken

Structural Genomic Variation and Horizontal Gene Transfer

Structural variation (SV) is of key importance for the evolution of genomes across the tree of life. This talk presents a tour of methodological developments for SV calling, genotyping, and haplotyping. First, I will explain methods (Clever, Mate-Clever) we developed and applied in the frame of the Genome of the Netherlands (GoNL) project, which sequenced 250 Dutch families, and highlight some of the results of this study. Second, I will venture into the world of bacterial genomics and show how lessons learned from detecting human structural variation can be applied to design a tool (Daisy) to detect recent horizontal gene transfer. Third, I will discuss the impact of technological developments for detecting SVs, using the data produced by the Human Genome Structural Variation Consortium (HGSVC) as an example. The HGSVC sequenced nine human genomes each on seven different platforms (Illumina paired ends, Tru-seq synthetic long reads, jumping libraries, 10X Genomics, PacBio, BioNano optical maps, Strand-seq). In the frame of this project, we particularly explored the abilities of these technologies to resolve haplotypes by employing our WhatsHap method, which I will briefly explain. As a result, the HGSVC has produced a map of haplotype-specific structural variation that highlights SVs as substantially more prevalent in humans than was previously appreciated.

16. 10. 2017 (Monday) 16:00 Itzhak Mizrahi, Ben-Gurion University, Israel

Insights into the rumen microbiome

The mammalian gut microbiota is essential in shaping many of its host's functional attributes. Relationships between gut bacterial communities and their mammalian hosts have been shown in recent years to play an important role in the well-being and proper function of their hosts. A classic example of these relationships is found in the bovine digestive tract in a compartment termed the rumen. The rumen microbiota is necessary for the proper physiological development of the rumen and for the animal’s ability to digest and convert plant mass into basic food products, making it highly significant to humans. In my lecture I will discuss our recent findings regarding this ecosystem's development, and interaction with the host.

14. 7. 2017 (Friday) 15:00 Christian Woehle Institute of Microbiology, CAU Kiel

Tracing back the evolution of the eukaryotic redox proteome

The redox-sensitive proteome (RSP) consists of protein thiols that undergo redox reactions, playing an important role in coordinating cellular processes. Here, we applied a large-scale phylogenomics approach to map the evolutionary origins of the eukaryotic RSP. Based on current-day snapshot of the diatom Phaeodactylum tricornutum we inferred ancestral sequence states and traced the evolution of the RSP stepwise back to the origin of eukaryotes. Our results show, that the majority of P. tricornutum redox-sensitive cysteines (76%) is specific to eukaryotes, yet these are encoded in genes that are mostly of a prokaryotic origin (57%). Furthermore, we find a threefold enrichment in redox-sensitive cysteines in genes that were gained by endosymbiotic gene transfer during the primary plastid acquisition. The secondary endosymbiosis event coincides with frequent introduction of reactive cysteines into existing proteins. While the plastid acquisition imposed an increase in the production of reactive oxygen species, our results suggest that it was accompanied by significant expansion of the RSP, providing redox regulatory networks the ability to cope with fluctuating environmental conditions.

9. 6. 2017 (Friday) 15:00 Giorgio Gonnella, Center for Bioinformatics, University of Hamburg

GFApy: a convenient and extensible Python library for handling sequence graphs

The Graphical Fragment Assembly formats 1 and 2 (GFA1 and GFA2) are recently defined formats for representing sequence graphs, such as assembly graphs (de Bruijn and string graphs), sequence variation graphs and gene splicing graphs. The formats are adopted by several software tools, including sequence assemblers, read mappers, variant analysis tools and interactive visualization tools.
We present a scripting language library for handling GFA files in Python (GFApy). The library allows the user to conveniently parse, edit and write GFA files. Complex operations, such as the separation of the implicit instances of repeats and the merging of linear paths are also supported. Furthermore, the library is easily extensible: we show an example on how to define custom record types for metagenomic analysis.
GFApy is the first library which allows for convenient handling of GFA files using Python and the first publicly available implementation in any language fully supporting the GFA2 specification.

12.5.2017 (Friday) 15:00 Fernando Tria, Institute of Microbiology, CAU Kiel

Phylogenetic rooting using minimal ancestor deviation

Ancestor-descendent relations play a cardinal role in evolutionary theory. Those relations are determined by rooting phylogenetic trees. Existing rooting methods are hampered by evolutionary rate heterogeneity or the unavailability of auxiliary phylogenetic information. We present a novel rooting approach, the minimal ancestor deviation (MAD) method, which embraces heterotachy by utilizing all pairwise topological and metric information in unrooted trees. We demonstrate the method in comparison to existing rooting methods by the analysis of phylogenies from eukaryotes and prokaryotes. MAD correctly recovers the known root of eukaryotes and uncovers evidence for cyanobacteria origins in the ocean. MAD is more robust and consistent than existing methods, provides measures of the root inference quality, and is applicable to any tree with branch lengths.

10.3.2017 (Friday) 15:00 Malte Rühlemann, IKMB Kiel

Genome-wide association studies of the human gut microbiota

1,800 individuals from Northern Germany, we wanted to investigate the influence of host-genetic variation on core members of the gut microbiota, as well as on overall beta-diversity of the community. Results show an overlap with previously known candidate genes for host-microbe-interactions from functional studies, sharing with loci identified in association studies of inflammatory disorders and new candidate genes shedding new light onto the mechanisms how the host-genome influences the bugs in our guts.

10.2.2017 (Friday) 15:00 Axel Wedemeyer, Institute of Informatics, CAU Kiel

Filtering reads for De Novo Assembly

These days, sequencing projects often produce huge data sets. Especially for single-cell projects it is necessary to sequence with a very high mean coverage in order to make sure that all parts of the sample DNA get covered by the reads produced. This leads to datasets with large amounts of redundant data. Metagenomic data sets often show a high coverage for abundant species and a low one for rare species.

For a de novo assembly, the assembler has to reconstruct the genetic information out of these data sets alone, a puzzle with sometimes billions of pieces. This is a demanding task, particularly with regard to the amount of RAM needed. Common assemblers like metaSPAdes or AllpathLG regularly need more than the 250GB of memory a common server has our days.

But is all the data necessary for the problem solution? The basic idea of our work is to filter out redundant reads in order to reduce memory and time requirements of the assembly process. The decision whether to keep or dump a certain read is based on a probalistic counting scheme for the k-mers (substrings of reads of length k) seen so far and on the phred score. While this method has been shown to be very effective on single-cell and transcriptomic data sets, we are currently working on adapting it to metagenomic data sets.

13.1.2017 (Friday) 15:00 Beate Slaby, GEOMAR Kiel

Metagenomic binning of a marine sponge microbiome reveals unity in defense but metabolic specialization

Marine sponges are ancient metazoans that are populated by distinct and highly diverse microbial communities. In order to obtain deeper insights into the functional gene repertoire of the Mediterranean sponge Aplysina aerophoba, we combined Illumina short-read and PacBio long-read sequencing followed by un-targeted metagenomic binning. We identified a total of 37 high-quality bins representing 11 bacterial phyla and 2 candidate phyla. Statistical comparison of symbiont genomes with selected reference genomes revealed a significant enrichment of genes related to bacterial defense (restriction-modification systems, toxin-antitoxin systems) as well as genes involved in host colonization and extracellular matrix utilization in sponge symbionts. A within-symbionts genome comparison revealed a nutritional specialization of at least two symbiont guilds, where one appears to metabolize carnitine and the other sulfated polysaccharides, both of which are abundant molecules in the sponge extracellular matrix. A third guild of symbionts may be viewed as nutritional generalists that perform largely the same metabolic pathways but lack such extraordinary numbers of the relevant genes. This study characterizes the genomic repertoire of sponge symbionts at an unprecedented resolution and it provides greater insights into the molecular mechanisms underlying microbial-sponge symbiosis.

9.12.2016 (Friday) 15:00 Matthias Merker, Molecular and Experimental Mycobacteriology, Research Center Borstel

Evolution of multidrug-resistant tuberculosis strains in Eastern Europe

Bacterial factors favoring the unprecedented multidrug-resistant tuberculosis (MDR-TB) epidemic in Eastern Europe remain unclear. We analyzed whole genome sequences from 1,436 clinical MDR Mycobacterium tuberculosis complex (MTBC) strains from different Eastern European settings. The vast majority (70%) of M/XDR-TB infections were caused by three closely related MTBC strain types. Bayesian coalescent analysis revealed that particular MTBC clones with patterns of low fitness cost resistance mutations (first- and second-line drugs) in combination with compensatory mutations existed prior the introduction of standardized TB treatment in the late 1990s. The dominance of particularly fit and highly resistant strains further challenges the application of standard treatment regimens including the new short MDR-TB regimen and highlights the need for universal, rapid comprehensive drug susceptibility testing especially in high burden settings.

11.11.2016 (Friday) 15:00 Silvio Waschina, Institute of Experimental Medicine, CAU Kiel

Costs and necessity of the Black Queen: the impact of metabolic trade-offs on the evolution of microbial community structure and dynamics

Microbial cells often exchange costly produced metabolites with neighbouring cells within their communities - creating a vast network of interdependencies where cooccurring organisms perform complementary metabolic functions. The Black Queen hypothesis aims to explain the evolution of such dependencies through the loss of metabolic functions by a sub-group of cells while the function is retained by coexisting cells that share the function’s essential product. To test this hypothesis requires knowledge of (i) the fitness consequences of metabolic gene loss as well as (ii) the costs that are associated with the biosynthesis of exchanged metabolites. Both quantities, however, usually remain elusive.

Here we addressed this issue using data mining approaches and constraint-based modelling of bacterial metabolism. The computational estimates and predictions were complemented with laboratory experiments of Escherichia coli and Acinetobacter baylyi. The results suggest that loss of conditionally essential biosynthetic functions is highly prevalent in natural bacterial populations. This rampant loss of anabolic functions can be explained by selective advantages of biosynthetic gene loss in the presence of the focal metabolites. In addition, epistatic interactions frequently affected fitness after losing multiple genes. We also identified a carbon source-dependent trade-off between the production costs of different classes of amino acids. Such biochemical trade-offs are known to play a crucial role in the ecology and evolution of microorganisms because coexisting lineages can mutually save metabolic costs by specialising in the production of different essential metabolites. Taken together, our observations demonstrate potential molecular causes underlying the evolution of metabolic interdependency and complementary within microbial communities.

14.10.2016 (Friday) 15:00 Astrid Dempfle, Institute of Medical Informatics and Statistics, CAU Kiel

Statistical aspects of gene-environment interaction

The concept of gene-environment interaction is relevant both in the etiology of complex diseases and in personalized treatment. Statistical aspects in the identification or utilization of such interactions will be highlighted, in particular relating to study design and statistical analysis for disease gene identification or pharmacogenetic clinical trials.

15.7.2016 (Friday) 15:00 Elisabeth Kaltenegger, Botanical Institute, CAU

Interference between paralogues at the protein level affects the dynamics after gene duplication

A common feature of proteins is their assembly into homomeric structures to act as functional units. Usually, the subunits are derived from a single genetic locus. When such a gene is duplicated, the gene products are suggested initially to cross-interact when co-expressed thus resulting in the phenomenon of paralogue interference. In this talk, I will present a case study of protein evolution in which paralogue interference after duplication might have facilitated neofunctionalization of one duplicate. I will also explore further possible ways of how paralogue interference can shape the fate of a duplicated gene and present further illustrative examples. One important outcome is a prolonged time window in which both copies remain under selection increasing the chance to accumulate mutations and to develop new properties. Thereby, paralogue interference can mediate the co-evolution of duplicates.

13.5.2016 (Friday) 15:00 Frederic Bertels, Max Planck Institute for Evolutionary Biology, Plön

Parallel evolution in a long term experiment with HIV-1

One of the most intriguing puzzles in biology is the degree to which evolution is repeatable. The repeatability of evolution or parallel evolution has been studied in a variety of model systems, but has rarely been investigated with clinically relevant viruses. To investigate parallel evolution of HIV-1, we passaged two replicate HIV-1 populations for almost one year in each of two human T-cell lines. For each of the four replicate lines, we determined the genetic composition of the viral population at nine time points by sequencing the entire genome. Mutations that were carried by the majority of the virus population showed an extreme degree of parallel evolution. In one of our evolutionary lines, all 19 majority mutations also occur in another line but appear in a different order. This repeatable pattern of HIV-1 evolution is indicative of a predictable process, which is maximally inconsistent with evolutionary neutrality.

15.4.2016 (Friday) 15:00 Marc Hoeppner, IKMB

Workflow systems in bioinformatics

Within just a few years, the steadily decreasing cost of next-generation sequencing has turned biology into one of the most data intense research disciplines in the world. While this age of "big data" is promising exciting new insights, it also threatens to outpace our ability to make sense of the flood of information and handle it efficiently. Here, one particular challenge is the use of high performance compute infrastructures and the detailed record keeping (data provenance) necessary for good scientific practice. Within this presentation, I will discuss the challenges of big data and how dedicated workflow systems can help accelerate bioinformatics, including some hands-on examples to show that the adoption of such purpose-built solutions do not need to be complicated.

11.3.2016 (Friday) 15:00 Transcriptomics Symposium

  • Rainer Kiko (GEOMAR):
    Accounting for differential RNA yield in RNA-Seq: a copepod example
  • Christian Wohle (IFAM, CAU):
    De-novo RNA-Seq analysis of non-model organisms: a foramenifera example
  • Wentao Yang (Zool. Inst., CAU):
    ABSSeq: a new RNA-Seq analysis method based on modeling absolute expression differences

12.2.2016 (Friday) 15:00 Dirk Fleischer, Kiel Marine Science

Start smart - Data capturing at the point of origin

15.1.2016 (Friday) 15:00 Tobias Lenz, Max Planck Institute for Evolutionary Biology, Plön

Evolutionary genomics of an optimal adaptive immune response

11.12.2015 (Friday) 15:00 Steffen Möller, University of Rostock

eQTL: intertwining disease decomposition and drug repositioning

Expression QTL (eQTL) further annotate disease-associated genetic loci with co-observed changes in the transcriptome. With drugs selected to compensate the disturbance caused for single loci, for a genotyped patient of a multifactorial disease one may derive a recipe for a drug cocktail. This presentation reviews resources available today and emergent algorithms, exemplified on murine data for experimental autoimmune encephalomyelitis, a mouse model for neuroinflammation.

13.11.2015 (Friday) 15:00 Wilhelm Hasselbring, Dept. Computer Science, CAU

Workflows for Scientific Data Processing and Publication

In this presentation, I'll present three related topics: (1) Our PubFlow approach to automate publication workflows for scientific data. The PubFlow workflow management system employs established technology. We integrate institutional repository systems and world data centers (in marine science). PubFlow collects provenance data automatically via our monitoring framework Kieker. In our evaluation in marine science, we collaborate with the GEOMAR Helmholtz Centre for Ocean Research Kiel. (2) Data processing in genomics: I'll briefly sketch bioinformatics tools such as Bioconductor and Galaxy, and indicate how these tools may be combined with advanced data-analysis systems for Internet-scale data processing such as MapReduce/Hadoop, including our own tools ExplorViz and TeeTime. (3) For good scientific practice, it is important that research results may be properly checked by reviewers and possibly repeated and extended by other researchers. I'll discuss publishing code, in addition to data.

Prof. Dr. Wilhelm (Willi) Hasselbring is professor of Software Engineering at Kiel University. In the competence cluster Software Systems Engineering (KoSSE), he coordinates technology transfer projects with (local) industry. In the excellence cluster Future Ocean, he is principal investigator and co-coordinator of the research area Ocean Observations.

9.10.2015 (Friday) 15:00 Anne Kupczok, IFAM CAU

Studying genetic heterogeneity within microbial populations using high-resolution metagenomics

9.7.2015 (Thursday) 15:00 David Ellinghaus, IKMB Kiel

A systematic cross-disease study of five chronic inflammatory diseases

11.6.2015 (Thursday) 16:00 Corrina Breusing, GEOMAR

Population connectivity and dispersal of vent mussels from the Mid-Atlantic Ridge

30.4.2015 (Thursday) 15:00 Elie Jami, IKMB Kiel

Characterization of the bovine rumen microbiome from birth to adulthood and its potential effect on host physiology

27.3.2015 (Friday) 15:00 Fabian Kloetzl, Max Planck Institute for Evolutionary Biology, Plön

Efficient Estimation of Evolutionary Distances

26.2.2015 (Thursday) 16:00 Oscar Puebla, GEOMAR, Kiel

Genomic atolls of differentiation in coral reef fishes (Hypoplectrus spp)

30.1.2015 (Friday) 15:00 Prof. Dr. Christoph Kaleta, Institute of Experimental Medicine, CAU Kiel

Microbial survival in challenging environments - Be quick or be social

18.12.2014 (Thursday) 16:00 Dr. Ben Krause-Kyora, IKMB CAU, Kiel

Microbial genomics from ancient DNA

5.12.2014 (Friday) 15:00:Dr. Ingram Iaccarino, Institute of Human Genetics UKSH, Kiel

Identification of novel downstream players in MYC-induced cellular transformation

30.10.2014 (Thursday) 16:00 Dr. Julien Y. Dutheil, MPI Plön

The evolution of primates X chromosome and Human-Chimp speciation.

26.9.2014 (Friday) 15:00:Dr. Giddy Landan, Institute of General Microbiology, CAU, Kiel

Origins of major archaeal clades correspond to gene acquisitions from bacteria.

The 13 higher taxonomical groups of archaea unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are 11-fold more frequent than vice versa

Gene transfers identified at major evolutionary transitions among archaea specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.

Comparison of sets of trees without a reference phylogeny.

Tree compatibility measures tuned to detect a non-vertical, LGT, signal.

Statistics for integration of layered data with small per-layer samples.

267,568 protein coding genes of: 134 sequenced archaeal genomes, in the context of their homologs from: 1,847 reference bacterial genomes.

2014, June 6th, 15:00: Dr. Silke Szymczak, Institute of Clinical Molecular Biology, CAU

Comparison of variable selection methods in random forests for genomic data sets

2014, May 16th: Prof. Dr. Anand Strivastav, Institute of Informatics, CAU

Streaming Algorithms for Big Data Problems in Bioinformatics

2014, April 27th-30th: SMBE Satellite meeting on Reticulated Microbial Evolution

2014, March 18th, 15:00: Prof. David Bryant, University of Otago, New Zealand

Phylogenetic analysis of species radiations using SNPs and AFLPs. (Bio/Bioinformatics/Genetics)

Technological wonders such as next generation sequencing mean that we can now, in principle, obtain SNP (single nucleotide polymorphism) data from multiple individuals in multiple species. This promises enormous benefits for population genetic and phylogenetic analysis, particularly of closely related or poorly resolved species. My interest is in how to analyse these data effectively and responsibly. We have developed an algorithm which estimates species trees, divergence times, and population sizes from independent (binary) makers such as well spaced SNPs. The method is based on coalescent theory (like the BEAST software), though it uses mathematical trickery to avoid having to consider all the possible gene trees. As a `full likelihood' method, it should be more accurate than alternative FST based approaches. I'll talk about our experiences applying this method to AFLP data from alpine plants, and some recent discoveries about the usefulness (or uselessness) of SNP data for estimating population sizes.

2014, March 14th, 15:00: Dr. Till Bayer. GEOMAR

16S metagenomic analysis of the coral microbiome

2014, January 31st, 15:00: Dr. Johann-Mattis List, Forschungszentrum Deutscher Sprachatlas, Philipps Universität Marburg

Using bionformatics to study the lateral component of language evolution

Ever since August Schleicher (1821-1868) first proposed the idea that the language history is best visualized “bei dem Bilde eines such verästelnden Baumes”, this view has been controversially discussed by linguists, leading to various opposing theories, ranging from wave-like evolutionary scenarios to early network proposals. The reluctance of many scholars to accept the tree as the natural metaphor for language evolution was due to conflicting signals in linguistic data: Many resemblances would simply not point to a unique tree. In the last two decades, historical linguistics has been experiencing a “quantitative revolution” and many automatic approaches from evolutionary biology have been applied to linguistic data. Given the important role that language contact and lexical borrowing play during language history, it is surprising that the majority of the new automatic approaches in historical linguistics assumes a strict “eukaryotic framework” for language evolution and only focuses on the reconstruction of language trees. I will argue that a “prokaryotic framework” for language evolution – based on biological network approaches that help to distinguish vertical from lateral processes during genome evolution – offers a fruitful alternative to current linguistic “dendrophilia” and provides more comprehensive insights into the complexities of language evolution.

2014, January 10th, 15:00: Prof. Dr. Bernhard Haubold, MPI Plön

Alignment-Free Tools for Genome Comparison

Whole genome sequencing has become routine. However, comparing whole genomes by alignment remains challenging. I therefore present three fast computer programs for comparing unaligned genomes. All three are based on calculating the lengths of exact matches between pairs of genomes. This quantity can be looked up efficiently by indexing sequences. I ex- plain how we combine genome indexing with mathematical modeling to construct programs for estimating pairwise substitution rates, closest local homologues, and detecting recombination.

2013, November 29th, 15:00: Launch Symposium II

Dr. Steffen Möller, Institut für Neuro- und Bioinformatik, Universität zu Lübeck: Computational Biology @ Dermatology in Lübeck

Dr. Volkmar Sauerland, Institut für Informatik, CAU:
Introduction:Research Group Discrete Optimization

Dr. Abhishek Kumar, AG Kempken CAU:
Marine Fungi: Application of Next-generation genome and RNA-Seq based methods for the exploration of cancer drug and other antibacterial natural compounds.

Prof. Dr. Tal Dagan, IFAM, CAU:
Introduction: Genomic Microbiology Group
Institute of Microbiology

2013, November 8th, 15:00: Launch Symposium I

Tal Dagan, Genomic Microbiology, IFAM CAU: Introduction

Andre Franke, CAU: Institute of Clinical Molecular Biology

Georg Hemmrich, CAU: bioinformatics resources

Ingo Thomsen, IKMB: Git and Git Server

Bernhard Haubold, MPI Plön: Alignment-Free Phylogeny Reconstruction

Ke Xiao,Plant Breeding Institute, CAU: Identification of bolting genes of sugar beet by whole genome resequencing