September 2009 No. 5 NEWSLETTER OF FEMS
FEDERATION OF EUROPEAN MICROBIOLOGICAL SOCIETIES
Genomics and RNomics: Taking the code further Genomics. Post- genomics. Hot topics in contemporary research contributing vast amount of data and opening new avenues for science. In order to address a prime example of the role of genomics and post- genomics in contemporary science, FEMS Focus interviewed a major personality in the field – Professor Pascale Cossart. She is the head of the Bacteria-Cell Interactions Unit of the Pasteur Institute in Paris, France. Read on for her views on genomics and post-genomics with emphasis on RNomics/transcriptomics, on the connection between these fields and microbiology, and the challenges that these novel opportunities facilitate.
i
ased on her experience in the field and recent RNA-based work, Dr. Cossart and her co-workers explored the role of RNomics with focus on the bacterium Listeria monocytogenes. This bacterium has recently emerged as a multifaceted model in pathogenesis and is ubiquitous in the environment. It can lead to severe foodborne infections exceeding the fatality rates of Salmonella. How this bacterium switches from a saprophyte to a pathogen is largely unknown. By using tiling arrays and RNAs from wild-type and mutant bacteria grown in vitro, ex vivo and in vivo in animal models, Cossart has analysed the entire Listeria transcriptome. What is your most important finding in Listeria? We have provided the complete Listeria operon map and have discovered far more diverse types of RNAs than expected: in addition to 50 small RNAs, at least two of which are involved in virulence in mice, we have identified antisense RNAs covering several open-reading frames and long overlapping 5’ and 3’ untranslated regions (UTRs). Long transcripts of 1-5 kb were discovered opposite of operons. These transcripts which had not been “annotated” in the Listeria genome may be remnants of genes or play other roles which should be analyzed. We have also discovered that riboswitches can act as terminators for upstream genes. Interestingly, several non-coding RNAs absent in the non-pathogenic species Listeria innocua exhibit the same
Prof. Pascale Cossart (with permission)
expression patterns as the virulence genes. Taken together, our data unravel successive and coordinated global transcriptional changes during infection and point to previously unknown regulatory mechanisms in bacteria.
From the Editorial Team The limelight for this issue of FEMS Focus falls on a topic which currently generates enormous volumes of information and pushes science to immeasurable dimensions – genomics fuelling post-genomics such as RNomics. The rapid emergence of these new disciplines has provided researchers with a new looking-glass for scientists and non-scientists, microbiologists and nonmicrobiologists alike. Thereby, it is also providing a lot of new input, energizing the fields of life, earth and environmental sciences. With the boom of the genomics era, the decoding of complete human and microbial genome sequences is made possible. Suddenly, man and microbes are spelled differently, with As, Ts, Gs and Cs. With that come endless discoveries – diseases illustrated in detail, probable solutions and new promise for the future. The race is on to sequence the genome of every living organism on the planet, to understand them better and provide answers to questions. This is impacting on microbiology, decoding its vast diversity. Indeed, we have entered the post-genomics era. In this new era, every minute counts. Tone Tønjum, Editor & Chared Verschuur, Communications Assistant
Professor Dr. Pascale Cossart heads the Bacterial-Cell Interactions Unit at the Institut Pasteur in Paris, France. Her research focuses on the food pathogen Listeria monocytogenes and employs molecular and cellular biology, RNomics, bioinformatics and animal models to sort out pathogenesis. Dr. Cossart received her Ph.D. in Biochemistry from the University of Paris in 1977. Her postdoctoral research was conducted at the Institut Pasteur. In 1998, she received the Richard Lounsberry Prize and the L’Oréal/UNESCO Award for Women in Science Leadership. In year 2000, the Swedish Society of Medicine awarded her the 2000 Louis Pasteur Gold Medal. Dr. Cossart is an “Officier de la Légion d’honneur” of the French Legion of Honor, and a member of the French Academie des Sciences and the German Leopoldina. She received an Advanced Investigator ERC grant in 2008, is a member of EMBO and just this year, became a member of the National Academy of Sciences in the US.
What is characteristic of the interface between RNomics and microbiology? The field of RNomics, or transcriptomics, is exploding, demanding a lot of bioinformatics power in order to handle all the data. Actually, it’s time for transcriptomics in microbiology. The diversity of the field and the unique traits of all the microbial entities to be unraveled make this a most timely challenge. A significant opportunity in this context is to combine RNomics/transcriptomics with deep sequencing in order to sort out the nature 3’/5’ UTR markings. What are the future impacts of RNomics on microbiology and beyond? RNomics will clearly open new avenues in microbiology. In this context, it is important to realize that the RNAs/sDNAs we currently put on the tiling arrays is an average of the population under investigation. Each bacterium might have a different transcriptome than its neighbour, and only the differential expression profile might sort out what is really significant. We need to get to the level of single cell analysis in order to sort out the gene expression levels of the diversity of a population. In medicine, RNomics has identified and characterized disease-associated nonprotein coding RNAs for applications as diagnostic markers and therapeutic targets, and experimental and bioinformatics methods have been developed for this task. Meta-transcriptomics analyses show that these data sets can reveal new information about the diversity, taxonomic distribution and abundance of sRNAs in naturally occurring microbial communities, and indicate their involvement in environmentally relevant processes including carbon metabolism and nutrient acquisition.
The “-omics” of post-genomics Some typical professional concepts and terms in the wake of HTP genomics are: Transcriptomics: whole cell or tissue gene expression measurements by DNA microarrays or serial analysis of gene expression RNomics: defining the complete transcriptome (all gene transcripts) of a cell or cell population Proteomics: complete identification of proteins and protein expression patterns of a cell or tissue through two-dimensional gel electrophoresis and mass spectrometry or multi-dimensional protein identification techniques Metabolomics: identification and measurement of all small-molecules metabolites within a cell or tissue Glycomics: identification of the entirety of all carbohydrates in a cell or tissue.
Genomics facilitating post-genomics In recent years, major technological developments have enabled large-scale sequencing of complete genomes and other large-scale sequencing projects to be performed quickly outside sequencing factories and to a reasonable price. However, current methods can directly sequence only relatively short DNA fragments in a single reaction. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide. As of July 2009, 934 microbial genome sequences are completed, while 1870 are in progress (source: http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi).
Parallelized sequencing DNA molecules are physically bound to a surface, and sequenced in parallel. Sequencing by synthesis, like dye-termination electrophoretic sequencing, uses a DNA polymerase to determine the base sequence. Reversible terminator methods (used by Illumina and Helicos) use reversible versions of dye-terminators, adding one nucleotide at a time, de-
s
tect fluorescence at each position in real time, by repeated removal of the blocking group to allow polymerization of another nucleotide. Pyrosequencing (used by 454) also uses DNA polymerization, adding one nucleotide species at a time and detecting and quantifying the number of nucleotides added to a given location through the light emitted by the release of attached pyrophosphates. Sequencing by ligation This enzymatic sequencing method uses a DNA ligase to determine the target sequence. Used in the polony method and in the SOLiD technology, it uses a pool of all possible oligonucleotides of a fixed length, labeled according to the sequenced position. Oligonucleotides are annealed and ligated; the preferential ligation by DNA ligase for matching sequences results in a signal informative of the nucleotide at that position. Other sequencing technologies are currently being developed.
arge-scale sequencing aims at sequencing very long DNA pieces, such as whole chromosomes. Common approaches consist of cutting or shearing large DNA fragments into shorter fragments. The fragmented DNA is cloned into a DNA vector, and amplified in Escherichia coli. Short DNA fragments purified from individual bacterial colonies are individually sequenced and assembled electronically into one long, contiguous sequence. This method does not require any pre-existing information about the sequence of the DNA and is referred to as de novo sequencing. Gaps in the assembled sequence may be filled by primer walking. The different strategies have different tradeoffs in speed and accuracy; shotgun methods are often used for sequencing large genomes, but its assembly is complex and difficult, particularly with sequence repeats often causing gaps in genome assembly.
Next generation sequencing methods High-throughput sequencing The high demand for low-cost sequencing has driven the development of high-throughput sequencing technologies that parallelize the sequencing process, producing thousands or millions of sequences at once. All the new systems aim to avoid cloning of DNA fragments by employing different forms of parallel amplification and various forms of solid-phase approaches for signal detection. High-throughput (HTP) sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. In vitro clonal amplification Molecular detection methods are not yet sensitive enough for single molecule sequencing, so most approaches use an in vitro cloning step to amplify individual DNA molecules. Emulsion PCR isolates individual DNA molecules along with primer-coated beads in aqueous droplets within an oil phase. Polymerase chain reaction (PCR) then coats each bead with clonal copies of the DNA molecule followed by immobilization for later sequencing. Another method for in vitro clonal amplification is bridge PCR, where fragments are amplified upon primers
The 50S subunit of a bacterial ribosome with 90000 atoms (image: www.ballview.org) DNA Translocation, the complexity of postgenomics (image by Tremani)
attached to a solid surface. The single-molecule method developed by Stephen Quake’s laboratory (later commercialized by Helicos) skips this amplification step, directly fixing DNA molecules to a surface. The revolutionizing aspect of the next generation technologies compared to the original Sanger method for sequencing, is that amplification of the single-stranded DNA generated does not occur by cloning. This is advantageous, since not all DNA fragments do not all have equal cloning efficiency. Instead, amplification is based on emulsion-PCR. The DNA molecules to be sequenced are bound to monodisperse and hydrophilic beads and amplified in a water- and oil-mixture, where the beads are located in the water drops.
Major landmarks in DNA sequencing 1953 Discovery of the structure of the DNA double helix by Watson and Crick 1972 Development of recombinant DNA technology, which permits isolation of defined fragments of DNA; prior to this, the only accessible samples for sequencing were from bacteriophage or virus DNA 1975 The first complete DNA genome to be sequenced is that of bacteriophage φX174 1977 Allan Maxam and Walter Gilbert publish “DNA sequencing by chemical degradation”. Frederick Sanger, independently, publishes “DNA sequencing by enzymatic synthesis” 1980 Frederick Sanger and Walter Gilbert receive the Nobel Prize in Chemistry 1984 Medical Research Council scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb 1986 Leroy E. Hood’s laboratory at the California Institute of Technology and Smith announce the first semi-automated DNA sequencing machine 1987 Applied Biosystems markets first automated sequencing machine, the model ABI 370 1990 The U.S. National Institutes of Health (NIH) begins large-scale sequencing trials on Mycoplasma capricolum, Escherichia coli, Caenorhabditis elegans, and Saccharomyces cerevisiae 1995 Richard Mathies et al. publish dye-based sequencing 1995 The first complete bacterial genome sequence on Haemophilus influenzae is published by Fleischmann, et al. 2004 Large-scale pyrosequencing is introduced July 2009: 2800 bacterial complete genome sequences are available on-line, more or less completely annotated Source: wikipedia
Genomics in industry – the example of pyrosequencing 454 Life Sciences, a Roche biotechnology company, is a company specializing in high-throughput DNA sequencing using a novel massively parallel sequencing-by-synthesis approach. 454 has experienced rapid growth since its acquisition by Roche Diagnostics and release of the GS20 sequencing machine in 2005, the first next-generation DNA sequencer on the market. The Genome Sequencer FLX instrument was released in 2007. In 2008, 454 Sequencing launched the GS FLX Titanium series reagents for use on the current instrument, with the ability to sequence 400-600 million base pairs with 400-500 base pair read lengths. With its high accuracy, low cost, and long reads, many researchers have migrated away from traditional Sanger capillary sequencing instruments and toward the 454 Sequencing platform for a variety of genome projects. 454 Life sciences was founded by Jonathan Rothberg, and the underlying technology is based on pyrosequencing and was conceived while he was on paternity leave and wanted a way to sequence the genome of his new born son who had been placed in new born intensive care.
Bioinformatics for HTP genomics The amounts of raw sequence data that HTP Genome Sequencers generate are vast, with over 100 million bases per instrument per run. In addition, the reads are short, representing new challenges in bioinformatics. For HTP genomics studies, this presents a very large data set with several challenges that need to be addressed. In fact, bioinformatics analysis represents the new bottle-neck to unleash the optimal output of next-generation technologies.
Future developments / Demands and incentives driving technology
j
urrent demands in HTP sequencing are whipping technology forward. The four major HTP sequence technology platforms available today are 454, Solexa/Illumina, ABI (SOLiD) and Helicos. Among these, 454 yields the longer sequence reads. On the other hand, the latter platforms generate a greater abundance of fragments read, making them ideal for re-sequencing/deep-sequencing, however, less useful for de novo sequencing. In addition, stakeholder are offering incentives to fuel technology development. By the next few years, a next-next generation platform is expected to be invented, based on single-molecule-reads (without amplification of DNA). This is the way to go to achieve the ultimate goal: to be able to sequence the human genome in a couple of days for 1000 Euro. If the “1000 Euro”-genome will be reality, this will represent a major change in genome-based medicine, for instance, the mutation rate in various somatic tissues and bacteria can be monitored any time.
Post-genomics funding opportunities
JOIN US AT
Distinguished Visiting Scientist Stipend programme, Netherlands Genomics Initiative Closing date: 1 April 2010 Budget: 0,5 million euro
4th Congress of European Microbiologists Geneva, Switzerland June 26-30, 2011
Frontiers of Functional Genomics, European Science Foundation Calls for Proposals Call for Science Meeting proposals - deadline 25th September 2009 17:00 CET Call for Short Visit and Exchange Grant applications - deadline 25th September 2009 17:00 CET
Advancing Knowledge on Microbes
www.kenes.com/fems-microbiology
Post-genomics events nes eadli d s t n ants S gra ce G9r FEM n a d 00 tten ing A er 1, 2 0 MeetSeptembril 1, 201 p ship and A ellow F d e 009 vanc S Ad r 1, 2 hip FEM Octobe llows e F h earc 2009 S Res r 15, FEMDecembe at rantsceding the. G g n e c i r Meetry year p takes pla ting f eve o h 1 h the mee c r ward a M whic sen Ans until n e J in EMS licatio . The Fn for appr 30, 2009 e be is opN ovem
5th International DNA Sampling Conference 16 – 18 September 2009, Alberta, Canada The Genomics of Common Diseases 2009 September 23-26, 2009, Cambridge, UK Mapping the Genomic Era Measurements and Meanings, 07 – 09 October 2009, Cardiff, UK Life Sciences Momentum 2009 20 October 2009, World Forum, The Hague, The Netherlands Genome Informatics October 27 - 30, 2009, Cold Spring Harbor, New York IPG (Integrative Post-Genomics) Lyon’s International Multidisciplinary Meeting on Post-Genomics, 18-20 November 2009, Lyon, France Gene 2009 1-7 December 2009, Foshan, China
Recent publication highlights on microbial RNomics Camejo et al. In vivo transcriptional profiling of Listeria monocytogenes and mutagenesis identify new virulence factors involved in infection. PLoS Pathogen 5(5):e1000449, 2009. Toledo-Arana et al. The Listeria transcriptional landscape from saprophytism to virulence. Nature 459:950-6, 2009. Shi et al. Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column. Nature 459(7244):266-9, 2009. Liu JM, Livny J, Lawrence MS, Kimball MD, Waldor MK, Camilli A. Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing. Nucleic Acids Res 37:e46, 2009.
Post-genomics links and resources http://www.genomics.nl/ http://genomicsnetwork.ac.uk/ http://www.genome.gov/ http://www.cdc.gov/genomics/ http://www.nerc.ac.uk/research/programmes/proteomics/ http://www.genomenewsnetwork.org/
Register as a FEMS Affiliate! www.fems-microbiology.org The FEMS Focus is published by the FEMS Central Office. Whom to contact? Prof. Dr. Tone Tønjum (tone.tonjum@medisin.uio.no). Design & production: ilumina@ilumina.si FEMS is a registered charity (no. 1072117) and also a company limited by guarantee (no. 3565643). © 2009 Federation of European Microbiological Societies
Thematic issue now available Online subscription to the full set of FEMS journals for only 175 Euro
FEMS Central Office Keverling Buismanweg 4 2628 CL Delft The Netherlands Tel: +31-15-269 3920 Fax: +31-15-269 3921 E-mail: fems@fems-microbiology.org