Hospital devices Medical devices

The online source of technology & product information for life scientist & bioentrepreneurs

Transcriptome profiling by high-throughput sequencing zooming in on the process of transcription

Figure 1. Schematic general overview of the procedures involved in high-throughput parallel sequencing of DNA. For RNA-seq an intermediate step of cDNA generation is required.
Figure 2. Mapping of exons in eukaryotic genes using RNA-seq. The histogram is a representation of the number of reads per nucleotide position. cDNA is generated from post-splicing mRNA, and hence contains only the exons, which can be mapped onto the genome sequence.
Figure 3. Schematic overview of the use of microbial strand-specific RNA-seq for determining whether genes are transcribed as monocistronic RNA (blue), multicistronic RNA (red). The green histogram is located in an area without genes, and is shown as an example of how a bacterial sRNA can be identified using RNA-seq.

The recent developments in high-throughput parallel sequencing techniques (next or second generation sequencing) are now transforming the fields of functional genomics and transcriptomics. Using the sequencing of cDNA libraries created from cellular (m)RNA, scientists can now investigate the process of transcription at an unparalleled resolution. The technique has already demonstrated that both eukaryotic and prokaryotic organisms transcribe a wealth of coding and non-coding RNA species. This ‘zooming in’ on the process of transcription is changing paradigms in cell biology, as the secrets behind transcription and post-transcriptional regulation are slowly being unveiled.
by Arnoud H.M. van Vliet

It is almost fifteen years ago since the Institute of Genome Research (TIGR) published the first complete genome sequence. The decoding of the genome of the pathogenic bacterium Haemophilus influenzae [1] laid the foundations for the field of functional genomics. At that time, sequencing required the generation of plasmid libraries, which were then subjected to Sanger dideoxy sequencing on parallel running, gel-based (and later capillary-based) systems. Both techniques were labour-intensive and expensive to run, however, the cost was outweighed by the development of many new experimental and bioinformatic technologies. Such technologies coupled the rapid increase in availability of complete and incomplete genome sequences to novel approaches like the microarray-based comparison of gene expression [2]. Since then, many genome sequences have been determined. In October 2009, there were almost 1,000 complete microbial genomes available online, and also a large number of (partial or complete) eukaryotic genomes.

NextGen sequencing technologies
Since the early days of genome sequencing, a lot of progress has been made, culminating in what is generally known as Next Generation sequencing, aka NextGen sequencing [3] (although in view of the anticipated new developments, a better name is probably Second Generation sequencing). NextGen sequencing is based on the parallel and high-throughput sequencing of DNA fragments in a single machine, without the need for cloning or separate reactions [Figure 1]. The first sequencing platform to be available commercially was the 454 pyrosequencing technology, which gives relatively long reads of up to 450 nucleotides with a running time measured in hours. The two major commercial competitors are the Illumina GA platform, which is capable of reads of approximately 75 nucleotides (with a running time measured in days), and the AB SOLiD platform, which is capable of reads of 35–50 nucleotides (and also runs for several days). While the running time of the Illumina GA and AB SOLiD is much longer, both systems give much larger datasets at a similar cost, and also have a higher fidelity of sequencing. Other sequencing platforms will become available soon, such as the Helicos platform [4]. Further explanation of the details and application of these NextGen sequencing technologies can be found in recent reviews [3, 5].

Functional genomics
The primary focus of sequencing applications has long been on determining complete genome sequences (for example the $1000 genome challenge). Such work has allowed the sequencing of many small microbial and large eukaryotic and mammalian genomes, including the human genome. However, the presence of sequences does not give any information on their usage, and hence it is important to couple genomics data to post-genomic information, such as the levels of RNA (transcriptomics), protein (proteomics), protein modification, i.e. glycosylation (glycomics) and metabolites (metabolomics). All these techniques together are often referred to as ‘functional genomics’, as they provide a functional insight into the information contained in genome sequences. In this review, the focus will be on the analysis of the transcriptome, i.e. the full complement of sequences transcribed in a
cell or microbe.

To date, the most common way to investigate transcription is by using microarrays. DNA fragments are immobilised or synthesised on a slide, using either microarray robots to populate slides with oligonucleotides or PCR fragments, or the more recently developed on-chip DNA synthesis. The probes are placed on a grid and are long enough to warrant specificity when washed at sufficient stringency. The RNA samples to be compared are converted into fluorescently labelled cDNA, and these cDNA samples are hybridised to the probes on the slide, either competitively (Type I) or against a common DNA or cDNA standard (Type II) [2]. The fluorescence is measured with a high-resolution scanner, and, using complex normalisation techniques, it is possible to give ratios of RNA levels. Due to their relative ease of use and flexibility, microarrays have been instrumental in our understanding of transcription in eukaryotes and prokaryotes, but even the high density tiled microarrays currently available lack the resolution to study transcription at the nucleotide level [6]. In addition, microarrays have potential weaknesses, such as cross- or false-hybridisation and signal saturation. Many of these problems can be avoided by using NextGen sequencing of cDNA libraries [7], a technique which is now known as RNA-sequencing or more commonly RNA-seq [8].

RNA-seq analysis: the method
The principle behind RNA-seq is relatively simple: RNA is purified from eukaryotic cells or microbes, and converted into cDNA without cloning procedures. The cDNA library thus constructed is subjected to sequencing using either 454, Illumina or SOLiD sequencing, and the resulting sequencing reads are computationally mapped to the genome sequence. Subsequently the number of reads is determined per nucleotide position, and this score is visualised in a histogram. The histograms can then be interpreted, as the coverage per nucleotide position correlates to the relative level of each RNA molecule present in the RNA preparation. As with every novel technique, there are challenges that need to be met. Depending on the RNA purification method, cDNA generation and sequencing approach chosen, coverage may vary, and artifacts may be introduced. One of the problems with the currently used approaches is amplification steps are required to get sufficient material for sequencing, and such amplification steps can distort the balance between abundant and rare RNA species. Also, size selection, positive and negative hybridisation to poly-T or tRNA/rRN-specific oligonucleotides are often used to separate mRNA from other RNA species, and this may result in specific RNAs being over- or under-represented, or even being absent. Finally, the large datasets require new bioinformatic analyses and improved software packages, and there is a considerable risk of the occurrence of “data deluge”. However, despite these potential disadvantages, the RNA-seq methods are opening up new avenues for the investigation of the process of transcription and (post) transcriptional regulation. The next generation (Third Generation) sequencing platforms are being actively developed, and may allow for RNA-seq analysis without
amplification prior to sequencing [4].

RNA-seq analysis of transcription and gene splicing in eukaryotes
The RNA-seq technology was pioneered with eukaryotic organisms, which is understandable in view of the relative ease of working with eukaryotic RNA. In most cells, the majority (usually 60-90%) of RNA species consist of ribosomal RNA (rRNA) and transfer RNA (tRNA), and hence there is a preference to remove these during RNA preparation. In eukaryotes, mRNA is usually polyadenylated at the 3’ end, and hence can be isolated from the total RNA pool via selection on poly-T columns, thus removing rRNA, tRNA and other non-polyadenylated RNA species. The poly-A tail can subsequently be employed for cDNA generation for subsequent high-throughput sequencing analysis. This approach has been successfully used in several organisms, such as yeast, plants and mammalian cells [8], and has allowed the identification of alternative splicing events and accurate exon identification [Figure 2]. Adaptations of the RNA-seq technique also allow for the identification of microRNAs and other non-coding RNAs [9], and have been used for analysis of transcription in single cells and the genome-wide determination of transcription start sites using a modification of the 5’ RACE technology.

RNA-seq analysis of transcription levels and novel non-coding RNAs in microbes
Working with bacterial RNA has always been challenging, since bacterial mRNA lacks a poly-A tail, and hence cannot be selectively isolated from other RNA sources. Furthermore, bacterial RNA preparations usually contain up to 80% ribosomal and transfer RNA, and the mRNA often has a very short half-life. These challenges are now being met, using techniques adapted for use with microbial RNA. Size selection of RNA and RNase digestion can be used to remove rRNA and/or tRNA from total RNA fractions, or rRNA and tRNA levels can be depleted using capture via specific oligonucleotides. The lack of a poly-A tail to the RNA can be overcome by in vitro polyadenylation, or alternatively specific or random primers can be used to generate cDNA [7]. To date, there are relatively few publications on the use of RNA-seq in microbes, but this is likely to change with the developments and adaptation described above.

The major use of microbial RNA-seq has been for the identification of transcript levels and transcript boundaries, to determine whether genes are transcribed on mono- or multi-cistronic mRNAs, and for the identification of non-coding small RNA species (ncRNA or sRNA) [Figure 3]. These sRNAs are the microbial equivalent of microRNAs in eukaryotes, and it is now becoming clear that they play important roles in post-transcriptional and transcriptional regulation in microbes, by either interacting with mRNA or with proteins like RNA polymerase or regulatory proteins. An example is the recent survey of transcription in the important human pathogen Salmonella enterica serovar Typhi (S. Typhi) [10], where Illumina sequencing was used to sequence cDNA derived from total RNA depleted of 16S and 23S rRNA. In this study it was shown that genomic DNA removal by DNase treatment of the RNA fraction significantly improves downstream sequencing applications. The RNA-seq information was subsequently used to correct the annotation of the S. typhi genome sequence, and the identification of 40 novel non-coding RNA sequences [10]. A targeted approach was used to specifically identify sRNAs in Vibrio cholerae by size selecting RNA, followed by removal of tRNA and 5S RNA using RNaseH [11]. That dataset contained both the 20 known V. cholerae sRNAs, as well as a multitude of putative novel sRNAs and antisense RNAs. As indicated, this is only the beginning of the application of RNA-seq to microbes, and we can expect many novel applications to become available in the near future, such as the sequencing of immunoprecipitated DNA (IP-seq, as an alternative for ChIP-on-chip experiments).

Concluding remarks
RNA was always seen as a relatively inert molecule, either encoding proteins or assisting in the production of proteins. This view of RNA has radically changed in the last decade due to the discovery of catalytic and regulatory RNAs, such as the now well-known microRNAs used for RNA interference (RNAi). We are now also learning that microbes have functionally similar, though structurally different, pathways for such regulatory activities. When combined with the latest developments in microarray technology [6], we now can get an increasingly detailed view on the processes underlying (post) transcriptional regulation. Such information will undoubtedly show us the varied approaches used by living organisms to survive in their respective environments.

Acknowledgements
Research at the author’s laboratory is supported by the BBSRC Institute Strategic Programme Grant to the Institute of Food Research.

References
1. Fleischmann RD et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 1995; 269: 496-512.
2. Hinton JC et al. Benefits and pitfalls of using microarrays to monitor bacterial gene expression during infection. Curr Opin Microbiol 2004; 7: 277-282.
3. MacLean D et al. Application of ‘next-generation’ sequencing technologies to microbial genetics. Nat Rev Microbiol 2009; 7: 287-296.
4. Ozsolak F et al. Direct RNA sequencing. Nature 2009; 461: 814-818.
5. Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnol 2008; 26: 1135-1145.
6. Toledo-Arana A et al. The Listeria transcriptional landscape from saprophytism to virulence. Nature 2009; 459: 950-956.
7. van Vliet AHM. Next generation sequencing of microbial transcriptomes: challenges and opportunities. FEMS Microbiol Lett 2009; Epub ahead of print, 21 August 2009.
8. Wang Z et al. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10: 57-63.
9. Moxon S et al. Deep sequencing of tomato short RNAs identifies microRNAs targeting genes involved in fruit ripening. Genome Res 2008; 18: 1602-1609.
10. Perkins TT et al. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 2009; 5: e1000569.
11. Liu JM et al. Experimental discovery of sRNAs in Vibrio cholerae by direct cloning, 5S/tRNA depletion and parallel sequencing. Nucleic Acids Res 2009; 37: e46.

The author
Arnoud H.M. van Vliet
Institute of Food Research
Foodborne Bacterial Pathogens Programme
Colney Lane
Norwich NR4 7UA
UK
email: arnoud.vanvliet@bbsrc.ac.uk


10 

Contact form

Get in touch directly with the above supplier

Pre-fill this form automatically in My BTI

Last name:*
Firstname:*
Company/Organisation:*
Job title:*
City:*
Postal Code:*
Country:*
Email:*
Email (confirm):*
Your email address will not be communicated to any third party other than the above supplier for the purpose of fulfilling this enquiry. For more information: BTI's privacy policy
Message:
 
 
Send product data
Send price data
Send dealer data
 
 
 
  I would like to receive BTI's electronic publications

Sponsored links: