current microbiology books

SAGE: Current Technologies and Applications Chapter Abstracts

How to buy this book


Chapter 1
Evaluation of SAGE Tags for Transcriptome Study

Erin D. Pleasance and Steven J.M. Jones

Abstract
Serial analysis of gene expression, or SAGE, is a powerful technique that provides absolute measures of gene expression based on sequencing of mRNA-derived fragments, or SAGE tags. Here we evaluate the issues surrounding the use of SAGE for transcriptome analysis. Our study shows that (1) The SAGE procedure is subject to potential inaccuracies, such as those arising from PCR biases and sequence errors; (2) Comparison of SAGE to other methods such as RT-PCR or hybridization-based procedures evaluates its overall accuracy in measuring transcript abundance; (3) The issue of assigning SAGE tags to genes is a significant one in SAGE analysis. A number of different methods have been developed that make use of expressed sequences, genomic sequence, and gene predictions for this purpose. In particular, SAGE tag length has an important effect on unique gene identification, and different SAGE procedures produce tags of different length; (4) The identification of SAGE tags is also complicated by biological variation such as polymorphisms, alternative splicing and polyadenylation; (5) Additionally, there is a small subset of genes that SAGE is not able to profile, due to the restriction enzymes used; (6) Finally, as sequenced tags often do not match known genes, SAGE has a particularly important role to play in novel gene identification.


Chapter 2
CAGE: A Novel Approach for Rapid Gene Discovery and Gene Network Identification

Matthias Harbers, and Piero Carninci

Abstract
With the fast sequencing of many genomes including those of humans and mouse, genomic sciences are now facing the need for novel approaches to understand the utilization of genomic information and its regulation in larger gene networks. Thus the focus of genomic research has moved on from genome sequencing to efforts analyzing the transcriptome. The comprehensive characterization of the transcriptome comprising tens-of-thousands of genes regulated at various levels and differentially expressed on the cellular level is a challenging task. Towards this goal, we developed the CAGE (Cap Analysis Gene Expression) for expression profiling and promoter identification.


Chapter 3
SuperSAGE: A Potent Transcriptome Tool for Eukaryotic Organisms

Hideo Matsumura, Stefanie Reich, Monika Reuter, Detlev H. Krüger, Peter Winter, Günter Kahl and Ryohei Terauchi

Abstract
To improve the efficiency of tag-to-gene identification of the conventional SAGE procedure, while maintaining its power for accurate and quantitative gene expression analysis, we used the Type III restriction endonuclease EcoP15I to isolate tags of 26 bp in length from defined positions of cDNAs. We coin this substantially improved variant of the conventional SAGE procedure "SuperSAGE". The resulting 26 bp "tags" allow a precise identification of the gene of origin and at the same time an accurate quantitative gene expression analysis. SuperSAGE will be especially useful for transcriptome profiling of two or more interacting organisms like hosts and pathogens (or parasites), and of organisms, for which no DNA database is available. Furthermore, SuperSAGE tags can be directly spotted onto microarrays and employed as RNAi for gene function analysis (functional genomics).


Chapter 4
An Improved Protocol for SAGE Tag-to-gene Allocation

Ute Kannbley, Jason R. Potas, and George Trendelenburg

Abstract
Serial analysis of gene expression (SAGE) yields digital information on transcript abundance by the use of short sequence fragments (tags). Because SAGE does not require a priori knowledge of the expressed genes in the starting material, SAGE is valuable for gene discovery. Unfortunately, correct tag-to-gene-allocation after SAGE remains difficult or even impossible when the short sequence of the tag corresponds to more than one gene in the reference database or when novel, yet uncloned genes are detected. To overcome this problem, longer fragments of the corresponding transcripts have to be isolated. The following chapter gives a brief overview of existing technologies and describes an improved protocol for the accurate identification of tag-corresponding genes. It relies on the isolation of 3'-terminal cDNA restriction fragments by the use of paramagnetic streptavidin beads, and the ligation of linkers prior to the amplification step. Because the principle is related to rapid amplification of cDNA ends (RACE)-PCR, this approach was termed SARA-PCR (SA for SAGE; RA for RACE). The success of this protocol is attributed to additional information encoded in each SAGE tag; its 3'-terminal location to the last NlaIII restriction site in the cDNA. In contrast to previous protocols, stringent PCR conditions that enable higher specificity can be applied because of the length of the specific primers, which are composed of linker- and tag-specific sequences. Additionally, the protocol yields quantitative information, which can be used for further expression analysis of specific SAGE tags.


Chapter 5
Using HPLC to Purify Ditags for SAGE Library Construction

Mette Damgaard Nielsen and Knud Josefsen

Abstract
Many SAGE workers have encountered difficulties generating concatemers of a satisfactory length. We propose that many of these problems can be attributed to purification of the 26 bp ditags by polyacrylamide gel electrophoresis (PAGE). Low yields, gel contaminants, potential exposure to degrading enzymes during handling and lengthy separation all disfavour the method. We have developed a method for the purification of 26 bp ditags based on reverse-phase high performance liquid chromatography (HPLC), using polystyrene/divinylbenzene (PS/DVB) columns and triethylammonium acetate buffer (TEAA) with acetonitrile (ACN) as the mobile phase. The whole process is fast and gives excellent results. Ditags purified by HPLC ligate more efficiently to yield high molecular weight concatemers leading to long-insert clones. The method substantially facilitates the construction of SAGE libraries.


Chapter 6
Web Tools for Statistical Analysis of SAGE Data

Chiara Romualdi and Stefania Bortoluzzi

Abstract
Serial Analysis of Gene Expression (SAGE) is an experimental technique for genome-wide analysis of gene expression. SAGE expression data are schematized as a n x m matrices of values (tag counts), with n case rows, and m values columns, representing different experimental conditions. Differential expression in SAGE data can be compared by test statistics suitable for frequency-like data, while investigation of possible data structure is allowed by multivariate statistical techniques (classification and dimension reduction techniques). In this chapter we briefly describe and compare statistical methodologies proposed for SAGE data analyses, focusing in particular on available web resources. Dedicated web pages and tools for SAGE data analysis and data retrieval have been implemented by the Johns Hopkins Oncology Center (SAGEnet), National Center of Biotechnology Information (SAGEMap) and Cancer Genome Anatomy Project (SAGEgenie). On the other hand, independent research groups have proposed web tools for more complete SAGE data analyses (IDEG6, USAGE).


Chapter 7
Conversion of MPSS Orphan Tags into 3' ESTs

Anamaria A. Camargo, Ana Paula M. Silva, Dirce M. Cararro, Jianjun Chen, and San Ming Wang

Abstract
Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, like SAGE, relies on the production of short tags proximal to mRNA polyadenylation sites. However, due to a combination of in vitro cloning of cDNA molecules with non gel-based high throughput signature sequencing, a single MPSS experiment can generate over 107 tags. MPSS thus provides an unprecedented depth of analysis of the human transcriptome. A significant fraction of MPSS tags cannot be assigned to known transcripts and are likely to be derived from novel transcripts and transcript variants expressed at a very low level (~ 1 copy per cell). In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI protocol (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS tags into their corresponding 3' cDNA fragments. The extended 3' cDNAs provide higher specificity than MPSS tags for the characterization of their corresponding transcripts and to distinguish between the real tags and artefactual tags. Our results indicate that a significant fraction of MPSS tags corresponds to bona fide transcripts, including those not yet identified in the human genome; those generated from alternatively spliced transcripts and those from polymorphic tag sequences. The whole process is rapid, highly efficient and will certainly accelerate the definition of the complete catalog of human transcripts.


Chapter 8
Mathematical Analysis and Modeling of the SAGE Transcriptome

Vladimir A. Kuznetsov

Abstract
In this work we show how statistical analysis and mathematical modeling can improve SAGE data analysis and estimate the number of expressed genes and their underlying gene expression level probability function (GELPF) in a cellular transcriptome. We present evidence that the underlying functional form of GELPF is almost invariant in the different cell types of a multi-cellular organism and in different organisms. We describe a Pareto-like skewed distribution function and use it to derive a probabilistic model of the growth of a population (e.g., the number of transcripts) containing many distinct classes (e.g., transcripts encoded distinct genes) in a collection (e.g., SAGE library) as the sample size increases. The statistical modeling of a SAGE library growth shows how we can compare differentially expressed genes based on SAGE data. The model exhibits predictive power even when the SAGE database is essentially incomplete, contains erroneous tags and ambiguity in tag-to-genes assignments. We develop a procedure for removing major erroneous tags and correcting redundancies in SAGE data, which allows us to overcome the limitation of using SAGE to identify the underlying GELPFs and to estimate the numbers of distinct transcripts in the yeast and human transcriptomes. The results of this work suggest that the "statistical mechanics" of the gene expression process in cells follows the simple probabilistic rules, which are different from the rules of the so-called "scale-free network" models.


Chapter 9
Statistical Analysis of SAGE Data

Michael Man

Abstract
Statistical analysis is an integral part of a successful SAGE study. A simple comparison of two libraries and a more complicated comparison of two or more groups with multiple libraries in each group are discussed with examples and program code to run the analyses. Issues in planning a SAGE study and related software tools are also considered.


Chapter 10
Studies of Plant Gene Expression Using SAGE

W. Walter Lorenz and Jeffrey F.D. Dean

Abstract
SAGE has so far seen fairly limited use in plant studies, but those published include work with several of the most important model systems used for plant research -- Arabidopsis, rice, maize, tobacco and pine. The technique has been used to identify genes involved in pathogen resistance, environmental and nutritional stress responses, and the metabolism of toxic compounds. SAGE has also been used to identify genes whose promoters respond to specific transcription factors. Plant researchers using SAGE were among the first to highlight the abundance of polyadenylated antisense transcripts in eukaryotic transcriptomes, and improved LongSAGE techniques developed for plant studies hold promise for making the technique more useful for organisms that do not have abundant genome sequence available. Increasing interest in the plant community for using SAGE in genome annotation and gene discovery suggests more widespread adoption and application of this technique for plant research.


Chapter 11
SAGE Analysis of Embryonic Stem Cells

Kirill V. Tarasov, Sergey V. Anisimov, Yelena S. Tarasova, Daniel R. Riordon, Antoine Younes, Michael D. Stern, Anna M. Wobus and Kenneth R. Boheler

Abstract
Stem cells represent natural units of embryonic development and tissue regeneration. Embryonic stem (ES) cells, in particular, possess both a nearly unlimited capacity for self-renewal and a developmental potential to differentiate, in vitro and in vivo, into virtually any type of cell. The ability to coax ES cells in vitro to defined lineages is currently limited by a lack of knowledge about the mechanisms involved in stem cell fate decisions. The identification of signals that regulate pluripotentiality and self-renewal is therefore fundamental to our understanding of stem cell biology and differentiation. One method to identify these signals is through Serial Analysis of Gene Expression (SAGE), a functional genomics technique that can be used for global profiling of gene transcripts. We have employed SAGE to analyze both undifferentiated stem cells and those undergoing early steps of differentiation to cardiomyocytes. Through these analyses, we have begun the identification of molecular signals responsible for pluripotentiality and early steps of differentiation.


Chapter 12
Determining Transcriptome Differences Between Brain Territories

Michel de Chaldée, Marie-Claude Gaillard, and Jean-Marc Elalouf

Abstract
The mammalian brain is subdivided into different regions defined by anatomical and functional features that are expected to rely on unique gene expression patterns. In order to address the diversity of brain functions, we compared the transcriptomes of various adult mouse brain structures using a microadaptation of the SAGE method. We recently published our results on the striatum, nucleus accumbens, somatosensory cortex and on the whole brain. Quantitative expression data for over 11,000 transcripts in each region were deposited in the GEO repository, and a procedure was set up to find region-specific transcripts. This work demonstrates the relevance of our approach for the discovery of novel molecular markers. Here we provide a detailed commentary on the design of our project and guidelines to extract information from our data. This should be helpful for both the investigators interested in searching our data, and those planning to initiate a comparable project.


Chapter 13
Identifying Selection Signatures in Mammalian Genes Through the Analysis of Patterns of Gene Expression

Araxi O. Urrutia and Laurence D. Hurst

Abstract
SAGE and Chip array large-scale expression profile datasets have opened new opportunities for the study of processes shaping gene evolution and particularly for detecting selection signals related to protein synthesis. However, both methods have biases and sources of error. Here we discuss the relative advantages of the different forms of data and provide suggestions as to how best to curate the datasets so as to minimise biases. We then review the recent advances in the identification of selection signals in mammalian genes through whole genome and expression profile analyses. Contrary to previously thought, mammalian genes show signs of selection for optimisation of protein synthesis in a variety of parameters, including base codon bias, gene length, intron content and even gene position along chromosomes.


Chapter 14
SAGE Identifies Transcripts Involved in Enhanced Performance of Endurance Athlete's Muscle

Jonny St-Amand, Hiroaki Tanakam, Naoko Shono, Eric E. Snyder, Munehiro Shindo and Mayumi Yoshioka

Abstract
Physical exercise produces several adaptive changes in skeletal muscle in order to enhance physical performance. However, the molecular mechanisms of these effects are poorly understood. By using human muscle biopsies and the serial analysis of gene expression (SAGE) method, we identify the transcripts involved in enhanced-endurance performance in the endurance athlete's muscle. Endurance-trained individuals had higher expression of transcripts involved in the molecular chaperone and oxidative pathways whereas they also had lower expression of genes involving in glycogenolysis and glycolysis pathways. Moreover, twenty-three novel transcripts with elevated expression in the endurance-trained muscle were identified. The molecular characteristics of human skeletal muscle showed that, regardless physical activity levels attained, the genes that were highly expressed were those encoding proteins of the contractile apparatus and involved in energy metabolism. The current study analyses the most expressed genes and therefore allows a better understanding of global muscle characteristics in normal and endurance trained individuals. Moreover, the current data suggest novel candidate genes that may be responsible for the enhanced-endurance performance.


Chapter 15
Using SAGE to Analyze Signaling and Development in Drosophila

Heinrich Jasper

Abstract
Drosophila melanogaster has been studied extensively to gain insight into the genetic basis of biological processes ranging from development and signal transduction, to cell proliferation, growth and aging. SAGE has been successfully applied to explore transcriptional regulation through signal transduction during Drosophila development. It has emerged as the method of choice for accurate expression profiling of minute amounts of tissue. Here, I review recent SAGE studies in Drosophila and discuss the accuracy, efficiency and limitations of transcript identification by SAGE in this model organism. Finally, I present a slightly modified MicroSAGE protocol that has been used successfully for transcriptome analysis of purified cell population from the developing fly.


Chapter 16
Use of SAGE Technology to Reveal Changes in Gene Expression in Arabidopsis Undergoing Cold Stress

Dong-Hee Lee, Yun-Jung Byun, Ji-Yeon Lee, and Sun-Hee Jung

Abstract
We have characterized the global gene expression patterns of Arabidopsis leaves and pollen using SAGE. A total of 85,226 SAGE tags were identified. The transcript profiles of leaves and pollen reflect accurately the characteristics of leaves as a principal site of photosynthesis and pollen as a reproductive organ, respectively. Functional analysis of annotated tags indicated that a significant proportion of the genes expressed in normal leaves were involved in energy and metabolism, especially in photosynthesis. In contrast, genes involved in cellular biogenesis such as polygalacturonase, pectate lyase and pectin methyesterase comprise more than 40% of the total transcripts in pollen. However, genes involved in energy and protein synthesis, that are prevalent in leaves, were expressed at a relatively low level. Interestingly, the number of unique tags in pollen was low compared to the SAGE library of the leaf constructed on a similar scale. To systematically analyze differential gene expression profiles under cold stress, SAGE tag library from cold-treated leaves at 0 degrees C for 72 hr was constructed and analyzed. A comparison of the tags derived from the cold-treated leaves with those identified in the normal leaves revealed 272 differentially expressed genes (P < 0.01): 82 genes were highly expressed in the normal leaves and 190 genes were highly expressed in the cold-treated leaves. After cold stress, in general, many of the genes involved in cell rescue/defense/cell death/aging, protein synthesis, metabolism, transport facilitation, and protein destination were induced. They included various COR genes, lipid transfer protein genes, alcohol dehydrogenase, b-amylase and many novel genes. By comparison, down-regulated genes were mostly photosynthesis-related genes involved in energy metabolism. While the expression of many genes were altered in leaves by cold treatment, the expression level of the majority of transcripts in pollen was unaffected by cold treatment. Interestingly, many genes thought to be responsible for cold acclimation such as COR, lipid transfer protein and b-amylase, that are highly induced in leaves, were only expressed at their normal level or weakly induced in the pollen, suggesting that poor accumulation of proteins that play a role in stress tolerance may explain why Arabidopsis pollen is cold sensitive. Altogether, the data presented here will provide useful information for understanding the tissue specific gene expression, and the mechanism of the freezing tolerance in plants.


Chapter 17
SAGE Analysis of Age- and Sex-Associated Changes in Cardiac Gene Expression

Sergey V. Anisimov, Kirill V. Tarasov, Edward G. Lakatta and Kenneth R. Boheler

Abstract
Aging and age-related diseases are associated with altered patterns of gene expression, involving quantitative and qualitative changes in the abundance of gene transcripts. A complete and simultaneous analysis of transcript abundance should therefore lead to important insights into the transcriptional mechanisms underlying the aging process. Serial analysis of gene expression (SAGE) allows rapid, large-scale expression profiling, which provides information about the dynamics of total gene expression with age, and can be employed to identify candidate genes that may serve as diagnostic and prognostic markers in age-associated cardiac diseases. The accompanying gene predictions from high-throughput gene expression profiling provide a starting point to understand the function, the complexity of interactions, and the role of genes in promoting cellular/organismal phenotypes during senescence and in response to disease. We have employed this high-throughput gene expression profiling technique to establish a reference dataset from C57Bl/6 mouse heart. We demonstrate, by comparisons with microarrays, EST libraries and PCR-based techniques that this catalog is quantitatively representative of the female mouse cardiac transcriptome, and that it is a valuable resource for comparative analyses. More recently, we generated SAGE libraries from young and old adult male mouse hearts. Although preliminary, our comparisons between males and females demonstrate that sex (gender) is a major contributor to altered gene expression in mice, indicating that it is critical to account for biological diversity (e.g., sex) when evaluating transcriptomes among mouse tissues with aging.


Chapter 18
Reverse Transcriptome: Identification of Novel Transcripts Through Novel SAGE Tags

San Ming Wang

Abstract
In eukaryotic genomes, the minority of genes are expressed at high levels contributing to the majority of total transcripts, whereas the majority of genes are expressed at low levels contributing only to the minority of the total transcripts. The quantitative differences among the transcripts cover over six orders of magnitudes. Therefore, it is a challenge to identify the full-set of transcripts, particularly for the low abundant transcripts, expressed in a eukaryotic genome. While the EST approach has reached its limit of sensitivity for identifying the lower abundant transcripts, SAGE, with its higher sensitivity over EST, has collected a large number of novel SAGE tags representing the novel, lower abundant transcripts. Here I propose a "Reverse Transcriptome" approach for large-scale identification of novel transcripts through the use of novel SAGE tags.

Current Books: