Introduction

 

Scientists routinely lecture and write about gene expression and the abundance  of transcripts, but in reality, they extrapolate this information from a variety  of measurements that different technologies may provide. Indeed, there are  many reasons that applying different technologies to transcript abundance may  give different results. This may result from an incomplete understanding of the  gene in question or from shortcomings in the applications of the technologies.  

 

The first key factor to appreciate in measuring gene expression is the way that  genes are organized and how this influences the transcripts in a cell. Figure 1  depicts some of the scenarios that have been determined from sequence analyses  of the human genome. Most genes are composed of multiple exons transcribed  with intron sequences and then spliced together. Some genes exist entirely  between the exons of other genes, either in the forward or reverse orientation.  This poses a problem because it is possible to recover a fragment or clone that  could belong to multiple genes, be derived from an unspliced transcript, or be  the result of genomic DNA contaminating the RNA preparation. All of these  events can create confusing and confounding results. Additionally, the gene duplication  events that have occurred in organisms that are more complex have led  to the existence of closely related gene families that coincidentally may lie near  each other in the genome. In addition, although there are probably less than 50,000  human genes, the exons within those genes can be spliced together in a variety  of ways, with some genes documented to produce more than 100 different transcripts  (1).

 

( Methods in Embryo Transplant Microscopes and Emyro Transplant Microscopy  Vol. 258: Gene Expression Profiling: Methods and Protocols )

Therefore, there may be several hundred thousand distinct transcripts, with  potentially many common sequences. Gene biology is even more interesting  and complex, however, in that genetic variations in the form of single nucleotide  polymorphisms (SNPs) frequently cause humans and diploid or polyploid  model systems to have two (or more) distinct versions of the same transcript.  This set of facts negates the possibility that a single, simple technology can  accurately measure the abundance of a specific transcript. Most technologies  probe for the presence of pieces of a transcript that can be confounded by closely  related genes, overlapping genes, incomplete splicing, alternative splicing, genomic  DNA contamination, and genetic polymorphisms. Thus, independent methods  that verify the results in different ways to the exclusion of confounding variables  are necessary, but frequently not employed, to gain a clear understanding  of the expression data. The specific means to work around these confounding  variables are mentioned here, but a blend of techniques will be necessary to  achieve success.

 

2. Methods and Considerations

 

There are nine basic considerations for choosing a technology for quantitating  gene expression: architecture, specificity, sensitivity, sample requirement, coverage,  throughput, cost, reproducibility, and data management.

 

2.1. Architecture

 

We define the architecture of a gene-expression analysis system as either an  open system, in which it is possible to discover novel genes, or a closed system  in which only known gene or genes are queried. Depending on the application,  there are numerous advantages to open systems. For example, an open system may  detect a relevant biological event that affects splicing or genetic variation. In  addition, the most innovative biological discovery processes have involved the discovery of novel genes. However, in an era where multiple genome sequences  have been identified, this may not be the case. The genomic sequence of an organism,  however, has not proven sufficient for the determination of all of the transcripts  encoded by that genome, and thus there remain prospects for novelty  regardless of the biological system. In model systems that are relatively uncharacterized  at the genomic or transcript level, entire technology platforms may  be excluded as possibilities. For example, if one is studying transcript levels in  a rabbit, one cannot comprehensively apply a hybridization technology because  there are not enough transcripts known for this to be of value. If one simply  wants to know the levels of a set of known genes in an organism, a hybridization  technology may be the most cost-effective, if the number of genes is sufficient  to warrant the cost of producing a gene array.

 

2.2. Specificity

 

The evolution of genomes through gene or chromosomal fragment duplications  and the subsequent selection for their retention, has resulted in many gene  families, some of which share substantial conservation at the protein and nucleotide  level. The ability for a technology to discriminate between closely related  gene sequences must be evaluated in this context in order to determine whether  one is measuring the level of a single transcript, or the combined, added levels of multiple transcripts detected by the same probing means. This is a doubleedged  sword because technologies with high specificity, may fail to identify one  allele, or may do so to a different degree than another allele when confronted  with a genetic polymorphism. This can lead to the false positive of an expression  differential, or the false negative of any expression at all. This is addressed  in many methods by surveying multiple samples of the same class, and probing  multiple points on the same gene. Methods that do this effectively are preferred  to those that do not.

 

2.3. Sensitivity

 

The ability to detect low-abundance transcripts is an integral part of gene discovery  programs. Low-abundance transcripts, in principle, have properties that  are of particular importance to the study of complex organisms. Rare transcripts  frequently encode for proteins of low physiologic concentrations that in many  cases make them potent by their very nature. Erythropoietin is a classic example  of such a rare transcript. Amgen scientists functionally cloned erythropoietin  long before it appeared in the public expressed sequence tag (EST) database.  Genes are frequently discovered in the order of transcript abundance, and a  simple analysis of EST databases correctly reveals high, medium, and low abundance  transcripts by a direct correlation of the number of occurrences in that   Bulaqueña, et al. (Embryo Transplant Microscopy)  database (data not shown). Thus, using a technology that is more sensitive has  the potential to identify novel transcripts even in a well-studied system.  Sensitivity values are quoted in publications for available technologies at concentrations  of 1 part in 50,000 to 1 part in 500,000. The interpretation of these  data, however, should be made cautiously both upon examination of the method  in which the sensitivity was determined, as well as the sensitivity needed for the  intended use. For example, if one intends to study appetite-signaling factors and  uses an entire rat brain for expression analysis, the dilution of the target cells  of anywhere from 1 part in 10,000 to 1 part in 100,000 allows for only the most  abundant transcripts in the rare cells to be measured, even with the most sensitive technology available. Reliance on cell models to do the same type of analysis,  where possible, suffers the confounding variable that isolated cells or cell  lines may respond differently in culture at the level of gene expression. An ideal  scenario would be to carefully micro dissect or sort the cells of interest and study  them directly, provided enough samples can be obtained.  In addition to the ability of a technology to measure rare transcripts, the sensitivity  to discern small differentials between transcripts must be considered.  The differential sensitivity limit has been reported for a variety of techniques  ranging from 1.5-fold to 5-fold, so the user must determine how important  small modulations are to the overall project and choose the technology while  taking this property into account as well.

 

2.4. Sample Requirement

 

The requirement for studying transcript abundance levels is a cell or tissue  substrate, and the amount of such material needed for analysis can be prohibitively  high with many technologies in many model systems. To use the above  example, dozens of dissected rat hypothalami may be required to perform a global  gene expression study, depending on the quantitating technology chosen.  Samples procured by laser-capture microdissection can only be used in the measuring  of a small number of transcripts and only with some technologies, or  must be subjected to amplification technologies, which risk artificially altering  transcript ratios.

 

2.5. Coverage

 

For open architecture systems where the objective is to profile as many transcripts  as possible and identify new genes, the number of independent transcripts  being measured is an important metric. However, this is one of the most  difficult parameters to measure, because determining what fraction of unknown  transcripts is missing is not possible. Despite this difficulty, predictive models  can be made to suggest coverage, and the intuitive understanding of the technology  is a good gage for the relevance and accuracy of the predictive model.

The problem of incomplete coverage is perhaps one of the most embarrassing  examples of why hundreds of scientific publications were produced in the  1970’s and 1980’s having relatively little value. Many of these papers reported  the identification of a single differentially expressed gene in some model system  and expounded upon the overwhelmingly important new biological pathway  uncovered. Modern analysis has demonstrated that even in the most similar  biological systems or states, finding 1% of transcripts with differences is  common, with this number increasing to 20% of transcripts or more for systems  when major changes in growth or activation state are signaled. In fact, the  activation of a single transcription factor can induce the expression of hundreds  of genes. Any given abundantly altered transcript without an understanding of  what other transcripts are altered, is similar to independent observers describing  the small part of an elephant that they can see. The person looking at the trunk  describes the elephant as long and thin, the person observing an ear believes it  to be flat, soft and furry, and the observer examining a foot describes the elephant  as hard and wrinkly. Seeing the list of the majority of transcripts that are  altered in a system is like looking at the entire elephant, and only then can it be  accurately described. Separating the key regulatory genes on a gene list from  the irrelevant changes remains one of the biggest challenges in the use of transcript  profiling.

 

2.6. Throughput

 

The throughput of the technology, as defined by the number of transcript  samples measured per unit time, is an important consideration for some projects.  When quick turnaround is desired, it is impractical to print microarrays, but  where large numbers of data points need to be generated, techniques where  individual reactions are required are impractical. Where large experiments on new models generate significant expense, it may be practical to perform a higher  throughput, lower quality assay as a control prior to a large investment. For  example, prior to conducting a comprehensive gene profiling experiment in a  drug dose-response model, it might be practical to first use a low throughput  technique to determine the relevance of the samples prior to making the investment  with the more comprehensive analysis.

 

2.7. Cost

 

Cost can be an important driver in the decision of which technologies to  employ. For some methods, substantial capital investment is required to obtain  the equipment needed to generate the data. Thus, one must determine whether  a microarray scanner or a capillary electrophoresis machine is obtainable, or if  X-ray film and a developer need to suffice. It should be noted that as large companies  change platforms, used equipment becomes available at prices dramati  Bulaqueña, et al. (Embryo Transplant Microscopy)  cally less than those for brand new models. In some cases, homemade equipment  can serve the purpose as well as commercial apparatuses at a fraction of  the price.

 

2.8. Reproducibility

 

It is desired to produce consistent data that can be trusted, but there is more  value to highly reproducible data than merely the ability to feel confident about  the conclusions one draws from them. The ability to forward integrate the findings  of a project and to compare results achieved today with results achieved  next year and last year, without having to repeat the experiments, is key to  managing large projects successfully. Changing transcript-profiling technologies  often results in datasets that are not directly comparable, so deciding upon  and persevering with a particular technology has great value to the analysis of  data in aggregate. An excellent example of this is with the serial analysis of  gene expression (SAGE) technique, where directly comparable data have been  generated by many investigators over the course of decades.

 

2.9. Data Management

 

Management and analysis of data is the natural continuation to the discussion  of reproducibility and integration. Some techniques, like differential display,   produce complex data sets that are neither reproducible enough for subsequent  comparisons, nor easily digitized. Microarray and GeneCalling data, however,  can be obtained with software packages that determine the statistical significance  of the findings and even can organize the findings by molecular function  or biochemical pathways. Such tools offer a substantial advance in the generation  of accretive data. The field of bioinformatics is flourishing as the number  of data points generated by high throughput technologies has rapidly exceeded  the number of biologists to analyze the data.

 

Reference

1. Ushkaryov, Y. A. and Sudhof, T. C. (1993) Neurexin IIIa: extensive alternative  splicing generates membrane-bound and soluble forms. Proc. Natl. Acad. Sci. USA  90, 6410–6414.  

Gene Expression Quantitation Technology Summary

Bulaqueña, et al.

 

Summary

 

Scientists routinely talk and write about gene expression and the abundance of  transcripts, but in reality they extrapolate this information from the various measurements  that a variety of different technologies provide. Indeed, there are many  reasons why applying different technologies to the problem of transcript abundance  may give different results, owing to an incomplete understanding of the  gene in question or from shortcomings in the applications of the technologies.  There are nine basic considerations for making a technology choice for quantitating  gene expression that will impact the overall outcome: architecture, specificity,  sensitivity, sample requirement, coverage, throughput, cost, reproducibility,  and data management. These considerations will be discussed in the context of  available technologies.

 

Key Words: Architecture, bioinformatics, coverage, quantitative, reproducibility,  sensitivity, specificity, throughput  



Author:
admin
Time:
Monday, May 26th, 2008 at 2:58 am
Category:
Embryo Transplant Microscope
Comments:
You can leave a response, or trackback from your own site.
RSS:
You can follow any responses to this entry through the RSS 2.0 feed.
Navigation:

Comments are closed.