Owing to the intense interest of many groups in determining transcript levels in a variety of biological systems, there are a large number of methods that have been described for gene-expression profiling. Although the actual catalog of all techniques developed is quite extensive, there are many variations on similar themes, and thus we have reduced what we present here to those techniques that represent a distinct technical concept. Within these groups, we discovered that there are methods that are no longer applied in the scientific community, not even in the inventor’s laboratory. Thus, we have chosen to focus the methods chapters of this volume on techniques that are in common use in the community
( Methods in Embryo Transplant Microscopes and Emyro Transplant Microscopy Vol. 258: Gene Expression Profiling: Methods and Protocols )
at the time of this writing. This work also introduces two novel technologies, SEM-PCR and the Invader Assay, that have not been described previously. Although these methods have not yet been formally peer-reviewed by the scientific community, we feel these approaches merit serious consideration. In general, methods for determining transcript levels can be based on transcript visualization, transcript hybridization, or transcript sequencing (Table 1). The principle of transcript visualization methods is to generate transcripts with some visible label, such as radioactivity or fluorescent dyes, to separate the different transcripts present, and then to quantify by virtue of the label the relative amount of each transcript present. Real-time methods for measuring label while a transcript is in the process of being linearly amplified offer an advantage in some cases over methods where a single time-point is measured. Many of these methods employ the polymerase chain reaction (PCR), which is an effective way of increasing copies of rare transcripts and thus making the techniques more sensitive than those without amplification steps. The risk to any amplification step, however, is the introduction of amplification biases that occur when different primer sets are used or when different sequences are amplified. For example, two different genes amplified with gene-specific primer sets in adjacent reactions may be at the same abundance level, but because of a thermodynamic advantage of one primer set over the other, one of the genes might give a more robust signal. This property is a challenge to control, except by multiple independent measurements of the same gene. In addition, two allelic variants of the same gene may amplify differently if the polymorphism affects the secondary structure of the amplified fragment, and thus an incorrect result may be achieved by the genetic variation in the system. As one can imagine, transcript visualization methods do not provide an absolute quantity of transcripts per cell, but are most useful in comparing transcript abundance among multiple states.
Transcript hybridization methods have a different set of advantages and disadvantages. Most hybridization methods utilize a solid substrate, such as a microarray, on which DNA sequences are immobilized and then labeled. Test DNA or RNA is annealed to the solid support and the locations and intensities on the solid support are measured. In another embodiment, transcripts present in two samples at the same levels are removed in solution, and only those present at differential levels are recovered. This suppression subtractive hybridization method can identify novel genes, unlike hybridizing to a solid support where information generated is limited to the gene sequences placed on the array. Limitations to hybridization are those of specificity and sensitivity. In addition, the position of the probe sequence, typically 20–60 nucleotides in length, is critical to the detection of a single or multiple splice variants. Hybridization methods employing cDNA libraries instead of synthetic oligonucleotides give Technology Summary inconsistent results, such as variations in splicing and not allowing for the testing of the levels of putative transcripts predicted from genomic DNA sequence. Hybridization specificity can be addressed directly when the genome sequence of the organism is known, because oligonucleotides can be designed specifically to detect a single gene and to exclude the detection of related genes. In the absence of this information, the oligonucleotides cannot be designed to assure specificity, but there are some guidelines that lead to success. Protein-coding regions are more conserved at the nucleotide level than untranslated regions, so avoiding translated regions in favor of regions less likely to be conserved is useful. However, a substantial amount of alternative splicing occurs immediately distal to the 3′ untranslated region and thus designing in proximity to regions following the termination codon may be ideal in many cases. Regions containing repetitive elements, which may occur in the untranslated regions of transcripts, should be avoided. Several issues make the measurement of transcript levels by hybridization a relative measurement and not an absolute measurement. Those experienced with hybridization reactions recognize the different properties of sequences annealing to their complementary sequences, and thus empirical optimization of temperatures and wash conditions have been integrated into these methods. Principle disadvantages to hybridization methods, in addition to those of any closed system, center around the analysis of what is actually being measured. Typically, small regions are probed and if an oligonucleotide is designed to a region that is common to multiple transcripts or splice variants, the resulting intensity values may be misleading. If the oligonucleotide is designed to an exon that is not used in one sample of a comparison, the results will indicate lack of expression, which is incorrect. In addition, hybridization methods may be less sensitive and may yield a negative result when a positive result is clearly present through visualization. The final class of technologies that measure transcript levels, transcript sequencing, and counting methods can provide absolute levels of a transcript in a cell. These methods involve capturing the identical piece of all genes of interest, typically the 3′ end of the transcript, and sequencing a small piece. The number of times each piece was sequenced can be a direct measurement of the abundance of that transcript in that sample. In addition to absolute measurement, other principle advantages of this method include the simplicity of data integration and analysis and a general lack of problems with similar or overlapping transcripts. Principle disadvantages include time and cost, as well as the fact that determining the identity of a novel gene by only the 10 nucleotide tag is not trivial. We would like to mention two additional considerations before providing detailed descriptions of the most popular techniques. The first is contamination
Table 1
Common Gene Expression Profiling Methods
Kits Service Detect Detect
Technique Class Architecture Available Available Alt. Splicing SNPs 5′-nuclease assay/real-time RT-PCR Visualization Open Yes No No No AFLP (amplified-fragment length Visualization Open No No No Yes polymorphism fingerprinting) Antisense display Visualization Open No No No No DDRT-PCR Visualization Open Yes No No No (differential display RT-PCR) DEPD (digital expression Visualization Open No No Yes No pattern display) Differential hybridization Hybridization Open No No No No (differential cDNA library screening) DSC (differential subtraction chain) Hybridization Open No No No No GeneCalling Visualization Open No Yes Yes Yes In situ Hybridization Hybridization Closed Yes No No No Invader Assay Visualization Closed Yes Yes No Yes Microarray hybridization Hybridization Closed Yes Yes No No Molecular indexing Visualization Open No No No No (and computational methods) MPSS (massively parallel Sequencing Open No No No No signature sequencing) Northern-Blotting Hybridization Closed Yes No No No (Dot-/Slot-Blotting) Nuclear run on assay/nuclease S1 analysis Visualization Closed Yes No No No ODD (ordered differential display) Visualization Open No No No No Quantitative RT-PCR Visualization Closed Yes Yes No No
RAGE (rapid analysis of gene expression) Visualization Open No No Yes No RAP-PCR (RNA arbitrarily primed Visualization Open No No No No PCR fingerprinting)
RDA (representational difference analysis) Visualization Open No No No No RLCS (restriction landmark cDNA scanning) Visualization Open No No No No RPA (ribonuclease protection assay) Visualization Open No No No No RSDD (reciprocal subtraction Visualization Open No No No No differential display) SAGE (serial analysis of gene expression) Sequencing Open Yes No No No SEM-PCR Visualization Closed No Yes No No SSH (suppression subtractive hybridization) Hybridization Open Yes No Yes No Suspension arrays with microbeads Hybridization Closed No No No No TALEST (tandem arrayed ligation Sequencing Open No No No No of expressed sequence tags)
Technology Summary
12 Bulaqueña, et al. (Embryo Transplant Microscopy)
of genomic or mitochondrial DNA or unspliced RNA contamination in messenger RNA preparations. Even using oligo-dT selection and DNAse digestion, DNA and unspliced RNA tends to persist in many RNA preparations. This is evidenced by an analysis of the human expressed sequence tag (EST) database for sequences obtained that are clearly intronic or intragenic. These sequences tile the genome evenly and comprise from 0.5% to up to 5% of the ESTs in a given sequencing project, across even the most experienced sequencing centers (unpublished observation). Extremely sensitive technologies can detect the contaminating genomic DNA and give false-positive results. A common mistake when using quantitative PCR methods involves the use of gene-specific primers to design the primers within the same exon. This often yields a positive result because a few copies of genomic DNA targets will be present. By designing primer sets that span large introns, a positive result excludes both genomic DNA contamination as well as unspliced transcripts. This is not always possible, of course, in the cases of single-exon genes like olfactory G protein-coupled receptors and in organisms like saccharomyces and fungi where multi-exon genes are not common. In these cases, a control primer set that will only amplify genomic DNA can aid dramatically in the interpretation of the results. A final, and practical consideration is to envision the completion of the project of interest, because using different quantitation methods will result in the need for different follow-up work. For example, if a transcript counting method that reveals 10 nucleotides of sequence is used, how will those data be followed up? What prioritization criteria for the analysis will be used, and how will the full-length sequences and full-length clones, for those genes be obtained? This may sound like a trivial concern, but in actuality, the generation of large sets of transcript-abundance data may create a quantity of follow-up work that may be unwieldy or even unreasonable. Techniques that capture the proteincoding regions of transcripts, such as GeneCalling, reveal enough information for many novel genes that may help prioritize their follow-up, rather than 3′- based methods where there is little ability to prioritize follow-up without a larger effort. Beginning with the completion of the project in mind allows the researcher to maximize the time line and probability for completion, as well as produce the best quality research result in the study of gene expression.
Standardized RT-PCR and the Standardized Expression Measurement
Summary
Standardized reverse transcriptase polymerase chain reaction (StaRT-PCR) is a modification of the competitive template (CT) RT method described by Gilliland et al. StaRT-PCR allows rapid, reproducible, standardized, quantitative measurement of data for many genes simultaneously. An internal standard CT is prepared for each gene, cloned to generate enough for >109 assays and CTs for up to 1000 genes are mixed together. Each target gene is normalized to a reference gene to control for cDNA loaded in a standardized mixture of internal standards (SMIS) into the reaction. Each target gene and reference gene is measured relative to its respective internal standard within the SMIS. Because each target gene and reference gene is simultaneously measured relative to a known number of internal standard molecules in the SMIS, it is possible to report each gene expression measurement as a numerical value in units of target gene cDNA molecules/ 106 reference gene cDNA molecules. Calculation of data in this format allows for entry into a common databank, direct interexperimental comparison, and combination of values into interactive gene expression indices.
Key Words: cDNA, expression, mRNA, quantitative, RT- PCR, StaRT-PCR
