1.   Introduction

 

The analysis of gene expression is key to addressing a wide variety of medical  and biological research questions, including the dissection of basic biological  processes, the classification of disease, and the identification of new drug  targets. Until recently, comparing expression levels across different tissues or  cells was restricted to monitoring a few genes at a time. Using DNA microarrays, however, it is possible to monitor the activities of thousands of genes at once (1). Global analyses of gene expression can be useful for obtaining in-depth views of cell function. It is estimated, for example, that between 0.2 and 10% of all From: Methods in Embryo Transplant Microscopes and Emyro Transplant Microscopy  Vol. 258: Gene Expression Profiling: Methods and Protocols Edited by: R. A. Bulaqueña, et al. (Embryo Transplant Microscopy) © Bulaqueña, et al. (Embryo Transplant Microscopy) Inc., Totowa, NJ 71Lescallett et al. transcripts in a typical mammalian cell are differentially expressed between cancer and normal tissues (2). Whole-genome analyses are also useful because they provide a powerful tool to search through the activities of thousands of genes and identify key players (3,4). In addition, large-scale analyses of expression allow investigators to generate robust classifiers of disease that can outperform traditional, single-marker tests (5,6). Moreover, these analyses frequently  yield information that extend beyond the study’s original aims. A study designed to identify expression patterns that correlate with a clinical outcome, for example, may also generate insights into the disorder’s basic biology, as well as identify candidate drug targets (5–7).

In this chapter, we describe the use of GeneChip® probe arrays, oligonucleotide microarrays that allow global analyses of gene expression with a high degree of reproducibility, sensitivity, and specificity (8). Unlikeother microarrays, GeneChip probe arrays track real and stray hybridization signals in a probe-specific manner, enabling accurate detection and quantitation of lowabundance  transcripts. In addition, the probes can be designed to distinguish  between homologous transcripts that are up to 90% identical (9). The design  and manufacture of GeneChip probe arrays is highly stereotyped and consistent,  ensuring a high degree of reproducibility between experiments (10). This  reproducibility allows the comparison of one control sample to many experimental  samples, or several controls to many experimental samples.  In this chapter, we also present practical guidelines for optimizing the capabilities  of GeneChip probe arrays. Suggestions for the extraction of RNA from  cells and tissues are provided, as well as instructions for the generation of labeled  targets. Target labeling is achieved by using the sample RNA as a template for  the synthesis of cDNA and then generating labeled cRNA in the presence of  biotinylated nucleotides. The labeled targets are then spiked with control transcripts  to monitor the quality of the subsequent hybridization. Recommendations  for washing, staining, and scanning of the arrays are provided.

The steps involved in performing data analysis and verifying data quality  measurements are described. The basics of single-array analysis is presented  first. This section describes how to obtain qualitative indicators for transcript  detection, as well as quantitative measurements of relative abundance. Recommendations  for conducting comparative analyses between arrays and new tools  for comparing and sharing data are also discussed. Although the application of  advanced data analysis techniques depends on the specific goals of individual  users, we briefly mention some of the most commonly used approaches.  Experimental design strategies are not discussed in this chapter. However,  before starting any microarray project it is important to have a well-defined experiment  that is formulated to answer a specific question. The data analysis strategy  should also be considered early on during the experimental planning. This Gene Expression Monitoring With DNA Microarrays  will help visualize a clear path to getting and summarizing experimental results.  For more information please refer to the Experimental Design, Statistical Analysis, and Biological Interpretation document accessible through the website.

2. Materials

2.1. Equipment

1. Affymetrix scanner system with workstation (Affymetrix; Santa Clara, CA).

2. Fluidics Station (Affymetrix; Santa Clara, CA).

3. Hybridization Oven 640 (Affymetrix; Santa Clara, CA).

4. GeneChip probe array cartridge carriers (Affymetrix; Santa Clara, CA).

2.2. Total RNA Isolation

1. TRIzol Reagent (Invitrogen Life Technologies; Carlsbad, CA).

2. RNeasy Mini Kit (QIAGEN; Valencia, CA).

2.3. cDNA Synthesis

1. SuperScript II (Invitrogen Life Technologies; Carlsbad, CA) or SuperScript Choice  System for cDNA Synthesis (Invitrogen Life Technologies; Carlsbad, CA).

2. GeneChip T7-oligo (dT) promoter primer kit.

3. GeneChip Eukaryotic polyA RNA control kit.

4. DEPC-treated water (Ambion, Austin, TX).

5. 5X First Strand cDNA buffer.

6. 0.1 M DTT (Invitrogen Life Sciences, Carlsbad, CA).

7. 10 mM dNTP (Invitrogen Life Technologies; Carlsbad, CA).

8. E. coli DNA Ligase (Invitrogen Life Technologies; Carlsbad, CA).

9. E. coli DNA Polymerase I (Invitrogen Life Technologies; Carlsbad, CA).

10. E. coli RNaseH (Invitrogen Life Technologies; Carlsbad, CA).

11. T4 DNA Polymerase (Invitrogen Life Technologies; Carlsbad, CA).

12. 5X Second strand buffer (Invitrogen Life Technologies; Carlsbad, CA).

2.4. cDNA Cleanup

1. GeneChip Sample Cleanup Module (Affymetrix; Santa Clara, CA).

2.5. Biotin-Labeled cRNA Synthesis

1. GeneChip cRNA labeling kit.

2.6. cRNA Cleanup and Quantitation

1. GeneChip Sample Cleanup Module (Affymetrix; Santa Clara, CA)

2.7. cRNA Fragmentation

1. GeneChip Sample Cleanup Module.  

Lescallett et al.

Table 1

Preparation of Hybridization Cocktail for a Single Probe Array  Hybridization Cocktail Components Final Concentration  Fragmented cRNA 0.05 µg/µL Control oligonucleotide B2 (3 nM) 50 pM  20X Eukaryotic hybridization controls 1.5, 5, 25 and 100 pM  (bioB, bioC, bioD, cre)  Herring sperm DNA (10 mg/mL) 0.1 mg/mL  Acetylated BSA (50 mg/mL) 0.5 mg/mL  2X Hybridization buffer 1X

2.8. Hybridization Cocktail

1. Acetylated bovine serum albumin (BSA) solution (50 mg/mL) (Invitrogen Life  Technologies; Carlsbad, CA).

2. Herring sperm DNA (Promega Corporation; Madison, WI).

3. GeneChip Eukaryotic Hybridization Control Kit (Affymetrix; Santa Clara, CA).

4. MES Free Acid Monohydrate SigmaUltra (Sigma-Aldrich; St. Louis, MO).

5. MES sodium salt (Sigma-Aldrich; St. Louis, MO).

6. 10% Surfact-Amps 20 (Tween-20), (Pierce Chemical; Rockford, IL).

7. 5 M NaCl, RNAse-free, DNase-free (Ambion, Austin, TX)

8. EDTA Disodium Salt, 0.5 M solution (Sigma-Aldrich; St. Louis, MO).

9. 12X MES stock;1.22 M MES, 0.89 M [Na+] (see Note 1).

10. 2X hybridization buffer ;100 mM MES, 1 M [Na+], 20 mM EDTA, 0.01% Tween20  (see Note 2).

2.9. Probe Array Washing and Staining

1. R-Phycoerythrin streptavidin (Molecular Probes; Eugene, OR).

2. PBS, pH 7.2 (Invitrogen Life Technologies; Carlsbad, CA).

3. 20X SSPE: 3 M NaCl, 0.2 M NaH2PO4, 0.02 M EDTA (Cambrex, East Rutherford, NJ).

4. Goat IgG, reagent grade (Sigma-Aldrich; St. Louis, MO).

5. Biotinylated anti-streptavidin antibody (goat) (Vector Laboratories; Burlingame,  CA).

6. Stringent wash buffer; 100 mM MES, 0.1 M [Na+], 0.01% Tween-20 (see Note 3).

7. Non-stringent wash buffer; 6X SSPE, 0.01% Tween-20 (see Note 4).

8. 2X stain buffer; 100 mM MES, 1 M [Na+], 0.05% Tween-20 (see Note 5).

9. 10 mg/mL goat IgG stock (see Note 6).

10. The staining and antibody solutions (see Tables 2 and 3).

3. Methods

The methods described outline the procedure for generating biotinylated  cRNA target for expression analysis on eukaryotic GeneChip probe arrays.  

Gene Expression Monitoring With DNA Microarrays

Table 2

Preparation of the Staining Solution

SAPE Stain Solution Final Concentration 2X MES Stain Buffer 1X  50 mg/mL acetylated BSA 2 mg/mL  1 mg/mL Streptavidin-Phycoerythrin 10 µg/mL

Table 3

Preparation of the Antibody Solution

Antibody Solution Final Concentration  2X MES Stain Buffer 1X  50 mg/mL acetylated BSA 2 mg/mL  10 mg/mL Normal Goat IgG 0.1 mg/mL  0.5 mg/mL biotinylated antibody 3 µg/mL  Please note that these protocols should only be used for eukaryotic organisms  owing to the intrinsic differences between eukaryotic and prokaryotic RNA. Pro- karyotic-specific guidelines are available through the website, www.affymetrix.  com.

A schematic of the gene expression assay, from starting material to probe  array scanning, is illustrated in Fig. 1.

3.1. Sample Preparation

These protocols are for preparing labeled biotinylated cRNA from total  RNA; however, poly (A)+ RNA may be used as starting material with slight  modifications.  The first step in the eukaryotic gene expression assay is the purification of  RNA from cells or tissues. High-quality starting material is the most crucial  component of a successful sample preparation. Therefore, it is important to  choose an RNA extraction method that provides the highest quality RNA for  the specific tissues or cells being used.  The second step in the protocol is the generation of double-stranded cDNA.  Promoter primer T7-(dT) is used in this reaction. This primer facilitates the  synthesis of the cDNA strand and incorporates a promoter sequence for use in  the the third step of the assay - the in vitro transcription (IVT). After the IVT is  complete, the biotin-labeled cRNA is fragmented. This cRNA fragmented target  is used to create a hybridization cocktail. The cocktail is hybridized to a Lescallett et al.

Fig. 1. Eukaryotic gene expression assay, starting from total RNA to the generation  of the scanned image (GeneChip Expression Analysis Technical Manual).

Gene Expression Monitoring With DNA Microarrays

GeneChip probe array for 16 h. Next, the array is washed, stained with a fluorescent  tag, and scanned using a laser to excite the fluorescent stain. Finally, the  captured array image is analyzed using GeneChip software.

3.1.1. Isolation and Quantification of Total RNA

Total RNA isolation from mammalian cells or tissues, Arabidopsis, yeast, and other species can be performed using a variety of methods. As summarized above, it is best to investigate an isolation procedure that is most successful for a particular sample type. RNeasy Total RNA Isolation kit or the TRIzol Reagent provides a robust way for isolation of mammalian and Arabidopsis samples (see Note 7). When extracting from yeast samples, a hot phenol extraction protocol (11) should be considered. If the RNeasy Total RNA Isolation kit is used, ethanol precipitation is not required, unless concentration of the RNA is necessary. This precipitation is only required when using TRIzol isolation or hot phenol extraction methods. Prior to proceeding to the cDNA synthesis step, it is important to determine sample concentration and purity by spectrophotometric analysis and gel electrophoresis. The A260/A280 ratio should be close to 2.0 for pure RNA, however, ratios between 1.8 and 2.1 are acceptable. RNA degradation is identified by running an agarose gel and examining the 28S and 18S ribosomal RNA (rRNA) bands. These rRNA bands should be clear and with minimal smearing, especially below the 18S band (12). If the RNA purity is not at an acceptable absorbance reading and/or the gel shows signs of smearing, an additional isolation procedure on the RNA samples should be performed. If this does not lead to acceptable quality, then fresh starting material from tissues or cells is required. The minimum amount of total RNA required for the assay is 5 µg (see Note 8).

3.1.2. Synthesis of Double-Stranded cDNA From Total RNA

The Invitrogen Life Technologies SuperScript Choice system is required for this section of the assay. However, there are slight modifications to the SuperScript Choice system recommended protocol. For example, a T7-(dT)24 oligo primes the first-strand cDNA synthesis in place of oligo (dT) or random primers (see Note 9). The recommended amount of starting total RNA for the cDNA protocol is between 5 and 20 µg which subsequently influences the amount of SuperScript II Reverse Transcriptase (200 U/µL ) needed. More specifically, if the total RNA starting amount is between 5 and 8 µg, then 1 µL of enzyme is used. If the starting amount of total RNA is between 8.1 and 16 µg , then 2 µL of enzyme is used. Finally, if the starting amount of total RNA is between 16.1 and 20 µg, then 3 µL of enzyme is used.

The first-strand cDNA synthesis involves three steps:

Lescallett et al.

1. Combine the T7-(dT)24 primer (final amount 100 pmol), DEPC-H2O and RNA (5–20 µg) mixture and incubate at 70°C for 10 min, spin and place on ice.

2. Add the 5X first strand cDNA buffer (final concentration 1X), 0.1 M DTT (final concentration 10 mM) and 10 mM dNTP mix (final concentration 500 µM each) to the tube and incubate for 2 min at 42°C (see Note 10).

3. Add the SuperScript II RT enzyme (final content 200–1000 U) to the tube, making the final reaction volume 20 µL. Allow the reaction to proceed for 1 h at 42°C. When the first-strand reaction is complete, the tube is placed on ice and the second-strand reaction components are added in the following sequence:

1. Add DEPC-H2O and 5X Second-Strand Reaction Buffer (final concentration 1X), 10 mM dNTP mix (final concentration 200 µM each), 10 U/µL E. coli DNA Ligase (final content 10 U), 10 U/µL E. coli DNA Polymerase I (final content 40 U), 2 U/µL E. coliRNase H (final content 2 U). The final volume, first strand plus second strand, should be 150 µL.

2. Gently tap the tube to mix and briefly microcentrifuge to remove any condensation. Then, incubate at 16°C for 2 h in a cooling water bath. After the second-strand synthesis is complete, add 2 µL of T4 DNA Polymerase (10 U) and return tube to 16°C for 5 mins. Then, add 10 µL 0.5 M EDTA to stop the reaction.

The reaction can be stored at -20°C for later use (see Note 11).

3.1.3. Cleanup of Double-Stranded cDNA

The cleanup of the double-stranded cDNA reaction is imperative to rid the sample of impurities. This step is accomplished by using Phase Lock Gels or a column purification method such as the GeneChip Sample Cleanup Module. If using the Phase Lock gels, be sure to ethanol precipitate the samples after purification before going to the next step. Ethanol precipitation is not required when using the column purification method.

3.1.4. Synthesis of Biotin-Labeled cRNA

The Enzo BioArray HighYield (HY) RNA Transcript Labeling Kit is used to generate biotin-labeled cRNA. This reaction is catalyzed by the addition of T7 RNA Polymerase, which recognizes the promoter region incorporated into the sequence during the first-strand cDNA synthesis reaction. This IVT reaction generates a 50- to 100-fold linear amplification of the represented transcripts (see Note 12). The amount of cDNA used in the IVT reaction depends on the original amount of starting material. More specifically, if the starting total RNA isolated is between 5.0 and 8.0 µg, 10 µL of cDNA should be used. If the starting Gene Expression Monitoring With DNA Microarrays total RNA is between 8.1 and 16.0 µg of total RNA, 5 µL should be used. If the starting total RNA is between 16.1 and 20 µg, 3.3 µL of cDNA should be used. The reaction components are added to the cDNA target along with the appropriate amount of water. The final reaction volume is 40 µL (see Note 13). Once the reagents are added, the tube is mixed gently, microcentrifuged briefly for 5 s, and quickly placed in a 37°C water bath for 4–5 h. Mix the reaction every 30–45 min during the incubation. The labeled cRNA can be stored at -20° or at -70°C for long-term storage (see Note 14).

3.1.5. In Vitro Transcription Cleanup

Cleaning the products of the IVT rids the sample of excessive nucleotides, salts, and other impurities. Accomplish this step by using the GeneChip Sample Cleanup Module.

3.1.6. cRNA Quantification

It is imperative to determine the purity and yield of the cRNA target through spectrophotometric analysis and gel electrophoresis. Acceptable A260/A280absor-bance ratios are between 1.8 and 2.1. If a sample does not meet this criterion, it is advisable to repeat the experiment. Gel electrophoresis provides an illustration of the yield and size distribution of the labeled target. Another step in quantifying the cRNA yield is to account for unlabeled RNA in the reaction. Unlabeled RNA is accounted for by adjusting the cRNA yield using the following equation: Adjusted cRNA yield (µg) = (cRNA yield after IVT) - (RNA starting amount) * (cDNA used in the IVT)

3.1.7. cRNA Fragmentation

The cRNA is fragmented by a metal-induced hydrolysis process which segments the target into fragments ranging from 35 to 200 bases. It is important to have the correct concentration of the reaction components - cRNA, fragmentation buffer, and water, as well as ensuring that the time and temperature are exactly those recommended. The maximum amount of cRNA to fragment depends on the volume of the hybridization cocktail, which ultimately depends on the size of the array. For example, for a standard array, the minimum amount to fragment is 10 µg of cRNA for a 200 µL cocktail. Fragmentation buffer (5X), cRNA, and water is added to the reaction to make a total volume of 40 µL (see Note 15). The reaction is incubated at 94°C for 35 min. The tube is then placed on ice or stored at -20°C until the hybridization procedure. An aliquot of fragmented cRNA is saved for gel analysis, so that the fragmented target can be compared to the purified and unpurified cRNA.

Lescallett et al.

3.2. Sample Hybridizationand Probe Array Washing, Staining, and Scanning

3.2.1. Hybridization Cocktail

The hybridization cocktail includes the fragmented cRNA target, 20X Eukaryotic Hybridization Controls (E.coli bioB, bioC, bioD and bacteriophage cre controls), Oligo B2, acetylated BSA, and Herring Sperm DNA (see Note 16). Mix the following reagents with buffered solution for a final volume that varies depending on the array type and the number of hybridizations. Be sure to heat the 20X Eukaryotic Controls at 65°C for 5 min in order to resuspend the mixture completely.

Once the hybridization cocktail is prepared, the probe arrays are taken out of 4°C and equilibrated to room temperature. At the same time, the hybridization cocktail is heated to 99°C for 5 mins and then transferred to another 45°C heat block for 5 mins. The cocktails are then spun at maximum speed in a microcentrifuge for 5 mins to separate any insoluble material from the qualified hybridization mixture. Meanwhile, the arrays are prehybridized with 1X hybridization buffer. The buffer is injected into the lower septa of the array and the upper septum is vented for air release. The probe arrays are then incubated in the hybridization oven for 10 mins at 45°C at a rotation speed of 60 rpm. Once prehybridization is complete, the buffer solution is removed from the probe array cartridge and the array is filled with approx 80% of the hybridization cocktail solution (see Note 17). The probe arrays are balanced and placed in the hybridization oven for 16 h at 45°C.

3.2.2. Preparation for Probe Array Washing and Staining

After the 16-h hybridization, the cocktail is removed from the probe array and saved. The cocktail can be stored at -20°C or at -80°C (see Note 18). Once the sample is removed, the probe array is filled completely with nonstringent wash buffer. The following steps prepare the array for an automatic washing and staining procedure performed on the GeneChip Fluidics Station:

1. Open the GeneChip System Workstation.

2. Turn on the fluidics machine and scanner.

3. Create an experiment file (.EXP) in GeneChip software for each probe array.

4. Prime the fluidics machine with the appropriate wash buffers (nonstringent and stringent).

5. Prepare the streptavidin-phycoerythrin (SAPE) staining and antibody solutions (see Note 19). The staining procedure used for most GeneChip probe arrays requires a staining and an antibody amplification step. This process starts by staining the Gene Expression Monitoring With DNA Microarrays array with SAPE, which recognizes the biotin-labeled ribonucleotides. A second solution, which includes an anti-streptavidin biotinylated antibody, is washed over the array. Finally, another solution of SAPE is added to the array that binds to the biotinylated antibody and provides further amplification of the signal. Add deionized water to the SAPE stain solution for a final volume of 600 µL. This reaction can be doubled, in order to make a master mix that is enough for both of the SAPE stains. Add deionized water to the antibody solution for a final volume of 600 µL.

3.2.3. Fluidics Washing and Staining

The probe array is washed and stained on the fluidics machine using arrayspecific protocols recommended by Affymetrix. For example, the fluidics protocol EukGE-WS2 is used for the standard format array. The name of the protocol indicates that the array is for eukaryotic (Euk) gene expression (GE) samples that go through two washing and staining (WS2) procedures. The protocol takes approx 75 mins to complete. The majority of the fluidics protocols consist of the following steps:

1. 10 cycles of 2 mixes per cycle with nonstringent buffer (see Subheading 2.) at 25°C.

2. 4 cycles of 15 mixes per cycle with stringent buffer (see Subheading 2.) at 50°C.

3. SAPE stain for 10 mins at 25°C.

4. 10 cycles of 4 mixes per cycle with nonstringent buffer at 25°C.

5. Antibody stain for 10 mins at 25°C.

6. SAPE stain for 10 mins at 25°C.

7. 15 cycles of 4 mixes per cycle with nonstringent buffer at 30°C. Once the fluidics protocols are complete, check the probe array for bubbles. Bubbles occur when the nonstringent buffer does not completely fill the probe array chamber during the final fill step. If bubbles are present, return the array to the probe array holder to automatically perform a drain and fill. If this does not remove the bubbles, this step needs to be performed manually by pipeting nonstringent buffer into the array chamber. Ensure that all bubbles are removed before scanning and that the glass surface is clean and free of dust, lint, and other materials that can interfere with the scanning procedure. If the glass needs to be cleaned, use a non-abrasive towel or tissue to gently wipe the glass surface before scanning. Once the fluidics protocol is completed and each array is checked for bubbles, the fluidics machine is cleared of buffer and other contaminants by performing a shutdown procedure.

 

3.2.4. Scanning

 

The GeneChip scanner must be turned on 15 mins prior to use. The scan time takes approx 10 mins depending on the array type. The scanned data is represented Lescallett et al.Fig. 2. Screen shot of the microarray scanned image representing the intensity value for each probe cell.

Gene Expression Monitoring With DNA Microarrays as a .DAT or image file and saved on the computer (Fig. 2). Immediately following the creation of a .DAT, the software automatically creates a .CEL file. This file contains a single intensity value for each probe cell.

3.3. Data Analysis

3.3.1. Single-Array Data Analysis

Whether classifying samples based on their expression profiles, identifying transcripts of potential biological or medical importance, or building expression databases, most array experiments involve working with data obtained from multiple arrays. The consistency and reproducibility of GeneChip arrays uniquely positions this platform to achieve these comparisons. Before integrating these data sets, however, the results generated by single arrays must be reviewed and processed. This section describes a basic procedure for analyzing data from single arrays, applicable to many experimental situations. Depending on specific experimental techniques and goals; however, users may need to modify these guidelines.

Open the Affymetrix software and view the scanned image(s) (.dat file). Check for image artifacts such as high or low density spots, uneven background, or other abnormalities. Apply a grid and enlarge each of the four corners of the array image to check the intensity and grid alignment of the control Oligo B2 hybridization (see Note 20). Next, adjust the expression analysis settings so that scaling, normalization, probe mask, baseline, and the algorithm defaults are set appropriately. If experimental samples are going to be compared to a baseline or control sample, it is important to choose a scaling or normalization method that best fits the experimental design. For example, if the majority of transcripts in an experimental sample are not expected to change compared to a control, then a global scaling approach is a suitable strategy. Conversely, when a large number of changes are expected to occur between the experiment and control samples, an approach that scales to a selected number of uniformly expressed transcripts is recommended (see Note 21). In both global and selected scaling methods, an arbitrary number, called “target intensity,” is used across all experiments, allowing interexperiment comparisons. This number facilitates the generation of a scaling factor by which each signal value on the array is multiplied.

3.3.2. The Detection Algorithm

After these preparation steps, the data analysis output or .CHP file is generated. This file contains detection calls, indicators of whether a transcript is reliably detected or not, and signal values, relative measures of transcript abundance. The following section briefly explains how these outputs are generated. Lescallett et al.

Transcript or probe set detection and quantification depends on analyzing the hybridization signals of the 11–20 probe pairs. These probe pairs represent different 25-mer segments of a particular transcript. For each segment or probe that is perfectly identical to a target sequence, GeneChip arrays provide a partner probe that is identical except for a single base mismatch at the 13th position. These probe pairs, containing the perfect match (PM) and mismatch probes (MM), allow for the assessment of real and stray (nonspecific) signals across the probe set. The detection algorithm uses a nonparametric test, based on a one-sided Wilcoxon signed rank, to evaluate probe pair intensities and generate a detection p-value with an associated present (P), marginal (M), or absent (A) call (see Note 22). The first step in determining the p-value is calculating the discrimination score (R). The discrimination score is an indicator of target-spe-cific intensity differences between the perfect match and the mismatch and is calculated as:

R = (PM - MM) / (PM + MM) Each probe pair discrimination score is then adjusted by an empirically derived, small positive number called Tau (see Note 23). The adjusted discrimination scores are then ranked according to the absolute value. Once ranked, the sign is re-applied, the positive rank values are summed, and a p-value is generated. Individual transcripts are assigned a P, M or A call based on user-defined, pvalue cut-offs known as a1 and a2 (see Note 24). Values falling below a1 are assigned a P call, those between a1 and a2 an M call, and those above a2 an A call. The final output is a call with an associated p-value.

3.3.3. The Signal Algorithm

The relative level of expression for each transcript is calculated using an algorithm based on the one-step Tukeys biweight estimate. This robust method provides an effective approach to handling outliers that, instead of being dropped, are smoothly down weighted. The first step in the process of deriving signal is to identify the median of the data. This is done by calculating the log of the PM intensity after subtracting the stray signal estimate, obtained from the MM intensity or the idealized MM intensity (see Note 25). The closer this value is to the median value of the set, the more strongly it is weighted. The mean is then calculated once all of the pairs have been weighted. The weighted mean is converted back to the linear scale and the output is a quantitative metric called signal.

3.3.4. Quality Control

Generating an expression analysis report file (.RPT) derived from the analysis output file (.CHP) can perform most of the quality review of an array expe Gene Expression Monitoring With DNA Microarrays riment. The report allows users to assess sample quality, assay execution, and hybridization performance. The results from the control bioB transcripts, included in the hybridization cocktail at 1.5 [pM], offer an indication of the assay’s sensitivity. In a typical experiment, bioB should be called P most of the time. BioC, bioD, and cre should always be called P and should show increasing signal values that correspond to their relative concentrations. RNA sample and assay quality are often monitored by comparing the signal values of the 3′ probe sets to the 5′ probe sets of actin and GAPDH transcripts. Given that the assay for generating labeled targets has an intrinsic 3′ bias, because of the reverse transcription from the 3′ polyA tail, the ratio of 3′ to 5′ signal values is usually greater than 1. However, ratios that exceed three indicate either degraded sample RNA or inefficient IVT (see Note 26). Another indicator of sample quality can be the percentage of probe sets assigned a P call. This percentage varies depending on biological factors, such as cell or tissue type, but extremely low values may indicate poor sample quality. The percentage is also useful for assessing the reproducibility of replicate experiments.

The average background and raw noise values should also be inspected. Although background can vary widely, average background values typically fall between 20 and 100. Ideally, arrays should have similar background levels if they are being compared. The noise value, a measure ofpixel-to-pixel variation, should also be similar. Although sample quality can contribute to noise, usually the most significant contributor is the electrical noise from the scanner. It is important to keep a running log of the quality control metrics for each sample in order to monitor sample performance and identify sample outliers.

3.3.5. Viewing the Data

After reviewing the report file, return to the .CHP file. The signal values, detection calls, and detection p-values for each transcript can be viewed and sorted according to user preferences (Fig. 3). The data can also be imported as a text file into other programs, such as Microsoft® Excel™.

3.4. Array Comparison Analysis

The goal of many gene expression experiments is to compare the transcription profiles of two samples. To begin analysis, obtain a .CHP file for each of the samples to be compared. Designate one of the arrays as the baseline, and the other as the experimental array (the choice can be arbitrary, but should be used consistently throughout subsequent analyses) (see Note 27). The difference values (PM-MM) of each probe pair in the baseline array are compared to their matching probe pairs in the experimental array. As in single-array analysis, comparison analysis involves two algorithms that generate a qualitative Lescallett et al.

Fig. 3. Data analysis output (.CHP file) for a Single-Array Analysis includes Stat Pairs, Stat Pairs Used, Signal, Detection, and Detection p-value for each probe set. output with an associated p-value, and a quantitative metric, also associated with a confidence interval (CI). The qualitative output is called the change call, which indicates if a transcript in the experimental array is increased, decreased, or equivalent to its baseline counterpart. The quantitative metric is called the signal log ratio and is a quantitative estimate of the change in gene expression.

3.4.1. Change Algorithm

Similar to single-array analyses, comparison analyses rely on a Wilcoxon rank test. First, each probe pair is evaluated for intensity saturation. Then, each probe set in the experimental array is compared to the matching set in the baseline array to generate a change p-value. User-defined cut-off values, called gammas, are then applied to the p-values to generate discrete change calls (increase [I], marginal increase [MI], no change [NC], marginal decrease [MD], or decrease [D]). P-values range from 0.0 to 1.0, with those close to 0.0 indicating a probable increase in the experimental probe set relative to the baseline set, and those close to 1.0 indicating a likely decrease. Values close to 0.5 indicate probe sets whose intensities are very similar in the baseline and experimental data sets.

3.4.2. Signal Log Ratio Algorithm

The Signal Log Ratio provides an estimate of the magnitude and direction of change in transcript abundance between two arrays. Like the signal value Gene Expression Monitoring With DNA Microarrays Fig. 4. Data analysis output (.CHP file) for a Comparison Analysis includes Stat Common Pairs, Signal Log Ratio, Signal Log Ratio Low, Signal Log Ratio High, Change, and Change p-value for each probe set. derived from single-array analyses, the log ratio is calculated using a one-step Tukeys biweight method. The log ratio algorithm calculates a mean of the log ratios of probe pair intensities across two arrays (see Note 28). Ninety-five-per-cent CIs are also calculated to provide a measure of the variation in the biweight estimate. Small CI indicate that the data are less variable and more accurate.

3.4.3. Viewing the Data

After reviewing the report file, return to the .CHP file. The signal log ratio, change calls, and change p-values for each transcript on the experimental sample can be viewed and sorted according to user preferences (Fig. 4). The data can also be imported as a text file into other programs, such as Microsoft Excel.

3.5. Advanced Data Analysis and Mining

It is beyond the scope of this chapter to provide an in depth guide to advanced microarray data analyses, but this section offers some general pointers regarding the available tools. A variety of algorithms have been described to group samples or genes with similar expression patterns. Clustering analyses are often used in studies aimed at discovering new disease classes or novel relationships Lescallett et al. between genes. These methods rely on unsupervised algorithms, which search for patterns of gene expression without taking into account any previously known biological, clinical, or demographic information. Although some of these algorithms allow users to impose a few constraints on the clusters generated (13), the main advantage of clustering is the ability to provide systematic and unbiased analyses of expression data. Studies using self-organizing maps (SOMs) (13), hierarchical algorithms (14), and k-means clustering algorithms (15) illustrate the capabilities of such techniques. For some applications, however, supervised algorithms that incorporate prior knowledge into the analyses are more useful. These algorithms can be “trained” to search for expression patterns associated with particular traits, such as disease outcomes or responsiveness to drugs, and then used to predict those traits in new, unknown samples. Examples include k-nearest neighbors algorithms (5), weighted voting algorithms (16,17), the support vector machine method (18), Bayesian models (19), and artificial neural networks (20). Whether applying supervised or unsupervised algorithms, however, users should be aware of the problem of “multiple comparisons.” Given the large number of results per array experiment, even a small percentage of false positives can result in a large absolute number of artifactual correlations. To minimize this problem, many investigators set aside samples for conducting independent tests, and apply permutation tests in which they introduce noise or scramble the data and then assess how much the identified correlations differ from correlations that could arise by chance. Although these statistical tests are powerful, it is important to note that expression patterns may still result from random associations.

3.6. Data Management

The number of genes that can be simultaneously monitored with the GeneChip platform is unequalled. Because GeneChip arrays generate large amounts of data it is critical to set up consistent procedures for data storage and handling. Deciding on a clear and concise nomenclature for each project, performing regular back-ups of all files, and employing database management software are highly recommended. Affymetrix has developed software that employs a centralized data management system for moderate to high throughput laboratories. This software facilitates data sharing among groups, allows automation of data analysis, has more sophisticated security capabilities, and increases throughput by liberating workstations from analysis tasks.

An important feature of both systems is that they provide the flexibility of open architecture design, allowing users to access a wide variety of tools for analyzing and exchanging data. This flexibility derives from the Affymetrix

Gene Expression Monitoring With DNA Microarrays

Analysis Data Model (AADM), a relational database schema that stores array results in a format that can be easily recognized and used by many software programs. Four related subschema hold the data associated with each experiment: array design (which includes information about the array, such as its numbers of rows and columns), experiment setup (including information about the target applied), analysis results (ranging from individual cell intensities to comparative analysis results), and protocol parameters. AADM’s open design is proving particularly useful in light of the growing number of analytical algorithms being developed in academia and industry, and users’ increasing need to share and compare their data. An additional software tool that complements the flexibility of AADM-based databases is NetAffx Analysis Center at Affymetrix.com. Through this online center, array users can efficiently collect and integrate a wide variety of information relevant to their specific experimental results and aims. This site provides access to a variety of public databases, including GenBank, dbEST, RefSeq, and UniGene. In addition, it links users to proprietary databases that offer annotations, such as protein domain alignments, as well as target and probe sequences for GeneChip arrays. Researchers can use the site to search array probe sets for particular sequences, review gene and protein annotations, and sort transcripts by a number of criteria, such as functional groups, metabolic pathways, or disease association. The Gene Ontology Mining Tool provides visualization mapping of probe sets to gene groups in detail or at a broad level.

4. An Array of Possibilities

A wealth of studies illustrate how the guidelines described in this chapter can be used to answer a variety of biological and medical questions. Applications range from probing biological processes, such as development (21,22) and circadian rhythms (23,24), to searching for predictors of disease and drug responsiveness (25). Cancer research is a rapidly growing field of application, in which arrays have helped investigators discover new tumor classes, assign patient samples to known tumor classes, predict clinical outcomes, reveal cancerassociated alterations in molecular pathways, and identify new drug targets (26). In one of the most comprehensive leukemia studies to date, for example, Yeoh and co-workers used GeneChip Human Genome U95A arrays to monitor the expression of more than 12,600 genes in leukemic blasts from 360 pediatric ALL patients (6). The study showed that through expression profiling, it is possible to not only classify all known leukemia subtypes that are prognostically relevant, but to identify patients that are at risk of failing conventional treatments. In addition, the array data supplied molecular candidates for developing new treatments, as well as suggested new diagnostic and subclassification Lescallett et al. markers. As often occurs when applying microarray techniques, the authors were able to extract valuable information about the whole genome relevant to multiple questions from their data sets.

5. Notes

1. 1000mL 12X MES Buffer

70.4 g MES free acid monohydrate193.3 g MES Sodium Salt800 mL of Embryo Transplant Microscopes and Emyro Transplant Microscopy Grade waterMix and adjust volume to 1000 mLThe pH should be between 6.5 and 6.7; pass through a 0.2 µm filter.3.

2. 50 mL 2X Hybridization Buffer 8.3 mL of 12X MES Stock17.7 mL of 5 M NaCl 4.0 mL of 0.5 M EDTA 0.1 mL of 10% Tween-2019.9 mL of waterStore at 2–8°C, and shield from light

3. 1000 mL Stringent wash buffer

83.3 mL of 12X MES stock buffer

5.2 mL of 5 M NaCl 1.0 mL of 10% Tween-20910.5 mL of water Pass through a 0.2 µm filter Store at 2–8°C and shield from light

4. 1000 mL Nonstringent wash buffer 300 mL of 20X SSPE

1.0 mL of 10% Tween-20699 mL of water Pass through a 0.2 µm filter

5. 250 mL 2X Stain buffer  41.7 mL 12X MES Stock buffer 92.5 mL 5 M NaCl 2.5 mL 10% Tween-20113.3 mL water Pass through a 0.2 µm filter Store at 2–8°C and shield from light

6. 10 mg/mL Goat IgG Stock

Resuspend 10 mg in 1 mL 150 mM NaCl Store at 4°C

7. When TRIzol is used to isolate total RNA it is recommended that a second cleanup  on the total RNA is performed in order to obtain sufficient cRNA yields. This can be done with QIAGEN RNeasy Total RNA isolation kit.

Gene Expression Monitoring With DNA Microarrays

8. The required amount of poly(A)+ starting material is 0.2–2.0 µg. There is a small sample protocol that can be used for limiting amount of starting total RNA material, please refer to www.affymetrix.com or to the GeneChip Expression Analysis Technical Manual.

9. The oligo T7-(dT)24 primer (5′ GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG (dT)24-3′, 100 pmol/µL) must be HPLC purified to achieve efficient cDNA synthesis and in  vitro transcription. Poorly made primer will lead to lower cRNA yield.

10. If Poly (A)+ is used, it is important to adjust the temperature of the first-strand  cDNA synthesis to 37°C from 42°C used for total RNA.

11. RNase treatment of the cDNA prior to the in vitro transcription is not recommended.

12. Prior to use, centrifuge all reagents briefly to ensure that the components remain  at the bottom of the tube. The product should not be used after the expiration date  stated in the label. If precipitation occurs in the reaction buffer, centrifuge briefly  to remove precipitate before use. The precipitation does not interfere with the  reaction.

13. The amount of cDNA used in the in vitro transcription reaction for poly (A)+ RNA  varies from the amount of total RNA used.

14. It is useful to save an aliquot of the unpurified IVT reaction for analysis by gel  electrophoresis.

15. The cRNA in the fragmentation reaction must be at a final concentration range of 0.5–2.0 µg/µL. If the sample is more dilute, perform an ethanol precipitation step before proceeding.

16. When preparing the hybridization cocktail, it is important to consider the probe array type being used because different arrays require different amounts of cRNA.

17. While pipeting the solution, be sure to avoid any insoluble material at the bottom of the tube.

18. Once the hybridization cocktail is pipeted out of the array and the array chamber is filled with the nonstringent buffer, it is possible to store the array at 4°C for up to 4 h before proceeding to the washing and staining steps. Be sure to equilibrate the probe array to room temperature before washing and staining.

19. Always store the SAPE reagent in the dark at 4°C (do not freeze). Be sure to mix the SAPE thoroughly, but gently, before adding to the rest of the reaction components. Always prepare the SAPE stain solution immediately before use.

20. The control oligonucleotide B2 should generate hybridization signals that trace the boundaries of the probe area. The controls appear as an alternating pattern of intensities with a checkerboard pattern at each corner and spell out the name of the array. In addition to serving as a positive control, the pattern is used by the software to align the array image with a grid. If the intensity of the checkerboard patterns is too high or too low, or if the pattern is distorted, the grid must be aligned manually.

21. One option is to apply a normalization method based on the intensities of 100 control probe sets.

Lescallett et al.

22. To establish whether a transcript is present in detectable amounts, evaluate the  level of signal saturation for each probe pair. If a MM probe is saturated (46,000  for the 2500 GeneArrayScanner), the signal from the corresponding PM probe is  uninformative, and the probe pair is discarded.

23. The default value of Tau is set at 0.015. Tau can be adjusted to balance sensitivity  and specificity. If the experiment is designed to achieve high sensitivity and avoid  false negatives, while tolerating some miscalls, Tau can be decreased. If the experiment  is designed to achieve high specificity, avoiding false positives, while missing  a few positive calls, Tau can be increased.

24. a1 and a2 default values change depending on the number of probe pairs.

25. The signal algorithm is designed to avoid generating negative signal values, which lack physiological meaning and can interfere with subsequent data processing. If a MM value is higher than a PM value, as a result of cross-hybridization, the uninformative MM is replaced with either an adjusted MM value calculated from the mean of the PM:MM ratios of the other probes in the set, or a value that is slightly lower than the PM and which results in an absent call.

26. If only one of the controls has a ratio above 3, do not automatically assume that the quality of the experimental data is compromised. The elevated ratio may be the result of transcript specific changes rather than low sample or assay quality. It is important to compare the outcomes of the various quality indicators, as well as accumulation of previous experiment results, before reaching a final assessment.

27. Before running an analysis, check the Expression Analysis Settings with particular attention to the scaling or normalization criteria.

28. Logarithms are used because hybridization behavior is best described by exponential functions. In addition, signal log ratios can provide more sensitive indicators of the differences between probe values than linear -fold changes. When the experimental and baseline values are very similar, log ratios outperform fold-change measurements. In addition, because the log scale used by the algorithm is base 2, the Signal Log Ratio is easily converted to a fold-change value, if desired. A value of 1.0 indicates a twofold increase, a value of -1.0 indicates a twofold decrease, and a value of 0 indicates no change at all. The algorithm also provides an estimate of the amount of variation in the data in the form of CIs, which are calculated based on the variation between probes in a set.

Acknowledgments

Some of the material in this review was derived from the Affymetrix GeneChip Expression Analysis Technical Manual. We are indebted to all who participated in its production. We would also like to thank Brian Shimada, Raji Pillai, Bob Kolovch, and Yan Zhang-Klompus for their editorial suggestions.

References

1. Lockhart, D. J. and Winzeler, E. A. (2000) Genomics, gene expression and DNA arrays. Nature 405, 827–836.

Gene Expression Monitoring With DNA Microarrays

2. Notterman, D. A., Alon, U., Sierk, A. J., and Levine, A. J. (2001) Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Res. 61, 3124–3130.

3. Tice, D. A., Szeto, W., Soloviev, I., et al. (2002) Synergistic induction of tumor antigens by wnt-1 signaling and retinoic acid revealed by gene expression profiling. J. Biol. Chem. 277, 14329–14335.

4. Ferrando, A., Neuberg, D., Staunton, J., et al. (2002) Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell 1, 75–87.

5. Pomeroy, S. L., Tamayo, P., Gaasenbeek, M., et al. (2001) Gene expression-based classification of outcome prediction of central nervous system embryonal tumors. Nature 415, 436–441.

6. Yeoh, E. J., Ross, M., Shurtleff, S., et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133–143.

7. MacDonald, T. J., Brown, K. M., LaFleur, B., et al. (2001) Expression profiling of medulloblastoma: PDGFRA and the ras/mapk pathway as therapeutic targets for metastatic disease. Nat. Genet. 29, 143–152.

8. Lockhart, D. J., Dong, H., Byrne, M. C., et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680.

9. Gerhold, D., Lu, M., Xu, J., Austin, C., Caskey, C. T., and Rushmore, T. (2001) Monitoring expression of genes involved in drug metabolism and toxicology using DNA microarrays. Physiol. Genomics 5, 161–170.

10. Fodor, S. P. A., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T., and Solas, D. (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251, 767–773.

11. Schmitt, M. E., Brown, T. A., and Trumpower, B. L. (1990) A rapid and simple method for preparation of RNA from Saccharomyces cerevisiae. Nucleic Acids Res. 18, 3091–3092.

12. Farrell, R. (1998) RNA Methodologies, Academic Press.

13. Tamayo, P., Slonim, D., Mesirov, J., et al. (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907–2912.

14. Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868.

15. Tavazoie, S., Hughes, J. D., Campbell, M. J., Cho, R. J., and Church, G. M. (1999) Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285.

16. Golub, T. R., Slonim, D. K., Tamayo, P., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537.

17. Shipp, M., Tamayo, P., Ross, K., et al. (2002) Diffuse large B-cell lymphoma outcome prediction by gene expression profiling. Nat. Med. 8, 68–74.

Lescallett et al.

18. Brown, M. P., Grundy, W. N., Lin, D., et al. (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262–267.

19. West, M., Blanchette, C., Dressman, H., et al. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 98, 11462–11467.

20. Khan, J., Wei, J. S., Ringner, M., et al. (2001) Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7, 673–679.

21. Muller, H., Bracken, A. P., Vernell, R., et al. (2001) E2Fs regulate the expression of genes involved in differentiation, development, proliferation, and apoptosis. Genes Dev. 15, 267–285.

22. Mody, M., Cao, Y., Cui, Z., et al. (2001) Genome-wide gene expression profiles of the developing mouse hippocampus. Proc. Natl. Acad. Sci. USA 98, 8862–8867.

23. Storch, K. F., Lipan, O., Leykin, I., et al. (2002) Extensive and divergent circadian gene expression in liver and heart. Nature 417, 78–83.

24. Ueda, H. R., Matsumoto, A., Kawamura, M., Iino, M., Tanimura, T., and Hashimoto, S. (2002) Genome-wide transcriptional orchestration of circadian rhythms in Drosophila. J. Biol. Chem. 277, 14048–14052.

25. Chicurel, M. and Dalma-Weiszhausz, D. (2002) Microarrays in pharmacogenomics: Advances and future promise. Pharmacogenomics 5, 589–601.

26. Chicurel, M. E. and Dalma-Weiszhausz, D. D. (2003) Oligonucleotide Microarrays. In: Expression profiling of human tumors (Ladanyi, M. and Gerald, W. L., eds.), Humana Press, Inc., Totowa, NJ.

Amplified Differential Gene Expression Microarray

Zhijian J. Chen and Kenneth D. Tew

Summary

Amplified Differential Gene Expression (ADGE) and DNA microarray provides a new concept that the ratios of differentially expressed genes are magnified prior to detecting them. The ratio magnification is achieved with the integration of DNA reassociation and polymerase chain reaction (PCR) amplification and ensured with the design of the adapters and primers. The ADGE technique can be used either as a stand-alone method or in series with DNA microarray. ADGE is used in sample preprocessing and DNA microarray is used as a displaying system in the series combination. The combination of ADGE and DNA microarray provides a mutual complement of their strengths: the magnification of ratios of differential gene expression improves the detection sensitivity; the PCR amplification and efficient labeling enhance the signal intensity and reduce the requirement for large amounts of starting material; and the high throughput for DNA microarray is maintained.

Key Words: ADGE, amplified differential gene expression, DNA microarray, gene expression



Author:
admin
Time:
Monday, May 26th, 2008 at 3:48 am
Category:
Embryo Transplant Microscope
Comments:
You can leave a response, or trackback from your own site.
RSS:
You can follow any responses to this entry through the RSS 2.0 feed.
Navigation:

Comments are closed.