ToxSci Advance Access originally published online on April 6, 2006
Toxicological Sciences 2006 92(2):560-577; doi:10.1093/toxsci/kfj184
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
HIGHLIGHTED ARTICLE |
A System-Based Approach to Interpret Dose- and Time-Dependent Microarray Data: Quantitative Integration of Gene Ontology Analysis for Risk Assessment



* Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, Washington 98105;
GenMAPP Development Team, Bioinformatics Research Associate/Conklin Lab Gladstone Institute of Cardiovascular Disease/UCSF, San Francisco, California 94158;
Cell and Molecular Biology Branch, U.S. Army Medical Research Institute of Chemical Defense, Aberdeen Proving Ground, Maryland 21010; and
Institute for Risk Analysis and Risk Communication, Environmental and Occupational Health Sciences, Seattle, Wasington
1 To whom correspondence should be addressed at Institute for Risk Analysis and Risk Communication, Department of Environmental and Occupational Health Sciences, University of Washington, 4225 Roosevelt Way NE, Suite #100, Seattle, WA 98105. Fax: (206) 616-4875. E-mail: faustman{at}u.washington.edu.
Received March 30, 2006; accepted April 1, 2006
| ABSTRACT |
|---|
|
|
|---|
Although microarray technology has emerged as a powerful tool to explore expression levels of thousands of genes or even complete genomes after exposure to toxicants, the functional interpretation of microarray data sets still represents a time-consuming and challenging task. Gene ontology (GO) and pathway mapping have both been shown to be powerful approaches to generate a global view of biological processes and cellular components impacted by toxicants. However, current methods only allow for comparisons across two experimental settings at one particular time point. In addition, the resulting annotations are presented in extensive gene lists with minimal or limited quantitative information, data that are crucial in the application of toxicogenomic data for risk assessment. To facilitate quantitative interpretation of dose- or time-dependent genomic data, we propose to use combined average raw gene expression values (e.g., intensity or ratio) of genes associated with specific functional categories derived from the GO database. We developed an extended program (GO-Quant) to extract quantitative gene expression values and to calculate the average intensity or ratio for those significantly altered by functional gene category based on MAPPFinder results. To demonstrate its application, we applied this approach to a previously published dose- and time-dependent toxicogenomic data set (J. F. Dillman et al., 2005, Chem. Res. Toxicol. 18, 2834). Our results indicate that the above systems approach can describe quantitatively the degree to which functional gene systems change across dose or time. Additionally, this approach provides a robust measurement to illustrate results compared to single-gene assessments and enables the user to calculate the corresponding ED50 for each specific functional GO term, important for risk assessment.
Key Words: microarray; GO analysis; dose and time dependent; quantitative.
| INTRODUCTION |
|---|
|
|
|---|
Microarray technology has become a powerful tool to explore the expression levels of thousands of genes or even complete genomes after exposure to toxicants. A major challenge in the interpretation of toxicogenomic data is the functional interpretation, linking potentially interrelated alterations in gene expression to conventional toxicological end points (Hamadeh et al., 2002
An approach using the data average from many different genes categorized into broad "omic categories" has been proposed (Jansen and Gerstein, 2000
). However, there is no consensus about the omic categories, and no systemic approach has been proposed to identify and calculate the average data. Since then, there has been a rapid development of the GO project, a collaborative effort to address the need for consistent descriptions of gene products in different databases (Ashburner et al., 2000
; Camon et al., 2004
; Harris et al., 2004
). The GO collaborators are developing structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components, and molecular functions in a species-independent manner. The use of GO terms by several collaborating databases facilitates uniform queries across them. The Gene Ontology identifier (GOID) for the term seems to be a perfect candidate for the proposed omic categories. Computer programs such as MAPPFinder are available to link gene expression data to the GO hierarchy and to statistically identify the GO terms altered by toxicant exposure (Doniger et al., 2003
). To facilitate quantitative interpretation of dose- or time-dependent genomic data, we propose a system-based approach to integrate the raw gene expression data with the GO Ontological analysis. We developed a computer program called GO-Quant that automatically links functional gene category analysis result from MAPPFinder with the original gene expression data and calculates the average intensity or ratio for those significant genes within each GO term. We applied this approach to demonstrate its application in a published dose- and time-dependent toxicogenomic data set (Dillman et al., 2005
). Our results indicate that the above systematic approach can describe quantitatively the degree to which functional gene systems change across dose or time course. Additionally, this approach provides a global measurement to illustrate our results compared to single-gene assessments and enables the user to calculate the corresponding ED50 for each specific functional GO term.
| SYSTEMS AND METHODS |
|---|
|
|
|---|
Gene expression data set.
In this paper, dose and temporal genomic data were obtained from Dillman et al. (2005)
Statistical analysis and comparisons.
Raw gene expression data were imported to BRB Array Tools v3.3 for statistical analysis (Wright and Simon, 2003
). A log2 transformation was applied to the data set. Each array was normalized by using the median intensity over the entire set of arrays. Since in this paper we are demonstrating our system approach by applying it to dose or time course microarray data, we conducted one-way ANOVAs across doses or time points. Statistical comparisons were conducted across doses (vehicle control, 1, 3, or 6 mg/kg) for each time point (0.5, 1, 3, 6, and 24 h) or across time points for each dose by using the "randomized variance" F test (Long et al., 2001
; Wright and Simon, 2003
). Significantly changed genes across doses (dose response) at specific time points (6 and 24 h) were selected based on one-way ANOVA test at p < 0.005 while significantly changed genes across time points (time course) at a dose of 6 mg/kg were selected based on one-way ANOVA test at p < 0.001 since there was a large number of genes changed with time. Hierarchical clustering analysis of functional gene category based on MAPPFinder results was performed across doses at two time points (6 and 24 h) or across time points at a dose of 6 mg/kg for the significantly changed genes. For the selected significant genes, ratios were derived by dividing each normalized gene value by the average value of the controls (vehicle) and then transforming the ratios to their respective log2 ratios. These log2-transformed ratios were used as input for the TIGR Multiexperiment Viewer (MEV) (Saeed et al., 2003
). Hierarchical clustering analysis was conducted on the output genes using average linkage and Euclidean dissimilarity methods (Eisen et al., 1998
). The complete output of the cluster analysis is shown with the fold change indicated colorimetrically.
GO analysis.
To establish associations between treatment and the affected GO terms and pathways, we used MAPPFinder (http://www.GenMapp.org) (Doniger et al., 2003
), which links gene expression data to the GO hierarchy (Ashburner et al., 2000
; Camon et al., 2004
; Harris et al., 2004
). MAPPFinder calculates a Z score as well as a permutation test p value for the genes that are significantly altered across doses (p
0.005) at 6 or 24 h or across time points (p
0.001) at a dose of 6 mg/kg doses. Since we proposed to use combined average raw gene expression values (e.g., intensity or ratio) of genes associated with specific functional gene categories derived from the MAPPFinder results and there are two directions of gene expression changes at a certain dose (or time), either upregulation or downregulation within a specific gene category, the average of these two different directions of gene expression alteration will mask the degree of changes. Therefore, the upregulated or downregulated genes will be calculated separately in the MAPPFinder. In this study we found simple patterns of expression of a consistent upregulation or consistent downregulation across dose and time based on K-means cluster analysis (Soukas et al., 2000
). K-means cluster analysis was used to group individual genes across doses (or times) into two groups classified as up- or downexpression based on the trend with dose (or time) of the mean expression pattern. The number of clusters used in our analysis was chosen based upon having the mean expression patterns against dose (or time), and this assumption of using only two clusters was tested by examining using more than two clusters. To test the robustness of this method, we compared the K-means cluster analysis with a manual analysis of the trend. In the manual analysis of trend, we assigned an upregulation trend of a gene when the gene expression ratio was significantly elevated in at least three of the four groups or elevated at both SM 3 and 6 mg/kg groups. The trend of downregulation of a gene across groups was assigned when the gene expression level was significantly downregulated in at least three of the four groups or downregulated at both SM 3.0 and 6 mg/kg groups. The manual analysis of trend tended to create isolated clusters with few genes and larger clusters poorly represented by their mean expression patterns.
Each GO node and the cumulative total of the number of genes either upregulated or downregulated in a parent GO term combined with all its children were calculated. Z scores, a statistical measure of significance for gene expression in a given group, were calculated by subtracting the number of genes expected to be randomly changed in a GO term from the observed number of changed genes in that GO term. This value was then divided by the SD of the observed number of genes under a hypergeometric distribution. The equation can be written out as
![]() | (1) |
0.05.
|
|
Quantitative integration of GO analysis to evaluate dose- or time-dependent genomic data.
MAPPFinder links gene expression data to the GO hierarchy and statistically identifies the GO term altered by the toxicant exposure (Doniger et al., 2003
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Framework for the Quantitative Functional Interpretation of Dose- and Time-Dependent Genomic Data
As shown in Figure 1, typically three main steps are included in the framework for the quantitative functional interpretation of dose- and time-dependent genomic data. The first step is a traditional microarray analysis, aimed at identifying the statistically significant genes across dose or time course by using the ANOVA method. The next step is to identify significantly altered functional GO gene categories or pathways by using unsupervised GO analysis or a pathway-mapping tool such as MAPPFinder. The last step is to extract the quantitative gene expression data and calculate the average intensity or ratio for those genes significantly altered within each functional gene category based on MAPPFinder by using our GO-Quant program. In order to demonstrate its power in quantitative evaluation of dose- and time-dependent microarray data, we applied this approach to a published dose- and time-dependent toxicogenomic data set in rats treated with SM.
|
Table 1 shows the outline of gene alteration across dose and time. Time course of the total number of significantly changed genes across doses (0, 1, 3, or 6 mg/kg) for each time point (0.5, 1, 3, 6, and 24 h) by one-way ANOVA comparisons (p
0.005) is listed at the right side of the table. The number of significantly changed genes increased in a time-dependent manner, peaking at 6 h. Dose-dependent increase in the number of significantly changed genes (p
0.005) across time points (0.5, 1, 3, 6, and 24 h) was listed in the left side of the table. Less than 137 genes changed with isopropyl alcohol across the time period observed, and more than 2600 genes changed with a dose of 6 mg/kg of SM.
|
Dose-Dependent Alteration of Gene Expression after Injection of SM
Hierarchical clustering analysis.
Figure 2 shows the hierarchical cluster analysis of gene expression alterations across dose. Significant gene expression changes were seen in a dose-dependent manner after injections of SM at 6 (A) and 24 h (B). Genes significantly changed across doses (p
0.005) at 6 (737 genes) or 24 h (441 genes) after injection were chosen to conduct the hierarchical clustering analysis. Examples of dose-dependent changes included downregulation of protein-tyrosine phosphatase receptor type G and upregulation of cyclin-dependent kinase inhibitor 1A and cyclin G1 transcription after 6 h postinjection (A and B clusters of Fig. 2A). Additionally, dose-dependent downregulation of T-cell receptor beta-chain and granzyme M and upregulation of cyclin-dependent kinase inhibitor 1A and cyclin G1 transcription were observed 24 h postinjection (A and B clusters in Fig. 2B). However, based solely on the above statistical analysis, it is difficult to interpret these example changes in gene expression within a biological context.
|
GO analysis of SM-induced genes alterations.
We used MAPPFinder to find significant changes within each GO functional category. MAPPFinder linked the 11,212 probe sets measured in this experiment to the GO database. A total of 4485 probes were linked to an Ensembl ID and 3144 genes linked to a GO term in the hierarchy. MAPPFinder calculates a Z score as well as a permutation test p value for the genes that are significantly altered across doses (p
0.005) at 6 or 24 h, or across time points (p
0.001) at a dose of 6 mg/kg. MAPPFinder calculated the percentage of genes meeting the criteria of either a statistically significant increase or decrease in gene expression across dose (or time). In addition, a Z score for each GO term was calculated based on Formula 1 to identify which GO terms had a significant number of altered genes. Since there are two directions of gene expression alteration after treatment at certain a gene category, the separation of the genes based on their trend (either upregulation or downregulation) of average gene expression pattern across dose (or time) allows to quantitatively evaluate the degree of changes across dose or time. In this study we identified two simple patterns of either consistent upregulation or consistent downregulation for gene expression across doses (or time) within GO terms based on K-means cluster analysis. MAPPFinder results represent a global picture of biological processes, cellular components, and molecular functions that are significantly altered (upregulated or downregulated) at the transcriptional level after treatment (Tables 2 and 3). Significant downregulations of genes involved in protein modification (GOID 6464), such as phosphorylation (GOID 6468) and signaling transduction (GOID 7165, 8277, 7242, 9966, and 9968), were observed based on biological process gene category 6 h after SM injection (Table 2). In contrast, significant upregulation of genes involved in the cell cycle (GOID 7049) was observed based on biological process gene category 6 h after SM injection (Table 2). As shown in Table 3, in the biological process, genes in cell cycle, especially in M phase, such as M phase of mitotic cell cycle (GOID 87), mitosis (GOID 7067), cell division (GOID 51301), M phase (GOID 279), and mitotic cell cycle (GOID 278), were upregulated. Transferase activity (GOID 16746) was also upregulated. Significant downregulation in immune responses (GOID 6955) such as cell activation (GOID 1775) and cell adhesion (GOID 7155) was observed.
|
Consistent with the previously published analysis (Dillman et al., 2005
Quantitative Integration of GO Analysis to Evaluate Dose Response
Toxicogenomic data can provide profoundly important and novel information to the field of risk assessment. However, we are currently facing the difficult challenge of how to quantitatively interpret dose- or time-dependent relationships resulting from genomic approaches. As presented, the conventional GO analysis does not provide any quantitative information about microarray data in a dose-dependent manner, and no methodology is available to achieve quantitative analysis of pathway-specific information. Thus, we developed a framework to evaluate functional genes from microarray data, as illustrated in Figure 1, and propose the use of such data within a statistically significant functional gene category, such as a specific GO term in the GO database. The above calculation includes a large number of genes within each functional gene category, and there is currently no tool available to conduct such a task. We developed a computer program called GO-Quant, which automatically links functional GO analysis results from MAPPFinder with the original microarray data set, to calculate the average intensity or ratio for those significant genes within each GO term. Figure 3 shows the hierarchical cluster analysis of quantitative GO results across SM treatments after 6 (Fig. 3A) or 24 h (Fig. 3B). Compared to the conventional qualitative GO analysis shown in Tables 2 and 3, Figure 3 shows the dose-dependent alterations of each functional gene category (GOID) following SM treatment. Cell cycle genes (GOID 7049) were dose dependently upregulated 6 h after SM injection (Fig. 3A), and cell cycle alterations, especially in M phase pathway genes (GOID 87, 279, 278, 7067, and 51301), were dose dependently upregulated 24 h after SM injection (Fig. 3B). Dose-dependent downregulation of protein modification gene pathways, such as phosphorylation, were observed at the 6 h time point, while dose-dependent downregulation of immune response gene pathway (GOID 6952, 1775, and 45321) was observed at 24 h after SM injection (Fig. 3B). Figures 4 and 5 further demonstrate the dose response of quantitative gene expression pathway alterations within each functional gene category in biological process (Fig. 4) and molecular function genes (Fig. 5). Significant upregulation in cell cycle gene pathways becomes obvious at 3 mg/kg both at 6 (Fig. 4A) and at 24 h (Fig. 5A) after injection. Our results confirm that the above system approach is a more robust and clear measurement to determine the degree to which a functional gene system changes across dose. As an alternative to representing annotations by the average of the annotated genes, the sum or the median expression profile could also be calculated using the GO-Quant. In most cases, the resulting plots are virtually identical (data not shown). Such an approach gives one the ability to calculate the ED50 for a specific functional GO term, which is important for risk assessment.
|
|
|
Time Course Gene Alteration after a Single Injection of SM at a Dose of 6 mg/kg
In order to demonstrate the above application to the time course within this microarray data set, we further applied the above approach as illustrated in Figure 1 to analyze the time course data after the single injection of SM at a dose of 6 mg/kg. Figure 6 shows the hierarchical clustering analysis of 1111 genes whose transcription was significantly (p
0.001) altered across the selected time points (6 mg/kg SM). Using cluster analysis, we identified changes in transcription in specific genes, e.g., upregulation of heme oxygenase 1, cyclin G, and cyclin-dependent kinase inhibitor 1A and downregulation of protein kinase C and chemokine ligand 12. As discussed, it is challenging to interpret these previous results in a biological context. Table 4 shows the list of GOID from the MAPPFinder analysis representing a global picture of biological processes, cellular components and molecular function genes that are significantly altered (upregulated or downregulated) at the transcriptional level following SM exposure at 6 mg/kg (Table 4). As shown in Figures 7 and 8, GO-Quant analysis revealed time-dependent alterations in various functional gene categories within biological process, cellular component, and molecular function genes. In the biological process gene category, time-dependent downregulation in immune cell activation (GOID 45321, 46649, and 1775), cell division (GOID 51310), and response to drug (GOID 42493) were observed, especially at 6 h after SM exposure. In addition, time-dependent upregulations in the regulation of cell cycle (GOID 7050 and 74), protein ubiquitination (GOID 16567) gene pathways were also observed, particularly at 6 h postexposure. Our results confirm that the above systems approach allows us to demonstrate time-dependent alteration in cellular and molecular gene function based on GO categorization. It is not our focus in this paper to describe why these functional transcriptional alterations after SM injection are observed. Our current analysis is intended to demonstrate that our systematic approach to applying the average data within the statistically significant GOID is a powerful tool to quantitatively describe and interpret dose- or time-dependent relationships discovered using microarray data.
|
|
|
| CONCLUSION |
|---|
|
|
|---|
Using gene expression profiling following SM exposure, we have conducted a conventional GO analysis and compared it with our proposed quantitative approach for functional GO analysis. Our results confirmed that GO analysis is a powerful approach to generate an unbiased view of the functional gene alterations by SM. However, the quantitative approach we propose using combined average raw gene expression values (e.g., intensity or ratio) of genes associated with specific functional gene categories derived from the MAPPFinder results in conjunction with the GO-Quant algorithm provides a powerful tool to interpret microarray data in a dose- or time-dependent manner. Additionally, this approach provides a global measurement based on pathway response as compared to single-gene assessments and enables one to calculate the corresponding ED50 for each specific functional GO term, which is important for risk assessment. The application of such quantitative interpretation of toxicogenomic data is likely to become increasingly useful for the interpretation of toxicogenomic data generated for evaluating mechanistic similarity of novel chemicals.
| ACKNOWLEDGMENTS |
|---|
This work was supported in part by a toxicogenomic grant from National Institute of Environmental Health Science (NIEHS) (U10 ES 11387), the US Environmental Protection Agency-NIEHS UW Center for Child Environmental Health Risks Research (EPA R826886 and NIEHS 1PO1ES09601), the NIEHS grant R01-ES10613, and Center for Oceans and Human Health Research (NIEHS: P50 ES012762 and NSF (National Science Foundation): OCE-0434087), and UW NIEHS Center for Ecogenetics and Environmental Health (5 P30 ES07033).
| REFERENCES |
|---|
|
|
|---|
Al-Shahrour, F., Diaz-Uriarte, R., and Dopazo, J. (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20, 578580.
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., et al. (2000). Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 2529.[CrossRef][ISI][Medline]
Bammler, T., Beyer, R. P., Bhattacharya, S., Boorman, G. A., Boyles, A., Bradford, B. U., Bumgarner, R. E., Bushel, P. R., Chaturvedi, K., Choi, D., et al. (2005). Standardizing global gene expression analysis between laboratories and across platforms. Nat. Methods 2, 351356.[CrossRef][ISI][Medline]
Beissbarth, T., and Speed, T. P. (2004). GOstat: Find statistically overrepresented gene ontologies within a group of genes. Bioinformatics 20, 14641465.
Camon, E., Magrane, M., Barrell, D., Lee, V., Dimmer, E., Maslen, J., Binns, D., Harte, N., Lopez, R., and Apweiler, R. (2004). The gene ontology annotation (GOA) database: Sharing knowledge in Uniprot with gene ontology. Nucleic Acids Res. 32, D262D266.
Currie, R. A., Bombail, V., Oliver, J. D., Moore, D. J., Lim, F. L., Gwilliam, V., Kimber, I., Chipman, K., Moggs, J. G., and Orphanides, G. (2005). Gene ontology mapping as an unbiased method for identifying molecular pathways and processes affected by toxicant exposure: Application to acute effects caused by the rodent non-genotoxic carcinogen diethylhexylphthalate. Toxicol. Sci. 86, 453469.
Dennis, G., Jr, Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C., and Lempicki, R. A. (2003). DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol. 4, 3.
Dillman, J. F., III, Phillips, C. S., Dorsch, L. M., Croxton, M. D., Hege, A. I., Sylvester, A. J., Moran, T. S., and Sciuto, A. M. (2005). Genomic analysis of rodent pulmonary tissue following bis-(2-chloroethyl) sulfide exposure. Chem. Res. Toxicol. 18, 2834.[CrossRef][ISI][Medline]
Doniger, S. W., Salomonis, N., Dahlquist, K. D., Vranizan, K., Lawlor, S. C., and Conklin, B. R. (2003). MAPPFinder: Using gene ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 4, R7.[CrossRef][Medline]
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95, 1486314868.
Hamadeh, H. K., Bushel, P. R., Jayadev, S., Martin, K., DiSorbo, O., Sieber, S., Bennett, L., Tennant, R., Stoll, R., Barrett, J. C., et al. (2002). Gene expression analysis reveals chemical-specific profiles. Toxicol. Sci. 67, 219231.
Harris, M. A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., Eilbeck, K., Lewis, S., Marshall, B., Mungall, C., et al. (2004). The gene ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258D261.
Jansen, R., and Gerstein, M. (2000). Analysis of the yeast transcriptome with structural and functional categories: Characterizing highly expressed proteins. Nucleic Acids Res. 28, 14811488.
Long, A. D., Mangalam, H. J., Chan, B. Y., Tolleri, L., Hatfield, G. W., and Baldi, P. (2001). Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of global gene expression in Escherichia coli K12. J. Biol. Chem. 276, 1993719944.
Moggs, J. G. (2005). Molecular responses to xenoestrogens: Mechanistic insights from toxicogenomics. Toxicology 213, 177193.[CrossRef][ISI][Medline]
Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., et al. (2003). TM4: A free, open-source system for microarray data management and analysis. Biotechniques 34, 374378.[ISI][Medline]
Soukas, A., Cohen, P., Socci, N. D., and Friedman, J. M. (2000). Leptin-specific patterns of gene expression in white adipose tissue. Genes Dev. 14, 963980.
Wright, G. W., and Simon, R. M. (2003). A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19, 24482455.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. S. Thomas, B. C. Allen, A. Nong, L. Yang, E. Bermudez, H. J. Clewell III, and M. E. Andersen A Method to Integrate Benchmark Dose Estimates with Genomic Data to Assess the Functional Effects of Chemical Exposure Toxicol. Sci., July 1, 2007; 98(1): 240 - 248. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











