ToxSci Advance Access originally published online on April 25, 2008
Toxicological Sciences 2008 104(2):412-418; doi:10.1093/toxsci/kfn083
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Automated Quantitative Dose-Response Modeling and Point of Departure Determination for Large Toxicogenomic and High-Throughput Screening Data Sets
Toxicogenomic Informatics and Solutions, LLC, Lansing, Michigan 48909
1 To whom correspondence should be addressed at Toxicogenomic Informatics and Solutions, LLC, P.O. Box 27482, Lansing, MI 48909. E-mail: burgoonL{at}txisLLC.com. Fax: (888) 843-6822.
Received January 15, 2008; accepted April 18, 2008
| ABSTRACT |
|---|
|
|
|---|
Regulatory and homeland security agencies undertake safety and risk assessments to assess the potential hazards of radiation, chemical, biological, and pharmaceutical agents. By law, these assessments must be science-based to ensure public safety and environmental quality. These agencies use dose-response modeling and benchmark dose methods to identify points of departure across single end points elicited by the agent. Regulatory agencies have also begun to examine toxicogenomic data to identify novel biomarkers of exposure and assess potential toxicity. The ToxResponse Modeler streamlines analyses and point of departure (POD) calculations across hundreds of responses (e.g., differential gene expression, changes in metabolite levels) through an automated process capable of large-scale modeling and model selection. The application identifies the best-fit dose-response model utilizing particle swarm optimization and calculates the probabilistic POD. The application analyzed a publicly available 2,3,7,8-tetrachlorodibenzo-p-dioxin dose-response data set of hepatic gene expression data in C57BL/6 mice to identify putative biomarkers. The Gene Ontology mapped these responses to specific functions to differentiate adaptive effects from toxic responses. In principle, safety and risk assessors could use the automated ToxResponse Modeler to analyze any large dose-response data set including outputs from high-throughput screening assays to assist with the ranking and prioritization of compounds that warrant further investigation or development.
Key Words: bioinformatics; biomarkers; dose-response; mechanisms.
| INTRODUCTION |
|---|
|
|
|---|
The Food and Drug Administration (FDA), the Environmental Protection Agency (EPA), and their counterparts anticipate the submission of toxicogenomic data in support of sponsor submitted applications (Benson et al., 2007
Typically, regulatory agencies use benchmark dose (BMD) methods to calculate the point of departure (POD): the dose or concentration of an agent (e.g., chemical, drug, biological, radiation) where the level of a biomarker in an exposed population deviates from the level in an untreated population. Similarly, the EPA defines a POD as: "The dose-response point that marks the beginning of a low-dose extrapolation. This point is most often the upper bound on an observed incidence or on an estimated incidence from a dose-response model" (http://www.epa.gov/IRIS/gloss8_arch.htm). POD is used with a series of uncertainty factors, to set safe exposure and environmental target goals for drugs and chemicals (Gaylor and Aylward, 2004
). There are two predominant methods for calculating BMD: the relative ED01 and the risk-based BMD (Gaylor and Aylward, 2004
). The relative ED01 uses the 95% lower confidence limit (benchmark dose 95% lower confidence limit) calculated upon the dose that elicits 1% of the maximum response, beyond the mean background or untreated response. The risk-based BMD is calculated where the 1st or 99th percentile of untreated control levels are used to represent an abnormal response. The advantage of calculating the risk-based BMD is that it does not require a maximal response. This allows for risk assessments based on nonasymptotic models, thereby eliminating the requirement for high dose groups, with the potential of decreasing animal use.
Clustering toxicogenomic data (e.g., differential gene expression), and phenotypically anchoring to more apical responses (e.g., cell death, vacuolization, alanine aminotransferase [ALT]), may result in multiple PODs for different effects. These putative biomarkers of exposure and toxicity have potential uses in clinical trial monitoring, patient drug-compliance, and tort actions. In addition, exposure biomarkers have homeland security applications, including identification of an exposure agent based on biomarker profiles, thus facilitating countermeasure actions. However, prior to any use in safety and risk assessment, the most sensitive biomarkers must be validated as being associated with toxicity.
Although software currently exists for calculating BMD (http://www.epa.gov/ncea/bmds/index.html), and for calculating BMD based on gene expression (Thomas et al., 2007
), advanced knowledge of model fitting and statistics is required. Furthermore, the latter software calculates models based on functional groups of genes based on their Gene Ontology categories. Our cross-platform Java application fits dose-response curves in an automated fashion, on a per-feature (i.e., gene, protein, or metabolite) basis. Investigators can use the resulting models to calculate BMDs using either the relative ED50 or the risk-based BMD (Fig. 1). In this report, dose-response hepatic cDNA microarray data from immature ovariectomized C57BL/6 mice treated with 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) was used to illustrate the utility of the application.
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Dose-response data set.
A previously published mouse TCDD dose-response microarray data set was used for this study (Boverhof et al., 2005
Continuous dose-response models.
Dose-response data exhibit a range of shapes best fitted using exponential, linear, Gaussian, quadratic or sigmoidal dose-response models (Table 1). Gaussian models include five parameters capable of attaining multiple maxima (minima) due to their exponential nature. The advantages of the sigmoidal model, over the Hill model that is typically used, are the increased number of parameters that allow additional curve-fitting flexibility. However, it is possible to convert a sigmoidal model to the Hill model through algebraic manipulation. The algorithm calculates ED50 values as the dose that produces 50% of the maximal response. The Java 6 JRE (http://java.sun.com) is required to run the ToxResponse Modeler.
|
Particle swarm optimization for model fits.
A combination of the particle swarm optimization (PSO) algorithm (Eberhart and Kennedy, 1995
In PSO, each individual in the swarm is an independent dose-response model that belongs to a clique, or social group (Supplementary Fig. 1). Cliques are randomly assigned to individuals at the start of the PSO algorithm. Clique membership does not change during the course of the algorithm, and is not dependent upon parameter values. Members of the clique move toward the member whose model is the closest fit to the goal, but ignore members from other cliques. Distance from each member to the goal is determined using Euclidean distance. In this study, the application iteratively changes the model parameters to identify the best-fit model for a specific gene. Thus, the PSO algorithm is attempting to minimize the Euclidean distance.
This process of identifying the closest members within a clique, updating the parameters, and moving the members continues until the application finds an exact match, or it reaches the maximum number of iterations. Use of cliques prevents the model from falling into a local minima and not identifying an appropriate best-fit model. The application randomly assigns the initial model parameters and cliques. A weighted voting method (http://www.ctl.ua.edu/math103/POWER/wtvoting.htm) is then used to compare the best exponential, linear, Gaussian, quadratic and sigmoidal models, in order to select the overall best-fit model across all classes, analogous to selecting the best in show across all breeds in a dog show. After finding the overall best-fit model (e.g., best in show), the application reinitializes the PSO to examine the next response (e.g., gene) in the data file.
The number of iterations per class for each response, the number of models considered within a class, and the clique size are all user-defined parameters. In this study, the application used 1000 models within a class per gene divided into five member cliques (i.e., 200 cliques), and 3000 iterations within each class to identify the best dose-response model for a gene.
Overview of automated dose-response modeling and user interpretation.
An overview of the automated dose-response method is given (Fig. 1). Briefly, (1) a user starts the application, and loads a large-scale dose-response data set. (2) The application processes the data one feature at a time. Examples include mRNA, protein, or metabolite levels as well as enzyme or binding activities at each dose within a study. Note that the application works on each feature individually to identify the best in model within a class (PSO) for all five model classes, and then determine the best overall model among the five classes (weighted voting). (3) The application initializes the PSO algorithm by randomizing model parameters and assigning cliques. For instance, the application would randomly assign values for k, a, and c for each exponential model, and then assign each model to a clique. (4) The PSO identifies the closest (best-fit) model in each clique at the end of an iteration, and moves the members of each clique toward that model. The closest model is the one whose predicted dose-response pattern is the most similar to the feature's dose-response pattern as measured by the Euclidean distance. (5) The iterative PSO algorithm ends once it has identified an exact match best-fit model, or once the algorithm has used all of the iterations. The application repeats Steps 3 through 5 for each model class within the same feature, thus generating best-fit models for the exponential, Gaussian, quadratic, linear, and sigmoidal classes (the within-class best-fit models). (6) The application compares the Euclidean distance for each within-class best-fit model to identify the best overall model using a weighted vote method. The vote is weighted such that the within-class best-fit model with the smallest Euclidean distance relative to the feature dose-response will become the overall best-fit model. (7) The application uses the best overall model to calculate EDn and POD values (if applicable). These values are used to rank and prioritize putative biomarkers or chemical activities. 8) Model-based clusters may provide additional mechanistic insight through integration of potency and POD data with functional annotation and phenotypic anchoring. For example, EDn and POD data may generate model-based clusters for lipid metabolism and transport gene expression that may be associated with the occurrence of hepatic vacuolization and lipid accumulation. (9) Through complementary comparative studies using toxic and nontoxic congeners in responsive and nonresponsive species across time, data may emerge that differentiates biomarkers of exposure from toxicity-related responses which can use to support mechanistically based quantitative risk assessments, clinical monitoring, and drug candidate selection that warrant further development.
Probabilistic POD.
In this study, the POD is defined as the point at which the upper 95% confidence limit for the vehicle response intersects the lower 95% confidence limit for the treated response based on parametric assumptions.
| RESULTS |
|---|
|
|
|---|
TCDD Dose-Response Modeling Results
The ToxResponse Modeler analyzed 238 differentially expressed genes, represented by 278 features, from a previously published TCDD dose-response cDNA microarray data set (Boverhof et al., 2005
|
|
The potency analysis excluded genes/features exhibiting nonsigmoidal dose responses, as it requires ED50 values. ED50 values for the sigmoidal responses ranged from 0.01 to 177.70 µg/kg, with the majority exhibiting ED50 values from 0 to 10 µg/kg, whereas the PODs ranged from 0.01 to 266.09 µg/kg. Because several probes/features may represent different regions of the same gene, the application reports ED50 and POD values on a per-feature basis.
ED50 and POD Trends Across Gene Functions
Clustering sigmoidal dose-response genes based on functional annotation identified several interesting trends in ED50 and POD patterns. The application assigned retained features to one of four response categories based on their ED50 and POD values (Table 3). Oxidoreductases were the most sensitive dose responsive group with ED50 values ranging from 0.07 to 0.99 µg/kg across eight genes. The ubiquitin pathway (ED50s from 4.81 to 9.90 µg/kg for six genes) as well as lipid and fatty acid metabolism (ED50s from 0.72 to 34.54 µg/kg for six genes) groups were moderately sensitive. Genes associated with cell cycle (0.57–18.89 µg/kg for ten genes), oxidative stress (0.79–91.34 µg/kg across four genes), and apoptosis (0.74–44.26 µg/kg for five genes), exhibited a range of sensitivities.
|
Identification of Putative Biomarkers
We define putative biomarkers as those biomarkers, which differentiate treated from vehicle (or unexposed) animals. We base our criteria on the ranking and prioritization of each biomarker's POD. The most sensitive differential gene expression responses, included Adam2, Cyp1a1, Wap, Dspg3, CutA, and Pak1ip1, with PODs ranging from 0.01 to 0.12 µg/kg and maximal fold changes from 1.5 to 21.9, with an average of 6.4 (Table 4). However, further investigation is required to determine if they are mechanistically associated with TCDD elicited toxicity in order to differentiate adaptive effects from toxic responses.
|
| DISCUSSION |
|---|
|
|
|---|
Automating dose-response analysis, from model selection to parameter fitting, provides a systematic and consistent workflow for data handling, management, and the analysis of large-scale dose-response data. Extending this to include higher-order analyses (e.g., PODs) also supports objective computational efforts in mechanistically based quantitative risk assessment.
Figure 1 summarizes the workflow from data modeling to interpretation. The application reads in the data file, and performs the PSO to identify the best-fit dose-response model for each response (e.g., best in show for change in mRNA, metabolite, or enzyme activity level). The dose-response models calculate EDn and POD values for ranking and prioritization of putative biomarkers that warrant further investigation. Incorporating functional annotation, phenotypic anchoring, and complementary mechanistic studies can then associate these responses to exposure (i.e., biomarkers that can be used to identify low-dose exposures), or toxicity (i.e., biomarkers that are mechanistically tied to toxicity).
ToxResponse Modeler iteratively identifies the best fit among five model classes for each response (e.g., gene expression). For example, it identified 157 sigmoidal dose responses from 238 active genes, with ED50 values ranging from 0.01 to 177.70 µg/kg using a TCDD dose-response data set (Boverhof et al., 2005
). Not surprisingly, highly sensitive genes included members of the Ah receptor gene battery (Cyp1a1, Nqo1, and Xdh). Other responses included the oxidative stress gene Gsta4, cell cycle genes Cdc37, Ccnd3, Rbl2, and Jund1, oxidoreductases Dhrs3, Ugdh, and Aldh16a1, and the apoptosis related genes Hip1 and Hif1a, as well as the lipoprotein lipase Lpl.
Grouping genes based on function and their ED50 values identified patterns of sensitivity that may facilitate distinguishing adaptive effects from primary (activation of AhR) toxic responses (as opposed to secondary mechanisms of toxicity where gene products activated through AhR mediate further transcriptional responses) in this data set. For instance, all oxidoreductases were highly sensitive to TCDD, whereas differentially expressed genes associated with the ubiquitin pathway were only moderately sensitive. The oxidative stress and cell cycle groups exhibited highly sensitive to relatively resistant responsiveness. Most lipid and fatty acid metabolism genes were moderately sensitive, with the exception of Lpl, which was highly sensitive. Similar patterns emerged when PODs were used.
In addition, PODs and ED50s further elucidate mechanisms associated with low-dose TCDD responses. For example, the induction of oxidoreductase genes may increase overall reactive oxygen species (ROS) levels, requiring compensation through the induction of Gsta4 to conjugate increasing levels of secondary electrophiles produced from ROS (Hayes et al., 2005
). Thus, glutathione synthesis from Gss (moderately sensitive) would only be triggered as GSH levels become depleted at higher doses of TCDD. Consequently, the need for redox cycling of GSH becomes more urgent, resulting in the induction of the less responsive Gsr, glutathione reductase 1 gene (Hayes et al., 2005
).
These results are significant based on the pathology seen in the original paper. For instance, the mice experienced significant inflammation and triglyceride accumulation at 168-h postexposure. These results correlate with the clinical pathology results showing statistically significant increases in ALT at 168 h, triglycerides at 24 and 168 h, free fatty acids at 24 and 168 h, and a decrease in cholesterol at 72 and 168 h (Boverhof et al., 2005
). It has been shown that induction of the oxidoreductases that are part of the AhR gene battery contribute to reactive oxygen species (ROS) formation, which may lead to cellular oxidative stress, lipid peroxidation, and DNA fragmentation (Bagchi et al., 2002
; Barouki and Morel, 2001
). Thus, activation of the AhR gene battery is generally seen as being a significant part of the TCDD toxicity.
The genes involved in the fatty liver response appear to be moderately sensitive to TCDD. Their dose-dependent and temporal profiles precede or parallel the fatty liver response, suggesting they may mediate this response (Boverhof et al., 2005
). The role these genes play in the overall toxicity, and their regulation remains unclear. However, being moderately sensitive to TCDD suggests they may have lower affinity dioxin response elements than the AhR gene battery, that they may require additional cofactors to be present for transcriptional activation, or that their expression may be secondary to AhR activation.
Ranking and prioritizing genes by their PODs can also identify putative biomarkers. In addition to the well-characterized induction of Cyp1a1 (Hankinson, 2005
; Okino and Whitlock, 1995
; Wu and Whitlock, 1992
), Adam2, Wap, Dspg3, CutA, and Park1ip1 were also identified as potential biomarkers. However, further investigation is required to assess these responses as biomarkers and to determine their roles in toxicity. Complementary studies with other AhR ligands in other species and target tissues as well as by examining other end points are needed to further assess the robustness of these putative biomarkers. Comparable approaches could also be used to identify biomarkers for clinical or environmental monitoring of drugs, chemicals, nutriceuticals, and other regulated materials, including weaponized biological and chemical warfare agents.
| CONCLUSION |
|---|
|
|
|---|
The ToxResponse Modeler provides automated dose-response modeling capable of calculating EDn and POD to support drug/chemical ranking and prioritization that warrant further investigation and the identification of putative biomarkers of exposure and toxicity. Although this study examined microarray data, any large dose-response data set, including proteomic, metabolomic, and HTP screening assay outputs, would be amenable to ToxResponse Modeler analysis. Ideally, clustering function or structure with ED50 and POD values as well as phenotypic end points will not only identify more relevant biomarkers, but will also differentiate adaptive effects from adverse responses, thus further elucidating mechanisms of toxicity, and supporting mechanistically based quantitative risk assessment efforts.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary data are available online at http://toxsci.oxfordjournals.org/.
| FUNDING |
|---|
|
|
|---|
This project was supported by internal funds from Toxicogenomic Informatics and Solutions, LLC.
| ACKNOWLEDGMENTS |
|---|
We would like to thank Dr Jeremy Burt for his comments on this manuscript.
| REFERENCES |
|---|
|
|
|---|
Bagchi D, Balmoori J, Bagchi M, Ye X, Williams CB, Stohs SJ. Comparative effects of TCDD, endrin, naphthalene and chromium (VI) on oxidative stress and tissue damage in the liver and brain tissues of mice. Toxicology (2002) 175:73–82.[CrossRef][Web of Science][Medline]
Barouki R, Morel Y. Repression of cytochrome P450 1A1 gene expression by oxidative stress: Mechanisms and biological implications. Biochem. Pharmacol. (2001) 61:511–516.[CrossRef][Web of Science][Medline]
Benson WH, Gallagher K, McClintock JT. U.S. Environmental Protection Agency's activities to prepare for regulatory and risk assessment applications of genomics information. Environ. Mol. Mutagen. (2007) 48:359–362.[CrossRef][Web of Science][Medline]
Boverhof DR, Burgoon LD, Tashiro C, Chittim B, Harkema JR, Jump DB, Zacharewski TR. Temporal and dose-dependent hepatic gene expression patterns in mice provide new insights into TCDD-mediated hepatotoxicity. Toxicol. Sci. (2005) 85:1048–1063.
Burgoon LD, Eckel-Passow JE, Gennings C, Boverhof DR, Burt JW, Fong CJ, Zacharewski TR. Protocols for the assurance of microarray data quality and process control. Nucleic Acids Res. (2005) 33:e172.
Dix DJ, Gallagher K, Benson WH, Groskinsky BL, McClintock JT, Dearfield KL, Farland WH. A framework for the use of genomics data at the EPA. Nat. Biotechnol. (2006) 24:1108–1111.[CrossRef][Web of Science][Medline]
Dix DJ, Houck KA, Martin MT, Richard AM, Setzer RW, Kavlock RJ. The ToxCast Program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. (2007) 95:5–12.
Eberhart RC, Kennedy J. Particle swarm optimization. In: IEEE International Conference on Neural Networks (1995) Piscataway, NJ: IEEE. 1942–1948.
Eckel JE, Gennings C, Chinchilli VM, Burgoon LD, Zacharewski TR. Empirical Bayes gene screening tool for time-course or dose-response microarray data. J. Biopharm. Stat. (2004) 14:647–670.[CrossRef][Medline]
Eckel JE, Gennings C, Therneau TM, Burgoon LD, Boverhof DR, Zacharewski TR. Normalization of two-channel microarray experiments: A semiparametric approach. Bioinformatics (2005) 21:1078–1083.
Gaylor DW, Aylward LL. An evaluation of benchmark dose methodology for non-cancer continuous-data health effects in animals due to exposures to dioxin (TCDD). Regul. Toxicol. Pharmacol. (2004) 40:9–17.[CrossRef][Web of Science][Medline]
Goodsaid F, Frueh FW. Implementing the U.S. FDA guidance on pharmacogenomic data submissions. Environ. Mol. Mutagen. (2007) 48:354–358.[CrossRef][Web of Science][Medline]
Hankinson O. Role of coactivators in transcriptional activation by the aryl hydrocarbon receptor. Arch. Biochem. Biophys. (2005) 433:379–386.[CrossRef][Web of Science][Medline]
Hayes JD, Flanagan JU, Jowsey IR. Glutathione transferases. Annu. Rev. Pharmacol. Toxicol. (2005) 45:51–88.[CrossRef][Web of Science][Medline]
OECD. Report of the OECD/IPCS workshop on toxicogenomics. Organization for Economic Co-operation and Development (2005).
Okino ST, Whitlock JP Jr. Dioxin induces localized, graded changes in chromatin structure: Implications for Cyp1A1 gene transcription. Mol. Cell. Biol. (1995) 15:3714–3721.[Abstract]
Thomas RS, Allen BC, Nong A, Yang L, Bermudez E, Clewell HJ 3rd, Andersen ME. A method to integrate benchmark dose estimates with genomic data to assess the functional effects of chemical exposure. Toxicol. Sci. (2007) 98:240–248.
Wu L, Whitlock JP Jr. Mechanism of dioxin action: Ah receptor-mediated increase in promoter accessibility in vivo. Proc. Natl. Acad. Sci. U.S.A. (1992) 89:4811–4815.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
L. D. Burgoon, Q. Ding, A. N'jai, E. Dere, A. R. Burg, J. C. Rowlands, R. A. Budinsky, K. E. Stebbins, and T. R. Zacharewski Automated Dose-Response Analysis of the Relative Hepatic Gene Expression Potency of TCDF in C57BL/6 Mice Toxicol. Sci., November 1, 2009; 112(1): 221 - 228. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


