Skip Navigation


ToxSci Advance Access originally published online on January 10, 2006
Toxicological Sciences 2006 90(2):558-568; doi:10.1093/toxsci/kfj097
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
90/2/558    most recent
kfj097v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Burgoon, L. D.
Right arrow Articles by Zacharewski, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Burgoon, L. D.
Right arrow Articles by Zacharewski, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

dbZach: A MIAME-Compliant Toxicogenomic Supportive Relational Database

Lyle D. Burgoon*,{dagger},{ddagger}, Paul C. Boutros{dagger},§, Edward Dere{dagger},§ and Timothy R. Zacharewski{dagger},{ddagger},§,1

* Department of Pharmacology & Toxicology, {dagger} National Food Safety & Toxicology Center, {ddagger} Center for Integrative Toxicology, and § Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, Michigan 48824

1 To whom correspondence should be addressed at Michigan State University, Department of Biochemistry & Molecular Biology, 224 Biochemistry Building, Wilson Road, East Lansing, MI 48824-1319. Fax: (517) 353-9334. E-mail: tzachare{at}msu.edu.

Received October 3, 2005; accepted January 3, 2006


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Quantitative risk assessment and the elucidation of mechanisms of toxicity requires computational infrastructure and innovative analysis approaches that systematically consider available data at all levels of biological organization. dbZach (http://dbzach.fst.msu.edu) is a modular relational database with associated data insertion, retrieval, and mining tools that manages traditional toxicology and complementary toxicogenomic data to facilitate comprehensive data integration, analysis, and sharing. It consists of four Core Subsystems (i.e., Clones, Genes, Sample Annotation, and Protocols), four Experimental Subsystems (i.e., Microarray, Affymetrix, Real-Time PCR, and Toxicology), and three Computational Subsystems (i.e., Gene Regulation, Pathways, Orthology) that comply with the Minimum Information About a Microarray Experiment (MIAME) standard. It is capable of including emerging technologies and other model systems, including ecologically relevant species. dbZach represents an enterprise toxicogenomic data management system which facilitates data integration and analysis, and reduces uncertainties in the continuum from initial exposure to toxicity while facilitating more comprehensive elucidations of mechanisms of toxicity and supporting mechanistically-based quantitative risk assessment.

Key Words: dbZach; database; MIAME compliant; toxicogenomic data management system.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Improvements in the quantitative risk assessment of chronic and subchronic exposure to synthetic and natural chemicals and their complex mixtures requires reducing uncertainties associated with exposure as well as all the intermediate steps leading to the adverse effect (i.e., the source-to-outcome continuum) (Kavlock et al., 2003Go). Emerging technologies and computational toxicology will provide comprehensive mechanistic information for the development of more quantitative predictive models. However, disparate chemical, exposure, absorption, distribution, metabolism, excretion, toxicologic and omic data must be integrated to develop more predictive models. To support these efforts throughout the source-to-outcome continuum, innovative software tools are required. Enterprise data management solutions and analysis systems are an integral step in computational toxicology to facilitate mechanistically-based quantitative risk assessments and regulatory decision-making.

Enterprise data management systems have proven to be indispensable in other fields, where they serve as the foundation for data integration across diverse sectors to support large data mining efforts. Similarly, toxicology and risk assessment involve combining disparate data throughout the source-to-outcome continuum to identify diagnostic profiles and relationships in susceptible populations and ecologically relevant species. These profiles may represent agglomerative biomarkers encompassing exposure, molecular responses, and adverse effects, which facilitate the reevaluation of historical data in light of new information, or allow comparisons across complementary technologies, chemical classes, or species. In addition, relational databases provide the infrastructure to develop computational tools that assist with data interpretation and the elucidation of mechanisms of toxicity.

Various toxicology centric databases and knowledgebases are emerging that provide data management, and facilitate quality assurance, analysis, and deposition into public repositories (Bao et al., 2005Go; Bushel et al., 2001Go; Hayes et al., 2005Go; Mattes et al., 2004Go; Tong et al., 2003Go; Waters et al., 2003Go). In general, they support chemical class comparisons within the same platform (Hayes et al., 2005Go), or provide a public repository of genomic data (Brazma et al., 2003Go; Rocca-Serra et al., 2003Go). Although more specific toxicogenomic database efforts are in development, their focus is to support regulatory activities or serve as a public data warehouse (e.g., Chemical Effects in Biological Systems [CEBS]; Waters et al., 2003Go) as opposed to a toxicogenomic laboratory information management systems (LIMS) to support investigator or collaborative group level research efforts prior to publication.

The dbZach System is not a public repository, but rather an enterprise computational toxicology analysis and management system developed to support ongoing traditional toxicology and toxicogenomic studies (Fig. 1), as well as the local development of computational toxicology data mining tools. It complies with the Minimum Information About a Microarray Experiment (MIAME) standard (Brazma et al., 2001Go), and the Microarray and Gene Expression (Spellman et al., 2002Go) Markup Language (MAGE-ML) for electronic data sharing. Although developed for toxicogenomic research efforts, the schematic designs and implementation of dbZach and associated tools are applicable to other biomedical research programs.


Figure 1
View larger version (62K):
[in this window]
[in a new window]
 
FIG. 1. The dbZach database subsystems. dbZach is a modular relational database organized into (1) Core Subsystems, (2) Computational Subsystems, and (3) Experimental Subsystems. The Core Subsystems are required and provide general annotation support for data produced from toxicogenomic studies. The Computational Subsystems manage data derived from computational analyses of data, such as pathway prediction (in development), gene regulation, and orthology. The Experimental Subsystems track information specific to wet lab experiments. Each subsystem represents a single technology, a biological concept/discipline or MIAME requirement allowing investigators to incorporate only those subsystems necessary for their environment. Modularity of dbZach also facilitates the incorporation of new technology without affecting the existing architecture. Asterisks in the figure denote those subsystems populated within the local installation. Note that the Metabonomic subsystem is based on the SRMS standards and is currently being tested. The Protein subsystem is in development.

 
To illustrate its functionality, examples of the use of dbZach are provided which can be extrapolated to other independent research efforts. This report communicates the core functionalities of the system, and offers access to the schema and associated tools for establishing independent local installations or the incorporation of select subsystems into other existing databases.


    DATABASE DESIGN
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
dbZach consists of four Core Subsystems (i.e., Clones, Genes, Sample Annotation, and Protocols), four Experimental Subsystems (i.e., Microarray, Affymetrix, Real-Time PCR [RTPCR], and Toxicology), and three Computational Subsystems (i.e., Gene Regulation, Pathways, Orthology) (Fig. 1 and Table 1). The modular design reflects biological concepts and relationships to facilitate intuitive interactions and data interpretation. Each module is termed a subsystem, which manages data for a technology (e.g., quantitative Real-Time PCR, spotted microarray, Affymetrix), a biological concept/discipline (e.g., cDNA clones, genes, toxicology, pathway, gene regulation), or a MIAME required concept (e.g., protocols, sample annotation). Its modularity ensures the seamless incorporation of new subsystems for nascent technologies and the adoption of dbZach subsystems into other existing databases.


View this table:
[in this window]
[in a new window]
 
TABLE 1 Description of dbZach Subsystems

 
The tables representing definitive concepts (e.g., animals and organs) are structured to capture relevant biological relationships. For example, the Animal Table records data specific to the animal itself, such as arrival date, age at arrival, sex, and the cage identifier (Fig. 2). A separate table records information about harvested organs, such as the organ name, and weights (e.g., Organ Table under Animal/Biosource). These tables are connected through a one-to-many relationship (depicted as Formula ), where one animal may have associated data from one or more organs. Moreover, organs can be assessed using independent histological sections. Each section may contain unique lesions that can be identified on a per section basis (e.g., Organ Table has a required one-to-many relationship to Pathology Table as indicated by Formula ). Scores and remarks concerning each lesion (Pathology Lesions Table) are related back to the animal treatment annotation through the Pathology Section -> Pathology -> Organ -> Animal -> Biosource -> Biosource Treatment -> Treatment Chemical tables. Thus, chemical treatment/exposure annotation is not provided at every level (e.g., lesion, tissue section, and organ), but only at the animal level. This allows all information regarding experimental manipulations (e.g., route of treatment, surgeries, husbandry) that may influence the outcome (e.g., histopathology) to be associated with the level at which they occurred–the animal. In addition, it optimizes performance and prevents data inconsistencies by reducing redundancy (where this same information would be associated with each experimental level such as histopathology, clinical chemistry, gross observations).


Figure 2
View larger version (66K):
[in this window]
[in a new window]
 
FIG. 2. Capturing relevant biological relationships. Biological relationships between animal husbandry, treatment, and histopathology are captured in dbZach through a series of tables. For example, animal husbandry and treatment data are in separate tables since a cage may hold more than one animal, and therefore a one-to-many relationship exists between the CAGE and ANIMAL tables. This allows data specific to the cage (e.g., bedding, feed type, water type) to be separated from the animal. Similar logic follows for treatments and histopathology data. Relationships between tables are depicted using the crow's feet symbols (the line symbols between tables), where the parent table (e.g., CAGE) is represented with either a double line symbol (Figure 2), or a circle with a cross symbol (Figure 2), and the child table (e.g., ANIMAL) is represented with a crow's foot (Figure 2). In the one-to-many relationship, there is one parent that may contain many children (e.g., one cage may contain many animals). In practice, the one-to-many (i.e., parent-to-child) relationship is realized through a primary key (i.e., unique identifier from the parent table) to foreign key relationship (e.g., the CAGE_ID in the CAGE table is the primary key, while the CAGE_ID in the ANIMAL table is a foreign key).

 

    THE SUBSYSTEMS
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
The following sections describe existing subsystems, and provide specific examples to illustrate the possible supportive roles of dbZach within independent laboratories. The Metabonomics Subsystem was developed in accord with the SMRS standards (Lindon et al., 2005Go). Protein, Metabonomics, and Pathway subsystems are still in development and will not be discussed in further detail.

Clones Subsystem
The current implementation of this subsystem manages information for 10,068 human, 25,313 mouse, and 8567 rat cDNA/EST clones used in the preparation of the in-house custom arrays cataloged within dbZach. Each cDNA clone is associated with one or more GenBank accession numbers to accommodate multiple high probability BLAST matches. It also manages in-house and GenBank sequence information and relates a clone to its location within the 96- or 384-well storage plates. The Clones Subsystem can be extended to manage oligonucleotide sets and in situ synthesized oligonucleotide arrays to accommodate laboratories using commercial products (e.g., Agilent arrays, Operon Clone sets).

Genes Subsystem
The Genes Subsystem manages annotation data for 24,837 human (UniGene Build: 180), 28,371 mouse (UniGene Build: 144), and 8176 rat (UniGene Build: 139) genes associated with multiple cDNA clones, real-time PCR primers, and pathways. Annotation includes chromosomal locations, Gene Ontology data, NCBI LocusLink, and RefSeq identifiers, and NCBI UniGene Cluster numbers. These clone-gene relationships are updated regularly, and are based on UniGene relationships of GenBank Accessions with cross-references to the Entrez Gene database.

Protocols Subsystem
All protocols and standard operating procedures used within the laboratory are stored within the Protocols Subsystem as required by MIAME. The subsystem follows a hierarchical structure, where the general concept of a protocol resides at the top, and encompasses different versions of a protocol divided into a series of steps to allow tracking of individually implemented modifications through time. This facilitates examination of method differences, to identify biases introduced into a given protocol, or to investigate the effect of varying protocols on data.

Sample Annotation Subsystem
Unlike other MIAME supportive databases, dbZach captures sample annotation as in vivo and in vitro tracks allowing more detailed, less ambiguous information to be managed (Fig. 3). The broadest categorization is the project, which includes all experimental data derived from an animal or cell culture model. Data within the animal track includes cage conditions (e.g., bedding material, temperature, light/dark cycle, type of water, feed formulation), sex, age, and organs collected, while the in vitro track captures cell culture conditions such as medium formulation, passage number, flask type, and incubator conditions. Our implementation of dbZach currently contains sample annotation data for 815 mice and 484 rats and more than 700 human, mouse, and rat in vitro cultures. Both tracks also manage information regarding biological fluids such as serum for clinical chemistry analysis or urine for metabolomics. Information regarding extracted samples used in other assays, such as microarrays or metabolomic profiling of extracted lipids, are also carefully monitored.


Figure 3
View larger version (29K):
[in this window]
[in a new window]
 
FIG. 3. In vivo and in vitro sample annotation tracks. In vivo and in vitro sample annotation data is tracked separately, which minimizes table size, improving efficiency, and allows for more complete, less ambiguous annotation. For example, animals are not grown in medium, nor are surgeries performed on a cell culture sample. Yet, both categories of information are necessary for complete sample annotation, and ineffectively captured using one large table containing both in vivo and in vitro data.

 
This systematic capture of data minimizes redundancy, and ensures proper integration across domains. Figure 2 illustrates generalization, aggregation, and composition relationships fostered by data integration at the sample annotation level between microarray and pathology data. A generalization is a relationship where common features are shared at a parent node (e.g., Organ), and specific data is captured at child nodes (e.g., specific organs). Aggregations are relationships where one object exists as a collection of other objects, such as a cage is typically an aggregation of animals. Compositions are special aggregations where nonexistence of the collection object precludes existence of the member objects. For example, animals are composed of organs, but if the animal no longer exists within the database, neither can the organs. Thorough and accurate sample annotation is essential to distinguish subtle differences that may create discrepancies in otherwise comparable studies.

Microarray Subsystem
This subsystem stores the TIFF image from the scanner; the quantified raw data from the analysis of the raw image, and the normalized data. It also tracks the location of all features with respect to pixel locations on the TIFF file and their grid locations. Features are associated in the database with their cDNA clones, which are in turn associated with their respective gene annotations and other functional information. The installation of dbZach within our lab currently manages data from 2470 microarrays, which includes 4940 images representing 31,386,657 features from in vivo and in vitro studies (as of May 2005).

Real-Time PCR Subsystem
Stored quantitative real-time PCR (QRT-PCR) data includes forward and reverse primer sequences, the TaqMan probe sequence, assay plate layout information, outputted raw data files, and processed expression data. The primers are associated with the template used for their design, and also with a gene, to provide up-to-date annotation which facilitates comparisons between microarray and QRT-PCR gene expression data. The plate layout is critical for monitoring the state of the real-time PCR equipment, and for quality control. There are currently 489 real-time human, mouse, and rat PCR primers within this lab's installation. Because this subsystem only manages QRT-PCR data, and is agnostic to the experimental purpose, it can be extended to manage both gene expression and chromatin immunoprecipitation (ChIP) data (Hinojos et al., 2005Go).

Toxicology Subsystem
All traditional toxicology data, including histopathology, in vitro assays, and cell morphology are stored within the Toxicology Subsystem and associated with the source organism in the Sample Annotation Subsystem. The National Toxicology Program Pathology Code (Boorman et al., 2002Go) has been adopted as the controlled vocabulary for pathology data.

Pathology data is stored in a section and lesion centric model where organs are divided into a series of sections, allowing any number of sections to be analyzed. Data are captured per section and lesion, allowing for more comprehensive annotation. This method also facilitates the electronic storage of section micrographs for reassessment or reference, and supports the creation of pathology image banks for the development of software to computationally identify lesions (Marchevsky and Wick, 2004Go).

Affymetrix Subsystem
Affymetrix GeneChip data represents another common platform used for global gene expression analysis. A separate subsystem has been created due to the significant differences in platforms, data and file structures relative to two color arrays. All binary format Affymetrix data and images can be parsed and stored within dbZach.

Orthology Subsystem
Orthology is defined as the same entity (e.g., gene) that exists within two distinct species. Knowledge of orthologous relationships is critical for establishing conserved mechanisms of toxicity between species. For example, orthologous genes encode the same protein, and arose from a common ancestor. dbZach catalogues orthologous genes across human, mouse, and rat species, but is species independent, and can be extended to other models including the dog, non-human primate, and ecologically relevant species provided sequence information is available. Orthology relationships may be based on information from a number of databases, although this implementation is specific to the Ensembl database. Currently 155,553 orthologous gene relationships (i.e., 17,047 human-mouse, 16,358 human-rat, and 18,335 mouse-rat orthologous reciprocal best match gene relationships) are managed within dbZach. There are no limits to the number of orthologous entries with respect to species or entity type (i.e., genes or proteins).

Gene Regulatory Subsystem
This subsystem provides access to genomic sequence information (e.g., –10Kb upstream, 5' and 3' untranslated regions) for all human, mouse, and rat genes with RefSeq annotation. It also supports the identification of motifs that may serve important regulatory functions. In general, there are no restrictions regarding what sequence information can be stored.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Platform Independence
The dbZach system is not designed for any particular RDBMS or operating system. The database schemas are platform agnostic, and have been replicated in Oracle 9i and 10g as well as IBM's DB2. Interaction and data analysis software have been developed in Java (i.e., and require at least version 1.5.0 of the JRE).

Bulk Data Insertions
Data are inserted using template spreadsheet files in MS Excel format. This format was chosen to facilitate bulk uploads as opposed to more involved graphical user interfaces (GUIs) and takes advantage of user familiarity with spreadsheets. Furthermore, template spreadsheets can be populated by cutting-and-pasting data from other files, simplifying the numerous one-to-many relationships present within data. For example, it is easier to visualize and enter data from one-to-many relationships in a single spreadsheet while minimizing user-based errors, such as typographical errors, or mouse-click errors in the case of GUI combo boxes. Moreover, spreadsheets can record multiple data records, and allow larger datasets to be simultaneously uploaded, thereby decreasing user interaction time. Following submissions, the Audit and Report Tools (ART) generate inspection reports to ensure the data have been faithfully loaded prior to further analysis. This allows data generators, the ones who know the data best, to act as their own curators.

Quality Assurance
Databases serve as a rich source of data for generating quality assurance protocols. As the volume of information increases, a large pool of training data becomes available for the development of automated quality assurance and process control methods which provide non-biased quality assessments. Data within dbZach have been used to establish a protocol that ensures the consistency of microarray data across studies and between investigators in order to maintain intralaboratory quality standards. The protocol combines (1) diagnostic plots monitoring the degree of feature saturation, global feature and background intensities, and feature misalignments with (2) plots monitoring the intensity distributions within arrays and (3) a support vector machine (SVM) model to identify high and low quality microarray data sets (Burgoon et al., 2005Go).

Database Querying
Databases provide the ability to effectively mine large datasets and identify relationships across domains and experiments. For example, a database can identify all active genes following the same treatment in different tissues or models, supporting hypothesis development regarding a putative mechanism of action. Similarly, queries of histopathology data may identify treatments that yield comparable lesions across tissues and/or species providing compelling evidence for a conserved mechanism of action, thus supporting cross species extrapolations in quantitative risk assessment. Therefore, databases provide the necessary infrastructure to begin to integrate disparate data and provide an effective solution to facilitate investigational queries across different studies.

Structured Query Language (SQL) is used by specific applications and tools that have been developed to interact with dbZach. Most investigator queries occur through Java interfaces and applications built on the Swing library of classes for GUI development (Table 2). This implementation of dbZach also includes limited public web access to genes represented on human, mouse, and rat cDNA arrays, information regarding our real-time PCR primer library, and routine protocols used in this laboratory.


View this table:
[in this window]
[in a new window]
 
TABLE 2 Applications for Data Mining, Upload, and Interaction with dbZach

 

    dbZach APPLICATIONS IN TOXICOGENOMICS
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Although queries identify relationships, the output may span several hundred records, making interpretation too complex using standard approaches. Consequently, visualization and filtering methods have been developed to facilitate analysis and interpretation. For example, the Toxicogenomics Correlation Tool (TCT) visualizes comparisons to identify similarities and differences in response behavior (Fig. 4) (Burgoon et al., manuscript in preparation). This includes comparing responses within a chemical class to define a response signature, identifying conserved responses across species, and identifying shared responses between in vitro and in vivo models. TCT plots a significance index (SI) that represents the correlation coefficient of the p-value or posterior probability profile for the same gene in two different data sets as well as an activity index (AI) which represents the correlation coefficient of the biological response (e.g., gene expression) for the same gene in those data sets (Fig. 4A). Its use is not limited to gene expression profiles, and may be used to compare proteomic and metabolomic profiles, as well as platform comparisons (RTPCR vs. microarray, or spotted cDNA vs. Affymetrix).


Figure 4
View larger version (32K):
[in this window]
[in a new window]
 
FIG. 4. The toxicogenomic correlation tool. (A) Active gene lists for mouse and rat uterine gene expression following oral gavage with 100 µg/kg ethynyl estradiol for 3 consecutive days were investigated for orthologous genes that exhibit comparable expression and significance profiles (Kwekel et al., 2005Go). Pearson correlation coefficients were calculated for gene expression and posterior probability profiles for each orthologous mouse and rat gene pair. The x-axis (activity index, AI) represents the gene expression correlation between orthologous mouse and rat genes. AI is a measure of the similarity of the temporal expression patterns for each orthologous pair. The y-axis (significance index, SI) represents the correlation coefficient between the posterior probabilities for the orthologous mouse and rat genes. Therefore, SI is a measure of p1(t) similarity for each orthologous gene in the pair. Orthologous genes that have highly correlated gene expression profiles and P1(t) values are found in QI. Orthologous genes that are inversely correlated, in both variables are represented in QIII. Pairs that are poorly correlated in both the gene expression and posterior probability profiles have indices close to zero, and are visualized close to the origin of the graph. (B) Examples of the different paired profiles that would be found in the different quadrants of the TCT. Correlation is a measure of profile shape similarity, and is independent of magnitude.

 
The TCT has been used to identify similarities and differences in orthologous gene expression in hepatic time course experiments conducted in rats and mice using comparable study designs and treatment regimens (Fig. 4A). Each point represents correlation values represented as the Activity Index (AI, correlation of rat vs. mouse gene expression profiles) and Significance Index (SI, correlation of rat vs. mouse P1(t) values at each time point) when comparing rat and mouse hepatic time courses in response to ethynyl estradiol (Kwekel et al., 2005Go). If the rat and mouse profiles are similar for the orthologous gene, both the AI and SI will be positive (Fig. 4, AQ1 and BQ1). Points within the upper right hand quadrant (quadrants are numbered counter-clockwise with this one denoted: Q1 in Fig. 4A) represent orthologous genes that exhibit profiles with comparable expression patterns and significance profiles suggesting a common mechanism of regulation in both species. In contrast, points in the upper left hand quadrant (QII in Fig. 4A) represent orthologous genes exhibiting divergent expression profiles with comparable P1(t) profiles (Fig. 4B Q2). This indicates that the mouse and rat gene expression profiles are different, but they have similar probability profiles. Quadrant III represents orthologous mouse and rat genes with different expression and significance (P1(t)) patterns (Fig. 4D Q3). In quadrant IV orthologous genes have similar expression patterns but exhibit divergent P1(t) values (Fig. 4D Q4). Orthologous genes with dissimilar gene expression patterns in mouse and rat hepatic samples and high P1(t) variability would fall in close proximity to the origin. Points in QII, QIII, and QIV as well as those close to the origin may represent orthologous genes with differences in gene regulation that imparts a species specific advantage or sensitivity, and/or species differences in pharmacokinetics and pharmacodynamics. AI and SI divergence may also be due to experimental factors such as rat and mouse probes querying different regions of gene, splice variant present in only one species, and/or inaccurate annotation such that the genes are not true orthologs.

Other data analysis and interpretation applications in development includes tools to identify highly represented functional gene categories, over-represented sequences within gene regulatory regions of similarly expressed genes, and novel visualization tools for the analysis of large pathology datasets.

Data Sharing
Growing interest in data sharing (Ball et al., 2004bGo; Brazma et al., 2001Go) and calls for the deposition of published data into public repositories (Ball et al., 2004aGo), requires effective methods for data exchange. dbZach facilitates sharing by exporting data in the Microarray and Gene Expression (MAGE) Markup Language (MAGE-ML) format (Spellman et al., 2002Go) between databases, including ArrayExpress (Brazma et al., 2003Go; Rocca-Serra et al., 2003Go), the Gene Expression Omnibus (GEO), and eventually the Chemical Effects in Biological Systems (CEBS) Knowledgebase (Waters et al., 2003Go). Exporting MIAME-compliant data in MAGE-ML has the advantage of being less error prone than web-interaction based submissions since the data are written directly to a file without human intervention.


    dbZach SUPPORTS TOXICOGENOMIC RESEARCH
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Toxicogenomic studies may be divided into (1) subject treatment and biological fluid and organ collection (in vivo) or biological sample harvesting (in vitro), (2) microarray assay, (3) gene functional analysis, and (4) phenotypic anchoring (Fig. 5). dbZach supports each level and facilitates adherence to generally accepted reporting and exchange standards, such as MIAME. Several published and ongoing studies have benefited from the support provided by dbZach (Boverhof et al., 2004Go, 2005Go; Burgoon et al., 2005Go; Fong et al., 2005Go; Kwekel et al., 2005Go).


Figure 5
View larger version (22K):
[in this window]
[in a new window]
 
FIG. 5. dbZach and associated tools support the systematic elucidation of mechanisms of toxicity. In addition to data storage, dbZach has facilitated the development of tools that support data analysis, integration and more comprehensive interpretation. Time course and dose response data from in vitro and in vivo studies examining different chemical treatments in human, mouse and rat models can be compared to identify conserved responses, providing important information regarding the appropriateness of extrapolations and uncertainties in the exposure-to-outcome continuum. The integration of histopathology, clinical chemistry, gene expression response clustering, functional annotation, orthologous gene responses, and the presence of putative functional response element predictions also facilitates discrimination between adaptive and toxic responses and ultimately, the development of pathways and networks involved in the etiology of toxic responses.

 
In all of these studies, dbZach supported subject treatment and collection including cataloging sample annotation, and information about reagents, husbandry, and growth conditions. It also maintained records of body and organ weights as well as comments regarding gross observations. All of the submitted information was verified using an audit and report tool. Pre-hybridization (e.g., array print date, labeled extract information), hybridization (e.g., RNA amounts, incubation times, washing conditions), and post-hybridization (e.g., scanning protocols) data are all captured to facilitate comparisons between studies.

Raw intensity data uploaded to dbZach are first verified by the submitter using the Microarray Audit Report Tool (MART) prior to the quality assurance analysis (Burgoon et al., 2005Go). Next, a standardized, unattended SAS application is executed that directly interacts with dbZach to extract the required data to identify significant changes in gene expression, thus decreasing statistical analysis time and ensuring all data is properly submitted and analyzed using a consistent protocol to facilitate future study comparisons (e.g., ethynyl estradiol versus tamoxifen; rat versus mouse versus human; in vitro versus in vivo).

In these studies dbZach was also queried for up-to-date functional annotation on differentially expressed genes using the Gene Annotation Tool (GAT). GAT provides a frequency distribution of functions that provides initial insights into pathways perturbed by treatment. For instance, the functional annotation of differentially expressed hepatic genes in C57BL/6 mice treated with TCDD were associated with physiological processes involving oxidative stress, metabolism, differentiation, apoptosis, gluconeogenesis, and fatty acid uptake and metabolism (Boverhof et al., 2004Go), while in the same model treated with ethynyl estradiol the functional annotation was associated with growth and proliferation, cytoskeletal and extracellular matrix responses, microtubule-based processes, oxidative metabolism and stress, and lipid metabolism and transport (Boverhof et al., 2005Go). This brings some organization and priority to the list of differentially expressed genes that allow investigators to further elucidate the mechanisms involved by initially focusing on disrupted pathways.

Consistent acquisition and proper management of large data sets also allows more comprehensive and systematic comparisons to be performed. dbZach facilitates these comparisons by providing the information necessary to determine if comparable protocols and analysis methods were used. In addition, specific dbZach subsystems and associated tools support the comparative studies and provide visualization tools to assist with interpretation. For example, the availability of microarray and RTPCR data within dbZach allows comparisons for verification purposes. Moreover, the Orthology Subsystem provides information to facilitate cross-species comparisons in support of risk assessment by assessing extrapolations between species.

A comparison of the uterotropic gene expression programs in C57BL/6 mice and Sprague-Dawley rats identified 153 orthologous gene pairs that were positively correlated, suggesting these conserved transcriptional targets are important in uterine proliferation. Furthermore, functional annotation for these conserved responses were associated with angiogenesis, water and solute transport, cell cycle control, redox control, DNA replication, protein synthesis and transport, xenobiotic metabolism, cell-cell communication, energetics, and cholesterol and fatty acid regulation, consistent with complementary histopathology and morphometry also stored within dbZach (Kwekel et al., 2005Go).

Current efforts are expanding these capabilities to utilize the the rich information available from the human, mouse, and rat genome databases and to incorporate more sophisticated bioinformatic approaches to support mechanistic research. Genomic sequence information has been computationally searched and compared to identify putative dioxin response elements (DREs) in orthologous human, mouse, and rat genes and subsequently integrated with gene expression data (Sun et al., 2004Go). Results from this study suggest that AhR-mediated gene expression may not be well conserved across species, which could have significant implications in risk assessment. Unsupervised search algorithms are also being developed to identify novel over-represented response elements in co-regulated genes in an effort to identify interactions between pathways relevant to toxicity.

Consequently, dbZach has driven the development of new software applications that moves beyond the analysis of individual chemicals, species, and organs. In addition to data mining and cataloging capabilities, dbZach also serves as a platform for laboratory-wide quality assurance. Furthermore, its reporting applications facilitate the deposition of this information into public repositories.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
dbZach serves as a platform for the comprehensive management and integration of disparate data domains facilitating not only the phenotypic anchoring of "omic" data, but also the development of advanced analysis methods, quality assurance protocols, predictive toxicology tools, and the systematic examination of mechanisms of toxicity. These capabilities are currently being extended to include metabonomic data, with proteomic and pathway capabilities in development. Furthermore, dbZach supports the integration of toxicology, gene expression data, functional annotation, orthology, and genomic motif regulatory information. These efforts will not only facilitate the elucidation of comprehensive mechanisms of toxicity and the identification of mechanistically-based biomarkers, but will also engender computational toxicology, systems toxicology, and ultimately, more accurate mechanistically based quantitative risk assessments.


    NOTES
 
The schema for dbZach may be obtained by contacting the authors. Copyrighted code for generating the database, and the Java software associated with the database, can be licensed through arrangement with the Office of Intellectual Property at Michigan State University.


    ACKNOWLEDGMENTS
 
The authors would like to acknowledge Dr. Rob Halgren, Dr. Yan Sun, Shane Doran, Shraddha Pai, Raeka Aiyar, Jigger Vakharia, Rebecca Rotman, Bonny Lau, Andrea Adams, Jung-sup Lee, Willis Lang, Rahul Sarkar, and Stacy Hung for their efforts in developing code associated with this project. This work was supported by NIEHS grants ES 04911-12, ES 011271, ES 011777. L.D.B. was supported by T32 ES07255.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE DESIGN
 THE SUBSYSTEMS
 IMPLEMENTATION
 dbZach APPLICATIONS IN...
 dbZach SUPPORTS TOXICOGENOMIC...
 CONCLUSION
 REFERENCES
 
Ball, C. A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., Matese, J. C., Parkinson, H., Quackenbush, J., Ringwald, M., et al. (2004a). Submission of microarray data to public repositories. PLoS Biol. 2, E317.[CrossRef][Medline]

Ball, C. A., Sherlock, G., and Brazma, A. (2004b). Funding high-throughput data sharing. Nat. Biotechnol. 22, 1179–1183.[CrossRef][Web of Science][Medline]

Bao, W., Schmid, J. E., Goetz, A. K., Ren, H., and Dix, D. J. (2005). A database for tracking toxicogenomic samples and procedures. Reprod. Toxicol. 19, 411–419.[CrossRef][Web of Science][Medline]

Boorman, G. A., Haseman, J. K., Waters, M. D., Hardisty, J. F., and Sills, R. C. (2002). Quality review procedures necessary for rodent pathology databases and toxicogenomic studies: The National Toxicology Program experience. Toxicol. Pathol. 30, 88–92.[CrossRef][Web of Science][Medline]

Boverhof, D. R., Burgoon, L. D., Tashiro, C., Chittim, B., Harkema, J. R., Jump, D. B., and Zacharewski, T. R. (2005). Temporal and dose-dependent hepatic gene expression patterns in mice provide new insights into TCDD-mediated hepatotoxicity. Toxicol. Sci. 85, 1048–1063.[Abstract/Free Full Text]

Boverhof, D. R., Fertuck, K. C., Burgoon, L. D., Eckel, J. E., Gennings, C., and Zacharewski, T. R. (2004). Temporal and dose-dependent hepatic gene expression changes in immature ovariectomized mice following exposure to ethynyl estradiol. Carcinogenesis 25, 1277–1291.[Abstract/Free Full Text]

Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371.[CrossRef][Web of Science][Medline]

Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E., Kapushesky, M., Kemmeren, P., Lara, G. G., et al. (2003). ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71.[Abstract/Free Full Text]

Burgoon, L. D., Eckel-Passow, J. E., Gennings, C., Boverhof, D. R., Burt, J. W., Fong, C. J., and Zacharewski, T. R. (2005). Protocols for the assurance of microarray data quality and process control. Nucleic Acids Res. 33, e172.[Abstract/Free Full Text]

Bushel, P. R., Hamadeh, H., Bennett, L., Sieber, S., Martin, K., Nuwaysir, E. F., Johnson, K., Reynolds, K., Paules, R. S., and Afshari, C. A. (2001). MAPS: A microarray project system for gene expression experiment information and data validation. Bioinformatics 17, 564–565.[Abstract/Free Full Text]

Fong, C. J., Burgoon, L. D., and Zacharewski, T. R. (2005). Comparative microarray analysis of basal gene expression in mouse Hepa-1c1c7 wild-type and mutant cell lines. Toxicol. Sci. 86, 342–353.[Abstract/Free Full Text]

Hayes, K. R., Vollrath, A. L., Zastrow, G. M., McMillan, B. J., Craven, M., Jovanovich, S., Rank, D. R., Penn, S., Walisser, J. A., Reddy, J. K., et al. (2005). EDGE: A centralized resource for the comparison, analysis, and distribution of toxicogenomic information. Mol. Pharmacol. 67, 1360–1368.[Abstract/Free Full Text]

Hinojos, C. A., Sharp, Z. D., and Mancini, M. A. (2005). Molecular dynamics and nuclear receptor function. Trends Endocrinol. Metab. 16, 12–18.[CrossRef][Medline]

Kavlock, R. J., Ankley, G., Blancato, J., Collete, T., Francis, E., Gray, E., Hammerstrom, K., Swartout, J., Tilson, H., Toth, G., et al. (2003). A framework for computational toxicology research in ORD. http://www.epa.gov/comptox/comptox_framework.html.

Kwekel, J. C., Burgoon, L. D., Burt, J. W., Harkema, J. R., and Zacharewski, T. R. (2005). A cross-species analysis of the rodent uterotrophic program: Elucidation of conserved responses and targets of estrogen signaling. Physiol. Genomics 23, 327–342.[Abstract/Free Full Text]

Lindon, J. C., Nicholson, J. K., Holmes, E., Keun, H. C., Craig, A., Pearce, J. T., Bruce, S. J., Hardy, N., Sansone, S. A., Antti, H., et al. (2005). Summary recommendations for standardization and reporting of metabolic analyses. Nat. Biotechnol. 23, 833–838.[CrossRef][Web of Science][Medline]

Marchevsky, A. M., and Wick, M. R. (2004). Evidence-based medicine, medical decision analysis, and pathology. Hum. Pathol. 35, 1179–1188.[Medline]

Mattes, W. B., Pettit, S. D., Sansone, S. A., Bushel, P. R., and Waters, M. D. (2004). Database development in toxicogenomics: Issues and efforts. Environ. Health Perspect. 112, 495–505.[Web of Science][Medline]

Rocca-Serra, P., Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Contrino, S., Vilo, J., Abeygunawardena, N., Mukherjee, G., Holloway, E., et al. (2003). ArrayExpress: A public database of gene expression data at EBI. C R Biol. 326, 1075–1078.[Web of Science][Medline]

Spellman, P. T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart, D., Sherlock, G., Ball, C., Lepage, M., et al. (2002). Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 3, RESEARCH0046.[Medline]

Sun, Y. V., Boverhof, D. R., Burgoon, L. D., Fielden, M. R., and Zacharewski, T. R. (2004). Comparative analysis of dioxin response elements in human, mouse and rat genomic sequences. Nucleic Acids Res. 32, 4512–4523.[Abstract/Free Full Text]

Tong, W., Cao, X., Harris, S., Sun, H., Fang, H., Fuscoe, J., Harris, A., Hong, H., Xie, Q., Perkins, R., et al. (2003). ArrayTrack–supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ. Health Perspect. 111, 1819–1826.[Medline]

Waters, M., Boorman, G., Bushel, P., Cunningham, M., Irwin, R., Merrick, A., Olden, K., Paules, R., Selkirk, J., Stasiewicz, S., et al. (2003). Systems toxicology and the Chemical Effects in Biological Systems (CEBS) knowledge base. EHP Toxicogenomics 111, 15–28.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Toxicol SciHome page
L. D. Burgoon, Q. Ding, A. N'jai, E. Dere, A. R. Burg, J. C. Rowlands, R. A. Budinsky, K. E. Stebbins, and T. R. Zacharewski
Automated Dose-Response Analysis of the Relative Hepatic Gene Expression Potency of TCDF in C57BL/6 Mice
Toxicol. Sci., November 1, 2009; 112(1): 221 - 228.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
S. Kim, E. Dere, L. D. Burgoon, C.-C. Chang, and T. R. Zacharewski
Comparative Analysis of AhR-Mediated TCDD-Elicited Gene Expression in Human Liver Adult Stem Cells
Toxicol. Sci., November 1, 2009; 112(1): 229 - 244.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
C. R. Williams-Devane, M. A. Wolf, and A. M. Richard
Toward a Public Toxicogenomics Capability for Supporting Predictive Toxicology: Survey of Current Resources and Chemical Indexing of Experiments in GEO and ArrayExpress
Toxicol. Sci., June 1, 2009; 109(2): 358 - 371.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
A. N'Jai, D. R. Boverhof, E. Dere, L. D. Burgoon, Y. S. Tan, J. C. Rowlands, R. A. Budinsky, K. E. Stebbins, and T. R. Zacharewski
Comparative Temporal Toxicogenomic Analysis of TCDD- and TCDF-Mediated Hepatic Effects in Immature Female C57BL/6 Mice
Toxicol. Sci., June 1, 2008; 103(2): 285 - 297.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
N. Kiyosawa, J. C. Kwekel, L. D. Burgoon, K. J. Williams, C. Tashiro, B. Chittim, and T. R. Zacharewski
o,p'-DDT Elicits PXR/CAR-, Not ER-, Mediated Responses in the Immature Ovariectomized Rat Liver
Toxicol. Sci., February 1, 2008; 101(2): 350 - 363.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Waters, S. Stasiewicz, B. Alex Merrick, K. Tomer, P. Bushel, R. Paules, N. Stegman, G. Nehls, K. J. Yost, C. H. Johnson, et al.
CEBS Chemical Effects in Biological Systems: a public data repository integrating study design and toxicity data with microarray and proteomics data
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D892 - D900.
[Abstract] [Full Text] [PDF]


Home page
Mol. Pharmacol.Home page
D. R. Boverhof, L. D. Burgoon, K. J. Williams, and T. R. Zacharewski
Inhibition of Estrogen-Mediated Uterine Gene Expression Responses by Dioxin
Mol. Pharmacol., January 1, 2008; 73(1): 82 - 93.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
J. M. Fostel, L. Burgoon, C. Zwickl, P. Lord, J. C. Corton, P. R. Bushel, M. Cunningham, L. Fan, S. W. Edwards, S. Hester, et al.
Toward a Checklist for Exchange and Interpretation of Data from a Toxicology Study
Toxicol. Sci., September 1, 2007; 99(1): 26 - 34.
[Abstract] [Full Text] [PDF]


Home page
Toxicol SciHome page
D. R. Boverhof, L. D. Burgoon, C. Tashiro, B. Sharratt, B. Chittim, J. R. Harkema, D. L. Mendrick, and T. R. Zacharewski
Comparative Toxicogenomic Analysis of the Hepatotoxic Effects of TCDD in Sprague Dawley Rats and C57BL/6 Mice
Toxicol. Sci., December 1, 2006; 94(2): 398 - 416.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
90/2/558    most recent
kfj097v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (16)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Burgoon, L. D.
Right arrow Articles by Zacharewski, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Burgoon, L. D.
Right arrow Articles by Zacharewski, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?