ToxSci Advance Access published online on August 3, 2007
Toxicological Sciences, doi:10.1093/toxsci/kfm185
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Categorical QSAR Models for Skin Sensitization Based Upon Local Lymph Node Assay Classification Measures Part 2: 4D-Fingerprint 3-State and Two-2-State Logistic Regression Models
,¶

,¶
* Laboratory of Molecular Modeling and Design (MC 781), College of Pharmacy, University of Illinois at Chicago, 833 South Wood Street, Chicago, IL 60612-7231
College of Pharmacy, MSC09 5360, 1 University of New Mexico, Albuquerque, NM 87131-0001
Procter & Gamble Eurocor, Temselaan 100, B-1853 Strombeek-Bever, Belgium
The Procter & Gamble Company, Miami Valley Innovation Center, P.O. Box 538707, Cincinnati, OH 45253-8707
¶ The Chem21 Group, Inc., 1780 Wilson Drive, Lake Forest, IL 60045
|| Graduate Institute of Biomedical Engineering and Bioinformatics, Dept. of Computer Science and Information Engineering, National Taiwan University, No.1 Sec. 4, Roosevelt Road, Taipei, Taiwan 106
1 Corresponding Author, Voice: +886.2.3366.4888#529, Fax: +886.2.23628167, Email: yjtseng{at}csie.ntu.edu.tw
Received May 4, 2007; revision received June 26, 2007; accepted July 9, 2007
| Abstract |
|---|
Three and four state categorical QSAR models for skin sensitization have been constructed using data from the murine Local Lymph Node Assay (LLNA) studies. This is the same data we previously used to build 2-state [sensitizer, non-sensitizer] QSAR models (Li, et al., 2007). 4D-fingerprint descriptors derived from the 4D-molecular similarity paradigm are used to generate these models. A training set of 196 and a test set of 22 structurally diverse compounds were used in this study. Logistic regression (LR), and partial least square coupled logistic regression (PLS-CLR) were used to build the models. The three-state QSAR model gives a classification accuracy of 73.4% for the training set and 63.6% for the test set, while the random average value of classification accuracy for any three-state dataset is 33.3%. The two-2-state [four categories in total] QSAR model gives a classification accuracy of 83.2% for the training set and 54.6% for the test set, while the random average value of classification accuracy for any two-2-state dataset is 25%. An analysis of the skin sensitization models developed in this study, as well as the 2-state QSAR models developed in our previous analysis, suggest that the "moderate" sensitizers may be the main source of limited model accuracy.
Key Words: Skin Sensitization; QSAR; Logistic Regression; 4D-fingerprints; Categorical Models.