An accurate calculation of carcinogenicity of chemicals became a serious challenge for the health assessment authority around the globe because of not only increased cost for experiments but also various ethical issues exist using animal models. In this study, we provide machine learning-based classification models for the carcinogenicity and mutagenicity. The carcinogenic and mutagenic information of 1481 chemically diverse molecules in various species (e.g. dog, hamster, rat, single-cell and multi-cell) has been used for classification models, and these models include random forest method using physicochemical descriptors and structural fingerprints. In addition, the sum of ranking difference (SRD) method has been used to rank the developed models. The best models based on the random forest approach correctly classify more than 70% of compounds in the test set. Furthermore, the MACCS fingerprints were utilized to understand the structural features of the chemicals that cause mutagenicity or carcinogenicity. The results obtained from these studies along with the qualitative models could potentially be employed to screen a large number of chemicals for carcinogenicity and mutagenicity assessment.
- Machine learning
- Random forest