Description
Breast cancer subtype classification is a critical task for accurate diagnosis, prognosis, and personalized treatment planning. This research introduces a cascade Deep Forest–based classification framework that leverages multi-omics data to identify breast cancer subtypes efficiently and accurately.
The proposed model utilizes a cascade ensemble of Random Forests and Completely Random Forests to learn high-level feature representations without relying on conventional deep neural networks. Unlike traditional deep learning models, the cascade Deep Forest approach mitigates overfitting and performs well on high-dimensional, imbalanced biological datasets.
Experiments are conducted using the METABRIC dataset, incorporating gene expression, clinical data, copy number aberrations, and copy number variations. Extensive evaluations demonstrate that gene expression data alone achieves the highest classification accuracy, reaching 83.45 percent for Pam50 subtypes and 77.55 percent for IntClust subtypes, while significantly reducing computational time.
This research is particularly valuable for biomedical researchers, data scientists, and healthcare professionals working in cancer genomics, bioinformatics, and AI-driven diagnostic systems. It provides strong evidence that efficient ensemble learning methods can rival deep neural networks in medical decision-support applications.
