Description
Breast cancer is a highly heterogeneous disease, and accurate classification of aggressive subtypes such as triple-negative breast cancer is critical for effective clinical decision-making. This research paper presents a comprehensive machine learning–based framework for classifying breast cancer into triple-negative and non-triple-negative categories using gene expression data.
The study analyzes RNA-sequencing data obtained from The Cancer Genome Atlas and applies rigorous quality control, normalization, and differential gene expression analysis to identify thousands of biologically relevant genetic features. Four supervised machine learning algorithms. Support Vector Machines, K-Nearest Neighbor, Naïve Bayes, and Decision Trees. are evaluated for classification performance.
Results demonstrate that the Support Vector Machine model outperforms other algorithms, achieving up to 90 percent accuracy, along with strong sensitivity and specificity. Feature selection experiments further show that optimal performance can be achieved using smaller subsets of highly significant genes, improving computational efficiency without sacrificing accuracy.
This research is valuable for biomedical researchers, clinical geneticists, data scientists, and healthcare professionals interested in precision medicine, cancer genomics, and AI-driven diagnostic systems. It provides strong evidence that machine learning can enhance molecular cancer classification and support early identification of high-risk patients.
