Reading time: 3 minutes
It is important to know that breast cancer is the most common type of cancer among women in France, the European Union, and the United States. Although this disease was the leading cause of cancer death among women in 2018, the number of cases diagnosed each year has been decreasing since 2005. If breast cancer is detected at an early stage, the 5-year survival rate is 99%. Early detection of breast cancer therefore has a significant impact on reducing the mortality rate of the disease.
Currently, several artificial intelligence tools exist to help healthcare professionals accelerate diagnosis and facilitate treatment decisions. By combining genomic sequencing data with machine learning algorithms, it is possible to fight cancer.
Machine learning can aid in the detection, treatment, and prognosis of the disease, as well as in the development of personalized treatments. This approach leverages data from multiple patients to identify similarities and correlations between them and thus better understand the disease.
However, artificial intelligence is currently hampered by the limited amount of accessible data. So how can we enable AI to break through this barrier and reach the next stage of its evolution? To answer this question, we offer a use case using the “Breast Cancer Wisconsin (Diagnostic) – UCI Machine Learning Repository” database. This dataset aims to predict whether the tumor type is malignant or benign. We therefore decided to augment the training database of a classification artificial intelligence model using synthetic data.
1,000 digital twins complemented the 569 real patients in this study. These digital twins are synthetic data generated from artificial intelligence algorithms. These algorithms faithfully reproduce the characteristics of real patients while preserving their anonymity. This approach made it possible to expand the size of the training data set. This offers new perspectives for artificial intelligence models.
We compared the performance of several classification models. The results demonstrated a 5.2% improvement in the performance of models trained on a cohort combining real and virtual patients compared to models trained only on real patients. The benefits of synthetic data in this context are clear. The performance of artificial intelligence solutions for breast cancer classification is improved by adding synthetic data. This allows the models to be more accurate and reliable in detecting malignant and benign tumors. This can have a direct impact on the treatment decisions made by healthcare professionals.
By using synthetic data, it is possible to significantly expand the size of the training dataset. This allows models to learn from a more diverse and representative sample. Furthermore, synthetic data has the advantage of being anonymous, which solves privacy concerns and the protection of sensitive data from real patients. Researchers and healthcare professionals can therefore use this data without fear of violating the privacy of the individuals concerned.
The use of synthetic data has brought significant improvements in breast cancer detection through artificial intelligence. The performance of classification models has been enhanced, which therefore implies better predictions and a larger database for research. While preserving patient confidentiality, synthetic data thus opens new avenues for innovation in the fight against breast cancer. This promising approach then paves the way for new advances in the field of health.
COPYRIGHT © 2023 ALIA SANTé