Molecular classification of breast cancer, an unfinished story
Author:
Camilla Fiz
Date of publication: 15 September 2025
Last update: 15 September 2025
Abstract
Over the last decades, breast cancer has been classified in four subtypes and other molecular features. Several European research groups were involved in this process, from the initial studies in laboratories to large clinical trials. The leading goals were to predict the prognosis and improve the therapeutical regime for each patient. Today, these scopes have partially been met and a combination of approaches, both old and new, are combined to optimise the diagnosis and the treatment of breast cancer. As occurred in the past, the latest trends in technology have now the chance to further handle the complexity of breast tumours and improve the clinical applications of molecular classification.
Introduction
Past approaches to diagnosing and treating breast cancer are still used alongside a few applications of molecular classification. While the genetic perspective has led to partial changes in the daily clinical practice, it has been a significant shift in the understanding of breast tumours, which is still ongoing.
New and past approaches are nowadays used together to diagnose and treat breast cancer. The analysis of the disease volume and size is still crucial in defining the diagnosis, as well as chemotherapy for treatment. However, innovative tools based on the detection of gene alterations, miRNAs and other molecular features have also entered the clinics in the last years. The current classification of breast cancer in 4 subtypes – luminal A and B, HER2-enriched and the basal like Triple Negative Breast Cancer (TNBC) – is indeed crucial in guiding the therapeutical regime of patients. In addition, a set of genetic tests can provide more information about the benefits of chemotherapy in women with a specific molecular profile.
Although these changes in daily breast cancer practice, the shift in mindset has been particularly striking. The idea of a one-fits-all approach has been gradually replaced by the goal of a customised medicine, that is based on predicting the disease development more than its current features. The beginning of this change can be traced back to 50 years ago, when advances in gene analysis techniques were made and researchers were able to provide a deeper insight into cancer.
##Genes acting in concert
“Before working on DNA, I was a protein biochemist thinking that everything we needed to know in genetics was at the protein level. The switch into the gene studies started in the 1980s when DNA was isolated and discovered to be easier to study than proteins”. Anne-Lise Børresen-Dale, current geneticist at the University of Oslo and Oslo University Hospital, in Norway, has been one of the first researchers to study breast cancer from a genetical perspective in Europe. Thanks to the technique of gene-cloning and the polymerase chain reaction (PCR), the scientific community managed to give a name and a function to the genetic mutations and identify the first oncogenes and tumour suppressors. In the early 1990s, TP53 was the ‘cancer gene’ and the reference for the initial studies on the ‘mutagen fingerprints’ of cancer DNA. During the same years, in the United States, the gene for breast cancer susceptibility BRCA1 was discovered, followed by the BRCA2 in the United Kingdom.
Nevertheless, the tools for broad gene analyses weren’t yet ready to grasp this complexity. “We understood quite early that looking at one gene at a time would have been a lot of work and still insufficient”, tells Therese Sørlie, molecular biologist at the Oslo University Hospital, who started working with Børresen-Dale in the late 1990s. “In every cancer, a huge number of genes act together to regulate cell growth, death and differentiation and they can’t be singularly investigated”. More than an extreme precision in the analysis of gene expression, the concert of all these elements was important, as given by multiple molecular levels.
The simultaneous analysis of thousands of genes at the same time became possible in 1995, when microarrays was developed by Patrick Brown and colleagues at the Standford Medical University, in the United States (Schena et al., 1995). After the first experiments in plants, they were used to profile complex diseases, such as cancer, and became the second huge change in biomedicine after the isolation of molecules of DNA.
Thanks God it’s Frozen
Microarrays were not yet available in Europe in the late 1990s, when Børresen-Dale urged the adoption of new approaches to predicting patients’ responses to therapy. Indeed, together with P.E. Lønning, collaborator at Haukeland University Hospital in Bergen, she had a set of breast cancer samples from Oslo University Hospital that warranted special techniques for investigation. The tissues were large, fresh and frozen, which allows for better preservation of cellular structures and nucleic acids, and more accurate analysis. “Instead of ‘Thanks’ God it’s Friday’, we used to say, ‘Thanks’ God its Frozen’.” Børresen-Dale comments laughing: “But the main thing was that part of them was collected twice from the same patients”. Over a total of 65 surgical tissue samples, 20 tumours were gathered before treatment and four months after neoadjuvant treatment with doxorubicin. This practice was uncommon at the time, when samples were usually obtained once from surgical specimens before adjuvant treatment and rarely from the same patients during or after the therapy.
Being aware of the new assay of microarrays developed in the United States, Børresen-Dale flew with the samples and the colleague Therese Sørlie to California to join Patrick Brown, David Botstein and Charles Perou at the Standford University School of Medicine. There, they began using microarrays to investigate the expression of multiple genes at the same time. Their purpose was to predict the response to the therapy in those breast cancer patients rather than classification. However, when the data were ready and they started the analysis, they quickly realised the contours of a new classification system. The double paired samples provided a singular opportunity to explore dissimilarities and similarities between different patients, and especially within the same patients over time.
The analyses revealed that the expression of a set of almost 500 genes, the so-called intrinsic genes, vary between tumours but are highly similar in subsequent samples from the same patient. This was a crucial step towards the understanding that each cancer is characterised by a unique molecular portray, which is homogeneous over time but shapes the disease progression. Eventually, the group proposed five subtypes of breast tumour in 2000, extending the results the following year (Perou et al., 2000; Sørlie et al., 2001). This classification involved the groups of luminal A and luminal B plus C characterised by positivity to oestrogen receptors and a higher survival, while the ERBB2+ and the basal-like subtypes were associated with a lower rate of survival. The normal-like cluster was instead considered controversial due to its gene expression similar to normal breast epithelium.
Genes as galaxies in the sky
In the same period, other scientists began to get involved in this area of research and provide significant contribution in Europe. Laura van ‘t Veer was one of them. Currently, she is molecular biologist at the UCSF in California, but she started her career at the Netherlands Cancer Institute, in Amsterdam. In the early 1990s, Van ‘t Veer was the first molecular biologist in the Institute’s hospital and part of her work was to introduce a connection between the molecular techniques and the clinical setting. Communication was one of the first challenges she had to face: “We all had to learn something. For me, that was the language of the clinic; for the doctors, it was the language of molecular biology”, she says. “In the first years, we conducted a limited number of tests, as introductory work to improve patient diagnostics.” After a while, she and colleagues begun looking for a project that could have a high impact in diagnostics. They wanted to develop a new gene-signature to differentiate breast cancer patients with a low or high risk to develop metastasis in the first years after surgery.
On microarrays, they compared the gene expression of 78 patients with primary breast cancer, where almost half developed distant metastasis and half was free after at least 5 years. When they investigated the mRNA on microarrays and used bioinformatics to collect patterns, it turned out that skills in astrophysics could be useful too. “Hongyue Dai and Yudong He, astrophysicists originally from Beijing, in China, used their knowledge of mathematics to recognize the patterns in galaxy to recognize patterns on the microarrays”, she says. “Combining that with clinical statistics and previous knowledge, we identified a signature of 70 genes characterizing tumours with a different likelihood of recurrence.” According on their results, the patients with a ‘poor-prognosis signature’ were supposed to have a higher probability to meet distant metastasis within 5 years as compared to those with a ‘good-prognosis signature’. Therefore, just the former would benefit from chemotherapy.
Together, the evidence of Børresen-Dale, Sørlie and van ‘t Veer, with all the colleagues involved in the studies, showed that subtypes in breast cancer could be defined through combining many molecular features. By different perspectives, they suggested the common principles to tumour classification by expression profiling and traced the line to the next directions of research in the field. On one side, the study of Børresen-Dale and Sørlie set the basis for the following molecular classification of breast cancer. On the other, the study of van ‘t Veer was closer to clinical and commercial tools for predicting the patients’ prognosis. Indeed, she later co-founded the company Agendia and commercialized the 70-gene signature under the brand MammaPrint.
The jump into clinics and new commercial opportunities
“In the early 2000s, breast cancer started being not considered as a single entity, but as a set of many subgroups based on endocrine sensitivity and gene expression. Phenotypes emerged determinants in driving treatment selection”, reports Etienne Brain, medical oncologist at the Institut Curie in Paris, who has spent the last 20 years studying the adjustment of adjuvant therapy in breast cancer patients. “Beyond simple immunohistochemistry we were more interested in tumour biology.” At that time, the molecular classification of breast cancer was emerging, while the biological one, based on the biochemical measurement of oestrogen receptors, was already established. Gradually, the Human Epidermal Growth Factor Receptor 2 (HER2) was added to the existing list of the oestrogen receptors (ER) and progesterone receptors (PR) to assess prognosis and predict treatment response in breast cancer. Moreover, it was developed the assay of immunohistochemistry, which allows to guide the treatment regimens in a simple and inexpensive manner.
As a result, a few therapies started being differentiated by groups. In 2001, the International Consensus Panel on the Treatment of primary breast cancer acknowledged the endocrine treatment with tamoxifen as the standard of care for ER-positive patients, while trastuzumab, a monoclonal antibody that targets the HER2 receptor, was considered for patients who express it (Goldhirsch et al., 2001). Ten years later, the endocrine responsiveness was associated with the intrinsic molecular subtypes identified by Børresen-Dale, Sørlie and colleagues, for the first time (Gnant et al., 2011). But still, many doubts remained about adjuvant therapy. New solutions were proposed by computer software and tools capable of combining many parameters in the early 2000s. A leading example was Adjuvant!, the algorithm that help clinicians to understand patients for chemotherapy, using standard features, such as the tumour size and the nodes involves, and to a minor extent the ER status (Ravdin et al., 2001).
Other tools were instead based on gene signature, which became a fertile ground for business. Actually, companies tended to oppose reducing the number of patients for which therapies were developed at the very beginning pharmaceutical. As van ‘t Veer told to Jama in 2004, things changed when drugs became much specific and too expensive to be introduced to large populations and, for instance, firms began working on introducing microarrays into clinics. Indeed, though revolutionary, it was too complex and expensive for routine clinical analysis. Many scientists and companies also collaborated to translate the results of research into commercial tools. The MammaPrint by Laura van‘t Veer was the first multi-analyte test to be recognised by FDA in 2007, and later others were developed in the United States. They were mostly Oncotype DX and Prosigna, with the latter based on the intrinsic subtypes discovered by Børresen-Dale, Sørlie and colleagues. Large clinical trials thus began testing the efficacy of these tools in patients, before being introduced in the ESMO guidelines in 2015 for the first time (Senkus et al., 2015). For instance, the MINDACT trial evaluated the MammaPrint test by Laura van‘t Veer on thousands of participants scattered in various hospitals in Europe (Cardoso et al., 2020).
Discrimination and other risks
Multiple challenges emerged as the number of clinical trials increased in the early 2000s. They mostly concerned the need for large groups of samples or participants to collect significant, controlled and reproducible data in laboratory studies, as well as in clinical trials. Though it is a common problem in medicine, it is particularly prevalent in tumours and molecular studies. Cancer is an extremely variable and flexible disease since it changes between different people and evolves within the same patient during time. This complexity even increased when subtypes and other molecular classifications started being considered and samples were fragmented in smaller subgroups. In some cases, that resulted in ineffective tools for diagnosis and treatment in clinics, or fewer than expected. In others, it led to discriminate against vulnerable more patients.
Despite people over the age of 65 constitute large proportion of breast cancer patients and could particularly benefit from disease classification, they have been ignored by most clinical trials since the beginning (Hutchins et al., 1999). In 2005, while treating a 79-years old patient, Etienne Brain noted the “cruel lack of data, in women of this age, to guide [their] therapeutic regime”, as he said to the newspaper Le Monde a few years ago. “Older women are usually much less represented than in real life, as they represent a multicomplex clinical context that creates ‘background noise’ in the development of innovations and slow it”, Brain adds. “This doesn’t happen with younger populations, where there is a better control of trial conditions and reproducible data are easier to be obtained.” Because of this underrepresentation, also the tools for prediction and prognosis haven’t been enough accurate for older patients (Brain, 2014).
The situation currently remains the same, with oncology tending to focus more on younger patients. Nevertheless, there have been a few advances towards a more inclusive approach, thanks to the development of alternative algorithm to predict the prognosis of older patients and innovative clinical trials. An example is the ASTER-70 trial conducted by Etienne Brain and colleagues in a group of almost 2.000 breast cancer patients over the age of 70. Recently, this randomised phase III trials estimated that the addition of adjuvant chemotherapy to endocrine therapy doesn’t significantly increase the survival in women which were positive for ER, negative for HER2, and at a high risk of recurrence according to the Gene expression Grade Index (GGI) (Brain et al., 2025). The latter is a genomic signature developed to improve the prognostic power of histology for certain patients (Bertucci et al., 2009).
Old and new together
In the following years, important studies were also conducted in others European centres, such as at the Vall d’Hebron Barcelona Hospital in Spain, the Karolinska Institute in Sweden, the Institut Gustave Roussy in France, the European Institute of Oncology in Italy, and so on. Over time, data from both laboratories and clinical trials have distinguished four distinct molecular subtypes of breast cancer: luminal A and B, HER2-enriched and basal like TNBC. The normal-like group was thus excluded, since it was due to the contamination of cancer samples with normal epithelial cells. Each of the four subtype is currently given by combining multiple parameters, including the status of the hormonal receptors – ER, PR and HER2 –, gene mutations, such as BRCA and p53, specific sets of microRNAs, the proliferative factors Ki67 and others. Nevertheless, this classification has not replaced the previous ones, rather complements them. In daily practice, pathological assessment still requires doing biopsy and examining the size and histology of tumours, the involvement of lymph nodes, the tumour-node-metastases staging system and many other parameters, which are not necessarily related to subtypes.
Generally, once breast cancer has been confirmed through mammography and lymph node assessment, immunohistochemistry is used to evaluate hormone receptors and distinguish the patients’ prognoses. Positivity for ER and PR usually indicates a favourable prognosis, classifying the tumour as a luminal A or luminal B subtype. The HER-2 enriched group comprises patients who test positive for HER-2 and have an intermediate prognosis. In contrast, women with a Triple Negative Breast Cancer (TNBC) subtype tests negative for all the hormonal receptors and typically have a poor prognosis. Nowadays, large part of breast cancer patients is classified as luminal A or B and opt for hormonal treatment, such as tamoxifen, experiencing lower adverse effects than chemotherapy. The others follow a proper therapeutic regime, which mostly consists of surgery, chemotherapy and radiotherapy.
Besides this approach of classification, certain patients have access to further genetic tests. MammaPrint, Prosigna and Oncotype DX can predict the need and benefits of adjuvant chemotherapy in breast cancers positive for ER and negative for HER2 (Loibl et al., 2024), but they can’t always be applied. Unlike the cheap and standard immunohistochemistry used to identify molecular subtypes, these genetic tests are expensive and not always reimbursed by European countries’ National Health Services. When they are not, their use depends on patients’ financial means and whether they are insured with private insurance companies. The kind of test employed, moreover, is based on the reimbursement rules and the management of resources in each hospital.
The end is not yet
Evidently, the one-fits-all approach has not been overcome yet in breast cancer, as in other types of tumours. However, we now have a variety of tools that allow us to profile diseases and adapt treatments more accurately than was possible in the early 1970s. Thanks to this, many breast cancer patients can access to targeted drugs, and the use of chemotherapy has been limited. Although the need for breast cancer molecular classification has not been met yet, it is a priority for current research projects.
Some studies are now exploring new genetic mutations, while others are looking beyond tumour cells to their surroundings. This time, artificial intelligence is the new technology with the potential to shape the future research and clinical applications. The immune system is involved in new algorithms to predict the prognosis and response to immunotherapy (Hu et al., 2023; Zheng et al., 2022), while machine learning systems are being used to predict metastases (Botlagunta et al., 2023). In the next years, artificial intelligence could help to improve the gene expression tools and combine multiple more data related to patients’ clinical and family history or lifestyle. Therefore, even after several decades, the story of breast cancer molecular classification is still in its infancy. Keeping an eye on its past can help us to foresee upcoming challenges, such as the risks of discrimination in clinical trials and limited accessibility to innovative approaches.
References
Bertucci, F., Le Doussal, J. M., Birnbaum, D., Tagett, R., Martinec, A., Hermitte, F., Marisa, L., Martin, A. L., Geneve, J., Roché, H., & Penault-Llorca, F. (2009). Prognostic value of the genomic grade index (GGi) as compared to centrally measured Ki-67 IHC and mitotic index in early breast cancer patients. Journal of Clinical Oncology, 27(15), 11094–11094.
Botlagunta, M., Botlagunta, M. D., Myneni, M. B., Lakshmi, D., Nayyar, A., Gullapalli, J. S., & Shah, M. A. (2023). Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports, 13(1), 485.
Brain, E. (2014). Breast cancer in older women: Predicting adjuvant benefit. The Lancet Oncology, 15(7), 672–674.
Brain, E., Mir, O., Bourbouloux, E., Rigal, O., Ferrero, J.-M., Kirscher, S., Allouache, D., D’Hondt, V., Savoye, A.-M., Durando, X., Duhoux, F. P., Venat-Bouvet, L., Blot, E., Canon, J.-L., Rollot Trad, F., Bonnefoi, H., Roque, T., Lemonnier, J., Latouche, A., … Richard, V. (2025). Adjuvant chemotherapy and hormonotherapy versus adjuvant hormonotherapy alone for women aged 70 years and older with high-risk breast cancer based on the genomic grade index (ASTER 70s): A randomised phase 3 trial. The Lancet, 406(10502), 489–500.
Cardoso, F., Van ’T Veer, L., Poncet, C., Lopes Cardozo, J., Delaloge, S., Pierga, J.-Y., Vuylsteke, P., Brain, E., Viale, G., Kuemmel, S., Rubio, I. T., Zoppoli, G., Thompson, A. M., Matos, E., Zaman, K., Hilbers, F., Dudek-Perić, A., Meulemans, B., Piccart-Gebhart, M. J., & Rutgers, E. J. (2020). MINDACT: Long-term results of the large prospective trial testing the 70-gene signature MammaPrint as guidance for adjuvant chemotherapy in breast cancer patients. Journal of Clinical Oncology, 38(15), 506–506.
Gnant, M., Harbeck, N., & Thomssen, C. (2011). St. Gallen 2011: Summary of the Consensus Discussion. Breast Care, 6(2), 136–141.
Goldhirsch, A., Glick, J. H., Gelber, R. D., Coates, A. S., & Senn, H.-J. (2001). Meeting Highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. Journal of Clinical Oncology, 19(18), 3817–3827.
Hu, H., Zou, M., Hu, H., Hu, Z., Jiang, L., Escobar, D., Zhu, H., Zhan, W., Yan, T., & Zhang, T. (2023). A breast cancer classification and immune landscape analysis based on cancer stem-cell related risk panel. Npj Precision Oncology, 7(1), 130.
Hutchins, L. F., Unger, J. M., Crowley, J. J., Coltman, C. A., & Albain, K. S. (1999). Underrepresentation of Patients 65 Years of Age or Older in Cancer-Treatment Trials. New England Journal of Medicine, 341(27), 2061–2067.
Loibl, S., André, F., Bachelot, T., Barrios, C. H., Bergh, J., Burstein, H. J., Cardoso, M. J., Carey, L.A., Dawood, S., Del Mastro, L., Denkert, C., Fallenberg, E. M., Francis, P. A., Gamal-Eldin, H., Gelmon, K., Geyer, C. E., Gnant, M., Guarneri, V., Gupta, S., … Harbeck, N. (2024). Early breast cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up. Annals of Oncology, 35(2), 159–182.
Perou, C. M., Sørlie, T., Eisen, M. B., Van De Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, Ø., Pergamenschikov, A., Williams, C., Zhu, S. X., Lønning, P. E., Børresen-Dale, A.-L., Brown, P. O., & Botstein, D. (2000). Molecular portraits of human breast tumours. Nature, 406(6797), 747–752.
Ravdin, P. M., Siminoff, L. A., Davis, G. J., Mercer, M. B., Hewlett, J., Gerson, N., & Parker, H. L. (2001). Computer Program to Assist in Making Decisions About Adjuvant Therapy for Women With Early Breast Cancer. Journal of Clinical Oncology, 19(4), 980–991.
Schena, M., Shalon, D., Davis, R. W., & Brown, P. O. (1995). Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray. Science, 270(5235), 467–470.
Senkus, E., Kyriakides, S., Ohno, S., Penault-Llorca, F., Poortmans, P., Rutgers, E., Zackrisson, S., & Cardoso, F. (2015). Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Annals of Oncology, 26, v8–v30.
Sørlie, T., Perou, C. M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen, M. B., Van De Rijn, M., Jeffrey, S. S., Thorsen, T., Quist, H., Matese, J. C., Brown, P. O., Botstein, D., Lønning, P. E., & Børresen-Dale, A.-L. (2001). Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences, 98(19), 10869–10874.
Zheng, K., Luo, Z., Zhou, Y., Zhang, L., Wang, Y., Chen, X., Yao, S., Xiong, H., Yuan, X., Zou, Y., Wang, Y., & Xiong, H. (2022). A Framework to Predict the Molecular Classification and Prognosis of Breast Cancer Patients and Characterize the Landscape of Immune Cell Infiltration. Computational and Mathematical Methods in Medicine, 2022, 1–23.
1994
The mutation of the BRCA1 gene was discovered to play a role in breast cancer progression. It was soon followed by the identification of BRCA2.
1995
The microarray technology was developed at the Standford Medical University, in the United States.
2000
The concept of intrinsic genes and the molecular portraits of breast cancer was introduced by Anne-Lise Børresen-Dale and Therese Sørlie together with Patrick Brown, David Botstein and Charles Perou at the Standford University School of Medicine.
2001
The International Consensus Panel on the treatment of primary breast cancer set the guidelines for the endocrine treatment.
2003
The Human Genome Project released the completed sequencing of the human genome and marked the awareness of the genes’ role in cancer and other diseases.
2007
MammaPrint was the first gene-expression test to be approved by FDA to predict the prognosis of breast cancer patients.
2011
For the first time, the Saint Gallen Conference highlighted the relationship between the endocrine responsiveness and the intrinsic molecular subtypes.
2015
The gene expression tests, such as MammaPrint, Prosigna and Oncotype DX, were included in the ESMO guidelines for the treatment of breast cancer.

