An integrative data clustering method is applied to reclassify human tumors
Cell-of-origin influences, but does not fully determine, tumor classification
Immune features and copy-number aberrations define the most mixed tumor groups
Multi-cancer groups reveal new features with potential clinical utility
We conducted comprehensive integrative molecular analyses of the complete set of tumors in The Cancer Genome Atlas (TCGA), consisting of approximately 10,000 specimens and representing 33 types of cancer. We performed molecular clustering using data on chromosome-arm-level aneuploidy, DNA hypermethylation, mRNA, and miRNA expression levels and reverse-phase protein arrays, of which all, except for aneuploidy, revealed clustering primarily organized by histology, tissue type, or anatomic origin. The influence of cell type was evident in DNA-methylation-based clustering, even after excluding sites with known preexisting tissue-type-specific methylation. Integrative clustering further emphasized the dominant role of cell-of-origin patterns. Molecular similarities among histologically or anatomically related cancer types provide a basis for focused pan-cancer analyses, such as pan-gastrointestinal, pan-gynecological, pan-kidney, and pan-squamous cancers, and those related by stemness features, which in turn may inform strategies for future therapeutic development.
Genomic and other molecular analyses across many types of cancer have revealed a striking diversity of genomic aberrations, altered signaling pathways, and oncogenic processes. We hypothesized that this diversity arises from endogenous factors, such as developmental and differentiation programs and epigenetic states of the originating cells, in conjunction with exogenous factors, such as mutagenic exposures, pathogens, and inflammation. Here, we performed an integrative analysis of approximately 10,000 human samples representing 33 different cancers, to provide the first comprehensive view of the molecular factors that distinguish different neoplasms in The Cancer Genome Atlas (TCGA).
In 2014, TCGA Research Network reported an interim analysis of 3,527 tumors from 12 different cancer types (Pan-Cancer-12), integrating six genome-wide platforms that assayed tumor DNA (exome sequencing, DNA methylation, and copy number), RNA (mRNA and microRNA sequencing), and a cancer-relevant set of proteins and phosphoproteins (Hoadley et al., 2014). The analysis tested the hypothesis that molecular signatures might provide a taxonomy that differed from the current organ- and tissue-histology-based pathology classification (Hoadley et al., 2014). This effort extended beyond cancer subtype classification by individual molecular platforms by employing an integrated clustering algorithm to identify higher-level structures and relationships. These integrated subtypes shared mutations, copy-number alterations, pathway commonalities, and microenvironment characteristics that appeared influential in the new molecular taxonomy, beyond any phenotypic contributions from tumor stage or tissue of origin. We estimated that at least one in ten cancer patients might be classified (and perhaps treated) differently using such a molecular taxonomy, rather than the current histopathology-based classification.
Given that the earlier analysis included only a third of the final set of TCGA tumors, it seemed appropriate to analyze all 33 tumor types (called the PanCancer Atlas) to address the intriguing questions left unanswered: whether the inclusion of many more tumors and tumor types enhances the number of cross-tissue associations, produces additional convergent and/or divergent integrated molecular subtypes, and significantly increases the fraction of cancer patients whose classification or treatment might be affected by this new taxonomic approach.
We present a new PanCancer Atlas integrative analysis using iCluster (Shen et al., 2009, Shen et al., 2012) identifying 28 distinct molecular subtypes arising from the 33 different tumor types analyzed across at least four different TCGA platforms. We confirmed significant taxonomic divergences from and convergences with the routinely used clinical tumor classification system. We employed a new 2D visualization approach, TumorMap (Newton et al., 2017), to intepret the relationships between the samples and iClusters. The PanCancer Atlas molecular classification also provides a rationale for several TCGA analyses based on organ systems or differentiation states, including pan-gastrointestinal (GI) (Liu et al., 2018), pan-gynecological (gyn) (Berger et al., 2018), pan-kidney (Ricketts et al., 2018), pan-squamous (Campbell et al., 2018), and cancer stemness features (Malta et al., 2018).
Specimens and Tumor Types
This PanCancer study encompassed 11,286 tumor samples from 33 cancer types, for which molecular data were available from at least one of the five assay platforms. Of these, 9,759 had complete data for 4 platforms: aneuploidy, DNA methylation, mRNA and miRNA. RPPA protein data were available for a subset of samples (7,858). Hematologic and lymphatic malignancies included acute myeloid leukemia (LAML), lymphoid neoplasm diffuse large B cell lymphoma (DLBC), and thymoma (THYM). Solid tumor types were from gynecologic (ovarian [OV], uterine corpus endometrial carcinoma [UCEC], cervical squamous cell carcinoma and endocervical adenocarcinoma [CESC], and breast invasive carcinoma [BRCA]), urologic (bladder urothelial carcinoma [BLCA], prostate adenocarcinoma [PRAD], testicular germ cell tumors [TGCT], kidney renal clear cell carcinoma [KIRC], kidney chromophobe [KICH], and kidney renal papillary cell carcinoma [KIRP]), endocrine (thyroid carcinoma [THCA] and adrenocortical carcinoma [ACC]), core gastrointestinal (esophageal carcinoma [ESCA], stomach adenocarcinoma [STAD], colon adenocarcinoma [COAD], and rectum adenocarcinoma [READ]), developmental gastrointestinal (liver hepatocellular carcinoma [LIHC], pancreatic adenocarcinoma [PAAD], and cholangiocarcinoma [CHOL]), head and neck (head and neck squamous cell carcinoma [HNSC]), and thoracic (lung adenocarcinoma [LUAD], lung squamous cell carcinoma [LUSC], and mesothelioma [MESO]) organ systems. Cancers of the central nervous system (glioblastoma multiforme [GBM] and brain lower-grade glioma [LGG]) and soft tissue (sarcoma [SARC] and uterine carcinosarcoma [UCS]) were represented, as were cancers from neural-crest-derived tissues, such as pheochromocytoma and paraganglioma (PCPG), and melanocytic cancers of the skin (skin cutaneous melanoma [SKCM]) and eye (uveal melanoma [UVM]). (For a complete list of the TCGA cancer-type abbreviations, please see .)
Clustering by Individual Platforms
We explored the sample groupings from each individual assay platform. Using aneuploidy (AN), CpG hypermethylation (METH), mRNA (MRNA), miRNA (MIR), and protein (P), the resultant number of groups ranged from 10 to 25 (Figure 1). While cell-of-origin was a dominant feature of the classification, we observed tumors from different cancer types grouping and samples within a cancer type dispersing across groups.
Hierarchical clustering of 10,522 samples by chromosome arm-level aneuploidy yielded ten groups (Figure 1A; Table S1). Samples were split mainly by those with few alterations (AN7), those with moderate alterations (AN6,8-10), and those with many alterations (AN1-5). Over one-third of the samples displayed relatively sparse aneuploidy in AN7; these were enriched for THCA, LAML, PRAD, and THYM. We observed more distinct clustering by cell-of-origin among higher-aneuploid tumors. For example, AN2, characterized by chromosome (chr) 13 gain and chr18 loss, was strongly enriched for gastrointestinal tumors (COAD, READ, and STAD), and chromosomal instability (CIN) ESCA. Consistent with previous results (Hoadley et al., 2014), squamous (lung, head and neck, and esophageal) tumors clustered together by aneuploidy patterns, particularly 3p loss and 3q gain (AN3).
Unsupervised clustering of 10,814 tumors using DNA methylation data with 3,139 CpG sites that were hypermethylated in at least one tumor type identified 25 groups. Despite the exclusion of loci known to be involved in tissue-specific DNA methylation, tumors originating from the same organ often aggregated by cancer-type-specific hypermethylation (Figure 1B; Table S2). This result suggests that cancer-associated DNA hypermethylation in human cancers is influenced by pre-existing cell-type-specific chromatin marks or transcriptional programs, and not just by cell-type-specific DNA methylation patterns. Tumors within an organ system tended to co-cluster. Consistent with the aneuploidy analysis, squamous cell carcinomas (HNSC, ESCA, LUSC, and CESC) associated closely in METH2 and METH3. Gastrointestinal adenocarcinomas (ESCA, STAD, COAD and READ) were represented in a branch containing METH10 through METH13.
Unsupervised consensus clustering of 10,165 tumors by mRNA expression profiles identified 25 groups that contained at least 40 samples (Figure 1C; Table S3). While tumor type was a driving feature for many groups, several groups were comprised of tumors from different organ types. Samples with squamous morphology components (BLCA, CESC, ESCA, HNSC, and LUSC) grouped together. Similarly, tumors with tissue or organ similarities or proximity also grouped together. These included neuroendocrine and glioma tumors (GBM, LGG and PCPG), melanomas of the skin and eye (SKCM and UVM), clear cell and papillary renal carcinomas (KIRC and KIRP), adrenal cortical and chromophobe renal (ACC and KICH), hepatocellular and cholangiocarcinomas (LIHC and CHOL), a gastrointestinal group (COAD, READ, non-squamous ESCA, READ, and STAD), a digestive system group (PAAD, STAD, and a few ESCA), hematologic and lymphatic cancers (LAML, DLBC, and THYM), and two mixed lung cancer groups (LUAD and LUSC).
Unsupervised hierarchical clustering of miRNA expression profiles from 10,170 tumors yielded 15 groups (Figure 1D; Table S4). While six groups contained only a single cancer type, the remaining nine groups each represented a mix of cancer types. These included a squamous-enriched group (MIR2), a pan-kidney group (MIR11), and a pan-GI-enriched group (MIR6).
Hierarchical clustering of protein expression data from 7,858 samples across 32 tumor types (LAML did not have protein data) revealed ten distinct protein (P) groups (Figure 1E; Table S5). P1 (GBM, LGG) and P2 (DLBC, SARC, PCPG, UCS, THYM, and metastatic SKCM) were distinguished from the remaining 8 groups, largely corresponding to mesenchymal-like tumor types with high EMT signatures. Similar to the other individual data platforms, samples from related organ systems grouped together: luminal breast and gynecologic cancers (BRCA-Luminal, UCEC, and OV), plus some liver samples (LIHC) with high levels of ER-alpha, AR and IGFBP2 comprised the majority of the P3 and P4 groups. In addition, a pan-kidney (P6) and a pan-GI (P8) group were identified.
Integrative Clustering across Data Types
We used clustering of cluster assignments (COCA) algorithm (Hoadley et al., 2014) to assess the overlap of platform-specific memberships from each of the five molecular platforms (aneuploidy, mRNA, miRNA, DNA methylation, and RPPA) (Figure 2A). Many samples similarly grouped together by multiple platform-specific cluster memberships, both in groups that were defined by a single tumor type and in tumor types that co-clustered, such as KIRC and KIRP (pan-kidney). Gastrointestinal tumors (COAD, READ, STAD, and ESCA adenocarcinomas) co-clustered in the mRNA, miRNA, and RPPA platforms but were represented by several distinct DNA methylation clusters. Squamous histology cancers (LUSC, HNSC, CESC, ESCA, and BLCA) were similarly classified by the miRNA, mRNA and RPPA data but were further divided by the aneuploidy and DNA methylation data. Within pan-gyn cancers (BRCA, OV, UCEC, and UCS), RPPA data suggested that ovarian serous cystadenocarcinoma (OV) and UCEC (and ER+ LIHC) shared similarities at the protein level, whereas miRNA, mRNA, and DNA methylation data were grouped by their organ sites. Also of note, 13% of BRCA formed a subtype distinct from the majority of other BRCA, influenced by the mRNA and DNA methylation platforms.
While COCA showed high consistency across most data platforms, we found less concordance for aneuploidy, where more than a third of the samples were defined by few to no aneuploidy events. This group, AN7, included almost all the THCA and LAML samples, while not well defined by aneuploidy had strong concordance among the other data platforms. COCA is less powerful when the molecular patterns are not strong enough to specify a distinct group on multiple individual platforms. To complement this analysis, we explored joint clustering across all platforms simultaneously.
We performed integrative molecular subtyping with iCluster using the four most complete data types (copy number, DNA methylation, mRNA, and miRNA) across 9,759 tumor samples, identifying 28 iClusters (Figure 2B; Table S6). The relative contribution of each platform to the overall clustering was quantified by summing the different platform feature weights on the iCluster latent variables. Copy-number alterations contributed 47% to the overall integrated clustering results, followed by the transcriptome (mRNA and miRNA) at 42%, and DNA methylation at 11%.
For 16 of the tumor types, over 80% of samples grouped together in the same iCluster. Eight iClusters were dominated by a single tumor type (C24:LAML, C11:LGG [IDH1 mut], C6:OV, C8:UCEC, C12:THCA, C16:PRAD, C26:LIHC, C14:LUAD). Others contained tumors from similar or related cells or tissues: C28:pan-kidney (KIRC, KIRP), C15:SKCM/UVM-melanoma of the skin (SKCM) and eye (UVM), C23:GBM/LGG (IDH1wt), and C5:CNS/endocrine. Six tumor types had more diverse iCluster membership, with less than 50% of tumors represented in a given iCluster (BLCA, UCS, HNSC, ESCA, STAD, and CHOL).
The pan-GI cohort separated into three iClusters (C1, C4, and C18), primarily driven by differences in DNA methylation profiles. C1:STAD (Epstein-Barr virus [EBV]-CIMP) consisted of hypermethylated EBV-associated tumors, and C18:pan-GI (MSI) consisted mostly of microsatellite instability (MSI) tumors of STAD and COAD. C4:pan-GI (CRC) was predominantly COAD and READ with chromosomal instability (CIN) and a distinct aneuploidy profile (Figure 2B). The pan-squamous cohort formed three iClusters (C10, C25, and C27). The majority of LUSC fell into C10:pan-SCC, and nearly all CESC fell into C27:pan-SCC (human papillomavirus [HPV]). Even though all squamous iClusters were characterized by chromosome 3q amplification, unique features defined C10:pan-SCC (9p deletion) and C25:pan-SCC (Chr11 amp) (Figure 2B).
Among mixed tumor type iClusters, three were defined by copy-number alterations. C7:mixed was characterized by chr9 deletion, C2:BRCA (HER2 amp) mainly consisted of ERBB2-amplified tumors (BRCA, BLCA, and STAD), and C13:mixed (Chr8 del) contained highly aneuploid tumors, including a mixture of BRCA-Basal, UCEC (CN-high subtype), UCS, and BLCA. C3 and C20 were defined by their non-tumor-cell components including immune and stromal features.
We explored the non-tumor components of the iClusters in more detail. We estimated the stromal fraction as 1 minus tumor purity and the leukocyte fraction based on DNA methylation (Figure 3). C20 had the highest median stromal fraction followed by C14:LUAD, C10:pan-SCC, and C3 (Figure 3A). Each of these iClusters also displayed elevated leukocyte fractions (Figure 3B). To estimate how much of the stromal fraction was due to immune cell infiltration, we plotted the stromal fraction versus the leukocyte fraction (Figure 3C). In C3, more of the stromal fraction was defined by leukocytes than in C20. C3 contained predominately mesenchymal cancers, which we labeled C3:mesenchymal (immune). C20 tumors were predominately mixed epithelial cancers, which we labeled C20:mixed (stromal/immune).
To characterize composition and relative homogeneity of each iCluster, we computed the dominant-cancer-type proportion within each iCluster and plotted it against the mean iCluster silhouette width, a measure of within-group homogeneity (Figure 2C). The silhouette widths ranged from ?0.05 to 0.59, with the highest silhouette widths belonging to single-cancer-type-dominant iClusters (C11:LGG [IDH1 mut], C12:THCA, C16:PRAD, and C24:LAML). Interestingly, 6 of the 7 pan-organ system iClusters (pan-GI: C1, C4, C18; pan-SCC: C25, C27, and pan-kidney: C28) had similar ranges of silhouette widths to those of single cancer-type dominant iClusters, suggesting that these were as robust as the cancer-type-dominant iClusters. iClusters driven by a shared specific chromosomal alteration (e.g., C13:mixed [chr8 del]) tended to compose multiple tumor types and appeared to have among the lowest silhouette widths, suggesting substantial molecular heterogeneity.
We used a Sankey diagram to further visualize the relationship between the iCluster classification, cancer types, and organ systems (Figure 2D). Pan-kidney mapped almost entirely to C28, except for KICH, which grouped with ACC in C9, characterized by a high frequency of hypodiploid samples (Davis et al., 2014, Zheng et al., 2016). However, pan-GI, pan-gyn, and pan-squamous were distributed among multiple iClusters. C20:mixed (stromal/immune) was fairly heterogeneous, including pan-GI, pan-gyn, and pan-squamous. Pan-gyn and pan-squamous overlapped, as cervical cancer is primarily a squamous cell carcinoma. This analysis demonstrated that the iClusters were strongly influenced by the cell type of origin for the individual cancers, though this relationship was not absolute.
Tumor Maps of Organ Systems
We visualized the samples by calculating Euclidean distances between the iCluster latent variables for all sample pairs and projecting the distances onto a 2D layout with TumorMap (Figure 4A; Table S7) (Newton et al., 2017). We overlaid the tumor-type colors to reveal that tumors systematically assembled along the major organ systems (Figure 4B), lending further support for the organ-system groups explored in accompanying papers (Figure 4C) (Berger et al., 2018, Campbell et al., 2018, Liu et al., 2018, Malta et al., 2018, Ricketts et al., 2018). More subtle differences within individual iClusters were apparent, potentially signifying important distinctions from the dominant cell-of-origin-associated signals. Kidney tumors separated into KICH, KIRC, and KIRP (Ricketts et al., 2018), and CIMP kidney tumors were positioned near the Pan-GI CIMP tumors, suggesting similarities driven by DNA hypermethylation data (Figure 4D). Pan-gyn subtypes displayed partial overlap (Berger et al., 2018) (Figure 4E). Pan-gyn samples were broadly distributed, accounting for at least 5% of samples in 11 of the 28 iClusters. However, the majority of cervical cancers fell into the squamous C27:pan-SCC (HPV) with HPV-positive HNSC and BLCA, whereas other samples fell primarily within C6:OV, C19:BRCA (luminal) and C8:UCEC, reflecting their cell-of-origin and hormonal dependency (Berger et al., 2018). The pan-GI tumors separated into distinct molecular subtypes represented by MSI tumors, hypermutated-SNV tumors, genome-stable tumors, CIN tumors, and EBV-associated gastric cancers (Liu et al., 2018) (Figure 4F).
The TumorMap landscape showed that tumors with similar pathologic classification tended to assemble together, even though histopathologic information was not used in the map generation (Figure 5A). This result underscores the influence of the cell of origin on the molecular patterns observed in cancer and provides further support for the pan-squamous sub-analysis (Campbell et al., 2018). Immune-signaling subtypes identified in Thorsson et al. (2018) also co-localized on the TumorMap, indicating relationships between the iClusters, histopathology, and the types of immune infiltration (Figure 5B). Pan-squamous tumors shared predominant wound healing and interferon (IFN)-gamma-dominant immune signatures.
Cancer stemness has been proposed as a possible mechanism for treatment resistance and as a driver of the ability of subpopulations to repopulate new metastatic niches (Jin et al., 2017). Two stemness indices (Malta et al., 2018), based on mRNA expression and on DNA methylation data, revealed aggregation of high stemness tumors across distinct regions of the TumorMap (Figures 5C and 5D). TGCT showed strong enrichment of both signatures while others, such as LAML, showed strong enrichment only for the mRNA-based signature.
Mutational Assessment of iClusters
We did not use tumor mutation data in generating iClusters due to sparsity of mutations; however, we did use mutational burden and signatures for characterization. Overall somatic mutation burden varied among iClusters. Melanomas and lung adenocarcinomas have been shown to have relatively high mutation rates, and we observed similar results with C15:SKCM/UVM and C14:LUAD (Lawrence et al., 2013). Pan-GI and pan-squamous were also associated with overall higher somatic mutational burdens (Figure 6A). Mutation frequencies varied widely within the two iClusters with the most diverse tumor compositions: C3:mesenchymal (immune) and C20:mixed (stromal/immune). Mutational signatures (Covington et al., 2016) also varied among iClusters. Expected signatures were apparent, such as enrichment for UVB signatures in C15:SKCM/UVM, smoking in C14:LUAD, and POLE mutation in hypermutated samples of C8:UCEC and C4:pan-GI (CRC) (Figure 6B). We also found enhanced signatures in a few of our pan-organ groups such as C18:pan-GI (MSI), which showed enrichment of known (CpG, toxins) and unknown mutational signatures, some of which are likely related to the high proportion of mismatch-repair deficient tumors in this group (Figure 6B).
Pathway Characteristics of the PanCancer iCluster Subtypes
We compared the PARADIGM-inferred activation of ～19,000 pathway features (Vaske et al., 2010), as well as expression-based scores of 22 gene programs defined previously (Hoadley et al., 2014), and 18 canonical targetable pathways, to identify differential pathway characteristics across the 28 iClusters (Figure 7; Table S8). C28:pan-kidney was characterized by high hypoxia signaling, retinoid metabolism, low proliferation, PPAR-RXR pathway and immune-related signaling, including immune checkpoints PD-1 and CTLA4. However, KICH co-clustered with ACC in C9:ACC/KICH, lacking hypoxic and immune signals and showing low activity in nearly all pathways. Both these tumor types have previously been characterized as hypodiploid (Davis et al., 2014, Zheng et al., 2016).
Despite having very different cancer type compositions, the pan-squamous iClusters C10:pan-SCC, C25:pan-SCC (chr11 amp), and C27:pan-SCC (HPV) shared many pathway characteristics. All had high levels of squamous-cell-related signaling (dNp63 and TAp63 complexes and GP6), proliferation-related pathways, relatively high hypoxia, immune-related signaling, and high basal signaling.
Although the Pan-GI iClusters C1:STAD (EBV-CIMP), C4:pan-GI (CRC), and C18:pan-GI (MSI) shared some common characteristics such as relatively high proliferation signaling, these iClusters diverged in some respects. Immune-related signaling was high in C1:STAD (EBV-CIMP) and C18:pan-GI (MSI), but not in C4:pan-GI (CRC). In addition, C20:mixed (stromal/immune) contained 32% Pan-GI samples and also displayed strong immune-related signaling. Beta-catenin/cell-cell adhesion signaling appeared high in C4:pan-GI (CRC), C18:pan-GI (MSI), and C20:mixed (stromal/immune), but not in the smaller C1:STAD (EBV-CIMP).
Most UCS co-clustered with a subset of Basal BRCA, UCEC and BLCA in C13:mixed (chr8 del), with high basal signaling and proliferation in the absence of immune activation. Interestingly, another subset of Basal breast cancers co-clustered with squamous cancers in the C20:mixed (stromal/immune), which also had high basal signaling and proliferation, but activated immune signaling. OV and UCEC shared a number of pathway similarities with cervical cancers and a subset of Basal breast cancers despite falling into different iClusters. These similarities included high proliferation and DNA repair pathways and basal signaling. Although the estrogen-signaling gene program (GP7) was very high in the breast cancer iClusters C2:BRCA (HER2 amp) and C19:BRCA (luminal), that program did not appear to be high in the other gynecological cancers.
With nearly three times more tumors and tumor types profiled in this PanCancer Atlas analysis, we were able to detect more integrated molecular subtypes than we had reported in the original Pan-Cancer-12 analysis (Hoadley et al., 2014). We first performed unsupervised consensus clustering of tumor profiles from each of the 5 platforms, revealing from 10 to 25 platform-specific molecular subsets within ～10,000 tumors, each showing significant compositional heterogeneity based on classical tumor taxonomy (Figure 1). Aneuploidy classifications were weakly consistent with other classifications, in part due to low numbers of arm-level copy-number events in one-third of the tumors. We explored cross-platform cluster relationships using COCA and employed iCluster to integrate the multiplatform molecular data simultaneously into a final 28-cluster solution.
While a third of iClusters were mostly homogeneous for a single tumor type, the other two-thirds showed varying degrees of heterogeneity. The most diverse group, C20:mixed (stromal/immune), contained a remarkable 25 tumor types (Figures 2C and 2D). Most of the heterogeneous iClusters, including C20:mixed (stromal/immune), contained tumor types that fell within four major cell-of-origin, or organ system, patterns (Figure 2D): pan-GI, pan-gyn, pan-squamous, and pan-kidney. Individual cluster assignments, COCA, and iCluster-determined molecular subsets were concordant, and confirmed the multiplatform co-clustering of different kidney malignancies (pan-kidney), various gastrointestinal malignancies (pan-GI), diverse squamous cell malignancies (pan-squamous) and most gynecological malignancies (pan-gyn) into molecular subgroups, each with subordinate platform-specific subsets (Figure 2A). Consequently, these four major cell-of-origin patterns are the subject of separate in-depth reports detailing their distinguishing genomic and molecular features (Berger et al., 2018, Campbell et al., 2018, Liu et al., 2018, Malta et al., 2018, Ricketts et al., 2018). These iCluster assignments have potential clinical utility, and their multi-platform basis suggests that this new subclassification system might further improve the management of the 1%–3% of all cancer patients newly diagnosed with cancer of unknown primary (CUP). Using either RNA (Hainsworth et al., 2013) or DNA methylation (Moran et al., 2016) profiling has recently led to improved patient outcomes by better defining the tissues of origin for this diverse group of life-threatening malignancies.
While separate spatial co-localization of the four major cell-of-origin patterns was generally evident in the TumorMap visualization (Figure 4), heterogeneity was also apparent between subsets within these individual iClusters, even those with generally similar tumor type, organ system, and histopathology. This indicates that while iCluster groupings were strongly influenced by organ and cell-of-origin patterns, this influence did not fully determine their molecular groupings such as seen in our largest and most heterogeneous iCluster, C20:mixed (stromal/immune), which contained 25 of our 33 tumor types. The spatial relationships of C20:mixed (stromal/immune) tumors to C10:pan-SCC and C13:mixed (chr8 del) tumors may be determined in part by their different mRNA and DNA methylation-based stemness signatures (Figures 5C and 5D).
Interrogation of individual iClusters for their differentiating PARADIGM pathway features, canonical pathways, and gene programs amenable to drug targeting identified strong immune-related signaling features for both C3:mesenchymal (immune) and C20:mixed (stromal/immune) tumors, suggesting that they may share potential susceptibility to immunotherapy. We noted that C20:mixed (stromal/immune) and C3:mesenchymal (immune) tumors were commonly enriched for gene programs representing PD1, CTLA4, and GP2-T cell/B cell activation (Figure 7B), indicating that new therapies targeting these specific immune pathways might be appropriate. Another potentially clinically relevant similarity was upregulation of different druggable growth factor signaling pathways (Figure 7B). In particular, our PARADIGM analysis showed that C3:mesenchymal (immune) and C20:mixed (stromal/immune) tumors shared upregulated JAK2/STAT1,3,6 signaling with C14:LUAD tumors and C10:pan-SCC, pointing to the possibility of treating these diverse iCluster tumors with JAK-STAT agents currently approved to treat rheumatoid arthritis, myelofibrosis, polycythemia vera, and other non-malignant diseases (Banerjee et al., 2017).
Compared to the seemingly discohesive groupings of the 17 heterogeneous iClusters, the 11 most homogeneous iClusters (C6:OV, C8:UCEC, C11:LGG [IDH1 mut], C12:THCA, C14:LUAD, C15:SKCM/UVM, C16:PRAD, C19:BRCA [luminal], C21:DLBC, C24:LAML, C26:LIHC) had higher silhouette widths, uniform tumor types, and histopathologies, but showed surprising degrees of spatial discohesion in the TumorMap. These anatomically homogeneous iClusters also showed mixed types of immune infiltration and variable degrees of stemness, attesting to their underlying molecular heterogeneity, as previously reported (Cancer Genome Atlas Network, 2015, Cancer Genome Atlas Research Network, 2011, Cancer Genome Atlas Network, 2012, Cancer Genome Atlas Research Network, 2014a, Cancer Genome Atlas Research Network, 2014b, Cancer Genome Atlas Research Network, 2015a, Cancer Genome Atlas Research Network et al., 2015b, Cancer Genome Atlas Research Network, 2017, Cancer Genome Atlas Research Network et al., 2013a, Cancer Genome Atlas Research Network et al., 2013b, Robertson et al., 2017).
While malignancies arising from the same anatomical site have traditionally been treated clinically as a single entity, histologic and molecular sub-classifications are now routinely used to determine treatments for subtypes of lung, breast, gastrointestinal, skin and bone marrow derived malignancies. As drugs become increasingly clinically available to target such cancer-driving pathway targets as ALK, EGFR, ERBB2, ERα, KIT, BRAF, and ABL1, the traditional system of anatomic cancer classification should be supplemented by a classification system based on molecular alterations shared by tumors across different tissue types (Hoadley et al., 2014, Saunders et al., 2012). This concept has led to the development of so-called basket or umbrella trials, such as the NCI-MATCH study, to investigate the feasibility and validity of this new clinical approach (Ramos et al., 2015). However, exceptions that challenge this concept have also become apparent from such notable examples as the unpredictable clinical responses to a potent BRAF inhibitor across diverse malignancies all expressing the same BRAF mutation (Saunders et al., 2012). Integrated molecular tumor profiling such as described here, and in our previous Pan-Cancer-12 analysis, may improve basket-trial design by considering both mutations and oncogenic signaling pathways along with consideration of each tumor’s tissue-specific or cell-of-origin context (Hoadley et al., 2014).
We are grateful to the patients and families who contributed to this study. We also thank the NCI TCGA Program Office and NHGRI coupterpart for organizational and logistical support. This work was supported by NIH grants (U54 HG003273, U54 HG003067, U54 HG003079, U24 CA143799, U24 CA143835, U24 CA143840, U24 CA143843, U24 CA143845, U24 CA143848, U24 CA143858, U24 CA143866, U24 CA143867, U24 CA143882, U24 CA143883, U24 CA144025, and P30 CA016672).
Conceptualization: K.A.H., J.M.S., C.C.B., and P.W.L. Data Curation: K.A.H., A.D.C., V.T., R.A., R.B., and T.H. Formal Analysis: K.A.H., C.Y., T.H., D.M.W., E.D., R.S., A.M.T., A.D.C., V.T., R.A., R.B., C.K.W., F.S.-V., A.G.R., M.S.L., and T.M.M. Composition of Figures and Graphical Abstract: T.H., A.G.R., D.M.W., C.Y., and P.W.L. Writing – Original Draft: K.A.H., C.Y., T.H., D.M.W., A.J.L., A.M.T., V.T., R.A., M.W., A.G.R., B.G.S., C.C.B., and P.W.L. Writing – Review & Editing: K.A.H., C.Y., T.H., D.M.W., A.J.L., E.D., R.S., A.M.T., A.D.C., V.T., R.A., R.B., C.K.W., M.W., F.S.-V., A.G.R., B.G.S., M.S.L., H.N., T.M.M., J.M.S., C.C.B., and P.W.L. Supervision: K.A.H., and P.W.L.
Declaration of Interests
Michael Seiler, Peter G. Smith, Ping Zhu, Silvia Buonamici, and Lihua Yu are employees of H3 Biomedicine, Inc. Parts of this work are the subject of a patent application: WO2017040526 titled “Splice variants associated with neomorphic sf3b1 mutants.” Shouyoung Peng, Anant A. Agrawal, James Palacino, and Teng Teng are employees of H3 Biomedicine, Inc. Andrew D. Cherniack, Ashton C. Berger, and Galen F. Gao receive research support from Bayer Pharmaceuticals. Gordon B. Mills serves on the External Scientific Review Board of Astrazeneca. Anil Sood is on the Scientific Advisory Board for Kiyatec and is a shareholder in BioPath. Jonathan S. Serody receives funding from Merck, Inc. Kyle R. Covington is an employee of Castle Biosciences, Inc. Preethi H. Gunaratne is founder, CSO, and shareholder of NextmiRNA Therapeutics. Christina Yau is a part-time employee/consultant at NantOmics. Franz X. Schaub is an employee and shareholder of SEngine Precision Medicine, Inc. Carla Grandori is an employee, founder, and shareholder of SEngine Precision Medicine, Inc. Robert N. Eisenman is a member of the Scientific Advisory Boards and shareholder of Shenogen Pharma and Kronos Bio. Daniel J. Weisenberger is a consultant for Zymo Research Corporation. Joshua M. Stuart is the founder of Five3 Genomics and shareholder of NantOmics. Marc T. Goodman receives research support from Merck, Inc. Andrew J. Gentles is a consultant for Cibermed. Charles M. Perou is an equity stock holder, consultant, and Board of Directors member of BioClassifier and GeneCentric Diagnostics and is also listed as an inventor on patent applications on the Breast PAM50 and Lung Cancer Subtyping assays. Matthew Meyerson receives research support from Bayer Pharmaceuticals; is an equity holder in, consultant for, and Scientific Advisory Board chair for OrigiMed; and is an inventor of a patent for EGFR mutation diagnosis in lung cancer, licensed to LabCorp. Eduard Porta-Pardo is an inventor of a patent for domainXplorer. Han Liang is a shareholder and scientific advisor of Precision Scientific and Eagle Nebula. Da Yang is an inventor on a pending patent application describing the use of antisense oligonucleotides against specific lncRNA sequence as diagnostic and therapeutic tools. Yonghong Xiao was an employee and shareholder of TESARO, Inc. Bin Feng is an employee and shareholder of TESARO, Inc. Carter Van Waes received research funding for the study of IAP inhibitor ASTX660 through a Cooperative Agreement between NIDCD, NIH, and Astex Pharmaceuticals. Raunaq Malhotra is an employee and shareholder of Seven Bridges, Inc. Peter W. Laird serves on the Scientific Advisory Board for AnchorDx. Joel Tepper is a consultant at EMD Serono. Kenneth Wang serves on the Advisory Board for Boston Scientific, Microtech, and Olympus. Andrea Califano is a founder, shareholder, and advisory board member of DarwinHealth, Inc. and a shareholder and advisory board member of Tempus, Inc. Toni K. Choueiri serves as needed on advisory boards for Bristol-Myers Squibb, Merck, and Roche. Lawrence Kwong receives research support from Array BioPharma. Sharon E. Plon is a member of the Scientific Advisory Board for Baylor Genetics Laboratory. Beth Y. Karlan serves on the Advisory Board of Invitae.