5  Genomics and Epigenomics Analysis

5.1 On this page

Biological insights and take-home messages are at the bottom of the page at Lesson Learnt: Section 5.5.

  • Here we investigate all the genomics and epigenomics data available for the three kidney carcinomas.
  • First, we analyse the gene Copy Number Variants (CNVs).
    • We test CNVs distribution across the three kidney carcinomas and investigate any signature associated with them.
    • We perform some exploratory analyses on samples, CNVs and clinical covariates.
  • The, we look into the somatic mutations detected across the three kidney carcinomas.
    • We test the somatic mutations distribution across the three kidney carcinomas and investigate any signature associated with them.
    • We perform some exploratory analyses on samples, somatic mutations and clinical covariates.
  • Finally, we focus on DNA methylation and epigenomics signals.
    • We perform a QC on probe intensities (corresponding to CpG islands different methylation levels), their distribution across samples and we impute eventual missing values.
    • We do some exploratory analyses on samples, probe intensities and clinical covariates.
    • We then run a formal Differential Expression analysis to identify probes that have different methylation levels across the three Kidney cancer types, and the genes associated to them.
    • We perform Gene Set Enrichment Analyses on the genes associated with differentially methylated CpG islands to investigate biological and molecular themes that discriminates between the three Kidney cancer types.

5.2 Copy Number Variants (CNVs)

5.2.1 Filtering & QC

The copy number variants are available at TCGA as information collapsed at the gene level. CNVs information are reported for 24,776 genes. For each gene, we have an integer value ranging from -2 (complete deletion of the genomic region) to +2 (complete duplication of both alleles), with 0 indicating no CNV event for that gene.

All of the 24,776 reported genes are affected by at least a CNV event across 877 kidney carcinomas samples, only for 10 samples we do not have CNV information.

Patients affected by KICH kidney carcinomas seem to have a significantly higher number of genes affected by CNVs, followed by KIRP patients. KIRC patients have the least amount of genes affected by CNVs. In KICH and KIRP patients, also, CNVs involving gene duplications are more abundant than the ones involving gene deletions.

Figure 1: CNVs distributions across kidney carcinomas.

5.2.2 CNVs signatures across Kidney cancers

Let’s now investigate the presence of any CNV signature across the three kidney carcinomas. As mentioned before, the provided TCGA CNVs data are reported at the gene level, but obviously, Whole Genome Duplication or Chromosomal Aberration events would usually be larger than a single gene and span several genes. Since we do not have access to the raw genomics data, a first naive approach would be to walk each chromosome and bridge together as single CNV event regions between consecutive genes having the same CNV score as reported by TCGA. This would allow to collapse the information of 24,776 different segments (the reported genes) into tens or hundreds of long, consecutive genomics ranges. The approach would work well under the assumption that large CNV events are much more likely to happen than short, isolated CNV events, i.e.: spanning just one or few genes. Moreover, the available CNV data are limited to the gene level (~1-2% of total human genome length), which implies that we have no direct information on CNV events (or lack thereof) over 98% of the genome. So, we have decided to do not bridge the genomic regions across consecutive genes with the same CNV score as single CNV events, and keep the downstream CNV analyses at the gene level.

Figure 2: CNVs distributions at the gene level across 877 kidney carcinomas biopsies.

As seen in Figure 1, biopsies from KICH patients show extensive CNV events that seems to span most of the coding genome. CNVs events seems to have similar patterns across the different kidney carcinomas, and they appear to span full-length chromosomes. Within each kidney carcinoma type, it seems like biopsies could be clustered in different subtypes dependeing on their CNV signatures.

In the next steps, we will try to correlates these CNV signatures to other clinical covariates.

5.2.3 Dimesionality Reduction and Dataset Exploration

5.2.3.1 Principal Component Analysis (PCA)

As we did for the other omics data, the next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 3: PCA, exploration.

The first 18 Principal Components capture more than 80% of the variance in the Kidney cancers CNVs dataset, with the first two components (PC1 and PC2) capturing a bit more than 33% of the variance.

When we project the samples in the PC1 and PC2, we can see that the PC1 separates KIRC, KICH and KIRP, which instead cluster together. As observed in the UMAP, the clustering is more noisy than what observed for the other omics analyses (transcriptomics, proteomics and micro-RNAs).

Figure 4: PCA, biplot PC1 x PC2.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resove the three cancer types. PC1 PC4 seems to best separate samples from KIRC, KIRP and KICH.

Figure 5: PCA, pairs plots.

Let’s check the Pearson correlation with other clinical covariates.

Cancer_type correlates well with PC1 and PC4. Most of the TCGA-defined molecular subtypes correlates with PC1, PC2 and PC4. Interestingly, TCGA Subtype_CNA, which should be based on he CNV data, correlates with PC3 and PC4, but not PC1. PC3 is interesting since it correlates with follow_up_tumor_status, pathologic_t and tumor_stage, suggesting a link between CNVs signatures and cancer advancement, which is a common feature across all kidney carcinomas and not limited to KIRC, KIRP or KICH.

Figure 6: PCA, correlations between clinical covariates and Principal Components.

5.3 Somatic mutations

Let’s now dig into the Somatic Mutations detected in the biopsies of the kidney carcinomas patients. We have somatic information for 711 samples (~80% of the total 887 patients in the cohort). TCGA provides information on somatic mutations as binary information for each gene: 1, if the patient has a non-silent mutation for that gene, or 0, if the patient does not have a non-silent mutation on that gene. We could retrieve info for for 40,542 genes, and 13,584 of them had a non-silent mutations in at least one of our 711 kidney carcinomas patients. The first step was then to discard from downstream analysis the 26,958 that appeared not mutated in the kidney carcinomas biopsies.

Let’s now look at the distribution of somatic mutations across the different kidney carcinomas. KICH patients had an average of 31 genes with non-mutations, against the 53 and 56 average number of genes with non-silent mutations in KIRC and KIRP patients, respectively. Chromosomes 1, 2, 3, 11, 17 nad 19 where the chromosomes with more mutated genes on average.

Figure 7: Non-silent somatic mutations distributions.

5.3.1 Mutational signatures across Kidney cancers

Let’s now investigate the presence of any non-silent somatic mutation signature across the three kidney carcinomas.

Figure 8: non-silent somatic mutations distributions across 711 kidney carcinomas biopsies.

While for KICH and KIRP patients the non-silent somatic mutations seems to be evenly distributed across the genome, KIRC patients clearly show an over-abundance of 3 preferably mutated genes, one in chromosome 2 and two in chromosome 3.

Using external_gene_name as id variables

In KICH, ~30% of the patients (21 / 66) had a non-silent mutation in TP53, followed by 9% of patients (6 / 66) having a mutation on PTEN. As expected, KIRC patients show the highest mutational burden, with ~47% of patients (170 / 365) with mutation in VHL, ~40% of patients (145 / 365) with mutation on PBRM1 and ~20% of patients (72 / 365) with non-silent somatic mutations on TTN. As well, 16% of KIRP patients (45 / 280) has non-silent somatic mutations on TTN gene.

Figure 9: top 10 genes with non-silent somatic mutations across the kidney carcinomas.

In the table below are reported all the 19,573 somatic mutations detected across the biopsies of the three kidney carcinomas.

5.3.2 Dimesionality Reduction and Dataset Exploration

5.3.2.1 Principal Component Analysis (PCA)

As we did for the other omics data, the next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 10: PCA, exploration.

The PCA captures little to no variance observed in the non-silent somatic mutations, moreover, it fails to cluster the biopsies based on the kidney carcinoma type of origin, as seen with the UMAP. Few samples seems to be outliers driving most of the variance observed in PC1 (1.82% variance) and PC2 (1.37% variance).

PC4 and PC6 seems to separate KIRC samples into four distinct clusters, one of which overlapping with a cluster containing as well KIRP and KICH biopsies.

Figure 11: PCA, pairs plots.

PC4 (0.66% of observed variance) seems to correlate with some clinical covariates, such as histological_grade and subtype_mRNA, but it is difficult to attach a biological relevance to this observation since we cannot properly cluster samples on the PCAs.

Figure 12: PCA, correlations between clinical covariates and Principal Components.

5.4 Epigenomics

Now, we move forward to analyse the epigenomics data, that provides information on the methylation status of the CpG islands across the genomes of the kidney carcinomas. The methylation status is captured with Illumina 450k arrays, and for each CpG island we have two measurements: a methylated intensity (M) and an unmethylated intensity (U) that allows for calculating the propostion of methilation at each CpG island.

For the TCGA epigenomics data, for each CpG island we have available the corresponding Beta value, which is calculated as:

              beta = M / (M / U)

Under ideal conditions, a value of zero indicates that all copies of the CpG site in the sample were completely unmethylated (no methylated molecules were measured) and a value of one indicates that every copy of the site was methylated.

5.4.1 Filtering & QC

We have methylation information for 646 out of the 887 kidney carcinoma biopsies: 65 KICH, 311 KIRC and 270 KIRP patients, respectively. TCGA provides methylation status as beta values for 148,068 whitelisted probes. Out of them, 6,364 probes consistently have missing values across the 646 samples, so after filtering them out we obtain 141,704 probes.

Plotting the beta-values distribution for the probes of each biopsies reveals their bimodal distribution, indicating that for each sample, the probes can either be unmethylated (beta value = 0) or methylated (beta value = 1).

Figure 13: Distribution of whitelisted beta values densities across the 646 kidney carcinoma biopsies.

5.4.2 Dimesionality Reduction and Dataset Exploration

5.4.2.1 Principal Component Analysis (PCA)

The next step in the dataset exploration is to perform the Principal Component Analysis.

Figure 14: PCA, exploration.

The first 10 Principal Components capture less than 50% of the variance in the Kidney cancers epigenomics dataset, with the first two components (PC1 and PC2) capturing a bit more than 20% of the variance. The screeplot, however, shows a more reasonable amount of variance capture in the top Principal Components, in contrasts to what seen using non-silent somatic mutations (see Section 5.3.2.1).

When we project the samples in the PC1 and PC2, we can see that the PC1 (~13% of observed variance) creates a gradient of samples that separates KICH, KIRC and KIRP. The second component PC2 (~10% of variance), instead, does not separates the samples based on their kidney carcinoma types.

Figure 15: PCA, biplot PC1 x PC2.

We can also investigate other dimensions Principal Components, to see if there is a component that manages to fully resolve the three cancer types. PC1xPC4 and PC1xPC5 indeed seem to separates better the three kidney carcinoma types.

Figure 16: PCA, pairs plots.

Let’s check the loadings for the top 5 Principal Components. These indicate which probes are the more responsible to explain the position of the samples along the components, and the direction of this separation.

Looking at the top 1% most variable probes, the following 11 probes are the top loadings for the first 5 Principal Components:

  • cg03830585
  • cg07037412
  • cg07159758
  • cg08135406
  • cg05098566
  • cg07635227
  • cg00868875
  • cg07124687
  • cg01916088
  • cg03893663
  • cg00204802
  • cg01233786
  • cg00806644

Let’s now check their beta values distributions of the five top genes identified with the PCA across the cancer types:

Figure 17: PCA, loadings.

cg03830585 is associated with the Inositol 1,4,5-Trisphosphate Receptor, Type 1 ITPR1. ITPR1 is a novel target of novel target of HIF2α and protects renal cancer cells against natural killer cells by inducing autophagy. cg03830585 appears hypermethylated in KIRP, but not in KICH (cold tumor) and KIRC (hot tumor, with immuno-suppressant and immuni-evasive profile). cg00868875 is linked to the Potassium Channel Tetramerization Domain Containing 1 KCTD1, which is a prognostic marker for KIRP. Finally, cg07037412 is linked to the Calcium Voltage-Gated Channel Subunit Alpha1 H CACNA1H. Lower methylation status in KIRC and KIRP is congruent with the observation that CACNA1H was specifically overexpressed relative to normal tissue samples in renal cancer, sarcoma and gastrointestinal stromal tumors.

Let’s now check the Pearson correlation with other clinical covariates.

Cancer type strongly correlates with PC1 (~13% of observed variance) and PC5 (~3% of observed variance), which is concordant with what we have observed in the PCA pairplots: the three kidney carcinoma types are well separated in these components. PC1 also correlates well with hystological grade, selected subtype, subtype miRNA and subtype mRNA. PC2 (~10% of observed variance) and PC3 (~6% of observed variance) strongly correlates with subtype integrative, but more interestingly with subtype DNAmeth, histological grade, tumor stage, follow up tumor status and vital status. Indeed, the PCA pairplots shows that PC2 and PC3 cluster and separate very well biopsies from KICH, KIRC and KIRP patients, despite these two components not correlating with cancer type. More interestingly, subtype DNAmeth co-correlates with residual disease, more advanced cancer and worse prognosis, suggesting that DNA methylations and epigenomics plays a major role in cancer evolution and disease outcome.

Figure 18: PCA, correlations between clinical covariates and Principal Components.

5.4.3 Differential Methylation Analysis

Let’s move forward now to the differential Methylation analysis. We will take two complementary approaches:

  • Probe-Wise Differential Methylation
  • Region-Wise Differential Methylation (DMR)

The Probe-Wise Differential Methylation has a higher granularity (down to the single probe) allowing for a more precise characterization of the methylation status across the kidney carcinomas, but with the shortcoming of a more complex interpretation of the results. Gene / Region-Wise Differential Methylation approach, on the other hand, summarize the methylation status at the gene or at the region level, making comparisons and interpretation easier, with the downside of losing the signal of the methylation status of each CpG island.

5.4.3.1 Probe-Wise Differential Methylation

For epigenomics, we use two arbitrary thresholds to retain genes that are significantly differentially methylated across the each comparison: an absolute logFold-Change (logFC) higher than 2, and an adjusted p-value lower than 0.05. This results in:

  • KIRC_vs_KICH, 5,672 differentially methylated probes, 3,867 hypermethylated and 1,805 hypomethylated
  • KIRP_vs_KICH, 8,024 differentially methylated probes, 6,403 hypermethylated and 1,621 hypomethylated
  • KIRC_vs_KIRP, 954 differentially methylated probes, 163 hypermethylated and 791 hypomethylated

As usual, we can visualize the logFC p-values relationships for all the probes with a volcano plot.

KICH appears to have the highest amount of hypomethylated probes, suggesting that many of their genes are expressed due to the hypomethylation of the corresponding CpG islands. On the other hand, KIRP seems to have the highest amount of hypermethylated probes of the three kidney carcinoma types, followed by KIRC.

Figure 19: Volcano plots of each contrats reporting the differentially methylated probes.

The UpSet plot shows that KICH has a strong signature of 3,193 hypomethylated probes that are instead hypermethylated in KIRC and KIRP, and 999 hypermethylated probes that are insted hypomethylated in KIRC and KIRP. This methylation pattern can be ascribed to the fact that KICH carcinomas in general maintain their epithelial differentiation status, which instead is more often lost in KIRC and KIRP carcinomas. KIRP appears as the kidney carcinomas with most hypermethylated probes when compared to KIRC and KICH.

Figure 20: UpSet reporting the genes differentially methylated probes in common across all contrasts.

Tables of differentially methylated probes.

Let’s see if any function is enriched in the genes of linked to the Differentially Methylated Probes by performing Gene Set Enrichment Analysis (GSEA). The first step is to identify the list of linked to the differentially methylated probes. Then, we will run standard GSEA on these list of differentially methylated genes.

5.4.3.1.1 GO Biological Process terms

Gene Ontology Biological Process are the larger processes or ‘biological programs’ accomplished by the concerted action of multiple molecular activities. Examples of broad BP terms are “DNA repair” or “signal transduction”. Examples of more specific terms are “cytosine biosynthetic process” or “D-glucose transmembrane transport”.

Figure 21: Dotplot of enriched GO Biological Process terms.

Figure 22: Term maps reporting relationships between GO Biological Process across the different contrasts.

Figure 23: gene maps reporting relationships between GO Biological Process enriched across the different contrasts.

Figure 24: GO cluster heatmaps.

The enriched GO BP terms in KIRC vs. KICH and KIRP vs. KICH suggest indeed a disregulated methylation on “embryonic organ morphogenesis”, “kidney epithelium development”, and “regulation of cell-cell adhesion”, supporting the idea that KICH biopsies mostly retained their epithelial differentiation that was instead lost in KIRC and in KIRP. These changes are also highlighted by the disregulation of “small GTPase mediated signal transduction” and the “Wnt signaling pathway”.

In addition to that, genes involved in the “regulation of macrophages chemotaxis” are differentially methylated in KIRC vs. KIRP, correlating nicely with the evidences of a high abundance of infiltrating Macrophages M0 and M2 in KIRP generated by deconvoluting the infiltrating immune cells based on bulk transcriptomics data (see Chapter 2).

The complete list of 1,881 GO BP terms enriched across the three contrasts is reported below.

5.4.3.1.2 GO Cellular Compartments terms

Gene Ontology Cellular Compartment serves to capture the cellular location where a molecular function takes place.

Figure 25: dotplot of enriched GO Cellular Compartments terms.

Figure 26: Term maps reporting relationships between GO Cellular Compartments across the different contrasts.

Figure 27: gene maps reporting relationships between GO Cellular Compartments enriched across the different contrasts.

GO CC enrichment analyses do not provides any further insights on the characterization of kidney carcinomas based on their methytlation profiles, except for a high disregulation of genes associated with “adherens junctions”, “focal adhesion” and “membrane raft”, ad well as potential transporters in the “apical plasma membrane” and “basal plasma membrane”.

The complete list of 245 GO CC terms enriched across the three contrasts is reported below.

5.4.3.1.3 GO Molecular Function terms

Gene Ontology Molecular Function serves to represent molecular-level activities performed by gene products.

Figure 28: dotplot of enriched GO Molecular Functions terms.

Figure 29: Term maps reporting relationships between GO Molecular Functions across the different contrasts.

Figure 30: gene maps reporting relationships between GO Molecular Functions enriched across the different contrasts.

Enrichment of GO MF terms confirm the methylation driven disregulation of multiple signaling pathways in KIRC and KIRP when compared to KICH, including “GTPase regulator activity” and “protein serine/threonine kinases activity”.

The complete list of 149 GO MF terms enriched across the three contrasts is reported below.

5.4.3.1.4 KEGG pathways

KEGG pathways are manually curated and capture our knowledge of the molecular interaction, reaction and relation networks.

Figure 31: dotplot of enriched KEGG pathways.

Figure 32: Term maps reporting relationships between KEGG pathways enriched across the different contrasts.

Figure 33: gene maps reporting relationships between KEGG pathways enriched across the different contrasts.

The enriched KEGG pathways revolved around the remodeling of the extracellular matrix (“Adherens junctions”, “Focal adhesion”) and a pro-inflammatory, hypoxic microenvironment in KIRC (“HIF-1 signaling pathway” and “Inflammatory mediator regulation of TRP channels”). Interestingly, KIRC also showed disregulation of the “Fructose and mannose metabolism” via differential methylation of GMDS, KHK, PFKFB2 and PFKFB4. PFKFB4 has been shown a non-canonical activity where it interacts with ICMT and activates RAS/AKT signaling-dependent cell migration in melanoma.

The complete list of 166 KEGG pathways across the three contrasts is reported below.

5.4.3.1.5 Reactome pathways

Similarly to KEGG, Reactome pathways are manually curated biological pathways.

Figure 34: dotplot of enriched Reactome pathways.

Figure 35: Term maps reporting relationships between Reactome pathways enriched across the different contrasts.

Figure 36: gene maps reporting relationships between Reactome pathways enriched across the different contrasts.

The enriched Reactome pathways provide us with an additional layer of information. KIRC and KIRP differentiate from KIRC due to differential methylation of phosphoinositol3-kinase associated secondary signal transduction pathways (“PI3K/AKT Signaling in Cancer”, “DAG and IP3 signaling”, “PLC beta mediated events”, “CaM pathway”, “CREB1 phosphorylation through NMDA receptor-mediated activation of RAS signaling”, “VEGFA-VEGFR2 Pathway”). In addition to that, we have as well “NRAGE signals death through JNK” and “RHO GTPase cycle” and “RAC1 GTPase cycle”. Integrin-based extracellular matrix remodelling is linked to KIRP via differentially methylation of the transcription factor RUNX2 and the cyclin CCND1, which is known to be overexpressed in KIRP.

The complete list of 160 Reactome pathways enriched across the three contrasts is reported below.

5.4.3.1.6 WikiPathways

Lastly, also WikiPathways provides another angle on a set of manually curated biological pathways.

Figure 37: dotplot of enriched Wikipathways pathways.

Figure 38: gene maps reporting relationships between WikiPathways enriched across the different contrasts.

Finally, the enriched WikiPathways confirmed the retention of eputhelial identity in KICH versus the pluripotency of KIRC and KIRP and their higher chance of epithelial-to-mesenchimal transition (“Ectoderm differentiation”, “Mesodermal commitment pathway”). Moreover, genes associated with “Macrophage-stimulating protein (MSP) signaling” were differentially methylated in KIRP when compared to KIRC and KICH.

The complete list of 122 WikiPathways pathways enriched across the three contrasts is reported below.

5.4.3.2 Region-wise Differential Methylation (DMR)

While the probe-wise analysis has the highest resolution in terms of detection of methylation at the single CpG island level, it suffers from several limitations. Methylation of a single CpG island could be hard to detect, and it is not very informative since gene promoters are regulated by the methylation status of multiple CpG islands simultaneously. Moreover, for the enrichment analysis, even if a single CpG island linked to a gene would be differentially methylated, that gene would appear in the GSEA, bloating the number of false positives and enriched pathways identified.

A common way to mitigate the single-probe analyses drawback is to look if the methylation status of several CpGs near to each other (or regions) are concordantly differentially methylated or not. To identify Differentially Methylated Regions, we gather the single-probes into predefined and manually annotated functional regions using the BioConductor package mCSEA, which contains three types of regions for 450K and EPIC arrays: promoter regions, gene body and CpG Islands. mCSEA algorithm is based on standard GSEA. Briefly, CpG sites are ranked according to a metric (logFC, t-statistic, …) and an enrichment score (ES) is calculated for each region. This is done by running through the entire ranked CpG list, increasing the score when a CpG in the region is encountered and decreasing the score when the gene encountered is not in the region. A high ES indicates these probes are found high up in the ranked list. In other words, a high (N)ES value means that for the CpG sites in this region there is - on average - a shift towards a higher methylation level. This approach has been shown to be more effective to detect smaller but consistent methylation differences.

When grouping the probes into functional regions, we identified the following:

  • KIRC vs. KICH: 106 DMRs promoter regions and 144 DMRs gene regions covering a total of 136 unique genes.
  • KIRP vs. KICH: 69 DMRs promoter regions and 114 DMRs gene regions covering a total of 175 unique genes.
  • KIRC vs. KIRP: 175 DMRs promoter regions and 89 DMRs gene regions covering a total of 256 unique genes.

The UpSet plot shows that KIRC has a strong signature of 153 DMRs that are differentially methylated when compared to KIRP or KICH. Moreover, KICH shows a 98 DMRs signature when compared to KIRC or KIRP.

Figure 39: UpSet reporting the genes Differentially Methylated Regions in common across all contrasts.

The next step is to run the GSEA analyses to gain insights on the different enriched pathways affected by the different methylation status of genes across the three kidney carcinomas.

5.4.3.2.1 GO Biological Process terms

Figure 40: dotplot of enriched GO Biological Process terms.

Figure 41: Term maps reporting relationships between GO Biological Process across the different contrasts.

Figure 42: gene maps reporting relationships between GO Biological Process enriched across the different contrasts.

Figure 43: GO cluster heatmaps.

One significant difference between KICH when compared to KIRC and KIRP is the differential methylation of 23 cadherins and protocadherin alpha and gamma genes PCDHA, PCDHGB, involved in “cell-cell adhesion via plasma membrane adhesion molecules”. Also, the methylation status of several homeobox genes (“embryonic organ development”, “pattern specification process”) seems significantly different in KICH when compared to KIRC and KIRP, supporting the idea that KICH maintains its epithelial differentiation status.

The complete list of 456 GO BP terms enriched across the three contrasts is reported below.

5.4.3.2.2 GO Cellular Compartments terms

Figure 44: dotplot of enriched GO Cellular Compartments terms.

Figure 45: Term maps reporting relationships between GO Cellular Compartments across the different contrasts.

Figure 46: gene maps reporting relationships between GO Cellular Compartments enriched across the different contrasts.

Enriched GO CC terms indicated a disregulated “collagen containing extracellular matrix” between KIRC and KIRP, driven by genes such as brevican (BCAN) and different laminin and collagen subunit genes.

The complete list of 5 GO CC terms enriched across the three contrasts is reported below.

5.4.3.2.3 GO Molecular Function terms

Figure 47: dotplot of enriched GO Molecular Functions terms.

Figure 48: Term maps reporting relationships between GO Molecular Functions across the different contrasts.

Figure 49: gene maps reporting relationships between GO Molecular Functions enriched across the different contrasts.

Enriched GO MF terms fall into three major clusters. The first one discriminating KIRC from KICH and KIRP, with function “DNA-binding transcription repressor activity, RNA polymerase II-specific”, driven by the differential methylation of genes such as:

These genes are all transcription factors (except for c-SKI), and support the idea of a higher aggressiveness of KIRC when compared to KIRP and KICH by disregulation of major signalling pathways.

The second cluster of GO terms discriminates between KICH and KIRC and KIRP, with function “DNA-binding transcription activator activity, RNA polymerase II-specific”, and it is driven by the differential methylation of genes such as:

Most of these are Homeobox genes, supporting the idea of a retention of epithelial differentiation in KICH biopsies and of a less aggressive kidney carcinoma type.

Finally, the third cluster discriminates between KIRC and KICH, with function “histone H3K9 methyltransferase activity”, and it is driven by the differential methylation of genes such as:

The complete list of 14 GO MF terms enriched across the three contrasts is reported below.

5.4.3.2.4 KEGG pathways

Figure 50: dotplot of enriched KEGG pathways.

Figure 51: gene maps reporting relationships between KEGG pathways enriched across the different contrasts.

The Enriched KEGG pathways are associated with differential methylation in KIRC when compared to KICH and KIRP. The pathways affected are “Focal adhesion” and “PI3K-Akt signaling pathway”.

The complete list of 17 KEGG pathways across the three contrasts is reported below.

5.4.3.2.5 Reactome pathways

Figure 52: dotplot of enriched Reactome pathways.

Figure 53: gene maps reporting relationships between Reactome pathways enriched across the different contrasts.

The enriched Reactome pathways revolve around differential methylation of KIRP when compared to KIRC and KICH. The pathway “MET-actiated PTK2 signaling” is differentially methylated, with different Laminin genes involved (LAMA4, LAMB2, LAMC2). This finding is consistent with the observation that ~10% of KIRP patients have non-silent somatic mutations in MET (see Section 5.3.1).

The complete list of 16 Reactome pathways enriched across the three contrasts is reported below.

5.4.3.2.6 WikiPathways

Figure 54: dotplot of enriched Wikipathways pathways.

Figure 55: gene maps reporting relationships between WikiPathways enriched across the different contrasts.

The enriched Wikipathways confirmed the general theme of epithelial differentiation retention in KICH (“Ectoderm Differentiation”, “GDN/RE7 signaling axis”, “Mesodermal committment pathway”), and stronger Epithelial-to-Mesenchimal Transition in KIRC, driven by differential methylation of genes such as CDH8, GATA3, JARID2, MEIS1, MIR125B1, SMAD2, WT1

The complete list of 9 WikiPathways pathways enriched across the three contrasts is reported below.

5.5 Lessons Learnt

Based on genomics and epigenomics data, we have learnt:

  • Copy Number Variants:
    • KICH has significantly more genome instability and CNV events than KIRC and KIRP
    • whole genome duplication seems more likely to occur than loss of chromosomes
    • there is a link between CNVs signatures and cancer advancement, which is a common feature across all kidney carcinomas and not limited to KIRC, KIRP or KICH.
  • Somatic Mutations:
    • KIRC is the kidney carcinoma with the highest amount of non-silent somatic mutations.
    • KICH:
      • ~30% of the patients (21 / 66) had a non-silent mutation in TP53.
      • 9% of patients (6 / 66) having a mutation on PTEN.
    • KIRC:
      • 47% of patients (170 / 365) with mutation in VHL
      • ~40% of patients (145 / 365) with mutation on PBRM1
      • ~20% of patients (72 / 365) with non-silent somatic mutations on TTN
    • KIRP:
      • 16% of KIRP patients (45 / 280) has non-silent somatic mutations on TTN
      • 9% of KIRP patients (25 / 280) with mutation on MUC16
      • 8% of KIRP patients (22 / 280) with mutation on MET
  • Epigenomics:
    • Exploratory analysis suggests:
      • Principal Component Analysis can discriminates between KICH, KIRC and KIRP.
      • subtype DNAmeth co-correlates with residual disease, more advanced cancer and worse prognosis, suggesting that DNA methylations and epigenomics plays a major role in cancer evolution and disease outcome.
    • Differential Gene Expression analysis:
      • strong methylation signature that discriminates KICH from KIRC and KIRP (> 4,000 probes).
      • GSEA on differentially methylated probes supports the idea that:
        • KICH biopsies mostly retained their epithelial differentiation that was instead lost in KIRC and in KIRP.
        • Higher abundance of infiltrating Macrophages M0 and M2 in KIRP.
        • A pro-inflammatory, hypoxic microenvironment in KIRC.
      • GSEA on Differentially Methylated function Regions support the idea that:
        • General theme of epithelial differentiation retention in KICH.
        • A stronger Epithelial-to-Mesenchimal Transition in KIRC, with a signature of extra-cellular matrix remodelling and cell migration.
        • MET pathway is differentially methylated in KIRP, which is consistent with the higher chance of MET non-silent somatic mutation in this kind of kidney carcinoma.

5.6 Session Information

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 24.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8    LC_NUMERIC=C            LC_TIME=C              
 [4] LC_COLLATE=en_US.UTF-8  LC_MONETARY=C           LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C           
[10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C    

time zone: Europe/Brussels
tzcode source: system (glibc)

attached base packages:
 [1] parallel  grid      stats4    stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] UpSetR_1.4.0                                       
 [2] umap_0.2.10.0                                      
 [3] stringr_1.5.1                                      
 [4] scales_1.4.0                                       
 [5] RColorBrewer_1.1-3                                 
 [6] qs_0.27.3                                          
 [7] PCAtools_2.14.0                                    
 [8] missMethyl_1.36.0                                  
 [9] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[10] mCSEA_1.22.0                                       
[11] Homo.sapiens_1.3.1                                 
[12] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            
[13] GO.db_3.18.0                                       
[14] OrganismDbi_1.44.0                                 
[15] GenomicFeatures_1.54.4                             
[16] mCSEAdata_1.22.0                                   
[17] org.Hs.eg.db_3.18.0                                
[18] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
[19] minfi_1.48.0                                       
[20] bumphunter_1.44.0                                  
[21] locfit_1.5-9.12                                    
[22] iterators_1.0.14                                   
[23] foreach_1.5.2                                      
[24] Biostrings_2.70.3                                  
[25] XVector_0.42.0                                     
[26] SummarizedExperiment_1.32.0                        
[27] MatrixGenerics_1.14.0                              
[28] matrixStats_1.5.0                                  
[29] Gviz_1.46.1                                        
[30] gridExtra_2.3                                      
[31] ggpubr_0.6.1                                       
[32] GenomicRanges_1.54.1                               
[33] GenomeInfoDb_1.38.8                                
[34] forcats_1.0.0                                      
[35] EnhancedVolcano_1.20.0                             
[36] ggrepel_0.9.6                                      
[37] ggplot2_3.5.2                                      
[38] edgeR_4.0.16                                       
[39] limma_3.58.1                                       
[40] DT_0.33                                            
[41] dplyr_1.1.4                                        
[42] DOSE_3.28.2                                        
[43] DMRcate_2.16.1                                     
[44] data.table_1.17.8                                  
[45] cowplot_1.2.0                                      
[46] clusterProfiler_4.10.1                             
[47] circlize_0.4.16                                    
[48] BiocSingular_1.18.0                                
[49] BiocParallel_1.36.0                                
[50] AnnotationDbi_1.64.1                               
[51] IRanges_2.36.0                                     
[52] S4Vectors_0.40.2                                   
[53] Biobase_2.62.0                                     
[54] BiocGenerics_0.48.1                                

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2             dichromat_2.0-0.1            
  [3] progress_1.2.3                nnet_7.3-19                  
  [5] HDF5Array_1.30.1              vctrs_0.6.5                  
  [7] RApiSerialize_0.1.4           digest_0.6.37                
  [9] png_0.1-8                     shape_1.4.6.1                
 [11] deldir_2.0-4                  permute_0.9-8                
 [13] magick_2.8.7                  MASS_7.3-60.0.1              
 [15] reshape_0.8.10                reshape2_1.4.4               
 [17] httpuv_1.6.16                 qvalue_2.34.0                
 [19] withr_3.0.2                   xfun_0.52                    
 [21] ggfun_0.2.0                   survival_3.8-3               
 [23] doRNG_1.8.6.2                 proxyC_0.5.2                 
 [25] memoise_2.0.1                 gson_0.1.0                   
 [27] simplifyEnrichment_1.12.0     tidytree_0.4.6               
 [29] GlobalOptions_0.1.2           gtools_3.9.5                 
 [31] R.oo_1.27.1                   Formula_1.2-5                
 [33] prettyunits_1.2.0             KEGGREST_1.42.0              
 [35] promises_1.3.3                httr_1.4.7                   
 [37] rstatix_0.7.2                 restfulr_0.0.16              
 [39] rhdf5filters_1.14.1           stringfish_0.17.0            
 [41] rhdf5_2.46.1                  rstudioapi_0.17.1            
 [43] generics_0.1.4                reactome.db_1.86.2           
 [45] base64enc_0.1-3               curl_6.4.0                   
 [47] zlibbioc_1.48.2               ScaledMatrix_1.10.0          
 [49] ggraph_2.2.1                  polyclip_1.10-7              
 [51] GenomeInfoDbData_1.2.11       quadprog_1.5-8               
 [53] ExperimentHub_2.10.0          SparseArray_1.2.4            
 [55] RBGL_1.78.0                   interactiveDisplayBase_1.40.0
 [57] doParallel_1.0.17             xtable_1.8-4                 
 [59] evaluate_1.0.4                S4Arrays_1.2.1               
 [61] BiocFileCache_2.10.2          preprocessCore_1.64.0        
 [63] hms_1.1.3                     irlba_2.3.5.1                
 [65] colorspace_2.1-1              filelock_1.0.3               
 [67] NLP_0.3-2                     reticulate_1.43.0            
 [69] magrittr_2.0.3                readr_2.1.5                  
 [71] later_1.4.2                   viridis_0.6.5                
 [73] ggtree_3.10.1                 lattice_0.22-5               
 [75] genefilter_1.84.0             XML_3.99-0.18                
 [77] shadowtext_0.1.5              Hmisc_5.2-3                  
 [79] pillar_1.11.0                 nlme_3.1-164                 
 [81] compiler_4.3.3                beachmat_2.18.1              
 [83] RSpectra_0.16-2               stringi_1.8.7                
 [85] GenomicAlignments_1.38.2      plyr_1.8.9                   
 [87] crayon_1.5.3                  abind_1.4-8                  
 [89] BiocIO_1.12.0                 gridGraphics_0.5-1           
 [91] graphlayouts_1.2.2            bit_4.6.0                    
 [93] fastmatch_1.1-6               codetools_0.2-19             
 [95] openssl_2.3.3                 crosstalk_1.2.1              
 [97] slam_0.1-55                   bslib_0.9.0                  
 [99] GetoptLong_1.0.5              biovizBase_1.50.0            
[101] tm_0.7-16                     multtest_2.58.0              
[103] mime_0.13                     splines_4.3.3                
[105] Rcpp_1.1.0                    dbplyr_2.5.0                 
[107] sparseMatrixStats_1.14.0      HDO.db_0.99.1                
[109] interp_1.1-6                  knitr_1.50                   
[111] blob_1.2.4                    clue_0.3-66                  
[113] BiocVersion_3.18.1            AnnotationFilter_1.26.0      
[115] fs_1.6.6                      checkmate_2.3.2              
[117] DelayedMatrixStats_1.24.0     ggsignif_0.6.4               
[119] ggplotify_0.1.2               tibble_3.3.0                 
[121] Matrix_1.6-5                  statmod_1.5.0                
[123] tzdb_0.5.0                    tweenr_2.0.3                 
[125] pkgconfig_2.0.3               tools_4.3.3                  
[127] cachem_1.1.0                  RSQLite_2.4.2                
[129] viridisLite_0.4.2             DBI_1.2.3                    
[131] graphite_1.48.0               fastmap_1.2.0                
[133] rmarkdown_2.29                Rsamtools_2.18.0             
[135] sass_0.4.10                   broom_1.0.9                  
[137] AnnotationHub_3.10.1          patchwork_1.3.1              
[139] BiocManager_1.30.26           VariantAnnotation_1.48.1     
[141] graph_1.80.0                  carData_3.0-5                
[143] scrime_1.3.5                  rpart_4.1.23                 
[145] farver_2.1.2                  tidygraph_1.3.1              
[147] scatterpie_0.2.5              yaml_2.3.10                  
[149] latticeExtra_0.6-30           foreign_0.8-86               
[151] rtracklayer_1.62.0            illuminaio_0.44.0            
[153] cli_3.6.5                     purrr_1.1.0                  
[155] siggenes_1.76.0               GEOquery_2.70.0              
[157] lifecycle_1.0.4               askpass_1.2.1                
[159] backports_1.5.0               annotate_1.80.0              
[161] gtable_0.3.6                  rjson_0.2.23                 
[163] ape_5.8-1                     jsonlite_2.0.0               
[165] bitops_1.0-9                  minfiData_0.48.0             
[167] bit64_4.6.0-1                 yulab.utils_0.2.0            
[169] base64_2.0.2                  ReactomePA_1.46.0            
[171] RcppParallel_5.1.10           jquerylib_0.1.4              
[173] GOSemSim_2.28.1               dqrng_0.4.1                  
[175] R.utils_2.13.0                lazyeval_0.2.2               
[177] shiny_1.11.1                  htmltools_0.5.8.1            
[179] enrichplot_1.22.0             rappdirs_0.3.3               
[181] ensembldb_2.26.0              glue_1.8.0                   
[183] RCurl_1.98-1.17               treeio_1.26.0                
[185] mclust_6.1.1                  BSgenome_1.70.2              
[187] jpeg_0.1-11                   igraph_2.1.4                 
[189] R6_2.6.1                      tidyr_1.3.1                  
[191] labeling_0.4.3                cluster_2.1.6                
[193] rngtools_1.5.2                Rhdf5lib_1.24.2              
[195] beanplot_1.3.1                aplot_0.2.8                  
[197] bsseq_1.38.0                  DelayedArray_0.28.0          
[199] tidyselect_1.2.1              ProtGenerics_1.34.0          
[201] htmlTable_2.4.3               ggforce_0.5.0                
[203] xml2_1.3.8                    car_3.1-3                    
[205] rsvd_1.0.5                    S7_0.2.0                     
[207] nor1mix_1.3-3                 ComplexHeatmap_2.18.0        
[209] htmlwidgets_1.6.4             fgsea_1.35.6                 
[211] biomaRt_2.58.2                rlang_1.1.6                  
[213] ggnewscale_0.5.2