1  Data Exploration - Kidney Cancer

1.1 On this page

Biological insights and take-home messages are at the bottom of the page at Lesson Learnt: Section 1.4.

  • Here we focus on samples from Kidney Cancers patients and their overall relationships.
  • We then explore the association between each Kidney Cancer subtype and the major clinical covariates (e.g.: sex, disease status, cancer stage, clinical outcome).

1.2 Selection of Kidney samples

For the next sections, we will focus on kidney cancer. The choice is simply dictated by the abundance of samples which provide us a large playground to test different strategies for biomarker discovery, higher statistical power, and the existance of some Molecular Subtypes gene signatures that we can use as a benchmark and we can try to improve upon.

Let’s select and explore the samples available for Kidney tissue.

1.2.1 General Overview of the Dataset

Since we want to focus on discovering new biomarkers for the three kidney cancer types, we will remove the biopsys from normal solid tissue. Then, we test if there is a strong imbalance between the sex and age of the subjects in the three kidney cancer cohorts, and wheter we should balance the cohorts or not.

1.2.2 Cohort filtering

First, we retain only biopsys from “Primary Tumor”, removing “Solid Tissue Normal” and “Additional - New Primary” neoplasis. Then we look at the distribution of age, sex and ethnicity in the three different cancer cohorts.

Figure 1: Kidney Cancer: demographic overview.

We can immediately see that the sampling bias towards patients with white background observed in all the cancer cohorts in Section 3.1 is reflected as well in kidney cancer cohorts. In terms of gender demographics, males represents from 60% (KICH) to 74% (KIRP) of the cohorts. The histograms split by cancer types, however, show that the age distributions between the sexes int he different kidney cancer types are similar.

Since we may loose too many samples if we would generate balanced cohorts in terms of sex and race, we can keep the samples as is, and take Age, Sex and Race in account during the modeling phases downstream.

1.3 Contingency tables

As we saw previously TCGA provides rich clinical metadata for each patient. We can run some simple demographic statistics (chi-squared and Fisher exact tests) to see if any clinical covariate is significantly overrepresented in any of the three cancer types.

We can test if there is a significant difference of each observed clinical covariate across the three kidney cancer types. The null hypothesis here is that the distribution of each covariate in the three cancer subtypes is the same.

1.3.1 Sex

There are 596 males and 291 females in the three kidney cancer cohorts after filtering. Despite the imbalance, the difference across cancer types is not significant.

Figure 2: Contigency table, Sex.

1.3.2 Disease status

After filtering, the three kidney cancer types contains 563 subject that are now tumor free, and 190 subject that still have tumor. It seems that KIRP have significantly more patients that are now tumor free, while there are still significantly more patients with tumor for KIRC. The small sample size of KICH patients may skew these results, so this outcome should be interpreted with a pinch of salt.

Figure 3: Contigency table, Disease Status.

1.3.3 Tumor Stages

The tumors across the patients were scored as: 458 Stage I, 105 Stage II, 189 Stage III and 105 Stage IV tumors.

This staging system is less granular than the TNM Staging System (that we look at into below), and it describe the tumro as follow:

  • Stage 0 = Abnormal cells are present but have not spread to nearby tissue. Also called carcinoma in situ, or CIS. CIS is not cancer, but it may become cancer.
  • Stage I, Stage II, and Stage III = Cancer is present. The higher the number, the larger the cancer tumor and the more it has spread into nearby tissues.
  • Stage IV = The cancer has spread to distant parts of the body.

The higher the number, the more advanced the cancer is.

KIRC patients seem to have a higher number of Stage IV cancer, while KIRP patients have more STage I cancers. This can either be bias in the sampling of the cohort, or it could really indicate that KIRC cancer is more aggressive and fast progressing than KIRP.

Figure 4: Contigency table, Tumor Stage.

1.3.4 Tumor Histology

Histological evaluation of the tumor cells essential to allow for accurate prognostication and tailored therapy, both surgical and adjuvant.

Histological grade have five categories:

  • G1 = Well differentiated
  • G2 = Moderately differentiated
  • G3 = Poorly differentiated, Undifferentiated, Anaplastic
  • G4 = Undifferentiated (high grade)
  • GX = Grade cannot be assessed

Unfortunately, within the TCGA kidney cohorts, only KIRC biopsys where graded histologically. No histological grading was available for KICH and KIRP tumors.

For the KIRC tumors, 14 biopsys had G1 grade, 229 G2 grade, 204 had G3 grade, 76 G4 and 5 biopsys could not be graded (GX).

Figure 5: Contigency table, Tumor Histology.

1.3.5 TMN, T: Main Tumor Pathology

Let’s look now into the TMN staging system. We have multiple staging and substaging that describes the tumor presence, dimension and invasion of adjacent tissues:

  • TX = Primary tumor cannot be assessed.
  • T0 = No evidence of primary tumor.
  • T1 = Tumor ≤2 cm in greatest dimension.
  • T2 = Tumor >2 cm but ≤4 cm.
  • T3 = Tumor >4 cm or with subserosal invasion or involvement of the mesoappendix.
  • T4 = Tumor perforates the peritoneum or directly invades other adjacent organs or structures (excluding direct mural extension to adjacent subserosa of adjacent bowel), e.g., abdominal wall and skeletal muscle.

The presence of 12 distinct substaging for TMN, and the relative small sample size across the cohort makes the outcome of this comparison difficult to judge. We could claim that KIRP seems to be enriched in lower tumor stages (T1a) and depleted in higher tumor stages (T2, T3a, T3b) than KIRC.

Figure 6: Contigency table, TMN Main Tumor.

1.3.6 TMN, N: Lymph Nodes metastatis

The TMN tumor staging system provides as well a score for the lymph nodes:

  • NX = Regional lymph nodes cannot be assessed.
  • N0 = No regional lymph node metastasis.
  • N1 = Regional lymph node metastasis <12 nodes.
  • N2 = Large mesenteric masses (>2 cm) and/or extensive nodal deposits (≥12), especially those that encase the superior mesenteric vessels.

In the kidney cohort, lymph nodes of 506 patiens could not be scored (NX), 328 patients had a N0 score, 43 N1 and 7 had a N2 score. Lymph nodes of most patients with KIRP tumors could not be scored, they they were depleted in No and enriched in N1 if compared with patients with KIRC tumor.

Figure 7: Contigency table, TMN Lymph nodes.

1.3.7 Prognosis Outcome

Of the 887 petients with kidney tumor, 659 were alive at the time of the last check up, while 228 died. It appears that KIRC patients (48.7% dead) have worse prognosis than KIRP patients (17.9% dead), as observed in the previous comparisons. However, before drawing a formal conclusion, we should validate this observation at least in another independent cohort.

Figure 8: Contigency table, Vital Status.

1.4 Lessons Learnt

So far, we have learnt:

  • Contingency tables reports on large clinical trends between the three cancer types:
    • KIRC patients seems to have a higher chance to be tumor free than KIRP patients.
    • Despite that, KIRC patients seems to be enriched for Stage IV tumors and depleted in Stage I tumors when compared to KIRP patients.
    • Likewise, KIRP patients have more T1a tumor stages and less T2, T3a and T3b tumours when compared to KIRC.
    • Unfortunately, KIRC patients seems to had a worse outcome than KIRP patients.
  • The clinical outcome for KIRC patients seems to be more polarized: or the patient were cancer free at the end of the study, or the patient had more advanced tumour or a worse outcome when compared to KIRP patients. This overview suggest the possibility that if we could develop biomarkers for early detection of KIRC cancers we could improve the clinical outcome for these patients.

1.5 Session Information

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 24.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8    LC_NUMERIC=C            LC_TIME=C              
 [4] LC_COLLATE=en_US.UTF-8  LC_MONETARY=C           LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C           
[10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C    

time zone: Europe/Brussels
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] vcd_1.4-13         umap_0.2.10.0      stringr_1.5.1      scales_1.4.0      
 [5] RColorBrewer_1.1-3 matrixStats_1.5.0  gridExtra_2.3      ggplot2_3.5.2     
 [9] forcats_1.0.0      edgeR_4.0.16       limma_3.58.1       DT_0.33           
[13] dplyr_1.1.4        data.table_1.17.8  cowplot_1.2.0     

loaded via a namespace (and not attached):
 [1] sass_0.4.10         tidyr_1.3.1         generics_0.1.4     
 [4] stringi_1.8.7       lattice_0.22-5      digest_0.6.37      
 [7] magrittr_2.0.3      evaluate_1.0.4      fastmap_1.2.0      
[10] jsonlite_2.0.0      Matrix_1.6-5        RSpectra_0.16-2    
[13] purrr_1.1.0         crosstalk_1.2.1     jquerylib_0.1.4    
[16] cli_3.6.5           rlang_1.1.6         cachem_1.1.0       
[19] withr_3.0.2         yaml_2.3.10         tools_4.3.3        
[22] colorspace_2.1-1    locfit_1.5-9.12     BiocGenerics_0.48.1
[25] reticulate_1.43.0   vctrs_0.6.5         R6_2.6.1           
[28] png_0.1-8           zoo_1.8-14          lifecycle_1.0.4    
[31] htmlwidgets_1.6.4   MASS_7.3-60.0.1     pkgconfig_2.0.3    
[34] bslib_0.9.0         pillar_1.11.0       gtable_0.3.6       
[37] glue_1.8.0          Rcpp_1.1.0          statmod_1.5.0      
[40] lmtest_0.9-40       xfun_0.52           tibble_3.3.0       
[43] tidyselect_1.2.1    knitr_1.50          dichromat_2.0-0.1  
[46] farver_2.1.2        htmltools_0.5.8.1   labeling_0.4.3     
[49] rmarkdown_2.29      compiler_4.3.3      askpass_1.2.1      
[52] openssl_2.3.3