Gene mutation data for Cancer Genomics
The cBioPortal for Cancer Genomics is a resource for interactive exploration of multidimensional cancer genomics data sets. The portal supports and stores non-synonymous mutations, DNA copy-number data, mRNA and microRNA expression data, protein-level and phosphoprotein level data (RPPA or mass spectrometry based), DNA methylation data, and de-identified clinical data.
There are multiple ways to access the API using R.
One of the recommended R packages to access cBioPortal data is cBioPortalData package.
Bioconductor.cBioPortalData.package
The cBioPortalData
R package accesses cancer datasets from the cBio
Cancer Genomics Portal. The package provides cBioPortal datasets as
MultiAssayExperiment
objects in Bioconductor.
Thanks to waldronlab/cBioPortalData
According to Bioconductor.MultiAssayExperiment, harmonized and managed data of multiple experimental assays performed on an overlapping set of specimens by MultiAssayExperiment.
To install this package in R (version >= "4.3.0"), BiocManager
package should be used:
BiocManager::install("cBioPortalData")
loading package
library(cBioPortalData)
getting the information of cBioPortal API :
a list of all api datasets of studies that are available and currently building as MultiAssayExperiment representations.
cbio <- cBioPortal()
service: cBioPortal
tags(); use cbioportal$<tab completion>:
# A tibble: 65 x 3
tag operation summary
<chr> <chr> <chr>
1 Cancer Types getAllCancerTypesUsingGET Get al~
2 Cancer Types getCancerTypeUsingGET Get a ~
3 Clinical Attributes fetchClinicalAttributesUsingPOST Fetch ~
4 Clinical Attributes getAllClinicalAttributesInStudyUsin~ Get al~
5 Clinical Attributes getAllClinicalAttributesUsingGET Get al~
6 Clinical Attributes getClinicalAttributeInStudyUsingGET Get sp~
7 Clinical Data fetchAllClinicalDataInStudyUsingPOST Fetch ~
8 Clinical Data fetchClinicalDataUsingPOST Fetch ~
9 Clinical Data getAllClinicalDataInStudyUsingGET Get al~
10 Clinical Data getAllClinicalDataOfPatientInStudyU~ Get al~
# i 55 more rows
# i Use `print(n = ...)` to see more rows
tag values:
Cancer Types, Clinical Attributes, Clinical Data, Copy
Number Segments, Discrete Copy Number Alterations, Gene
Panel Data, Gene Panels, Generic Assay Data, Generic
Assays, Genes, Info, Molecular Data, Molecular Profiles,
Mutations, Patients, Sample Lists, Samples, Server
running status, Studies, Treatments
schemas():
AlleleSpecificCopyNumber, AlterationFilter,
AndedPatientTreatmentFilters,
AndedSampleTreatmentFilters, CancerStudy
# ... with 58 more elements
releasing the studies available in cbio and making a matrix of full information about all api studies including study ID :
permission=TRUE
is represented.
study <- getStudies(cbio)
-
Choosing a particular cancer study with TCGA studyID (GDC portal). This function will provide sample lists of the study selected based on cbio in MultiAssayExperiment using
sampleLists
function based on TCGA study id. ( SKCM-TCGA study is an example here). -
SampleListid column will be added to the table with study id and description.
sample <- sampleLists(studyId = "skcm_tcga",cbio)
colnames(sample)
[1] "category" "name" "description" "sampleListId"
[5] "studyId"
table(sample$category)
# all_cases_in_study
# 1
# all_cases_with_cna_data
# 1
# all_cases_with_methylation_data
# 2
# all_cases_with_mrna_rnaseq_data
# 1
# all_cases_with_mutation_and_cna_and_mrna_data
# 1
# all_cases_with_mutation_and_cna_data
# 1
# all_cases_with_mutation_data
# 1
# all_cases_with_rppa_data
It allows users to download sections of the data with molecular profile and gene panel combinations within a study.
SKCM <- cBioPortalData(api = cbio, studyId = "skcm_tcga",by ="hugoGeneSymbol",
molecularProfileIds = c("skcm_tcga_mutations"),
sampleListId = "skcm_tcga_3way_complete",
genePanelId = "IMPACT341")
SKCM
#> A MultiAssayExperiment object of 2 listed
#> experiments with user-defined names and respective classes.
#> Containing an ExperimentList class object of length 2:
#> [1] skcm_tcga_mutations: RangedSummarizedExperiment with 3798 rows and 283 columns
#> [2] skcm_tcga_rna_seq_v2_mrna: SummarizedExperiment with 341 rows and 287 columns
#> Functionality:
#> experiments() - obtain the ExperimentList instance
#> colData() - the primary/phenotype DataFrame
#> sampleMap() - the sample coordination DataFrame
#> `$`, `[`, `[[` - extract colData columns, subset, or experiment
#> *Format() - convert into a long or wide DataFrame
#> assays() - convert ExperimentList to a SimpleList of matrices
#> exportClass() - save data to flat files