Skip to content

How to manipulate an existing clustering

Elizabeth Purdom edited this page Aug 22, 2016 · 1 revision

In what follows, a 'clustering' is an assignment of samples to clusters.

Changing properties of the clusters

If you just want to change the labels or colors for a particular cluster (as oppose to changing the clustering, i.e. the cluster assignments), you do not have to make a new clustering. You can just change the value in the clusterLegend slot of the class using the clusterLegend function.

In the following I create a small toy clusterExperiment object

library(clusterExperiment)
mat <- matrix(data=rnorm(20*15), ncol=15)
mat[1,1]<- -1 #force a negative value
colnames(mat)<-paste("Sample",1:ncol(mat))
rownames(mat)<-paste("Gene",1:nrow(mat))
numLabels <- as.character(gl(5, 3))
numLabels[c(1:2)]<- c("-1","-2") #make sure some not assigned
numLabels<-factor(numLabels)
labMat<-cbind(as.numeric(as.character(numLabels)),as.numeric(as.character(numLabels)))
colnames(labMat)<-c("Cluster1","Cluster2")
cc <- clusterExperiment(mat, labMat, transformation = function(x){x})

I can look at the clusterLegend and see what the labels currently are, as well as the internal index for each cluster, and the default color assigned to the cluster:

clusterLegend(cc)

Here I change the names of the '-1' and '-2' clusters

clusterLegend(cc)[[2]][1:2,"name"]<-c("unassigned","missing")
clusterLegend(cc)

Changing the Actual clustering

If I want to instead monkey around with the cluster assignments of the second clustering in my clusterExperiment object and save it as a new clustering,

x<-clusterMatrix(cc)[,2]
x[3:4]<-c(8,8)
cc<-addClusters(cc,x)
cc
clusterLabels(cc)
clusterTypes(cc)
clusterLegend(cc)

Notice that the internal indexing of the clusters does not match the numbers of x. This is because they are required to be consecutive integers in clusterExperiment. But the cluster ids in x are given as the cluster names. In this case '8' got assigned to the internal index '1' and otherwise the internal index matches closely x. But it could be more extensive rearrangement than that depending on what the algorithm does to get consecutive integers. This is why it doesn't make sense to make a new clustering merely to change the names or colors (unless you really want to have two versions of the same clustering hanging around because you are switching back and forth frequently).

Then I can go and fix things up that cluster that will help for plotting and keeping track of this cluster.

clusterLabels(cc)[3]<-"My new cluster"
clusterLegend(cc)[[3]][3,"name"]<-"my new cluster"
clusterLegend(cc)[[3]][3,"color"]<-"red"