Cluster analysis in R: determine the optimal number of clusters
Being a newbie in R, I'm not very sure how to choose the best number of clusters to do a k-means analysis. After plotting a subset of below data, how many clusters will be appropriate? How can I perform cluster dendro analysis?
n = 1000
kk = 10
x1 = runif(kk)
y1 = runif(kk)
z1 = runif(kk)
x4 = sample(x1,length(x1))
y4 = sample(y1,length(y1))
randObs <- function()
{
ix = sample( 1:length(x4), 1 )
iy = sample( 1:length(y4), 1 )
rx = rnorm( 1, x4[ix], runif(1)/8 )
ry = rnorm( 1, y4[ix], runif(1)/8 )
return( c(rx,ry) )
}
x = c()
y = c()
for ( k in 1:n )
{
rPair = randObs()
x = c( x, rPair[1] )
y = c( y, rPair[2] )
}
z <- rnorm(n)
d <- data.frame( x, y, z )