Determining the Optimal Number of Clusters in Stream data using Real Time Fuzzy c means .
Clustering is the challenge in data mining, especially when we are dealing with massive continuous data such as stream data, it requires both correct clustering of data and accurate determination of the number of clusters. To find the optimal number of cluster for stream data has obtaining a sensible result in clustering analysis. However, there is no any generalization of pre-knowledge about the appropriate number of the clusters in many clustering algorithms such as Kmeans, k-means++ and fuzzy c-means by which we check whether the result of clustering is either appropriate or not. We propose a novel algorithm, real time fuzzy c-means for stream data to determine the optimal number of clusters and optimize it by genetic algorithm to find optimal solution. The most appropriate number of clusters is determined by using cluster validation indices in the available fuzzy clustering literature. We analyze the complexity of the data by using combination of compactness and separation to validate the clusters. We compare various validation indices and our results get more closer to optimal for both small as well as large data.