A baseline method for clusterings.
We are going to use this 2 data from this project:
- User should determine the critical distance (manually)
dc = <number>
- Using Euclidean distance calculate matrix distance. Note, you can use another distance function like Manhattan distance.
for i in range(N):
for j in range(i+1, N):
M[j, i] = M[i, j] = (sum((x[i, :]-x[j, :])**2))**0.5
- Find 'embryo' -> the most closest dots
- Find indexes - the minimum distance among all
- 3 lists:
- list 1 - long added points and their neighbors
- list 2 - fresh points added in the last step. It needs find their neighbors
- list 3 - this points is a new neighbors from list 2
Now, how it works with lists:
-
1st step:
- list 1 is empty
- list 2 has 'embryo'
- list 3 new neighbors
-
2nd step:
- list 2 is list 1 now
- list 3 is list 2 now
- list 3 is empty
-
etc.
if dc = 5 we have 3 clusters if dc = 6 we have 2 clusters if dc = 7 we have 1 cluster if dc = 1.5 we have 4 clusters if dc = 2 we have 3 clusters if dc = 5 we have 2 clusters
You can find another ready projects for analys like QGIS
You can use Python with data package: Anaconda or Miniconda. There's another way - use Portable Python. Also you can use whatever IDE for Python.
Free