K-Means Clustering Algorithm in Machine Learning

K-Means clustering is an unsupervised machine learning algorithm which partitions n instances into k clusters by similarity. As K-Means clustering is an unsupervised learning algorithm, therefore instances will not have labels.

As K-Means clustering is an unsupervised learning algorithm, training instances will not have labels. Furthermore, to make you understand K-Means clustering algorithm, I will take a very simple dataset having three features without labels.

Instances	x1	x2	x3
1	12	40	30
2	16	35	29
3	14	37	25
4	18	36	21
5	12	40	31

For the sake of simplicity, I will divide instances only into two clusters. To partition the dataset into two clusters you will require two instances to be centroid. Let us take instance 1 as centroid of cluster-0 and instance-2 as center of cluster-1.

cluster-0={1}

It means that instance 1 is in cluster-0.

cluster-1={2}

It means that instance 2 is in cluster-1.

Now let us see, which cluster instance 3 will belong to.

To decide which cluster instance 3 will belong to, you need to calculate distance from instance 1(centroid) to instance 3 and instance 2 (centroid) to instance 3.

Euclidian distance between instance 1 and 3 is.

d_1-3=sqrt((12-14)²+(40-37)²+(30-25)²)=sqrt(38)

d_2-3=sqrt((16-14)²+(35-37)²+(29-25)²)=sqrt(24)

It can be observed that instance no. 3 is closer to 2 compare to instance no. 1. Now it will be in cluster-1 then clusters would be.

cluster-0={1}

cluster-1={2,3}

Furthermore, a new centroid will be calculated for cluster-1, which would be.

((16+14)/2, (35+37)/2, (29+25)/2)=(15, 36, 27)

To decide which cluster instance 4 will belong to, you need to calculate distance from instance 1(centroid) to instance 4, and instance 23 (centroid) to instance 4.

d_1-4=sqrt((12-18)²+(40-36)²+(30-21)²)=sqrt(133)

d23-4=sqrt((15-18)²+(36-36)²+(27-21)²)=sqrt(45)

It can be observed that instance no. 4 is closer to 23 compare to instance no. 1. Now it will be in cluster-1 then clusters would be.

cluster-0={1}

cluster-1={2,3,4}

Now, new centroid will be calculated for cluster-1 as given below.

((16+14+18)/3, (35+37+36)/3, (29+25+21)/3)=(16, 36, 25)

To decide which cluster instance 5 will belong to, you need to calculate distance from instance 1(centroid) to instance 5, and instance 234 (centroid) to instance 5.

d_1-5=sqrt((12-12)²+(40-40)²+(30-31)²)=sqrt(1)

d_234-5=sqrt((16-12)²+(36-40)²+(25-31)²)=sqrt(68)

It can be observed that instance no. 5 is closer to 1 compare to instance no. centroid 234. Now it will be in cluster-0 then clusters would be.

cluster-0={1, 5}

cluster-1={2,3,4}

Now centroid for cluster-0 will be calculated as.

((12+12)/2, (40+40)/2, (30+31))=(12, 40, 30.5)

Finally, we will have two clusters

cluster-0={1, 5}

cluster-1={2,3,4}

And their centers are (16, 36, 25) and (12, 40, 30.5) respectively.

Python code

from sklearn.cluster import KMeans
kmc=KMeans(n_clusters=3)
X=[[12,40,30],[15,35,29],[14,37,25],[18,36,21],[12,40,31]]
kmc.fit(X)
print(kmc.labels_)
print(kmc.cluster_centers_)

After running the above code you will get clusters and their centers as output.

K-Means Clustering Algorithm in Machine Learning

Leave a Comment

©Postnetwork-All rights reserved.

See Also:

Leave a Comment

©Postnetwork-All rights reserved.