Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:iot-reloaded:k_means [2024/12/02 21:19] – [Silhouette Score] ktokarzen:iot-reloaded:k_means [2024/12/10 21:36] (current) pczekalski
Line 1: Line 1:
 ===== K-Means ===== ===== K-Means =====
  
-The first method discussed here is one of the most commonly used – K-means. K-means clustering is a method that splits the initial set of points (objects) into groups, using distance measure, which represents a distance from the given point of the group to the group's centre representing a group's prototype centroid. The result of the clustering is N points grouped into K clusters, where each point has assigned a cluster index, which means that the distance from the point of the cluster centroid is closer than the distance to any other centroids of other clusters. Distance measure employs Euclidian distance, which requires scaled or normalised data to avoid the dominance of a single dimension over others. +The first method discussed here is one of the most commonly used – K-means. K-means clustering is a method that splits the initial set of points (objects) into groups, using distance measure, representing a distance from the given point of the group to the group's centre representing a group's prototypecentroid. The result of the clustering is N points grouped into K clusters, where each point has assigned a cluster index, which means that the distance from the point of the cluster centroid is closer than the distance to any other centroids of other clusters. Distance measure employs Euclidian distance, which requires scaled or normalised data to avoid the dominance of a single dimension over others. 
 The algorithm steps schematically are represented in the following figure {{ref>K-means_steps}}: The algorithm steps schematically are represented in the following figure {{ref>K-means_steps}}:
  
 <figure K-means_steps> <figure K-means_steps>
-{{ :en:iot-reloaded:clustering_1.png?600 | |  K-means steps}} +{{ :en:iot-reloaded:clustering_1.png?600 | K-means Steps}} 
-<caption> K-means steps </caption>+<caption> K-means Steps </caption>
 </figure> </figure>
  
 In the figure: In the figure:
-  * **STEP 1:** Initial data setwhere points do not belong to any of the clusters.+  * **STEP 1:** Initial data set where points do not belong to any of the clusters.
   * **STEP 2:** Cluster initial centres are selected randomly.   * **STEP 2:** Cluster initial centres are selected randomly.
-  * **STEP 3:** For each point, the closest cluster centre is selected, which is the point marker.+  * **STEP 3:** For each point, the closest cluster centre, which is the point marker, is selected.
   * **STEP 4:** Cluster mark is assigned to each point.   * **STEP 4:** Cluster mark is assigned to each point.
-  * **STEP 5:** The initial cluster centre is being refined to minimise the average distance to the cluster centre from each cluster point. As a result, cluster centres might not be physical points any more; instead, they become imaginary.+  * **STEP 5:** The initial cluster centre is being refined to minimise the average distance to it from each cluster point. As a result, cluster centres might no longer be physical points; instead, they become imaginary.
   * **STEP 6:** Cluster marks of the points are updated.   * **STEP 6:** Cluster marks of the points are updated.
  
Line 20: Line 20:
  
 <figure Euclidian distance> <figure Euclidian distance>
-{{ {{ :en:iot-reloaded:clustereq1.png?300 |  Euclidian distance}} +{{ {{ :en:iot-reloaded:clustereq1.png?300 |  Euclidian Distance}} 
-<caption> Euclidian distance </caption>+<caption> Euclidian Distance </caption>
 </figure> </figure>
  
Line 35: Line 35:
  
 <figure K-means_example_1> <figure K-means_example_1>
-{{ {{ :en:iot-reloaded:Clustering_2.png?900 |  K-means example}} +{{ {{ :en:iot-reloaded:Clustering_2.png?900 |  K-means Example with Two Clusters}} 
-<caption> K-means example with two clusters </caption>+<caption> K-means Example with Two Clusters </caption>
 </figure> </figure>
  
Line 43: Line 43:
  
 <figure K-means_example_2> <figure K-means_example_2>
-{{ {{ :en:iot-reloaded:Clustering_3.png?900 |  K-means example}} +{{ {{ :en:iot-reloaded:Clustering_3.png?900 | K-means Example with Three Clusters}} 
-<caption> K-means example with three clusters </caption>+<caption> K-means Example with Three Clusters </caption>
 </figure> </figure>
  
Line 74: Line 74:
  
 <figure Elbow_example> <figure Elbow_example>
-{{ {{ :en:iot-reloaded:Clustering_4.png?600 |  Elbow example}} +{{ {{ :en:iot-reloaded:Clustering_4.png?600 | Elbow Example on Two Synthetic Data Sets}} 
-<caption> Elbow example on two synthetic data sets </caption>+<caption> Elbow Example on Two Synthetic Data Sets </caption>
 </figure> </figure>
  
Line 91: Line 91:
  
 <figure Silhouette score > <figure Silhouette score >
-{{ {{ :en:iot-reloaded:ClusterEq2.png?300 |  Silhouette score}} +{{ {{ :en:iot-reloaded:ClusterEq2.png?300 | Silhouette Score}} 
-<caption> Silhouette score </caption>+<caption> Silhouette Score </caption>
 </figure> </figure>
  
Line 107: Line 107:
     * Computing the SC for different values of NC, typically starting from NC=2 reasonable maximum value (e.g., 10 or 20).     * Computing the SC for different values of NC, typically starting from NC=2 reasonable maximum value (e.g., 10 or 20).
     * Plotting the SC values on the y-axis and the number of clusters NC on the x-axis.     * Plotting the SC values on the y-axis and the number of clusters NC on the x-axis.
-  - Observe the plot:+  - **Observe the plot:**
     * As the number of clusters NC increases, the SC shows different score values, which may or may not gradually decrease, as in the case of the "elbow" method.      * As the number of clusters NC increases, the SC shows different score values, which may or may not gradually decrease, as in the case of the "elbow" method. 
     * The main goal is to observe the maximum SC value and the corresponding NC value.     * The main goal is to observe the maximum SC value and the corresponding NC value.
Line 121: Line 121:
  
 <figure Silhouette_example> <figure Silhouette_example>
-{{ {{ :en:iot-reloaded:Clustering_5.png?400 |  Elbow example}} +{{ {{ :en:iot-reloaded:Clustering_5.png?400 | Silhouette Example on a Synthetic Data Set}} 
-<caption> Silhouette example on a synthetic data set </caption>+<caption> Silhouette Example on a Synthetic Data Set </caption>
 </figure> </figure>
  
 The user should look for the highest score, which in this case is for the 3-cluster option.  The user should look for the highest score, which in this case is for the 3-cluster option. 
en/iot-reloaded/k_means.1733174345.txt.gz · Last modified: 2024/12/02 21:19 by ktokarz
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0