This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:iot-reloaded:k_means [2024/12/02 21:19] – [Silhouette Score] ktokarz | en:iot-reloaded:k_means [2024/12/10 21:36] (current) – pczekalski | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ===== K-Means ===== | ===== K-Means ===== | ||
| - | The first method discussed here is one of the most commonly used – K-means. K-means clustering is a method that splits the initial set of points (objects) into groups, using distance measure, | + | The first method discussed here is one of the most commonly used – K-means. K-means clustering is a method that splits the initial set of points (objects) into groups, using distance measure, |
| The algorithm steps schematically are represented in the following figure {{ref> | The algorithm steps schematically are represented in the following figure {{ref> | ||
| <figure K-means_steps> | <figure K-means_steps> | ||
| - | {{ : | + | {{ : |
| - | < | + | < |
| </ | </ | ||
| In the figure: | In the figure: | ||
| - | * **STEP 1: | + | * **STEP 1: |
| * **STEP 2: | * **STEP 2: | ||
| - | * **STEP 3:** For each point, the closest cluster centre | + | * **STEP 3:** For each point, the closest cluster centre, which is the point marker, is selected. |
| * **STEP 4: | * **STEP 4: | ||
| - | * **STEP 5:** The initial cluster centre is being refined to minimise the average distance to the cluster centre | + | * **STEP 5:** The initial cluster centre is being refined to minimise the average distance to it from each cluster point. As a result, cluster centres might no longer |
| * **STEP 6: | * **STEP 6: | ||
| Line 20: | Line 20: | ||
| <figure Euclidian distance> | <figure Euclidian distance> | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| Line 35: | Line 35: | ||
| <figure K-means_example_1> | <figure K-means_example_1> | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| Line 43: | Line 43: | ||
| <figure K-means_example_2> | <figure K-means_example_2> | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| Line 74: | Line 74: | ||
| <figure Elbow_example> | <figure Elbow_example> | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| Line 91: | Line 91: | ||
| <figure Silhouette score > | <figure Silhouette score > | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| Line 107: | Line 107: | ||
| * Computing the SC for different values of NC, typically starting from NC=2 reasonable maximum value (e.g., 10 or 20). | * Computing the SC for different values of NC, typically starting from NC=2 reasonable maximum value (e.g., 10 or 20). | ||
| * Plotting the SC values on the y-axis and the number of clusters NC on the x-axis. | * Plotting the SC values on the y-axis and the number of clusters NC on the x-axis. | ||
| - | - Observe the plot: | + | - **Observe the plot:** |
| * As the number of clusters NC increases, the SC shows different score values, which may or may not gradually decrease, as in the case of the " | * As the number of clusters NC increases, the SC shows different score values, which may or may not gradually decrease, as in the case of the " | ||
| * The main goal is to observe the maximum SC value and the corresponding NC value. | * The main goal is to observe the maximum SC value and the corresponding NC value. | ||
| Line 121: | Line 121: | ||
| <figure Silhouette_example> | <figure Silhouette_example> | ||
| - | {{ {{ : | + | {{ {{ : |
| - | < | + | < |
| </ | </ | ||
| The user should look for the highest score, which in this case is for the 3-cluster option. | The user should look for the highest score, which in this case is for the 3-cluster option. | ||