DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) employs density measures to mark points in high-density regions and those in low-density regions – the noise. Because of this natural behaviour of the algorithm, it is particularly useful in signal processing and similar application domains. [1].

One of the essential concepts is the point's p neighbourhood, which is the set of points reachable within the user-defined distance eps (epsilon):

  Point's Neighbourhood

where:

The algorithm treats different points differently depending on density and neighbouring points distribution around the point – its neighbourhood:

  1. Core Points:
    • A point is a core point if it has at least MinPts neighbours within a distance eps, where MinPts and eps are user-defined parameters, i.e. N(p) ≥ MinPts.
  2. Directly Density-Reachable points:
    • A point is directly density-reachable from a core point if it lies within the distance eps of the core point.
  3. Border Points:
    • A border point is not a core point but within the eps distance of a core point. Border points are part of a cluster but do not have enough neighbours to be core points.
  4. Noise points:
    • Points that are not core and are not reachable from any core point are considered noise or outliers.
  DBSCAN Concepts
Figure 1: DBSCAN Concepts

DBSCAN is excellent for discovering clusters in data with noise, especially when clusters are not circular or spherical.

Some application examples (figures 2 and 3):

  DBSCAN Example
Figure 2: DBSCAN Example: Eps = 1.0, 13 clusters and 96 noise points
  DBSCAN Example
Figure 3: DBSCAN Example: Eps = 1.5, 3 clusters and 8 noise points

A typical application in signal processing (figure 4):

  DBSCAN Example
Figure 4: DBSCAN Example: Eps = 0.2, 3 Clusters and 84 Noise Points

Selecting eps and MinPts values

Usually, MinPts is selected using some prior knowledge of the data and its internal structure. If it is done, the following steps might be applied:

  Selecting MinPts
Figure 5: Selecting MinPts

The red horizontal line shows a possible eps value, around 0,045.


[1] Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. (eds.). A density-based algorithm for discovering clusters in large spatial databases with noise (PDF). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220. ISBN 1-57735-004-9.