Many scientific problems involve data that is embedded in a space with periodic boundary conditions. For instance, this can be related to an inherent cyclic or rotational symmetry in the data or a spatially extended periodicity. When analyzing such data, well-tailored methods are needed to obtain efficient approaches that obey the periodic boundary conditions of the problem. In this work, we present a method for applying a clustering algorithm to data embedded in a periodic domain based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a widely used unsupervised machine learning method that identifies clusters in data. The proposed method internally leverages the conventional DBSCAN algorithm for domains with open boundaries, and as such, it remains compatible with all optimized implementations for neighborhood searches in open domains. In this way, it retains the same optimized runtime complexity of O (N log N). We demonstrate the workings of the proposed method using synthetic data in one, two and three dimensions and also apply it to a real-world example involving the clustering of bubbles in a turbulent flow. The proposed approach is implemented in a ready-to-use Python package that is publicly available.
DBSCAN in Domains with Periodic Boundary Conditions
Gabbana, AlessandroUltimo
2025
Abstract
Many scientific problems involve data that is embedded in a space with periodic boundary conditions. For instance, this can be related to an inherent cyclic or rotational symmetry in the data or a spatially extended periodicity. When analyzing such data, well-tailored methods are needed to obtain efficient approaches that obey the periodic boundary conditions of the problem. In this work, we present a method for applying a clustering algorithm to data embedded in a periodic domain based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a widely used unsupervised machine learning method that identifies clusters in data. The proposed method internally leverages the conventional DBSCAN algorithm for domains with open boundaries, and as such, it remains compatible with all optimized implementations for neighborhood searches in open domains. In this way, it retains the same optimized runtime complexity of O (N log N). We demonstrate the workings of the proposed method using synthetic data in one, two and three dimensions and also apply it to a real-world example involving the clustering of bubbles in a turbulent flow. The proposed approach is implemented in a ready-to-use Python package that is publicly available.| File | Dimensione | Formato | |
|---|---|---|---|
|
6895d80c7f6a9.pdf
accesso aperto
Descrizione: versione editoriale
Tipologia:
Full text (versione editoriale)
Licenza:
Creative commons
Dimensione
2.44 MB
Formato
Adobe PDF
|
2.44 MB | Adobe PDF | Visualizza/Apri |
I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


