Many scientific problems involve data that is embedded in a space with periodic boundary conditions. For instance, this can be related to an inherent cyclic or rotational symmetry in the data or a spatially extended periodicity. When analyzing such data, well-tailored methods are needed to obtain efficient approaches that obey the periodic boundary conditions of the problem. In this work, we present a method for applying a clustering algorithm to data embedded in a periodic domain based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a widely used unsupervised machine learning method that identifies clusters in data. The proposed method internally leverages the conventional DBSCAN algorithm for domains with open boundaries, and as such, it remains compatible with all optimized implementations for neighborhood searches in open domains. In this way, it retains the same optimized runtime complexity of O (N log N). We demonstrate the workings of the proposed method using synthetic data in one, two and three dimensions and also apply it to a real-world example involving the clustering of bubbles in a turbulent flow. The proposed approach is implemented in a ready-to-use Python package that is publicly available.

DBSCAN in Domains with Periodic Boundary Conditions

Gabbana, Alessandro
Ultimo
2025

Abstract

Many scientific problems involve data that is embedded in a space with periodic boundary conditions. For instance, this can be related to an inherent cyclic or rotational symmetry in the data or a spatially extended periodicity. When analyzing such data, well-tailored methods are needed to obtain efficient approaches that obey the periodic boundary conditions of the problem. In this work, we present a method for applying a clustering algorithm to data embedded in a periodic domain based on the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, a widely used unsupervised machine learning method that identifies clusters in data. The proposed method internally leverages the conventional DBSCAN algorithm for domains with open boundaries, and as such, it remains compatible with all optimized implementations for neighborhood searches in open domains. In this way, it retains the same optimized runtime complexity of O (N log N). We demonstrate the workings of the proposed method using synthetic data in one, two and three dimensions and also apply it to a real-world example involving the clustering of bubbles in a turbulent flow. The proposed approach is implemented in a ready-to-use Python package that is publicly available.
2025
De Wit, Xander M.; Gabbana, Alessandro
File in questo prodotto:
File Dimensione Formato  
6895d80c7f6a9.pdf

accesso aperto

Descrizione: versione editoriale
Tipologia: Full text (versione editoriale)
Licenza: Creative commons
Dimensione 2.44 MB
Formato Adobe PDF
2.44 MB Adobe PDF Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2597279
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact