The increasing complexity and criticality of AIdriven services across the compute continuum, spanning from edge devices to cloud datacenters, necessitates resilient, intelligent, and explainable management strategies. This work addresses the challenges of deploying and orchestrating safetycritical AI services in dynamic and resource-constrained environments, such as Industry 5.0 and Human Assistance and Disaster Recovery (HADR). We present a suite of complementary solutions, including a cloud-native MLOps platform tailored for SMEs, a semi-supervised federated learning framework (FedEdge-Learn), and novel semantic communication mechanisms that optimize data transmission using LLM-driven embeddings. Furthermore, we introduce an intent-based Zerotouch Service Management (ZSM) architecture, leveraging neurosymbolic AI and collaborative intelligence to automate orchestration, model fine-tuning, and policy reasoning across federated Kubernetes clusters. These efforts pave the way for trustworthy, adaptive, and efficient AI service lifecycle management in environments characterized by disconnections, privacy constraints, and operational unpredictability. Future work focuses on extending the neuro-symbolic approach to support additional tasks, including dynamic node selection, optimized placement across the compute continuum with the goal of improving resilience and interpretability in distributed, resource-constrained environments like Industry 5.0 and HADR, while addressing challenges such as intermittent connectivity and evolving operational conditions.

Management of Safety-Critical AI Services in the Compute Continuum

Colombi, Lorenzo
Primo
;
Tortonesi, Mauro
Ultimo
2025

Abstract

The increasing complexity and criticality of AIdriven services across the compute continuum, spanning from edge devices to cloud datacenters, necessitates resilient, intelligent, and explainable management strategies. This work addresses the challenges of deploying and orchestrating safetycritical AI services in dynamic and resource-constrained environments, such as Industry 5.0 and Human Assistance and Disaster Recovery (HADR). We present a suite of complementary solutions, including a cloud-native MLOps platform tailored for SMEs, a semi-supervised federated learning framework (FedEdge-Learn), and novel semantic communication mechanisms that optimize data transmission using LLM-driven embeddings. Furthermore, we introduce an intent-based Zerotouch Service Management (ZSM) architecture, leveraging neurosymbolic AI and collaborative intelligence to automate orchestration, model fine-tuning, and policy reasoning across federated Kubernetes clusters. These efforts pave the way for trustworthy, adaptive, and efficient AI service lifecycle management in environments characterized by disconnections, privacy constraints, and operational unpredictability. Future work focuses on extending the neuro-symbolic approach to support additional tasks, including dynamic node selection, optimized placement across the compute continuum with the goal of improving resilience and interpretability in distributed, resource-constrained environments like Industry 5.0 and HADR, while addressing challenges such as intermittent connectivity and evolving operational conditions.
2025
978-3-903176-75-1
979-8-3315-9089-5
Zero-touch Service Management, MLOps, Industry 5.0
File in questo prodotto:
File Dimensione Formato  
1571190268.pdf

solo gestori archivio

Tipologia: Post-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 362.56 kB
Formato Adobe PDF
362.56 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2613503
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact