The increasing complexity and criticality of AIdriven services across the compute continuum, spanning from edge devices to cloud datacenters, necessitates resilient, intelligent, and explainable management strategies. This work addresses the challenges of deploying and orchestrating safetycritical AI services in dynamic and resource-constrained environments, such as Industry 5.0 and Human Assistance and Disaster Recovery (HADR). We present a suite of complementary solutions, including a cloud-native MLOps platform tailored for SMEs, a semi-supervised federated learning framework (FedEdge-Learn), and novel semantic communication mechanisms that optimize data transmission using LLM-driven embeddings. Furthermore, we introduce an intent-based Zerotouch Service Management (ZSM) architecture, leveraging neurosymbolic AI and collaborative intelligence to automate orchestration, model fine-tuning, and policy reasoning across federated Kubernetes clusters. These efforts pave the way for trustworthy, adaptive, and efficient AI service lifecycle management in environments characterized by disconnections, privacy constraints, and operational unpredictability. Future work focuses on extending the neuro-symbolic approach to support additional tasks, including dynamic node selection, optimized placement across the compute continuum with the goal of improving resilience and interpretability in distributed, resource-constrained environments like Industry 5.0 and HADR, while addressing challenges such as intermittent connectivity and evolving operational conditions.
Management of Safety-Critical AI Services in the Compute Continuum
Colombi, Lorenzo
Primo
;Tortonesi, MauroUltimo
2025
Abstract
The increasing complexity and criticality of AIdriven services across the compute continuum, spanning from edge devices to cloud datacenters, necessitates resilient, intelligent, and explainable management strategies. This work addresses the challenges of deploying and orchestrating safetycritical AI services in dynamic and resource-constrained environments, such as Industry 5.0 and Human Assistance and Disaster Recovery (HADR). We present a suite of complementary solutions, including a cloud-native MLOps platform tailored for SMEs, a semi-supervised federated learning framework (FedEdge-Learn), and novel semantic communication mechanisms that optimize data transmission using LLM-driven embeddings. Furthermore, we introduce an intent-based Zerotouch Service Management (ZSM) architecture, leveraging neurosymbolic AI and collaborative intelligence to automate orchestration, model fine-tuning, and policy reasoning across federated Kubernetes clusters. These efforts pave the way for trustworthy, adaptive, and efficient AI service lifecycle management in environments characterized by disconnections, privacy constraints, and operational unpredictability. Future work focuses on extending the neuro-symbolic approach to support additional tasks, including dynamic node selection, optimized placement across the compute continuum with the goal of improving resilience and interpretability in distributed, resource-constrained environments like Industry 5.0 and HADR, while addressing challenges such as intermittent connectivity and evolving operational conditions.| File | Dimensione | Formato | |
|---|---|---|---|
|
1571190268.pdf
solo gestori archivio
Tipologia:
Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
362.56 kB
Formato
Adobe PDF
|
362.56 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


