Adversarial AI attack detection: a novel approach using explainable AI and deception mechanisms
Keywords:
Adversarial AI detection, adversarial training, deception mechanisms, explainable AIAbstract
Detecting adversarial AI attacks has emerged as a critical issue since AI systems are becoming integral across all industries, from healthcare to finance and even transportation. Adversarial attacks stand on the fact that there exist weaknesses within machine learning and deep learning models, which they exploit on the grounds of their potential to cause serious disruptions and severe threats towards the integrity of AI operational procedures. In this light, the discussion will focus on developing robust mechanisms for detecting adversarial inputs in real-time to ensure that AI systems remain resilient against such sophisticated threats. While adversarial AI — software input sanitization, anomaly detection, and adversarial training — has some important foundational work, most approaches to them suffer from generalization challenges across attack types or real-time performance. This work will introduce novelty by extending the detection capabilities with explainable AI (XAI) and deception mechanisms. Adversarial activities will be detected based on adversarial training in combination with honeypots and digital twins, while keeping the process of detection transparent with XAI. While honeypots and digital twins decoy attackers, observing their behaviors can further strengthen detection methods. The results so-far promise tremendous improvements in the detection of adversarial attacks in high-risk AI applications, efficacy of honeypots for the capture of malicious behavior, and XAI for enhanced interpretability and reliability of the detection process. These techniques will enhance the robustness of AI systems against adversarial threats. Presented research contributes significantly by providing practical tools for cybersecurity professionals and AI practitioners against these attacks, thus offering new insights into AI for cybersecurity. The novelty value of the paper is the innovative integration of adversarial training, XAI, and deception techniques, which offers a combined, interpretable, and effective method toward the detection of adversarial AI attacks on cross-industry sectors.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Maria NICULAE, Georgios KALPAKTSOGLOU, Anastasia TSIOTA, Giorgio BERNARDINETTI, Zacharenia LEKKA, Nikolaos Sachpelidis BROZOS, Panagiotis Radoglou GRAMMATIKIS, Ignacio LACALLE, Dionysios XENAKIS, Christos XENAKIS, Athanasia SABAZIOTI, Aristeidis FARAO, Mari-Anais SACHIAN, Vlad STANESCU, George SUCIU, Stylianos KARAGIANNIS

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.