Abstract:Security Information and Event Management (SIEM) systems aggregate log data from heterogeneous sources to detect coordinated attacks. Traditional rule-based correlation engines struggle to classify multi-step web application attacks because they examine each event without reference to the behavioural history of the originating host.
We present Smart-SIEM, an AI module for the open-source Wazuh SIEM platform with two contributions: (1) a per-source-IP behavioural context vector encoding HTTP response-status distributions, peak rule activation counts, and MITRE ATT&CK technique frequencies from the N most recent prior events; (2) a two-stage hybrid cascade combining LightGBM for binary attack detection and XGBoost for six-class attack categorisation.
Evaluated on 46,454 purpose-built Wazuh security events, context features improve all tested gradient boosting algorithms from ~0.705 macro F1 to 0.947-0.967 (Stage 1) and 0.876-0.914 (Stage 2), an average gain of +0.254 and +0.324 respectively. The hybrid cascade achieves F1 of 0.967 (binary) and 0.914 (six-class). Wazuh's native rule engine detects 0% of Brute Force and Broken Authentication events; the AI module detects 100% and 98.3% respectively. A self-adaptive retraining mechanism recovers from concept drift: F1 drops from 0.905 to 0.465 when unseen attack types emerge, recovering to 0.814 after retraining on the combined corpus.
| Comments: | 38 pages, 13 figures, 13 tables |
| Subjects: | Cryptography and Security (cs.CR); Machine Learning (cs.LG) |
| ACM classes: | C.2.0; K.6.5 |
| Cite as: | arXiv:2605.13337 [cs.CR] |
| (or arXiv:2605.13337v1 [cs.CR] for this version) | |
| https://doi.org/10.48550/arXiv.2605.13337 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Badr Alboushy [view email]
[v1]
Wed, 13 May 2026 10:54:36 UTC (1,177 KB)
