Abstract:Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.
| Comments: | 12 pages, 5 figures, 4 tables, conference |
| Subjects: | Machine Learning (cs.LG); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2604.26422 [cs.LG] |
| (or arXiv:2604.26422v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2604.26422 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Yongliang Ding [view email]
[v1]
Wed, 29 Apr 2026 08:32:14 UTC (304 KB)
