STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

View PDF HTML (experimental)

Abstract:Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces, STLGT improves forecasting accuracy over PERT-GNN by 8.5% MAPE on average and achieves up to 12x faster CPU inference at N=32, matching the maximum span graph size after preprocessing the Alibaba traces. Ablation studies further demonstrate the effectiveness of each component, especially under bursty traffic.

Comments:	12 pages, 5 figures, 4 tables, conference
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2604.26422 [cs.LG]
	(or arXiv:2604.26422v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2604.26422 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yongliang Ding [view email]
[v1] Wed, 29 Apr 2026 08:32:14 UTC (304 KB)