Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models — AI News