Diagnosing Training Inference Mismatch in LLM Reinforcement Learning — AI News