From Text to Voice: A Reproducible and Verifiable Framework for Evaluating Tool Calling LLM Agents — AI News