Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation — AI News