When is Warmstarting Effective for Scaling Language Models? — AI News