GAGPO: Generalized Advantage Grouped Policy Optimization — AI News