Main Results

PAPO consistently outperforms both GRPO and DAPO across all benchmarks, with particularly pronounced improvements on vision-dependent tasks. The results demonstrate that our simple yet effective approach successfully addresses the perception bottleneck in multimodal reasoning without requiring additional computational resources or external models. PAPO can serve as a direct drop-in replacement for both GRPO and DAPO.