RL for Sokoban - PPO and GRPO

Agentic Planning for Long Horizon Tasks

Andnet DeBoer

Northwestern University

Reinforcement Learning Agentic Reasoning