This benchmark framework evaluates whether LLM agents can learn and adapt in complex stateful environments where actions modify persistent state, entities have cross-references, and workflows span ...