Robotouille

A challenging benchmark for testing LLM agent planning capabilities!

If you are not redirected, please click here.