A challenging benchmark for testing LLM agent planning capabilities!
If you are not redirected, please click here.