Researcher Collab

About

Areas of Interest

RL ML Action model learning PDDL

Integrating Reinforcement Learning, Action Model Learning, and Numeric Planning for Tackling Complex Tasks

Automated Planning algorithms require a model of the domain that specifies the preconditions and effects of each action. Obtaining such a domain model is notoriously hard. Algorithms for learning domain models exist, yet it remains unclear whether learning a domain model and planning is an effective approach for numeric planning environments, i.e., where states include discrete and numeric state variables. In this work, we explore the benefits of learning a numeric domain model and compare it with alternative model-free solutions. As a case study, we use two tasks in Minecraft, a popular sandbox game that has been used as an AI challenge. First, we consider an offline learning setting, where a set of expert trajectories are available to learn from. This is the standard setting for learning domain models. We used the Numeric Safe Action Model Learning (NSAM) algorithm to learn a numeric domain model and solve new problems with the learned domain model and a numeric planner. We call this model-based solution NSAM_(+p), and compare it to several model-free Imitation Learning (IL) and Offline Reinforcement Learning (RL) algorithms. Empirical results show that some IL algorithms can learn faster to solve simple tasks, while NSAM_(+p) allows solving tasks that require long-term planning and enables generalizing to solve problems in larger environments. Then, we consider an online learning setting, where learning is done by moving an agent in the environment. For this setting, we introduce RAMP. In RAMP, observations collected during the agent's execution are used to simultaneously train an RL policy and learn a planning domain action model. This forms a positive feedback loop between the RL policy and the learned domain model. We demonstrate experimentally the benefits of using RAMP, showing that it finds more efficient plans and solves more problems than several RL baselines.

Authors: Yarin Benyamin, Argaman Mordoch, Shahaf S. Shperberg, Roni Stern
Toward PDDL Planning Copilot

Large Language Models (LLMs) are increasingly being used as autonomous agents capable of performing complicated tasks. However, they lack the ability to perform reliable long-horizon planning on their own. This paper bridges this gap by introducing the Planning Copilot, a chatbot that integrates multiple planning tools and allows users to invoke them through instructions in natural language. The Planning Copilot leverages the Model Context Protocol (MCP), a recently developed standard for connecting LLMs with external tools and systems. This approach allows using any LLM that supports MCP without domain-specific fine-tuning. Our Planning Copilot supports common planning tasks such as checking the syntax of planning problems, selecting an appropriate planner, calling it, validating the plan it generates, and simulating their execution. We empirically evaluate the ability of our Planning Copilot to perform these tasks using three open-source LLMs. The results show that the Planning Copilot highly outperforms using the same LLMs without the planning tools. We also conducted a limited qualitative comparison of our tool against Chat GPT-5, a very recent commercial LLM. Our results shows that our Planning Copilot significantly outperforms GPT-5 despite relying on a much smaller LLM. This suggests dedicated planning tools may be an effective way to enable LLMs to perform planning tasks.

Authors: Yarin Benyamin, Argaman Mordoch, Shahaf S. Shperberg, Roni Stern
Crafting a Pogo Stick in Minecraft with Heuristic Search (Extended Abstract)

Minecraft is a widely popular video game renowned for its intricate environment. The game's open-ended design allows the creation of unique tasks and challenges for the agents, providing a broad spectrum for researchers to experiment with different AI techniques and applications. Indeed, various Minecraft tasks have been posed as an AI challenge. Most AI research on Minecraft focused on either applying Reinforcement Learning (RL) to solve the problem, learning an action model for planning, or modeling the problem for a domain-independent planner. In this work, we focus on the combinatorial search aspect of solving the Craft Wooden Pogo task within the Polycraft World AI Lab (PAL) Minecraft environment. PAL is an interface to Minecraft that provides an API for AI agents to interact with Minecraft's environment and send commands to the main character. PAL supports symbolic observations of the current state, making it ideal for planning algorithms, which require a symbolic model of the environment for problem-solving. Other Minecraft research frameworks such as MineRL, provide a visual, pixel-based representation of the game.

Authors: Yarin Benyamin, Argaman Mordoch, Shahaf Shperberg, Wiktor Piotrowski, Roni Stern
No collaboration calls yet.
No collaborations yet.