Curated repos, tools, and frameworks shaping the developer ecosystem.
Live data from GitHub.
by OpenPipe
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen3.6, GPT-OSS, Llama, and more!
W&B Training (Serverless RL) is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward functionโleading to faster feedback cycles, lower costs, and far less DevOps.
โจ Key Benefits:
# Before: Hours of GPU setup and infra management
# RuntimeError: CUDA error: out of memory ๐ข
# After: Serverless RL with instant feedback
from art.serverless.backend import ServerlessBackend
model = art.TrainableModel(
project="voice-agent",
name="agent-001",
base_model="Qwen/Qwen3.6-27B"
)
backend = ServerlessBackend(
api_key="your_wandb_api_key"
)
model.register(backend)
# Edit and iterate in minutes, not hours!
ART is an open-source RL framework that improves agent reliability by allowing LLMs to learn from experience. ART provides an ergonomic harness for integrating GRPO into any python application. For a quick hands-on introduction, run one of the notebooks below. When you're ready to learn more, check out the docs.
| Agent Task | Example Notebook | Description | Comparative Performance |
|---|---|---|---|
| ARTโขE [Serverless] | ๐๏ธ Train agent | Qwen 3.6 27B learns to search emails using RULER | |
| 2048 [Serverless] | ๐๏ธ Train agent | Qwen 3.6 27B learns to play 2048 | |
| ARTโขE LangGraph | ๐๏ธ Train agent | Qwen 2.5 7B learns to search emails using LangGraph | [Link coming soon] |
| MCPโขRL | ๐๏ธ Train agent | Qwen 2.5 3B masters the NWS MCP server | [Link coming soon] |
| Temporal Clue | ๐๏ธ Train agent | Qwen 2.5 7B learns to solve Temporal Clue | [Link coming soon] |
| Tic Tac Toe | ๐๏ธ Train agent | Qwen 2.5 3B learns to play Tic Tac Toe | |
| Codenames | ๐๏ธ Train agent | Qwen 2.5 3B learns to play Codenames | benchmarks |
| AutoRL [RULER] | ๐๏ธ Train agent | Train Qwen 2.5 7B to master any task | [Link coming soon] |
| Distillation (SFT) | ๐๏ธ Train model | Distill text-to-SQL from Qwen 3 235B to Qwen 3.6 27B | [Link coming soon] |
| Summarizer (SFT + RL) | ๐๏ธ Train model | Train a document summarizer with SFT warmup then RL | [Link coming soon] |
| SFT from a dataset | ๐๏ธ Train model | Fine-tune Qwen 3.6 27B on text-to-SQL from a dataset | [Link coming soon] |
Explore our latest research and updates on building SOTA agents.
ART agents can be trained from any client machine that runs python. To add to an existing project, run this command:
pip install openpipe-art
Curious about how to use ART for a real-world task? Check out the ARTโขE Agent blog post, where we detail how we trained Qwen 2.5 14B to beat o3 at email retrieval!

ART's functionality is divided into a client and a server. The OpenAI-compatible client is responsible for interfacing between ART and your codebase. Using the client, you can pass messages and get completions from your LLM as it improves. The server runs independently on any machine with a GPU. It abstracts away the complexity of the inference and training portions of the RL loop while allowing for some custom configuration. An outline of the training loop is shown below:
Inference
system, user, and assistant message is stored in a Trajectory.reward to its Trajectory, indicating the performance of the LLM.Training
This training loop runs until a specified number of inference and training iterations have completed.
ART should work with most vLLM/HuggingFace-transformers compatible causal language models, or at least the ones supported by Unsloth. Gemma 3 does not appear to be supported for the time being. If any other model isn't working for you, please let us know on Discord or open an issue on GitHub!
ART is in active development, and contributions are most welcome! Please see the CONTRIBUTING.md file for more information.
@misc{hilton2025art,
author = {Brad Hilton and Kyle Corbitt and David Corbitt and Saumya Gandhi and Angky William and Bohdan Kovalevskyi and Andie Jones},
title = {ART: Agent Reinforcement Trainer},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/openpipe/art}}
}
This repository's source code is available under the Apache-2.0 License.
ART stands on the shoulders of giants. While we owe many of the ideas and early experiments that led to ART's development to the open source RL community at large, we're especially grateful to the authors of the following projects:
Finally, thank you to our partners who've helped us test ART in the wild! We're excited to see what you all build with it.
Stable Diffusion web UI