Agent Reinforcement Trainer

Train multi-step agents for real-world tasks using GRPO.

🚀 W&B Training: Serverless RL

W&B Training (Serverless RL) is the first publicly available service for flexibly training models with reinforcement learning. It manages your training and inference infrastructure automatically, letting you focus on defining your data, environment and reward function—leading to faster feedback cycles, lower costs, and far less DevOps.

✨ Key Benefits:

40% lower cost - Multiplexing on shared production-grade inference cluster
28% faster training - Scale to 2000+ concurrent requests across many GPUs
Zero infra headaches - Fully managed infrastructure that stays healthy
Instant deployment - Every checkpoint instantly available via W&B Inference

# Before: Hours of GPU setup and infra management
# RuntimeError: CUDA error: out of memory 😢

# After: Serverless RL with instant feedback
from art.serverless.backend import ServerlessBackend

model = art.TrainableModel(
  project="voice-agent",
  name="agent-001",
  base_model="Qwen/Qwen3.6-27B"
)

backend = ServerlessBackend(
    api_key="your_wandb_api_key"
)
model.register(backend)
# Edit and iterate in minutes, not hours!

Agent Task	Example Notebook	Description	Comparative Performance
ART•E [Serverless]	🏋️ Train agent	Qwen 3.6 27B learns to search emails using RULER	benchmarks
2048 [Serverless]	🏋️ Train agent	Qwen 3.6 27B learns to play 2048	benchmarks
ART•E LangGraph	🏋️ Train agent	Qwen 2.5 7B learns to search emails using LangGraph	[Link coming soon]
MCP•RL	🏋️ Train agent	Qwen 2.5 3B masters the NWS MCP server	[Link coming soon]
Temporal Clue	🏋️ Train agent	Qwen 2.5 7B learns to solve Temporal Clue	[Link coming soon]
Tic Tac Toe	🏋️ Train agent	Qwen 2.5 3B learns to play Tic Tac Toe	benchmarks
Codenames	🏋️ Train agent	Qwen 2.5 3B learns to play Codenames	benchmarks
AutoRL [RULER]	🏋️ Train agent	Train Qwen 2.5 7B to master any task	[Link coming soon]
Distillation (SFT)	🏋️ Train model	Distill text-to-SQL from Qwen 3 235B to Qwen 3.6 27B	[Link coming soon]
Summarizer (SFT + RL)	🏋️ Train model	Train a document summarizer with SFT warmup then RL	[Link coming soon]
SFT from a dataset	🏋️ Train model	Fine-tune Qwen 3.6 27B on text-to-SQL from a dataset	[Link coming soon]

Open Sources

ART

About this project

Agent Reinforcement Trainer

🚀 W&B Training: Serverless RL

Related Projects

hermes-agent

yt-dlp

ART Overview

📒 Notebooks

📰 ART News

Why ART?

Installation

🤖 ART•E Agent

🔁 Training Loop Overview

🧩 Supported Models

🤝 Contributing

📖 Citation

⚖️ License

🙏 Credits

stable-diffusion-webui

Open Sources

We read 100+ sources so you don't have to.

ART

About this project

Agent Reinforcement Trainer

🚀 W&B Training: Serverless RL

Related Projects

hermes-agent

yt-dlp

ART Overview

📒 Notebooks

📰 ART News

Why ART?

Installation

🤖 ART•E Agent

🔁 Training Loop Overview

🧩 Supported Models

🤝 Contributing

📖 Citation

⚖️ License

🙏 Credits

stable-diffusion-webui