About this project

/watch

Give Claude the ability to watch any video.

Claude Code:

/plugin marketplace add bradautomates/claude-video
/plugin install watch@claude-video

claude.ai (web): download watch.skill and drop it into Settings → Capabilities → Skills.

Codex / generic skills:

git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch

Zero config to start — yt-dlp and ffmpeg install on first run via brew on macOS (Linux/Windows print exact commands). Captions cover most public videos for free. Whisper API key is only needed when a video has no captions.

Claude can read a webpage, run a script, browse a repo. What it can't do, out of the box, is watch a video. You paste a YouTube link and it has to either guess from the title or pull a transcript that's missing 90% of what's on screen.

With Claude Video /watch you can paste a URL or a local path, ask a question, and Claude downloads the video, extracts frames at an auto-scaled rate, pulls a timestamped transcript (free captions when available, Whisper API as fallback), and Reads every frame as an image. By the time it answers, it has seen the video and heard the audio.

/watch https://youtu.be/dQw4w9WgXcQ what happens at the 30 second mark?

Why this exists

I built this because I'm constantly using video to keep up with content. If I see a YouTube video that's blowing up, I want to know how the creator structured the hook — what's on screen in the first 3 seconds, what they said, why it worked. That used to mean watching it myself with a notepad. Now I just paste the URL and ask.

The other half is summarization. Most YouTube videos don't deserve 20 minutes of my attention. I hand the URL to Claude, it pulls the transcript, and tells me what actually happened. If the visual matters, frames come along too. If it's a podcast or a talking head, transcript is enough.

Claude is great at reading and synthesizing — but until now, video was the one input I couldn't hand it. Pasting a YouTube link got you nothing useful. closes that gap.

Duration	Default frame budget	What you get
≤30 s	~30 frames	Dense — basically every key moment
30 s - 1 min	~40 frames	Still dense
1 - 3 min	~60 frames	Comfortable
3 - 10 min	~80 frames	Sparse but workable
> 10 min	100 frames	"Sparse scan" warning — re-run focused

Surface	Install
Claude Code	`/plugin marketplace add bradautomates/claude-video` then `/plugin install watch@claude-video`
claude.ai (web)	Download `watch.skill` → Settings → Capabilities → Skills → `+`
Codex	`git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch`
Manual / dev	`git clone https://github.com/bradautomates/claude-video.git ~/.claude/skills/watch`

Capability	What you need	Cost
Download + native captions	`yt-dlp` + `ffmpeg`	Free
Whisper fallback (preferred)	Groq API key — `whisper-large-v3`	Cheap, fast
Whisper fallback (alt)	OpenAI API key — `whisper-1`	Standard pricing
Disable Whisper entirely	`--no-whisper`	Free, frames-only when no captions

Open Sources

claude-video

About this project

/watch

Why this exists

Related Projects

hermes-agent

yt-dlp

What people actually use it for

How it works

Frame budget — why it matters

Install

Claude Code

claude.ai (web)

Codex

Manual (developer)

First run

Bring your own keys

Usage

Limits

Structure

Develop

Open source

stable-diffusion-webui

Open Sources

We read 100+ sources so you don't have to.

claude-video

About this project

/watch

Why this exists

Related Projects

hermes-agent

yt-dlp

What people actually use it for

How it works

Frame budget — why it matters

Install

Claude Code

claude.ai (web)

Codex

Manual (developer)

First run

Bring your own keys

Usage

Limits

Structure

Develop

Open source

stable-diffusion-webui