MCP SERVER · WINDOWS CAPTURE · FFMPEG

Give your agent
eyes on the screen.

screencast-mcp records the Windows screen, samples footage into frames an agent can actually look at, and cuts the result with ffmpeg — from a quick trim to a platform-ready export. It speaks MCP over stdio, so it plugs into any MCP client.

$npm install -g @tmhs/screencast-mcp View source ↗

Needs ffmpeg and ffprobe on PATH (or FFMPEG_PATH / FFPROBE_PATH). Capture uses gdigrab, which is Windows-only; the watch and edit tools run anywhere ffmpeg runs. Add it to an MCP client with:

{
  "mcpServers": {
    "screencast": {
      "command": "npx",
      "args": ["-y", "@tmhs/screencast-mcp"]
    }
  }
}

00:02CHAPTER 1 / CAPTURE

Point it at anything on screen.

Record the whole desktop, one monitor, a window by title, or an exact pixel region — with optional system audio through a loopback device. Recordings run as background sessions you stop by id; screenshots are one call. Every capture is an explicit tool call: nothing records on its own.

Tool	What it does
`start_recording`	Start a background recording. Target = full, a monitor, a window by title, or a region. Optional fps, quality preset, and system audio.
`stop_recording`	Stop a session by id with a graceful quit so the file is finalized, not truncated.
`screenshot`	Capture a single PNG of a target.
`list_sessions`	List active and finished recording sessions.
`get_session`	Inspect a single session by id.
`list_audio_devices`	List DirectShow audio devices and flag a likely system-audio loopback device for start_recording.

00:14CHAPTER 2 / WATCH

Video a model can read.

A video file is opaque to a language model. sample_frames turns footage into PNGs an agent can actually view — at a fixed rate for a full pass, or at exact timestamps to check one moment.

Tool	What it does
`sample_frames`	Extract frames at a fixed fps or at explicit timestamps so the agent can view what happened.
`get_media_info`	ffprobe wrapper: duration, resolution, fps, codecs, format, size.

00:26CHAPTER 3 / EDIT

Small cuts, clean errors.

Composable ffmpeg edits with validated inputs — a bad rectangle, an odd dimension, or a wrong duration is rejected with a message that says how to fix it, not an encoder stack trace. Tools that write a file refuse to replace an existing file at a caller-supplied output path unless overwrite: true is passed; auto-generated default paths are always unique.

Tool	What it does
`trim`	Cut a sub-clip by start + end or duration. Stream copy, snaps to keyframes.
`clip`	Extract one or more frame-accurate sub-segments to separate files.
`concat`	Join two or more videos into one.
`convert`	Convert between mp4, gif, and webm.
`crop`	Crop to a pixel rectangle; an off-frame rectangle is rejected.
`scale`	Resize to a width and/or height, keeping aspect when one side is given.
`speed`	Change playback speed by a factor; audio is retempo'd when present.
`overlay`	Composite a logo, watermark, or picture-in-picture, optionally scaled and time-limited.
`compress`	Re-encode smaller with a CRF ladder and an optional width cap.
`extract_audio`	Write the audio track to its own file (mp3, aac, wav, or copy — copy picks a container that fits the source codec).
`redact_region`	Cover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection.

00:41CHAPTER 4 / PRODUCE

From raw capture to something you’d publish.

Assembly, titles, music, and platform-shaped exports. Mixed inputs are normalized to a common resolution, fps, and audio rate before they are combined, so heterogeneous clips compose cleanly.

Tool	What it does
`assemble_highlights`	Stitch two or more clips into one with hard cuts or an xfade transition between each.
`xfade_transition`	Crossfade two videos into one with an xfade transition. Inputs are auto-normalized first.
`title_card`	Generate a standalone title card with centered text on a solid background. Uses a bundled font.
`music_bed`	Lay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio.
`reframe`	Re-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill).
`export_preset`	Encode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate.

00:52APPENDIX / WINDOWS NOTES

Know your capture surface.

Monitor targets crop the virtual desktop to a display’s real pixel bounds, so monitor:1 grabs the second display at its true offset; monitor:0 is primary.
Window capture matches a case-insensitive exact title first, then falls back to a substring match; with several matches the topmost window wins.
Fullscreen-exclusive apps can produce black frames under gdigrab; use borderless-windowed mode.
System audio capture (start_recording with audio.source = system) needs a virtual-audio loopback device, since gdigrab is video-only and Windows has no native loopback. Use list_audio_devices to find one. Microphone capture is not supported.

00:55APPENDIX / SAFETY

The screen is sensitive. Treat it that way.

Screen capture can record anything on screen, including secrets. Capture is always explicit (a tool call, never automatic), output stays on the local filesystem, and this public repo ignores captured media so test recordings cannot be committed by accident. Review frames before sharing a file, and use redact_region before anything leaves the machine.

Give your agenteyes on the screen.