MCP server TypeScript Windows-first ffmpeg
A Model Context Protocol server that lets an agent record the screen, take screenshots, "watch" footage by sampling frames into viewable images, and run a small set of ffmpeg edits. It speaks MCP over stdio.
ffmpeg and ffprobe must be installed and on
PATH (or set via FFMPEG_PATH /
FFPROBE_PATH). Screen capture uses gdigrab, which is
Windows-only; the watch and edit tools run anywhere ffmpeg runs.
npm install -g @tmhs/screencast-mcp
{
"mcpServers": {
"screencast": {
"command": "npx",
"args": ["-y", "@tmhs/screencast-mcp"]
}
}
}
| Tool | What it does |
|---|---|
start_recording | Start a background recording. Target = full, a monitor, a window by title, or a region. Optional fps and quality preset. |
stop_recording | Stop a session by id with a graceful quit so the file is finalized, not truncated. |
list_sessions | List active and finished recording sessions. |
get_session | Inspect a single session by id. |
screenshot | Capture a single PNG of a target. |
sample_frames | Extract frames at a fixed fps or at explicit timestamps so the agent can view what happened. |
get_media_info | ffprobe wrapper: duration, resolution, fps, codecs, format, size. |
trim | Cut a sub-clip by start + end or duration. |
concat | Join two or more videos into one. |
convert | Convert between mp4, gif, and webm. |
crop | Crop to a pixel rectangle; an off-frame rectangle is rejected. |
scale | Resize to a width and/or height, keeping aspect when one side is given. |
speed | Change playback speed by a factor; audio is retempo'd when present. |
overlay | Composite a logo, watermark, or picture-in-picture, optionally scaled and time-limited. |
compress | Re-encode smaller with a CRF ladder and an optional width cap. |
extract_audio | Write the audio track to its own file (mp3, aac, wav, or copy). |
clip | Extract one or more frame-accurate sub-segments to separate files. |
redact_region | Cover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection. |
list_audio_devices | List DirectShow audio devices and flag a likely system-audio loopback device for start_recording. |
xfade_transition | Crossfade two videos into one with an xfade transition. Inputs are auto-normalized first. |
assemble_highlights | Stitch two or more clips into one with hard cuts or an xfade transition between each. |
title_card | Generate a standalone title card with centered text on a solid background. Uses a bundled font. |
music_bed | Lay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio. |
reframe | Re-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill). |
export_preset | Encode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate. |
monitor:1 grabs the second display at its true
offset; monitor:0 is primary.start_recording with
audio.source = system) needs a virtual-audio loopback device,
since gdigrab is video-only and Windows has no native loopback. Use
list_audio_devices to find one. Microphone capture is not
supported.