Podcast-Agent

Podcast-Agent turns YouTube podcasts and long-form videos into structured reports for faster understanding and analysis.

Overview

Podcast-Agent is useful when you want to:

Quickly understand what a podcast or long-form video is about.
Ask a focused question about a video and collect relevant evidence.
Produce reports in multiple formats, including editable Markdown, PDF, and Xiaohongshu-style image outputs.
Turn video content into viewpoints, summaries, and reports.
Save intermediate artifacts for review, debugging, or downstream analysis.

Current input support is centered on YouTube videos.

Key Features

Generate a structured report from a podcast or long-form video.
Quickly understand the core content without watching the full episode.
Organize and interpret the key viewpoints discussed in the podcast.
Jump from important report moments directly back to the original video.

Agentic Workflow

Podcast-Agent is designed as an inspectable agentic pipeline rather than a single opaque prompt. Each stage writes durable artifacts, so a long video can be reviewed, debugged, resumed, and rendered into multiple report forms.

The run starts by saving input.json, resolving the YouTube source into source.json, and detecting report intent from the question. That intent controls the report language and whether the output should be brief, default, or detailed.
Ingestion uses yt-dlp to fetch normalized metadata and ranked subtitle tracks. If subtitles are unavailable, the pipeline can download audio and fall back to an audio transcriber, then writes both transcript.vtt and transcript.txt.
Evidence extraction parses the VTT transcript into timed subtitle segments, prefers chapter-based chunks when metadata has chapters, otherwise uses overlapping ten-minute windows, and asks the model to keep the most relevant evidence for the user's question.
The outline stage turns evidence, metadata, source URL, chapters, and the question into a viewpoint breakdown. Viewpoint generation then selects the highest-importance viewpoints and writes detailed sub-theses, often in parallel, while preserving links back to supporting evidence indexes.
Summary generation condenses the selected viewpoints into an introduction, core conclusions, viewpoint order, and takeaway. The same artifact set then renders to Markdown, HTML, PDF, and Xiaohongshu image output, so every format shares one analytical core.

Demo PDFs

Put PDF files in demo-pdf/, then add their filenames to pdfFiles in this page.

1 / 5

Click the cover to preview the PDF.