AI Agent That Summarizes and Tags Videos 8x Faster
Built a workflow agent for SWARM Community that summarizes, quotes, and tags YouTube event recordings, automating 87.5% of the process and saving ~17.5 hours/month.
ROLE
AI Product designer
Full-Stack Developer
SERVICE
Automation
DURATION
1 weeks | 2025
AI Agent
Internal Tool

The Challenge
A small Startup with a BIG problem
At SWARM Community, we host numerous online events with technical professionals from diverse fields. These recordings are uploaded to our YouTube channel, but due to time constraints and limited resources, most videos are not summarized, tagged, or indexed. This makes it hard for community members, and even our team to find relevant content after events end.

Watching the recording of the event after it ended and writing descriptions and extracting important quotes and figuring out tags is a time consuming task.
4 hours per video
So many videos uploaded on SWARM's YouTube channel without being tagged and having a description.
Low discoverability
Low Views
Customers asking about the previous events videos and can't find them on Youtube even with searching by the related keywords to the videos content.
Not accessible
The Opportunity & Goals
How might we make hours of valuable educational YouTube content from expert events actually usable and accessible without adding manual labor?
As the AI Experience and Innovation Manager, I identified an opportunity to leverage AI to automate this process and reach the following goals:
For the team
Save time on manual video tagging and summarization
For the business
Increase discoverability of video content on YouTube
Monetizing the YouTube channel
For our users
Support accessibility and knowledge-sharing across our customers and our global community
My Process Overview
I owned design, backend, frontend, and every prompt in between
I worked solo to build and deploy a fully functioning summarization tool in just one week.
Step 2: Tested multiple AI models to evaluate OUTPUT quality
Step 1: SYSTEM FLOW and backend pipeline
step 3: Designed UI in Figma and Prototyped in Claude
step 4: Coded and integrated Python backend to Claude API
step 5: connected endpoints, tested with live data
NGL, I've faced so many backend errors, API problems and low quality summaries and tags during the process that I was getting disappointed on getting it done. There is still more to be done to get better quality.
ChatGPT
(GPT 4o)
Anthropic
(Claude Sonnet 4)
DeepSeek
V3


Figma Design
Claude Prototype
step 6: iterated the prompt to get high quality result


Accepting a YouTube video URL —> Downloading the audio using yt-dlp —> Transcribing the audio to text using OpenAI’s Whisper —> Sending the transcript to Anthropic’s Claude 4 Sonnet model API to:
1. Summarize the content in 1 sentence description and 4 key bullet points
2. Extract 3–5 insightful quotes
3. Generate 5–7 topic tags

Final Product
A one-click tool that outputs quotes, tags, and summaries
Result & Impact
Saved 17+ hours per month
Cut per-video processing time from 4 hours → 30 minutes
Saved ~17 hours/month of manual effort of summarizing and tagging 5 videos per month
Reduced human error and improved consistency
Enabled the customers to rediscover and reuse content previously buried
Key Takeaways
small tools can make BIG IMPACT
Clarity over complexity: A focused use case meant I could build fast and ship faster
Prompting = UX: Structuring AI outputs is part of the user experience, not just a dev task
Design + Dev made me faster: Owning the full stack meant I could problem-solve immediately
This wasn’t just about building with Claude, it was about designing usefulness in a real-world, content-heavy workflow. The payoff came in hours saved, not features added.
What's Next?
Improve quality and expand it to a personalized learning experience
Next thing I would do is to improve the quality of the summaries and quotes, as I’ve noticed issues like misspelled speaker names and unclear or inaccurate quotes.
After that I want to expand it to a conversational AI tool that transforms educational video content into personalized learning experiences, adapting to individual preferences, styles, and accessibility needs.