How I Turned a 3-Hour Meeting Recording Into Development-Ready User Stories — Without Taking a Single Note

Last week I had a nearly 3-hour requirements gathering meeting for a fraud module in a card management system. The kind of meeting where people discuss BINs, providers, business rules, international restrictions, and someone says "take a screenshot of that" every 5 minutes.

I didn't take a single note. I didn't open Notion, Jira, or a Google Doc. When the meeting ended, I had two heavy video files and zero documentation. Three hours later I had a Word document with 31 structured User Stories, acceptance criteria, endpoints, assigned modules, and defined priorities.

Here's exactly what I did.

The real problem: nobody documents meetings well

Let's be honest. In requirements meetings, one of two things happens: someone takes incomplete notes that nobody understands later, or the meeting gets recorded and the video rots in a Drive folder that nobody ever opens again.

I had two videos: one at 285 MB and another at 190 MB. Almost 3 hours of content where critical system decisions were discussed. The challenge was turning that into something a development team could consume directly.

Step 1: Extract the audio with ffmpeg

The first thing to understand is that no AI needs the video. It only needs the audio. And audio weighs a fraction of the video.

ffmpeg -i video1.mp4 -vn -q:a 2 audio1.mp3

From 285 MB down to something manageable. If you need it even smaller (for example, for the Whisper API which has a 25 MB limit):

ffmpeg -i video1.mp4 -vn -ar 16000 -ac 1 -b:a 32k audio1.mp3

Mono, 16kHz, 32kbps. Good enough for transcription and weighs almost nothing.

Step 2: Trim what matters

My audio was 2 hours and 47 minutes, but the first hour was context I already knew. I only needed from the 1-hour mark onward:

ffmpeg -i audio1.mp3 -ss 01:00:00 -acodec copy audio1_trimmed.mp3

Instant. -acodec copy doesn't re-encode, it just cuts.

If you need a specific range:

ffmpeg -i audio1.mp3 -ss 01:00:00 -to 02:00:00 -acodec copy audio1_trimmed.mp3

This step is key. Don't feed 3 hours of audio to an AI if you only need 40 minutes. You'll spend more tokens, more time, and you'll get worse results because the AI will try to process irrelevant conversations.

Step 3: Transcribe with Whisper

Here I had to make a decision. My options were:

Whisper API (OpenAI): Fast, cheap (~$0.006/min), but has a 25 MB file limit and requires account credits.
Whisper local: Requires an NVIDIA GPU with CUDA. My laptop has Intel Iris — not going to work.
Google Colab (free): Temporary T4 GPU, no cost, runs Whisper large-v3 without issues.

I tried the API first. My audio was 61 MB, so I had to split it:

ffmpeg -i audio1_trimmed.mp3 -f segment -segment_time 600 -c copy part_%03d.mp3

This generates 10-minute files. Then a simple Python script loops through them, transcribes each one, and concatenates the results.

I ended up going with Google Colab because I didn't have credits loaded on OpenAI and didn't want to wait. Three code cells and done:

!pip install openai-whisper
import whisper
model = whisper.load_model("large-v3")
result = model.transcribe("audio1_trimmed.mp3", language="es")

The large-v3 model for Spanish is excellent. Not perfect — it makes mistakes with technical jargon and proper names — but good enough to extract requirements.

Step 4: From transcription to User Stories with AI

This is where it gets interesting. I fed the transcription to Claude along with screenshots of the reference system (I2C) and asked it to extract the User Stories.

The important part: I didn't just say "make me User Stories." I gave it full context:

The transcription as plain text.
9 screenshots of the admin portal showing the Fraud tab with all its fields and options.
The project's CLAUDE.md file with architecture conventions, naming standards, existing modules, and code patterns.

That third point made a huge difference. Without the CLAUDE.md, the User Stories would have come out generic. With it, they came out aligned to the actual project: endpoints in kebab-case, permissions in module.resource.action format, Actions and DTOs following team conventions, references to existing modules (Fraud, FraudRules, Notification, AttributeDataSet).

The result was a document with 31 User Stories organized into 11 sections, each with ID, title, story, acceptance criteria, endpoints, assigned module, and priority.

Step 5: From User Stories to spec for Claude Code

User Stories in a Word doc are useless to Claude Code. I needed a plain markdown file that lives in the repo and that Claude Code can read directly.

I generated a docs/fraud-rules-spec.md with everything: table structures, enums, seeder rules, implementation details per US (which Action, which DTO, which FormRequest, which Controller), endpoints, conditional validations, and a suggested implementation order.

I placed it in docs/ at the project root and added a reference in CLAUDE.md:

## Fraud Module Spec
See `docs/fraud-rules-spec.md` for detailed fraud module requirements.

The strategy for using it with Claude Code isn't "implement everything." It's going US by US:

> Read docs/fraud-rules-spec.md and implement FR-001: CRUD for global rules 
> in the FraudRules module. Include migration, seeder, Action, DTO, 
> FormRequest, Controller, and unit tests.

Sequential, controlled, verifiable.

What worked and what didn't

Worked well:

ffmpeg for extracting, trimming, and splitting audio. It's the Swiss army knife every developer should know.
Whisper large-v3 for Spanish. Quality is very good for technical meetings.
Giving the AI full context (transcription + screenshots + project conventions). More context = better results.
Generating the spec as markdown for the repo, not as a Word doc.

Didn't work or required adjustment:

Transcription isn't perfect. Whisper confuses proper names, technical jargon, and sometimes merges sentences from different speakers. You need to review it before passing it to the AI.
Parts of the meeting where someone says "this" or "here" while pointing at a screen are useless without the screenshots. The screenshots were essential to complete the context.
Trying to do it all in a single prompt doesn't work. The best approach is: first the transcription, then the analysis, then the document, then the spec.

The complete flow

Video (285 MB)
  → ffmpeg: extract audio (~30 MB)
    → ffmpeg: trim relevant segment
      → ffmpeg: split into 10-min chunks (if using API)
        → Whisper: text transcription
          → Claude: transcription + screenshots + CLAUDE.md
            → User Stories (docx)
              → Technical spec (markdown in the repo)
                → Claude Code: implementation per US

From a video nobody was ever going to watch again, to code in production. Without taking a single note in the meeting.

The takeaway

The combination of tools that already exist — ffmpeg, Whisper, Claude, Claude Code — lets you automate the most tedious part of the development cycle: turning human conversations into technical specifications.

It's not perfect. You need to review the transcription, you need to provide visual context with screenshots, you need to have your conventions documented. But the time savings are massive. What used to take days of post-meeting documentation now takes hours.

And most importantly: the result is better. Because the AI processes the entire conversation, not just what someone managed to jot down.

I'm Rubén Rangel, a Senior Full-Stack Developer with 17+ years of experience building production web applications. Currently at Ansira, I work on a SaaS marketing automation platform serving 100+ brands. I integrate AI into my workflow to build faster, smarter, and with higher quality. Open to full-time opportunities and contracts.