How to Summarise YouTube Videos Locally with Ollama

YouTube videos contain enormous amounts of useful information — tutorials, lectures, interviews, conference talks — but watching a two-hour video to extract ten minutes of relevant content is a poor use of time. With a local LLM and the YouTube transcript API, you can summarise any YouTube video in seconds, extract key points, generate Q&A pairs, or convert lecture content into study notes, entirely on your own machine with no API costs and no data leaving your computer.

How It Works

YouTube automatically generates transcripts for most videos. The youtube-transcript-api Python library fetches these transcripts without any API key or authentication. You pass the transcript text to a local LLM with a summarisation prompt, and get back a structured summary in seconds. The whole pipeline is four lines of meaningful code.

Installation

pip install youtube-transcript-api ollama

Basic Summarisation

from youtube_transcript_api import YouTubeTranscriptApi
import ollama
import sys

def get_transcript(video_id: str) -> str:
    transcript = YouTubeTranscriptApi.get_transcript(video_id)
    return ' '.join(entry['text'] for entry in transcript)

def summarise(video_id: str, model: str = 'llama3.2') -> str:
    transcript = get_transcript(video_id)
    # Truncate to fit context window if needed
    words = transcript.split()
    if len(words) > 6000:
        transcript = ' '.join(words[:6000]) + '... [truncated]'

    response = ollama.chat(
        model=model,
        messages=[{
            'role': 'user',
            'content': f'Summarise this YouTube video transcript in 5 bullet points:\n\n{transcript}'
        }]
    )
    return response['message']['content']

# Extract video ID from URL or use directly
video_id = 'dQw4w9WgXcQ'  # or parse from full URL
print(summarise(video_id))

Extracting the Video ID from a URL

from urllib.parse import urlparse, parse_qs
import re

def extract_video_id(url: str) -> str:
    """Extract YouTube video ID from any YouTube URL format."""
    # youtu.be/VIDEO_ID
    if 'youtu.be/' in url:
        return url.split('youtu.be/')[1].split('?')[0]
    # youtube.com/watch?v=VIDEO_ID
    parsed = urlparse(url)
    if parsed.hostname in ('www.youtube.com', 'youtube.com'):
        return parse_qs(parsed.query).get('v', [None])[0]
    # Already a video ID
    if re.match(r'^[a-zA-Z0-9_-]{11}$', url):
        return url
    raise ValueError(f'Cannot extract video ID from: {url}')

Different Summary Formats

PROMPTS = {
    'bullets': 'Summarise this transcript in 5 clear bullet points. Each point should be a complete, actionable insight:',
    'tldr': 'Write a 2-sentence TL;DR for this transcript:',
    'detailed': 'Write a structured summary with: 1) Main topic, 2) Key arguments, 3) Important examples, 4) Conclusions:',
    'study_notes': 'Convert this transcript into study notes with clear headings, key concepts, and important facts:',
    'qa': 'Generate 5 quiz questions and answers based on this transcript:',
    'action_items': 'Extract all action items, recommendations, and things to do or try mentioned in this transcript:',
}

def summarise_as(video_id: str, format_key: str = 'bullets', model: str = 'llama3.2') -> str:
    transcript = get_transcript(video_id)
    words = transcript.split()
    if len(words) > 6000:
        transcript = ' '.join(words[:6000])
    prompt = PROMPTS[format_key]
    response = ollama.chat(
        model=model,
        messages=[{'role':'user','content':f'{prompt}\n\n{transcript}'}]
    )
    return response['message']['content']

# Usage
print(summarise_as('VIDEO_ID', 'study_notes'))
print(summarise_as('VIDEO_ID', 'qa'))

Handling Long Videos

A one-hour video transcript can contain 8,000–15,000 words, which exceeds the context window of most 7B models at default settings. Two approaches handle this: chunking (split the transcript and summarise each chunk, then summarise the summaries) or a model with a large context window (Mistral Nemo 12B or Llama 3.2 8B with num_ctx set to 32768).

def summarise_long_video(video_id: str, model: str = 'llama3.2',
                          chunk_size: int = 4000) -> str:
    transcript = get_transcript(video_id)
    words = transcript.split()

    if len(words) <= chunk_size:
        # Short enough — summarise directly
        return summarise(video_id, model)

    # Split into chunks and summarise each
    chunks = [' '.join(words[i:i+chunk_size])
              for i in range(0, len(words), chunk_size)]
    chunk_summaries = []
    for i, chunk in enumerate(chunks):
        print(f'Summarising chunk {i+1}/{len(chunks)}...')
        r = ollama.chat(model=model, messages=[{
            'role':'user',
            'content': f'Summarise this section of a video transcript in 3 bullet points:\n\n{chunk}'
        }])
        chunk_summaries.append(r['message']['content'])

    # Combine chunk summaries into final summary
    combined = '\n\n'.join(chunk_summaries)
    final = ollama.chat(model=model, messages=[{
        'role':'user',
        'content': f'Here are summaries of different sections of a video. Write a cohesive overall summary in 5 bullet points:\n\n{combined}'
    }])
    return final['message']['content']

Command-Line Tool

#!/usr/bin/env python3
"""Usage: python yt_summarise.py URL [format] [model]"""
import sys
import argparse
from youtube_transcript_api import YouTubeTranscriptApi, NoTranscriptFound
import ollama

def main():
    parser = argparse.ArgumentParser(description='Summarise a YouTube video locally')
    parser.add_argument('url', help='YouTube URL or video ID')
    parser.add_argument('--format', choices=['bullets','tldr','detailed','study_notes','qa','action_items'],
                        default='bullets')
    parser.add_argument('--model', default='llama3.2')
    parser.add_argument('--max-words', type=int, default=6000)
    args = parser.parse_args()

    video_id = extract_video_id(args.url)
    try:
        transcript = get_transcript(video_id)
    except NoTranscriptFound:
        print(f'No transcript available for {video_id}')
        sys.exit(1)

    words = transcript.split()
    print(f'Transcript: {len(words)} words', file=sys.stderr)
    if len(words) > args.max_words:
        transcript = ' '.join(words[:args.max_words])
        print(f'Truncated to {args.max_words} words', file=sys.stderr)

    prompt = PROMPTS[args.format]
    response = ollama.chat(
        model=args.model,
        messages=[{'role':'user','content':f'{prompt}\n\n{transcript}'}],
        stream=True
    )
    for chunk in response:
        print(chunk['message']['content'], end='', flush=True)
    print()

if __name__ == '__main__':
    main()

# Examples
python yt_summarise.py https://youtube.com/watch?v=VIDEO_ID
python yt_summarise.py VIDEO_ID --format study_notes
python yt_summarise.py VIDEO_ID --format qa --model mistral-nemo

Limitations and Workarounds

The YouTube Transcript API only works for videos that have auto-generated or manually added captions. Videos without captions return a NoTranscriptFound error. For videos without captions, the alternative is to download the audio with yt-dlp and transcribe it locally with Whisper — adding a transcription step before the summarisation. For non-English videos, specify the language in the transcript request: YouTubeTranscriptApi.get_transcript(video_id, languages=['fr', 'de']). For very fast-paced technical content, auto-generated captions sometimes contain errors (especially with technical terms and proper nouns) — these propagate into the summary, so treat summaries of dense technical videos as a starting point for review rather than a replacement for watching.

Choosing the Right Model

For most video summarisation tasks, Llama 3.2 8B produces good results and is fast enough that a 5-minute video summarises in under 10 seconds on a mid-range GPU. For long academic lectures or dense technical content where nuance matters, Mistral Nemo 12B with a 32K context window can process more of the transcript at once and produces more thorough summaries. Qwen2.5 7B is a strong alternative — its instruction following is particularly reliable for structured output formats like study notes and Q&A pairs. Use the smallest model that produces summaries you find useful; the quality differences for simple summarisation are smaller than they appear on benchmarks.

Why Summarise Locally Instead of Using a Cloud Service?

Several browser extensions and web services offer YouTube video summarisation, but they all share a common limitation: they send the transcript to a cloud API (typically OpenAI or Anthropic) for processing. This means the content of every video you summarise passes through a third-party server. For most public YouTube content this is not a concern, but for private or unlisted videos shared within an organisation, conference recordings under NDA, client interview footage, or any content you consider sensitive, sending transcripts to a cloud service is a data handling decision that deserves consideration. Local summarisation with Ollama eliminates this entirely — the transcript is processed on your own machine and nothing is transmitted to any external service.

The practical advantages beyond privacy are cost and speed. Cloud summarisation APIs charge per token, and a 30-minute lecture can contain 30,000–50,000 input tokens — meaningful cost at GPT-4 class pricing. Local inference has zero ongoing cost after the initial hardware investment. Speed depends on your hardware, but on a modern GPU a 30-minute lecture transcript typically processes in 15–30 seconds, which is fast enough for interactive use. Batching multiple videos for overnight processing is even more practical — queue up 50 conference talks and wake up to summaries for all of them.

Building a Batch Summarisation Pipeline

For processing multiple videos efficiently, a batch pipeline is more practical than running individual commands. This version processes a list of YouTube URLs from a file and saves summaries to a structured output:

import json
import time
from pathlib import Path
from youtube_transcript_api import YouTubeTranscriptApi, NoTranscriptFound
import ollama

def batch_summarise(url_file: str, output_dir: str = 'summaries',
                     model: str = 'llama3.2', format_key: str = 'bullets'):
    Path(output_dir).mkdir(exist_ok=True)
    urls = Path(url_file).read_text().strip().splitlines()
    results = []

    for i, url in enumerate(urls):
        url = url.strip()
        if not url or url.startswith('#'):
            continue
        print(f'[{i+1}/{len(urls)}] Processing: {url}')
        try:
            video_id = extract_video_id(url)
            transcript = get_transcript(video_id)
            words = transcript.split()
            if len(words) > 6000:
                transcript = ' '.join(words[:6000])
            prompt = PROMPTS[format_key]
            r = ollama.chat(model=model,
                messages=[{'role':'user','content':f'{prompt}\n\n{transcript}'}])
            summary = r['message']['content']
            result = {'url': url, 'video_id': video_id,
                      'word_count': len(words), 'summary': summary}
            # Save individual summary
            out_file = Path(output_dir) / f'{video_id}.txt'
            out_file.write_text(f'URL: {url}\n\n{summary}')
            results.append(result)
            print(f'  Saved to {out_file}')
        except NoTranscriptFound:
            print(f'  No transcript available, skipping')
            results.append({'url': url, 'error': 'No transcript'})
        except Exception as e:
            print(f'  Error: {e}')
            results.append({'url': url, 'error': str(e)})
        time.sleep(0.5)  # Small delay between requests

    # Save combined results
    (Path(output_dir) / 'results.json').write_text(json.dumps(results, indent=2))
    print(f'\nProcessed {len(results)} videos. Results in {output_dir}/')

# Usage: create urls.txt with one YouTube URL per line
batch_summarise('urls.txt', format_key='study_notes')

Integrating with Note-Taking Apps

The most useful extension of the YouTube summariser is piping output directly into your note-taking system. If you use Obsidian, you can write summaries directly to your vault as Markdown files. If you use Notion, you can use their API to create pages. For simpler setups, writing to a dated Markdown file and opening it in your editor of choice is sufficient:

from datetime import datetime

def summarise_to_file(url: str, output_dir: str = '~/notes/videos',
                       model: str = 'llama3.2') -> str:
    video_id = extract_video_id(url)
    transcript = get_transcript(video_id)
    words = transcript.split()
    if len(words) > 6000:
        transcript = ' '.join(words[:6000])

    # Generate both summary and key questions
    summary = ollama.chat(model=model, messages=[{
        'role':'user',
        'content': f'Summarise in 5 bullet points:\n\n{transcript}'
    }])['message']['content']

    questions = ollama.chat(model=model, messages=[{
        'role':'user',
        'content': f'Generate 3 follow-up questions to research after watching this video:\n\n{transcript}'
    }])['message']['content']

    # Write to Markdown file
    output_path = Path(output_dir).expanduser()
    output_path.mkdir(parents=True, exist_ok=True)
    date_str = datetime.now().strftime('%Y-%m-%d')
    file_path = output_path / f'{date_str}-{video_id}.md'

    content = f'# Video Summary\n\n'
    content += f'**URL:** {url}\n'
    content += f'**Date:** {date_str}\n\n'
    content += f'## Summary\n\n{summary}\n\n'
    content += f'## Follow-up Questions\n\n{questions}\n'
    file_path.write_text(content)
    print(f'Saved to {file_path}')
    return str(file_path)

summarise_to_file('https://youtube.com/watch?v=VIDEO_ID')

Practical Tips for Better Summaries

A few prompt adjustments consistently improve summary quality. First, tell the model the video type at the start of the prompt — "This is a technical tutorial about X" or "This is a conference talk on Y" gives the model useful context for prioritising which content to include. Second, for dense technical content, ask for the summary to include specific details rather than just high-level themes: "Include specific techniques, tools, or commands mentioned" produces more actionable notes than a generic summarisation prompt. Third, if the transcript contains timestamps or speaker labels, keep them in — they help the model understand the video's structure and produce better-organised summaries. Fourth, for interview content, ask the model to attribute key points to speakers when multiple people are talking — this preserves the context of who said what, which matters for quotes and attributions in research notes.

Using a Larger Context Model for Complete Transcripts

The chunk-based approach for long videos works well but introduces a trade-off: summaries of the individual chunks lose the overall arc and cross-section connections in the content. If a speaker builds to a conclusion over the full two hours, chunk summarisation may miss that narrative thread. For the best results on long-form content, use a model with a genuinely large context window configured at full capacity. Mistral Nemo 12B with num_ctx set to 32768 can process roughly 24,000 words of transcript in one pass — enough for a 30–40 minute video. Llama 3.2 8B with the same num_ctx handles similar lengths. For multi-hour content, the chunk approach remains necessary, but for most conference talks, lectures, and interviews under an hour, a 32K context model produces dramatically better summaries than chunking.

import ollama

def summarise_with_large_context(video_id: str,
                                   model: str = 'mistral-nemo',
                                   num_ctx: int = 32768) -> str:
    transcript = get_transcript(video_id)
    words = transcript.split()
    print(f'Transcript: {len(words)} words')
    response = ollama.chat(
        model=model,
        messages=[{
            'role': 'user',
            'content': f'Summarise this video transcript thoroughly. Include main arguments, key examples, and actionable takeaways:\n\n{transcript}'
        }],
        options={'num_ctx': num_ctx}
    )
    return response['message']['content']

Scheduled Summarisation with a Watchlist

A useful extension is a watchlist workflow: maintain a text file of YouTube URLs you intend to watch, run the summariser overnight, and review the summaries the next morning. Summaries that seem genuinely important get added to your watch queue for full viewing; the rest get filed as notes and you move on. This gives you AI-assisted triage of video content — a significant time saving for anyone who subscribes to many channels or receives video links from colleagues regularly. The script takes five minutes to set up and eliminates hours of low-value video watching per week for people who consume a lot of online video content for professional development. Combined with Obsidian integration, every summarised video becomes a searchable note in your personal knowledge base, available years later when you vaguely remember a technique from some video you summarised but cannot recall the source.

Getting Started

The fastest path to a working YouTube summariser is two commands and ten lines of code: pip install youtube-transcript-api ollama, paste the basic summarise function from this article, and call it with a video ID. For most educational content — programming tutorials, conference talks, technical lectures — the default bullet-point format with Llama 3.2 produces summaries that capture 80–90% of the genuinely useful content in under 30 seconds. Start there, then add the format options, long-video handling, and output integration as your workflow evolves. The core workflow is simple enough that you can build a personalised version with exactly the features you need in an afternoon.

Comparing Models for Video Summarisation

Not all models summarise equally well. Instruction-tuned models that were trained on diverse text formats perform better than base models or models fine-tuned narrowly for code or chat. For YouTube summarisation specifically, Llama 3.2 8B and Qwen2.5 7B both perform well because they are strong general-purpose instruction models — they follow structured prompts reliably and produce well-organised output without much prompt engineering. Qwen2.5 tends to produce slightly more detailed bullet points and adheres more strictly to the requested number of items, which matters when you want exactly five bullets rather than a variable-length list. Llama 3.2 tends to write more naturally flowing prose when you ask for paragraph-format summaries rather than bullets.

For the study notes and Q&A formats, a model with strong instruction following is more important than raw parameter count. A 7B model that reliably produces well-structured output beats a 12B model that ignores formatting instructions half the time. Test your chosen format against a familiar video where you know the content — if the summary accurately captures the key points you would have written manually, the model is working well for your summarisation use case. If it consistently misses important content or includes irrelevant details, try adjusting the prompt to specify what to include rather than switching to a larger model, which is often the faster fix.

The YouTube summariser is one of those local AI tools where the value is immediately obvious on first use — watching a 45-minute tutorial video summarised accurately in 20 seconds makes the case for local LLMs more concretely than any benchmark or feature comparison. For anyone who consumes significant amounts of online video content, this workflow is one of the most practical and immediately time-saving applications available.

Privacy and Terms Considerations

A few things worth knowing before deploying this widely. The YouTube Transcript API fetches publicly available captions — it does not bypass access controls, so private or members-only videos that you cannot access in a browser will not work. Auto-generated captions are available for most public videos uploaded after 2012. For older videos without auto-captions, you will need manual captions (when available) or the Whisper transcription fallback. Using this tool for personal productivity with public content is unambiguously fine. Using it to bulk-archive or redistribute video content would raise different questions beyond the technical scope of this article. The summarisation itself is transformative and analytical in nature — generating a brief summary of a video for personal reference is clearly within normal fair use of publicly accessible content.

The barrier to entry is genuinely low: two pip installs, a running Ollama instance, and a video URL. The return on that investment — hours of video content distilled to minutes of reading, entirely on your own machine — is one of the most tangible demonstrations of what practical local AI looks like in 2026. Start with one video you already know well — seeing an accurate summary of familiar content is the fastest way to calibrate your confidence in the tool before relying on it for new material. At that point, it becomes a tool you reach for automatically rather than one you have to remember exists.