Video Duplicate Detection

Gallery can automatically detect duplicate videos even when they have been re-encoded, resized, or converted to a different format. It uses the same CLIP-based AI that powers image duplicate detection, extended to work with video content by sampling and averaging multiple frames.

How It Works

When a video is processed for Smart Search, Gallery extracts multiple frames and encodes each one with CLIP to produce an embedding vector. These per-frame embeddings are averaged into a single representative vector that captures the visual content of the entire video. Two videos with the same visual content produce nearly identical vectors — regardless of codec, resolution, or bitrate.

Frame Sampling Strategy

The number of frames extracted depends on the video duration:

Duration	Frames Extracted
Invalid / 0	1 frame at the start (t=0)
Under 2 seconds	1 frame at the midpoint
2+ seconds	8 frames evenly spaced (5% to 95%)

This adaptive sampling ensures short clips and corrupt videos are still processed, while longer videos get thorough coverage across their timeline.

Matching Rules

Videos only match with other videos — never with images.
The duplicate detection distance threshold is the same one used for image duplicates, configurable in Administration > Machine Learning Settings.
Byte-identical uploads per user are already blocked by checksum deduplication. Video duplicate detection catches re-encoded and resized copies that have completely different bytes but the same visual content.

Configuration

Video duplicate detection requires two features to be enabled in Administration > Machine Learning Settings:

Smart Search (CLIP) — must be enabled so that video frames can be encoded into CLIP embeddings. This is enabled by default.
Duplicate Detection — must be enabled so that embeddings are compared to find matches. This is also enabled by default.

If both are already on, video duplicates work automatically with no additional setup.

Settings

Setting	Default	Description
Machine Learning > Enabled	On	Master toggle for all ML features. Turning this off disables Smart Search, duplicates, and more.
Smart Search > Enabled	On	Enables CLIP encoding for both images and videos.
Duplicate Detection > Enabled	On	Enables duplicate grouping based on CLIP embedding similarity.
Duplicate Detection > Max Distance	0.01	Maximum cosine distance between two embeddings to consider them duplicates. Lower = stricter.

Disabling

To stop detecting video duplicates specifically, there is no video-only toggle — the duplicate detection setting applies to both images and videos. To disable all duplicate detection, turn off Duplicate Detection > Enabled. To keep image duplicates but skip videos, disable Smart Search (though this also disables image smart search).

Using Video Duplicates

Reviewing Duplicates

Go to Utilities > Duplicates.
Video duplicates appear in the same list as image duplicates.
Review them side by side — file size, resolution, codec, and other metadata are shown.
Choose to keep the highest quality version and trash the rest, or stack them together.

Re-scanning Existing Videos

To detect duplicates in videos uploaded before this feature was available:

Go to Administration > Jobs.
Run the Smart Search job for all assets — this generates CLIP embeddings for videos that don't have them yet.
Run the Duplicate Detection job to find matches.

Technical Implementation

Encoding Pipeline

┌──────────────────────────────────────────────────────────────────────┐
│  Smart Search Job (per video asset)                                  │
│                                                                      │
│  ┌─────────┐    ┌───────────────┐    ┌───────────────┐              │
│  │ ffprobe │───►│ Calculate     │───►│ Extract       │              │
│  │ duration │    │ timestamps    │    │ frames (JPEG) │              │
│  └─────────┘    │ (8 @ 5%-95%) │    │ via ffmpeg    │              │
│                  └───────────────┘    └───────┬───────┘              │
│                                               │                      │
│                                    ┌──────────▼──────────┐           │
│                                    │ CLIP encode each    │           │
│                                    │ frame (sequential)  │           │
│                                    └──────────┬──────────┘           │
│                                               │                      │
│                                    ┌──────────▼──────────┐           │
│                                    │ Average embeddings  │           │
│                                    │ (element-wise mean) │           │
│                                    └──────────┬──────────┘           │
│                                               │                      │
│                                    ┌──────────▼──────────┐           │
│                                    │ Upsert into         │           │
│                                    │ smart_search table  │           │
│                                    └─────────────────────┘           │
└──────────────────────────────────────────────────────────────────────┘

Probe — ffprobe reads the video duration from the container metadata.
Timestamp calculation — 8 evenly spaced timestamps are generated from 5% to 95% of the duration (adaptive for short/invalid videos as described above).
Frame extraction — For each timestamp, ffmpeg -ss <t> -frames:v 1 extracts a single JPEG frame into a temporary directory.
CLIP encoding — Each extracted frame is sent to the machine learning service's existing /predict endpoint. Frames are encoded sequentially to avoid overloading the ML service.
Averaging — All per-frame embedding vectors are combined using element-wise mean into a single 512-dimensional vector.
Storage — The averaged vector is upserted into the existing smart_search table, the same table used for image embeddings.

Key Implementation Details

No schema changes — Video embeddings are stored in the same smart_search table as image embeddings. One vector per asset.
No API changes — The GET /duplicates endpoint and DuplicateResponseDto are type-agnostic and work with both image and video duplicate groups.
No frontend changes — Video duplicates appear in the existing Duplicates page. The comparison UI already renders video assets.
No ML service changes — The ML service receives individual frame images via the existing CLIP encoding endpoint.
Temp file isolation — Each job creates a unique temporary directory via mkdtemp, preventing collisions when multiple Smart Search jobs run concurrently. The directory is cleaned up in a finally block regardless of success or failure.
Graceful degradation — If some frames fail to extract (e.g., seek past end of file), the remaining successful frames are averaged. The job only fails if all frames fail or if ffprobe cannot read the video.

Job Flow

Video duplicate detection reuses the existing job pipeline with no new queues:

Smart Search job — When processing a video asset, handleEncodeClip in smart-info.service.ts detects the asset type and branches into the video encoding path (probe, extract frames, encode, average, upsert).
Duplicate Detection job — The existing duplicate.service.ts queues per-asset detection jobs. duplicate.repository.ts performs vector similarity search filtered by asset.type, so videos only match videos.
No new configuration — The existing duplicateDetection admin settings (enabled toggle, distance threshold) apply to both images and videos.

How It Works​

Frame Sampling Strategy​

Matching Rules​

Configuration​

Settings​

Disabling​

Using Video Duplicates​

Reviewing Duplicates​

Re-scanning Existing Videos​

Technical Implementation​

Encoding Pipeline​

Key Implementation Details​

Job Flow​