Video Duplicate Detection
Gallery can automatically detect duplicate videos even when they have been re-encoded, resized, or converted to a different format. It uses the same CLIP-based AI that powers image duplicate detection, extended to work with video content by sampling and averaging multiple frames.
How It Works
When a video is processed for Smart Search, Gallery extracts multiple frames and encodes each one with CLIP to produce an embedding vector. These per-frame embeddings are averaged into a single representative vector that captures the visual content of the entire video. Two videos with the same visual content produce nearly identical vectors — regardless of codec, resolution, or bitrate.
Frame Sampling Strategy
The number of frames extracted depends on the video duration:
| Duration | Frames Extracted |
|---|---|
| Invalid / 0 | 1 frame at the start (t=0) |
| Under 2 seconds | 1 frame at the midpoint |
| 2+ seconds | 8 frames evenly spaced (5% to 95%) |
This adaptive sampling ensures short clips and corrupt videos are still processed, while longer videos get thorough coverage across their timeline.
Matching Rules
- Videos only match with other videos — never with images.
- The duplicate detection distance threshold is the same one used for image duplicates, configurable in Administration > Machine Learning Settings.
- Byte-identical uploads per user are already blocked by checksum deduplication. Video duplicate detection catches re-encoded and resized copies that have completely different bytes but the same visual content.
Configuration
Video duplicate detection requires two features to be enabled in Administration > Machine Learning Settings:
- Smart Search (CLIP) — must be enabled so that video frames can be encoded into CLIP embeddings. This is enabled by default.
- Duplicate Detection — must be enabled so that embeddings are compared to find matches. This is also enabled by default.
If both are already on, video duplicates work automatically with no additional setup.
Settings
| Setting | Default | Description |
|---|---|---|
| Machine Learning > Enabled | On | Master toggle for all ML features. Turning this off disables Smart Search, duplicates, and more. |
| Smart Search > Enabled | On | Enables CLIP encoding for both images and videos. |
| Duplicate Detection > Enabled | On | Enables duplicate grouping based on CLIP embedding similarity. |
| Duplicate Detection > Max Distance | 0.01 | Maximum cosine distance between two embeddings to consider them duplicates. Lower = stricter. |
Disabling
To stop detecting video duplicates specifically, there is no video-only toggle — the duplicate detection setting applies to both images and videos. To disable all duplicate detection, turn off Duplicate Detection > Enabled. To keep image duplicates but skip videos, disable Smart Search (though this also disables image smart search).
Using Video Duplicates
Reviewing Duplicates
- Go to Utilities > Duplicates.
- Video duplicates appear in the same list as image duplicates.
- Review them side by side — file size, resolution, codec, and other metadata are shown.
- Choose to keep the highest quality version and trash the rest, or stack them together.
Re-scanning Existing Videos
To detect duplicates in videos uploaded before this feature was available:
- Go to Administration > Jobs.
- Run the Smart Search job for all assets — this generates CLIP embeddings for videos that don't have them yet.
- Run the Duplicate Detection job to find matches.
Technical Implementation
Encoding Pipeline
┌──────────────────────────────────────────────────────────────────────┐
│ Smart Search Job (per video asset) │
│ │
│ ┌─────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ ffprobe │───►│ Calculate │───►│ Extract │ │
│ │ duration │ │ timestamps │ │ frames (JPEG) │ │
│ └─────────┘ │ (8 @ 5%-95%) │ │ via ffmpeg │ │
│ └───────────────┘ └───────┬───────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ CLIP encode each │ │
│ │ frame (sequential) │ │
│ └──────────┬──────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Average embeddings │ │
│ │ (element-wise mean) │ │
│ └──────────┬──────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Upsert into │ │
│ │ smart_search table │ │
│ └─────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
- Probe —
ffprobereads the video duration from the container metadata. - Timestamp calculation — 8 evenly spaced timestamps are generated from 5% to 95% of the duration (adaptive for short/invalid videos as described above).
- Frame extraction — For each timestamp,
ffmpeg -ss <t> -frames:v 1extracts a single JPEG frame into a temporary directory. - CLIP encoding — Each extracted frame is sent to the machine learning service's existing
/predictendpoint. Frames are encoded sequentially to avoid overloading the ML service. - Averaging — All per-frame embedding vectors are combined using element-wise mean into a single 512-dimensional vector.
- Storage — The averaged vector is upserted into the existing
smart_searchtable, the same table used for image embeddings.
Key Implementation Details
- No schema changes — Video embeddings are stored in the same
smart_searchtable as image embeddings. One vector per asset. - No API changes — The
GET /duplicatesendpoint andDuplicateResponseDtoare type-agnostic and work with both image and video duplicate groups. - No frontend changes — Video duplicates appear in the existing Duplicates page. The comparison UI already renders video assets.
- No ML service changes — The ML service receives individual frame images via the existing CLIP encoding endpoint.
- Temp file isolation — Each job creates a unique temporary directory via
mkdtemp, preventing collisions when multiple Smart Search jobs run concurrently. The directory is cleaned up in afinallyblock regardless of success or failure. - Graceful degradation — If some frames fail to extract (e.g., seek past end of file), the remaining successful frames are averaged. The job only fails if all frames fail or if
ffprobecannot read the video.
Job Flow
Video duplicate detection reuses the existing job pipeline with no new queues:
- Smart Search job — When processing a video asset,
handleEncodeClipinsmart-info.service.tsdetects the asset type and branches into the video encoding path (probe, extract frames, encode, average, upsert). - Duplicate Detection job — The existing
duplicate.service.tsqueues per-asset detection jobs.duplicate.repository.tsperforms vector similarity search filtered byasset.type, so videos only match videos. - No new configuration — The existing
duplicateDetectionadmin settings (enabled toggle, distance threshold) apply to both images and videos.