Video Indexer with Advanced AI Models

A powerful video search and indexing tool that combines multiple state-of-the-art AI models for comprehensive video content search:

Visual Search: Meta's Perception Model (Core) for understanding visual content and cross-modal search
Transcript Search: Qwen3-Embedding-8B for high-quality semantic search over spoken content (transcribed using whisper and its large v3 model)

Demo

Watch a quick demo of the Video Indexer in action:

demo.mp4

Features

Visual Content Search

Text-to-video search using Meta's Perception Model
Image-to-video search for finding similar visual content
Frame extraction and intelligent indexing
Cross-modal understanding between text and visual content
Instant visual results with thumbnails

Transcript Search

Semantic search using Qwen3-Embedding-8B embeddings
Exact text matching for precise queries
Multi-language transcript support
Automatic video transcription
Time-aligned transcript results

User Interface & Playback

Basic Web UI
Instant search results with visual previews
Video playback starting at matched frames/segments
M3U playlist generation for search results

Technical Features

FAISS vector similarity search
FP16 support for efficient memory usage
Automatic video transcoding when needed
Configurable model parameters
Multi-threaded processing

Quickest way to try it out is to download the VideoIndexer-mac.zip file from the Releases page and run that. You might need to hop through security hoops to launch it despite the fact that I signed and notarized it, but hopefully that isn't too much of a pain.

You will pick the directory tree of videos you want to index and then click 'Start' and it will run for a while indexing. If you have less than 64GB or RAM, I would check the 'fp16' box. The accuracy should be about the same and use less RAM. When indexing is done, you can hit the local webserver at http://127.0.0.1:8002 and search away!

Installation

Clone the repository:

git clone https://github.com/yourusername/VideoIndexer-project.git
cd VideoIndexer-project

Create and activate conda environment:

conda env create -f environment.yml
conda activate video-indexer

Install additional dependencies:

pip install -r requirements.txt

Usage

Process videos to extract frames and build the visual search index:

cd src
# Extract frames (add --fp16 to reduce memory usage)
python frame_extractor.py /path/to/your/videos --output-dir video_index_output

# Build visual search index (add --fp16 for reduced memory)
python frame_indexer.py --input-dir video_index_output

Generate and index transcripts:

# Generate transcripts (add --fp16 for reduced memory)
python transcript_extractor_pywhisper.py /path/to/your/videos --output-dir video_index_output

# Build transcript search index (add --fp16 for reduced memory)
python transcript_indexer.py --input-dir video_index_output

Start the search server:

cd src
# Add --fp16 flag to reduce memory usage during search
python video_search_server_transcode.py --fp16

Open http://localhost:8002 in your browser

Memory Usage Tips

The --fp16 flag can be used with most components to reduce memory usage by about 50%
For large video collections, using FP16 is recommended
Memory usage is highest during initial indexing and reduces for search operations
If you encounter memory issues:
1. Use the --fp16 flag
2. Process videos in smaller batches
3. Close other memory-intensive applications

Building macOS App

Prerequisites

Copy the credentials template and fill in your Apple Developer details:

cp store_credentials_template.sh store_credentials.sh
chmod +x store_credentials.sh
# Edit store_credentials.sh with your details
./store_credentials.sh

Build the app:

./build_macos.sh

The built app will be in ./build/VideoIndexer.app

Configuration

Model Configuration

index_config.json: Configure visual search model and embedding settings
transcript_index_config.json: Configure transcript search settings
Command line arguments for frame extraction and indexing (see --help)

Performance Tuning

Use --fp16 flag for reduced memory usage
Adjust frame extraction rate for storage/accuracy tradeoff
Configure FAISS index type for speed/accuracy tradeoff
Set maximum result thresholds for faster searches

License

This project is licensed under the Apache License - see the LICENSE file for details.

Acknowledgments

Meta's Perception Model (Core) for visual understanding
Qwen3-Embedding-8B for semantic text understanding
FAISS for efficient similarity search
FFmpeg for video processing
whisper.cpp for multi-lingual transcription

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app_structure		app_structure
assets		assets
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
build_and_sign.sh		build_and_sign.sh
entitlements.plist		entitlements.plist
environment.yml		environment.yml
requirements.txt		requirements.txt
store_credentials_template.sh		store_credentials_template.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Video Indexer with Advanced AI Models

Demo

Features

Visual Content Search

Transcript Search

User Interface & Playback

Technical Features

Installation

Usage

Memory Usage Tips

Building macOS App

Prerequisites

Configuration

Model Configuration

Performance Tuning

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

jasontitus/VideoIndexer-project

Folders and files

Latest commit

History

Repository files navigation

Video Indexer with Advanced AI Models

Demo

Features

Visual Content Search

Transcript Search

User Interface & Playback

Technical Features

Installation

Usage

Memory Usage Tips

Building macOS App

Prerequisites

Configuration

Model Configuration

Performance Tuning

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages