Vector Databases for Semantic Search in Video Archives
Finding a specific shot in petabytes of video footage used to take hours of manual scrubbing. By using vector databases and vision-language models, we made our entire archive searchable by natural language.
Table of contents:
Generating Embeddings
We sample frames from our videos every second and process them through OpenAI's CLIP model, creating mathematical representations (vectors) of the visual content.
Milvus and Pinecone
Storing millions of highly-dimensional vectors requires specialized tech. We evaluate the performance of vector databases using HNSW algorithms to perform millisecond nearest-neighbor searches.
Multi-modal Search
By aligning text and image embeddings, a user can search for 'a dog running on the beach at sunset' and the system retrieves the vector that mathematically closest matches that text prompt.