Curated repos, tools, and frameworks shaping the developer ecosystem.
Live data from GitHub.
by HKUDS
[KDD'2026] "VideoRAG: Chat with Your Videos"
๐ฌ Intelligent Video Conversations | Powered by Advanced AI | Extreme Long-Context Processing
Vimo is a revolutionary desktop application that lets you chat with your videos using cutting-edge AI technology. Built on the powerful VideoRAG framework, Vimo can understand and analyze videos of any length - from short clips to hundreds of hours of content - and answer your questions with remarkable accuracy.
See how Vimo transforms video interaction with intelligent conversations and deep understanding capabilities.
For Video Enthusiasts & Professionals:
For Researchers & Developers:
[!NOTE] We are preparing the Beta release for macOS Apple Silicon first, with Windows and Linux versions coming soon!
For detailed setup instructions:
Quick Overview:
VideoRAG introduces a novel dual-channel architecture that combines:
Our VideoRAG algorithm significantly outperforms existing methods in long-context video understanding:
We also evaluate VideoRAG's QA performance on the Video-MME long video track to better understand the gains over the backbone models (included here because of the paper's page limit):
| Video-MME Long Video | MiniCPM-o w/o subs | MiniCPM-o w/ subs | MiniCPM-V w/o subs | MiniCPM-V w/ subs | VideoRAG |
|---|---|---|---|---|---|
| Accuracy | 52.2% | 56.3% | 51.8% | 56.3% | 60.2% |
Note: The score may show slight fluctuations across runs due to the instability of LLM generation.
See VideoRAG-algorithm for detailed development setup including:
We created the LongerVideos benchmark to evaluate long-context video understanding:
| Video Type | #Collections | #Videos | #Queries | Avg. Duration |
|---|---|---|---|---|
| Lectures | 12 | 135 | 376 | ~64.3 hours |
| Documentaries | 5 | 12 | 114 | ~28.5 hours |
| Entertainment | 5 | 17 | 112 | ~41.9 hours |
| Total | 22 | 164 | 602 | ~134.6 hours |
For detailed evaluation instructions and reproduction scripts, see VideoRAG-algorithm/reproduce.
If you find Vimo or VideoRAG helpful in your research, please cite our paper:
@article{VideoRAG,
title={VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos},
author={Ren, Xubin and Xu, Lingrui and Xia, Long and Wang, Shuaiqiang and Yin, Dawei and Huang, Chao},
journal={arXiv preprint arXiv:2502.01549},
year={2025}
}
We welcome contributions from the community! Whether you're:
Feel free to submit issues and pull requests. Together, we're building the future of intelligent video interaction!
Vimo builds upon the incredible work of the open-source community:
๐ Transform how you interact with videos. Start your journey with Vimo today!
Stable Diffusion web UI