许可证: mit
库名称: peft
基础模型: meta-llama/Meta-Llama-3-8B-Instruct
数据集:
- chenjoya/videollm-online-chat-ego4d-134k
语言:
- en
标签:
- llama
- llama-3
- 多模态
- 大语言模型
- 视频流
- 在线视频理解
- 视频理解
任务标签: 视频-文本到文本
模型卡片
https://showlab.github.io/videollm-online/
模型详情
- 大语言模型: meta-llama/Meta-Llama-3-8B-Instruct
- 视觉策略:
- 帧编码器: google/siglip-large-patch16-384
- 帧标记: CLS标记 + 平均池化的3x3标记
- 帧率: 训练时为2帧/秒,推理时为2~10帧/秒
- 帧分辨率: 最大分辨率384,保持宽高比进行零填充
- 视频长度: 10分钟
- 训练数据: Ego4D Narration Stream 113K + Ego4D GoalStep Stream 21K
模型来源
- 代码库: https://github.com/showlab/videollm-online
- 论文: https://arxiv.org/abs/2406.11816
使用方式
- 首先克隆GitHub仓库并按照安装说明操作:
git clone https://github.com/showlab/videollm-online
确保已安装Miniconda且Python版本≥3.10,然后运行:
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers accelerate deepspeed peft editdistance Levenshtein tensorboard gradio moviepy submitit
pip install flash-attn --no-build-isolation
PyTorch源码会安装旧版ffmpeg,但通常预处理质量较低。请按以下方式安装最新版ffmpeg:
wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz
tar xvf ffmpeg-release-amd64-static.tar.xz
rm ffmpeg-release-amd64-static.tar.xz
mv ffmpeg-7.0.1-amd64-static ffmpeg
如需在实时流中测试音频功能,请同时克隆ChatTTS:
pip install omegaconf vocos vector_quantize_pytorch cython
git clone git+https://github.com/2noise/ChatTTS
mv ChatTTS demo/rendering/
- 本地启动gradio演示:
python -m demo.app --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
- 或本地启动命令行界面:
python -m demo.cli --resume_from_checkpoint chenjoya/videollm-online-8b-v1plus
引用
@inproceedings{videollm-online,
author = {Joya Chen and Zhaoyang Lv and Shiwei Wu and Kevin Qinghong Lin and Chenan Song and Difei Gao and Jia-Wei Liu and Ziteng Gao and Dongxing Mao and Mike Zheng Shou},
title = {VideoLLM-online: Online Video Large Language Model for Streaming Video},
booktitle = {CVPR},
year = {2024},
}