🚀 V-JEPA 2
V-JEPA 2是由Meta旗下的FAIR团队开发的前沿视频理解模型。它扩展了VJEPA的预训练目标,借助大规模的数据和模型,实现了业界领先的视频理解能力。代码已在此仓库发布。
🚀 快速开始
V-JEPA 2是一个强大的视频理解模型,可用于视频分类、检索等任务,也能作为视觉语言模型(VLM)的视频编码器。
✨ 主要特性
- 扩展了VJEPA的预训练目标,具备先进的视频理解能力。
- 可处理视频和图像数据。
- 支持视频分类、检索等任务,还能作为VLM的视频编码器。
📦 安装指南
要运行V-JEPA 2模型,需确保安装了最新版本的transformers
库:
pip install -U git+https://github.com/huggingface/transformers
💻 使用示例
基础用法
加载模型和处理器
from transformers import AutoVideoProcessor, AutoModel
hf_repo = "facebook/vjepa2-vitl-fpc64-256"
model = AutoModel.from_pretrained(hf_repo)
processor = AutoVideoProcessor.from_pretrained(hf_repo)
加载视频
import torch
from torchcodec.decoders import VideoDecoder
import numpy as np
video_url = "https://huggingface.co/datasets/nateraw/kinetics-mini/resolve/main/val/archery/-Qz25rXdMjE_000014_000024.mp4"
vr = VideoDecoder(video_url)
frame_idx = np.arange(0, 64)
video = vr.get_frames_at(indices=frame_idx).data
video = processor(video, return_tensors="pt").to(model.device)
with torch.no_grad():
video_embeddings = model.get_vision_features(**video)
print(video_embeddings.shape)
加载图像
from transformers.image_utils import load_image
image = load_image("https://huggingface.co/datasets/merve/coco/resolve/main/val2017/000000000285.jpg")
pixel_values = processor(image, return_tensors="pt").to(model.device)["pixel_values_videos"]
pixel_values = pixel_values.repeat(1, 16, 1, 1, 1)
with torch.no_grad():
image_embeddings = model.get_vision_features(pixel_values)
print(image_embeddings.shape)
更多代码示例,请参考V-JEPA 2文档。
📄 许可证
本项目采用MIT许可证。
📚 引用
@techreport{assran2025vjepa2,
title={V-JEPA~2: Self-Supervised Video Models Enable Understanding, Prediction and Planning},
author={Assran, Mahmoud and Bardes, Adrien and Fan, David and Garrido, Quentin and Howes, Russell and
Komeili, Mojtaba and Muckley, Matthew and Rizvi, Ammar and Roberts, Claire and Sinha, Koustuv and Zholus, Artem and
Arnaud, Sergio and Gejji, Abha and Martin, Ada and Robert Hogan, Francois and Dugas, Daniel and
Bojanowski, Piotr and Khalidov, Vasil and Labatut, Patrick and Massa, Francisco and Szafraniec, Marc and
Krishnakumar, Kapil and Li, Yong and Ma, Xiaodong and Chandar, Sarath and Meier, Franziska and LeCun, Yann and
Rabbat, Michael and Ballas, Nicolas},
institution={FAIR at Meta},
year={2025}
}