🚀 微调视觉-语言-动作模型:优化速度与成功率
本项目聚焦于视觉-语言-动作模型的微调,旨在优化模型运行速度并提高任务成功率。本仓库包含了适用于LIBERO-Spatial的OpenVLA-OFT检查点,相关内容详见论文 Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success。OpenVLA-OFT通过采用优化的微调技术,显著提升了基础OpenVLA模型的性能。
项目页面:https://openvla-oft.github.io/
代码仓库:https://github.com/openvla-oft/openvla-oft
其他OpenVLA-OFT检查点:https://huggingface.co/moojink?search_models=oft
🚀 快速开始
此示例展示了如何使用预训练的OpenVLA-OFT检查点生成动作块。请确保你已按照GitHub README中的说明设置好conda环境。
import pickle
from experiments.robot.libero.run_libero_eval import GenerateConfig
from experiments.robot.openvla_utils import get_action_head, get_processor, get_proprio_projector, get_vla, get_vla_action
from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM
cfg = GenerateConfig(
pretrained_checkpoint = "moojink/openvla-7b-oft-finetuned-libero-spatial",
use_l1_regression = True,
use_diffusion = False,
use_film = False,
num_images_in_input = 2,
use_proprio = True,
load_in_8bit = False,
load_in_4bit = False,
center_crop = True,
num_open_loop_steps = NUM_ACTIONS_CHUNK,
unnorm_key = "libero_spatial_no_noops",
)
vla = get_vla(cfg)
processor = get_processor(cfg)
action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
with open("experiments/robot/libero/sample_libero_spatial_observation.pkl", "rb") as file:
observation = pickle.load(file)
actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], action_head, proprio_projector)
print("Generated action chunk:")
for act in actions:
print(act)
📄 许可证
本项目采用MIT许可证。
📚 引用
如果你在研究中使用了本项目,请引用以下论文:
@article{kim2025fine,
title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
journal={arXiv preprint arXiv:2502.19645},
year={2025}
}