🚀 微调视觉-语言-动作模型:优化速度与成功率
本仓库包含适用于LIBERO-Object的OpenVLA-OFT检查点,相关内容详见论文Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success。OpenVLA-OFT通过采用优化的微调技术,相较于基础的OpenVLA模型有了显著改进。
项目页面:https://openvla-oft.github.io/
代码仓库:https://github.com/openvla-oft/openvla-oft
其他OpenVLA-OFT检查点请见:https://huggingface.co/moojink?search_models=oft
🚀 快速开始
此示例展示了如何使用预训练的OpenVLA-OFT检查点生成动作块。请确保你已按照GitHub README中的说明设置好conda环境。
基础用法
import pickle
from experiments.robot.libero.run_libero_eval import GenerateConfig
from experiments.robot.openvla_utils import get_action_head, get_processor, get_proprio_projector, get_vla, get_vla_action
from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM
cfg = GenerateConfig(
pretrained_checkpoint = "moojink/openvla-7b-oft-finetuned-libero-spatial",
use_l1_regression = True,
use_diffusion = False,
use_film = False,
num_images_in_input = 2,
use_proprio = True,
load_in_8bit = False,
load_in_4bit = False,
center_crop = True,
num_open_loop_steps = NUM_ACTIONS_CHUNK,
unnorm_key = "libero_spatial_no_noops",
)
vla = get_vla(cfg)
processor = get_processor(cfg)
action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
with open("experiments/robot/libero/sample_libero_spatial_observation.pkl", "rb") as file:
observation = pickle.load(file)
actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], action_head, proprio_projector)
print("Generated action chunk:")
for act in actions:
print(act)
📄 许可证
本项目采用MIT许可证。
📚 引用
@article{kim2025fine,
title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
journal={arXiv preprint arXiv:2502.19645},
year={2025}
}
属性 |
详情 |
管道标签 |
机器人技术 |
库名称 |
transformers |
许可证 |
MIT |