SynthPose是基于VitPose Base的2D人体姿态估计模型,通过合成数据微调,可预测52个解剖学关键点
下载量 931
发布时间 : 1/8/2025
模型介绍
内容详情
替代品
模型简介
该模型采用VitPose Base架构,通过合成数据微调,能够预测包含COCO关键点在内的52个解剖学标记点,适用于运动捕捉和生物力学分析场景。
模型特点
密集关键点预测
可预测52个解剖学关键点,包括COCO标准点和额外生物力学分析点
合成数据微调
通过合成数据对预训练模型进行微调,提高在真实场景中的泛化能力
两阶段检测
支持先检测人体再预测关键点的标准流程,可与多种检测器配合使用
模型能力
2D人体姿态估计
多关键点检测
运动捕捉分析
生物力学标记点预测
使用案例
运动分析
运动捕捉系统
为运动捕捉系统提供密集的关键点预测,支持生物力学分析
可准确预测52个解剖学标记点位置
体育训练分析
用于运动员动作分析和训练指导
提供关节角度和身体姿态的精确测量
医疗康复
步态分析
用于患者步态评估和康复进度监测
可跟踪下肢关键点的运动轨迹
library_name: transformers license: apache-2.0 base_model:
- usyd-community/vitpose-base-simple pipeline_tag: keypoint-detection
SynthPose (Transformers 🤗 VitPose基础变体)
SynthPose模型由Yoni Gozlan、Antoine Falisse、Scott Uhlrich、Anthony Gatti、Michael Black和Akshay Chaudhari在论文OpenCapBench: 连接姿态估计与生物力学的基准中提出。
该模型由Yoni Gozlan贡献。
使用场景
本模型采用VitPose Base作为主干网络。SynthPose是一种创新方法,通过合成数据对预训练的2D人体姿态模型进行微调,可预测任意密集的关键点集合以实现精确的运动学分析。更多细节详见OpenCapBench: 连接姿态估计与生物力学的基准。此特定变体针对运动捕捉系统中常见的关键点集进行了微调,同时包含COCO关键点。
模型预测的52个标记点如下:
{
0: "鼻子",
1: "左眼",
2: "右眼",
3: "左耳",
4: "右耳",
5: "左肩",
6: "右肩",
7: "左肘",
8: "右肘",
9: "左腕",
10: "右腕",
11: "左髋",
12: "右髋",
13: "左膝",
14: "右膝",
15: "左踝",
16: "右踝",
17: "胸骨",
18: "右肩部",
19: "左肩部",
20: "右外侧肘",
21: "左外侧肘",
22: "右内侧肘",
23: "左内侧肘",
24: "右外侧腕",
25: "左外侧腕",
26: "右内侧腕",
27: "左内侧腕",
28: "右髂前上棘",
29: "左髂前上棘",
30: "右髂后上棘",
31: "左髂后上棘",
32: "右膝部",
33: "左膝部",
34: "右膝内侧",
35: "左膝内侧",
36: "右踝部",
37: "左踝部",
38: "右踝内侧",
39: "左踝内侧",
40: "右第五跖骨",
41: "左第五跖骨",
42: "右足尖",
43: "左足尖",
44: "右大脚趾",
45: "左大脚趾",
46: "左跟骨",
47: "右跟骨",
48: "第七颈椎",
49: "第二腰椎",
50: "第十一胸椎",
51: "第六胸椎",
}
其中前17个关键点来自COCO数据集,后续35个为解剖学标记点。
使用方法
图像推理
加载模型并运行图像推理的示例:
import torch
import requests
import numpy as np
from PIL import Image
from transformers import (
AutoProcessor,
RTDetrForObjectDetection,
VitPoseForPoseEstimation,
)
device = "cuda" if torch.cuda.is_available() else "cpu"
url = "http://farm4.staticflickr.com/3300/3416216247_f9c6dfc939_z.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# ------------------------------------------------------------------------
# 第一阶段:检测图像中的人体
# ------------------------------------------------------------------------
# 可自由选择检测器
person_image_processor = AutoProcessor.from_pretrained("PekingU/rtdetr_r50vd_coco_o365")
person_model = RTDetrForObjectDetection.from_pretrained("PekingU/rtdetr_r50vd_coco_o365", device_map=device)
inputs = person_image_processor(images=image, return_tensors="pt").to(device)
with torch.no_grad():
outputs = person_model(**inputs)
results = person_image_processor.post_process_object_detection(
outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3
)
result = results[0] # 取第一张图像结果
# COCO数据集中人体标签对应索引0
person_boxes = result["boxes"][result["labels"] == 0]
person_boxes = person_boxes.cpu().numpy()
# 将边界框从VOC格式(x1,y1,x2,y2)转为COCO格式(x1,y1,w,h)
person_boxes[:, 2] = person_boxes[:, 2] - person_boxes[:, 0]
person_boxes[:, 3] = person_boxes[:, 3] - person_boxes[:, 1]
# ------------------------------------------------------------------------
# 第二阶段:为每个检测到的人体预测关键点
# ------------------------------------------------------------------------
image_processor = AutoProcessor.from_pretrained("yonigozlan/synthpose-vitpose-base-hf")
model = VitPoseForPoseEstimation.from_pretrained("yonigozlan/synthpose-vitpose-base-hf", device_map=device)
inputs = image_processor(image, boxes=[person_boxes], return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
pose_results = image_processor.post_process_pose_estimation(outputs, boxes=[person_boxes])
image_pose_result = pose_results[0] # 第一张图像的结果
使用supervision库可视化
import supervision as sv
xy = torch.stack([pose_result['keypoints'] for pose_result in image_pose_result]).cpu().numpy()
scores = torch.stack([pose_result['scores'] for pose_result in image_pose_result]).cpu().numpy()
key_points = sv.KeyPoints(
xy=xy, confidence=scores
)
vertex_annotator = sv.VertexAnnotator(
color=sv.Color.PINK,
radius=2
)
annotated_frame = vertex_annotator.annotate(
scene=image.copy(),
key_points=key_points
)
annotated_frame
高级手动可视化
import math
import cv2
def draw_points(image, keypoints, scores, pose_keypoint_color, keypoint_score_threshold, radius, show_keypoint_weight):
if pose_keypoint_color is not None:
assert len(pose_keypoint_color) == len(keypoints)
for kid, (kpt, kpt_score) in enumerate(zip(keypoints, scores)):
x_coord, y_coord = int(kpt[0]), int(kpt[1])
if kpt_score > keypoint_score_threshold:
color = tuple(int(c) for c in pose_keypoint_color[kid])
if show_keypoint_weight:
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
transparency = max(0, min(1, kpt_score))
cv2.addWeighted(image, transparency, image, 1 - transparency, 0, dst=image)
else:
cv2.circle(image, (int(x_coord), int(y_coord)), radius, color, -1)
def draw_links(image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold, thickness, show_keypoint_weight, stick_width = 2):
height, width, _ = image.shape
if keypoint_edges is not None and link_colors is not None:
assert len(link_colors) == len(keypoint_edges)
for sk_id, sk in enumerate(keypoint_edges):
x1, y1, score1 = (int(keypoints[sk[0], 0]), int(keypoints[sk[0], 1]), scores[sk[0]])
x2, y2, score2 = (int(keypoints[sk[1], 0]), int(keypoints[sk[1], 1]), scores[sk[1]])
if (
x1 > 0
and x1 < width
and y1 > 0
and y1 < height
and x2 > 0
and x2 < width
and y2 > 0
and y2 < height
and score1 > keypoint_score_threshold
and score2 > keypoint_score_threshold
):
color = tuple(int(c) for c in link_colors[sk_id])
if show_keypoint_weight:
X = (x1, x2)
Y = (y1, y2)
mean_x = np.mean(X)
mean_y = np.mean(Y)
length = ((Y[0] - Y[1]) ** 2 + (X[0] - X[1]) ** 2) ** 0.5
angle = math.degrees(math.atan2(Y[0] - Y[1], X[0] - X[1]))
polygon = cv2.ellipse2Poly(
(int(mean_x), int(mean_y)), (int(length / 2), int(stick_width)), int(angle), 0, 360, 1
)
cv2.fillConvexPoly(image, polygon, color)
transparency = max(0, min(1, 0.5 * (keypoints[sk[0], 2] + keypoints[sk[1], 2])))
cv2.addWeighted(image, transparency, image, 1 - transparency, 0, dst=image)
else:
cv2.line(image, (x1, y1), (x2, y2), color, thickness=thickness)
# 注意:关键点连接线和调色板与数据集相关
keypoint_edges = model.config.edges
palette = np.array(
[
[255, 128, 0],
[255, 153, 51],
[255, 178, 102],
[230, 230, 0],
[255, 153, 255],
[153, 204, 255],
[255, 102, 255],
[255, 51, 255],
[102, 178, 255],
[51, 153, 255],
[255, 153, 153],
[255, 102, 102],
[255, 51, 51],
[153, 255, 153],
[102, 255, 102],
[51, 255, 51],
[0, 255, 0],
[0, 0, 255],
[255, 0, 0],
[255, 255, 255],
]
)
link_colors = palette[[0, 0, 0, 0, 7, 7, 7, 9, 9, 9, 9, 9, 16, 16, 16, 16, 16, 16, 16]]
keypoint_colors = palette[[16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0]+[4]*(52-17)]
numpy_image = np.array(image)
for pose_result in image_pose_result:
scores = np.array(pose_result["scores"])
keypoints = np.array(pose_result["keypoints"])
# 在图像上绘制每个关键点
draw_points(numpy_image, keypoints, scores, keypoint_colors, keypoint_score_threshold=0.3, radius=2, show_keypoint_weight=False)
# 绘制连接线
draw_links(numpy_image, keypoints, scores, keypoint_edges, link_colors, keypoint_score_threshold=0.3, thickness=1, show_keypoint_weight=False)
pose_image = Image.fromarray(numpy_image)
pose_image
Superpoint
其他
SuperPoint是一种自监督训练的全卷积网络,用于兴趣点检测和描述。
姿态估计
Transformers

S
magic-leap-community
59.12k
13
Vitpose Base Simple
Apache-2.0
ViTPose是基于视觉Transformer的人体姿态估计模型,在MS COCO关键点测试集上达到81.1 AP的精度,具有模型简洁、规模可扩展、训练灵活等优势
姿态估计
Transformers

英语
V
usyd-community
51.40k
20
Vitpose Plus Small
Apache-2.0
ViTPose++是基于视觉Transformer的人体姿态估计模型,在MS COCO关键点检测基准上达到81.1 AP的优异表现。
姿态估计
Transformers

V
usyd-community
30.02k
2
Vitpose Plus Base
Apache-2.0
ViTPose是一个基于视觉Transformer的人体姿态估计模型,采用简洁设计在MS COCO关键点检测基准上取得81.1 AP的优异表现。
姿态估计
Transformers

英语
V
usyd-community
22.26k
10
Superglue Outdoor
其他
SuperGlue是一种基于图神经网络的特征匹配模型,用于匹配图像中的兴趣点,适用于图像匹配和姿态估计任务。
姿态估计
Transformers

S
magic-leap-community
18.39k
2
Vitpose Plus Huge
Apache-2.0
ViTPose++是基于视觉Transformer的人体姿态估计基础模型,在MS COCO关键点测试集上达到81.1 AP的优异表现。
姿态估计
Transformers

V
usyd-community
14.49k
6
Img2pose
img2pose是一个基于Faster R-CNN的模型,用于预测照片中所有人脸的六自由度姿态(6DoF),并能将3D人脸投影到2D平面。
姿态估计
Safetensors
I
py-feat
4,440
0
Vitpose Plus Large
Apache-2.0
ViTPose++是基于视觉Transformer的人体姿态估计基础模型,在MS COCO关键点测试集上达到81.1 AP的优异表现。
姿态估计
Transformers

V
usyd-community
1,731
1
Synthpose Vitpose Huge Hf
Apache-2.0
SynthPose是基于VitPose巨型主干网络的关键点检测模型,通过合成数据微调预测52个人体关键点,适用于运动学分析。
姿态估计
Transformers

S
stanfordmimi
1,320
1
Sapiens Pose 1b Torchscript
Sapiens是基于3亿张1024x1024分辨率人体图像预训练的视觉Transformer模型,专为高精度姿态估计任务设计。
姿态估计
英语
S
facebook
1,245
7
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers

支持多种语言
L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers

英语
C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统
中文
R
uer
2,694
98
AIbase是一个专注于MCP服务的平台,为AI开发者提供高质量的模型上下文协议服务,助力AI应用开发。
简体中文