基于microsoft/Florence-2-base微调的图像描述生成模型,专注于提升描述质量和格式
下载量 23
发布时间 : 2/4/2025
模型介绍
内容详情
替代品
模型简介
该模型是基于Florence-2-base架构微调的图像到文本模型,专门优化了图像描述生成的质量和格式。使用<CAPTION>任务提示进行训练,适用于生成详细准确的图像描述。
模型特点
高质量图像描述
生成详细、准确的图像描述,优于基础模型
格式优化
专门优化了描述的格式和结构
任务提示支持
支持<CAPTION>任务提示,可扩展其他提示类型
模型能力
图像描述生成
详细场景分析
物体识别与描述
使用案例
内容生成
自动图像标注
为图像生成详细的描述性文本
生成比基础模型更准确、更详细的描述
无障碍辅助
为视障用户提供图像内容描述
提供更全面的场景理解
媒体处理
媒体内容分析
自动分析图像内容并生成描述
可用于内容分类和检索
数据集:
- PJMixers-Images/Castollux-Dataset 语言:
- 英语 基础模型:
- microsoft/Florence-2-base 任务类型: 图像到文本 库名称: transformers 新版本: PJMixers-Images/Florence-2-base-Castollux-v0.5
Florence-2-base-Castollux-v0.4
基于microsoft/Florence-2-base微调的模型,旨在提升图像描述的质量和格式。
使用<CAPTION>
任务提示。与<CAPTION>
相比,使用<DETAILED_CAPTION>
或<MORE_DETAILED_CAPTION>
进行训练在质量上似乎没有显著差异。
在v0.1和v0.2版本中,仅过滤掉超过1000个Florence-2标记的描述。从v0.3开始,将过滤掉超过512个T5标记的任何描述。
模型评估
所选图像并非精心挑选,它们来自随机选择的评估分割中的第一批图像。
源图像 | 真实描述 | 原始模型预测 | 微调模型预测 |
---|---|---|---|
![]() |
左侧可见一面双色墙,上部为橙色,下部为蓝色,中间有一条细水平线分隔两种颜色。橙色墙面上嵌有一个带有垂直金属条的小方形窗户。一根浅橙色管道沿着橙色墙面垂直延伸,靠近颜色分界线。这面墙的右侧是一面有纹理的浅灰色墙。一根灰色管道沿着这面墙垂直延伸。一个红色八角形停车标志,上面有白色的“STOP”字样,安装在灰色杆子上,位于灰色墙前。再往右,灰色墙上有一个蓝色百叶窗。窗户下方墙上装有一个装饰性的深蓝色栏杆。 | 一个停车标志立在建筑物旁边。 | 图像左侧有一个红色八角形停车标志,上面有白色文字“STOP”,附着在浅米色拉毛墙面上。停车标志右侧是一扇关闭的蓝色门,油漆剥落并有金属条,右侧有一个黑色锻铁阳台,带有装饰元素。墙面有纹理,可见裂缝和瑕疵。一根白色管道在橙色和蓝色墙面部分之间垂直延伸。橙色部分内可见一个带有深色金属条的小矩形窗户。 |
![]() |
一只棕色和白色、耳朵下垂、嘴巴张开的狗站在一片茂密的绿色植物中,植物上有小白花。狗位于图像中下部,略微面向左侧。绿色植物遍布前景和中景,形成厚厚的绿色地毯。在狗和较低的植物后面,一个更陡的斜坡上覆盖着类似的绿色植物,占据了图像的上部。一条狭窄、黑暗的泥土小路或小溪横穿中景,将较低的植物与较高的斜坡分开。一根倒下的原木沿着这条小路的右侧边缘。较高斜坡上的叶子看起来比前景中的稍小且更密集。在较高斜坡的右侧可见一些细小的光秃树枝。 | 一只狗坐在茂密的绿色田野中央。 | 一只棕色和白色、耳朵下垂的狗位于图像中央,略微面向左侧,舌头伸出。狗周围是一片带有小白花的绿色植物。狗的后方和右侧是一面覆盖着茂密绿叶的绿色墙。前景地面似乎是一条泥土小路。图像光线柔和自然。 |
![]() |
一道非常高的瀑布从深灰色岩石悬崖中央倾泻而下,落入底部的一片深色水域。悬崖有纹理,部分覆盖着积雪。悬崖顶部和崖面上有稀疏的树木。深色水面反射着悬崖和瀑布。天空呈灰色。 | 一片水域中央的大瀑布。 | 一道高瀑布从岩石崖面中央倾泻而下,水呈深蓝色,反射着上方的天空。崖面由灰色和棕色岩石组成,可见裂缝和缝隙。悬崖顶部是一片深绿色树木的茂密森林。天空多云,呈灰色。前景中的水呈深蓝色。 |
![]() |
一个色彩鲜艳的鱼形游乐设施,头部为白色,蓝色眼睛,张开的橙色嘴巴,鳞片为蓝色、橙色、黄色和白色,位于图像中央,安装在一个带有黑色把手的黑色底座上。鱼形游乐设施后面是一个奶油色的装饰性建筑元素,带有拱形开口。鱼形游乐设施左侧有几个金属锅碗瓢盆,包括一个倾倒的大蓝色搪瓷锅,红色把手朝向观众,后面堆放着其他较小的锅。鱼形游乐设施右侧有几个黑色桶叠在一起。鱼形游乐设施和桶位于一个红色平台上。背景中有一面风化的蓝色木墙,带有白色边框的窗户,墙左侧是带有装饰雕刻的陈旧木门。奶油色建筑元素上贴有一个白色标签,上面有文字可见。 | 一个大鱼雕像放在桌子上。 | 图像中央有一个大型金属鱼雕塑,鳞片为橙色、白色和蓝色,略微面向右侧。鱼有黑色的眼睛和黄色的喙。它安装在一个黑色金属支架上。鱼左侧是一个带有蓝色边缘和红色把手的红色桶。再往左是一个白色桶。前景中有一叠黑色塑料桶。鱼和桶后面是其他几个金属桶和容器。背景显示一栋建筑,绿色木质外墙和一扇多格白色窗户。左侧可见一扇木门,上方有金属栏杆。建筑侧面贴有一个带有黑色文字的小白色标志。 |
![]() |
一个棕色头发、棕色眼睛的小女孩微笑着直接看向观众,双手叉腰站着。她头上右侧戴着一个带有花朵的浅蓝色发带。她穿着一件浅蓝色无袖上衣和一条多色长裙,裙子为淡黄色、粉色、紫色和蓝色,似乎由薄纱或类似的透明面料制成。她赤脚站在纯白色背景上。 | 一个小女孩穿着裙子,头发上戴着一朵花。 | 一个肤色白皙、棕色眼睛的小女孩微笑着直接看向观众,头戴一个中央有大蓝花的浅蓝色发带,身穿白色无袖连衣裙,细肩带,彩虹色薄纱裙。双手叉腰,赤脚站立。背景是一面纯白色的墙。 |
![]() |
一个棕色卷发、肤色白皙的女性坐在一把黄色扶手椅上。她穿着一件红色亮片连衣裙,细肩带,黑色高跟鞋。双腿在膝盖处交叉,右手放在椅子扶手上。她直接看向观众,棕色眼睛,化着妆,包括眼线和口红。扶手椅左侧是一个小圆形白色桌子,上面放着一个橙色花瓶,里面有一束粉色、紫色和黄色的花。女性和桌子后面是白色垂直百叶窗和蓝色窗帘。百叶窗和窗帘后面的墙似乎是浅灰色或蓝色。地板部分可见,呈浅色。前景中可见一张绿色床或沙发的底部。图像光线类似摄影棚,聚焦在女性和周围环境上。 | 一个穿红裙子的女人坐在黄色椅子上。 | 一个长棕色卷发的女性坐在一把黄色扶手椅上。她穿着一件红色无袖连衣裙,细肩带。双腿交叉,穿着黑色高跟鞋。扶手椅有弯曲的靠背和扶手。扶手椅左侧是一个小圆形白色桌子。桌子上有一个透明玻璃花瓶,里面插满了粉色和白色的花。花瓶后面是一扇带有白色垂直百叶窗的窗户,右侧有一面深蓝色窗帘。背景是一面浅蓝色的墙。 |
![]() |
左上角有几缕白色云朵的清澈蓝天下方是一片绿松石色和清澈的海水,可见波纹和波浪图案。前景中白色沙滩上有一道带有白色泡沫的小浪花。水非常透明,可以看到沙底。地平线笔直,分隔着海和天。 | 一片清澈蓝水和白沙的海滩。 | 画面大部分是一片绿松石色的水域,轻柔的波浪拍打着前景中的沙滩。水是鲜艳的绿色,波浪呈白色泡沫状。沙滩呈浅米色。远处可见水与海滩之间的地平线。上方是清澈的蓝天,点缀着几缕白色云朵。 |
![]() |
景观视角显示一道瀑布从被明亮阳光照亮的岩石悬崖上倾泻而下,瀑布底部可见彩虹,两个小人站在瀑布底部附近。悬崖面主要为棕色和灰色岩石,部分覆盖绿色植被。阳光照射的悬崖面后面是更暗、阴影中的山脉,可见岩石条纹和一些稀疏植被。远处背景左侧是半圆顶的独特形状,上方是多云的天空。前景是一片茂密的深绿色森林,与明亮的瀑布和悬崖面形成对比。上方天空多云,主要是深灰色云层,但也有一些较亮的区域透出。 | 山脉中央的大瀑布。 | 一道瀑布从图像右侧的岩石崖面上倾泻而下,水呈黄色和白色。崖面由灰色和棕色岩石组成,顶部和侧面有绿色植被。瀑布位于崖面中央,周围是茂密的深绿色森林。背景中可见覆盖着深绿色针叶树的大山脉。上方天空多云,有深灰色和白色云层。整体场景戏剧性,聚焦于瀑布和下方的森林。 |
训练设置
使用Florence-2ner训练,配置如下,约17K张图像:
{
"模型名称": "microsoft/Florence-2-base",
"任务提示": "<CAPTION>",
"数据集路径": "./0000_Datasets/Gemini-512lim",
"wandb项目名称": "Florence-2-base",
"运行名称": "Florence-2-base-Castollux-v0.4-run7",
"训练轮次": 2,
"优化器": "CAME",
"学习率": 5e-6,
"学习率调度器": "REX",
"梯度检查点": true,
"冻结视觉部分": false,
"冻结语言部分": false,
"冻结其他部分": false,
"训练批次大小": 8,
"评估批次大小": 8,
"梯度累积步数": 8,
"梯度裁剪范数": 1,
"权重衰减": 1e-2,
"保存总数限制": 3,
"保存步数": 10,
"评估步数": 10,
"预热步数": 10,
"评估分割比例": 0.05,
"随机种子": 42,
"过滤进程数": 128,
"注意力实现": "sdpa"
}
引用
显示引用
@misc{xiao2023florence2advancingunifiedrepresentation,
title={Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks},
author={Bin Xiao and Haiping Wu and Weijian Xu and Xiyang Dai and Houdong Hu and Yumao Lu and Michael Zeng and Ce Liu and Lu Yuan},
year={2023},
eprint={2311.06242},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2311.06242},
}
@misc{wolf2020huggingfacestransformersstateoftheartnatural,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
year={2020},
eprint={1910.03771},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/1910.03771},
}
@misc{dao2023flashattention2fasterattentionbetter,
title={FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning},
author={Tri Dao},
year={2023},
eprint={2307.08691},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2307.08691},
}
@misc{luo2023cameconfidenceguidedadaptivememory,
title={CAME: Confidence-guided Adaptive Memory Efficient Optimization},
author={Yang Luo and Xiaozhe Ren and Zangwei Zheng and Zhuo Jiang and Xin Jiang and Yang You},
year={2023},
eprint={2307.02047},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2307.02047},
}
@misc{chen2021rexrevisitingbudgetedtraining,
title={REX: Revisiting Budgeted Training with an Improved Schedule},
author={John Chen and Cameron Wolfe and Anastasios Kyrillidis},
year={2021},
eprint={2107.04197},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2107.04197},
}
@misc{geminiteam2024geminifamilyhighlycapable,
title={Gemini: A Family of Highly Capable Multimodal Models},
author={Gemini Team and Rohan Anil and Sebastian Borgeaud and Jean-Baptiste Alayrac and Jiahui Yu and Radu Soricut and Johan Schalkwyk and Andrew M. Dai and Anja Hauth and Katie Millican and David Silver and Melvin Johnson and Ioannis Antonoglou and Julian Schrittwieser and Amelia Glaese and Jilin Chen and Emily Pitler and Timothy Lillicrap and Angeliki Lazaridou and Orhan Firat and James Molloy and Michael Isard and Paul R. Barham and Tom Hennigan and Benjamin Lee and Fabio Viola and Malcolm Reynolds and Yuanzhong Xu and Ryan Doherty and Eli Collins and Clemens Meyer and Eliza Rutherford and Erica Moreira and Kareem Ayoub and Megha Goel and Jack Krawczyk and Cosmo Du and Ed Chi and Heng-Tze Cheng and Eric Ni and Purvi Shah and Patrick Kane and Betty Chan and Manaal Faruqui and Aliaksei Severyn and Hanzhao Lin and YaGuang Li and Yong Cheng and Abe Ittycheriah and Mahdis Mahdieh and Mia Chen and Pei Sun and Dustin Tran and Sumit Bagri and Balaji Lakshminarayanan and Jeremiah Liu and Andras Orban and Fabian Güra and Hao Zhou and Xinying Song and Aurelien Boffy and Harish Ganapathy and Steven Zheng and HyunJeong Choe and Ágoston Weisz and Tao Zhu and Yifeng Lu and Siddharth Gopal and Jarrod Kahn and Maciej Kula and Jeff Pitman and Rushin Shah and Emanuel Taropa and Majd Al Merey and Martin Baeuml and Zhifeng Chen and Laurent El Shafey and Yujing Zhang and Olcan Sercinoglu and George Tucker and Enrique Piqueras and Maxim Krikun and Iain Barr and Nikolay Savinov and Ivo Danihelka and Becca Roelofs and Anaïs White and Anders Andreassen and Tamara von Glehn and Lakshman Yagati and Mehran Kazemi and Lucas Gonzalez and Misha Khalman and Jakub Sygnowski and Alexandre Frechette and Charlotte Smith and Laura Culp and Lev Proleev and Yi Luan and Xi Chen and James Lottes and Nathan Schucher and Federico Lebron and Alban Rrustemi and Natalie Clay and Phil Crone and Tomas Kocisky and Jeffrey Zhao and Bartek Perz and Dian Yu and Heidi Howard and Adam Bloniarz and Jack W. Rae and Han Lu and Laurent Sifre and Marcello Maggioni and Fred Alcober and Dan Garrette and Megan Barnes and Shantanu Thakoor and Jacob Austin and Gabriel Barth-Maron and William Wong and Rishabh Joshi and Rahma Chaabouni and Deeni Fatiha and Arun Ahuja and Gaurav Singh Tomar and Evan Senter and Martin Chadwick and Ilya Kornakov and Nithya Attaluri and Iñaki Iturrate and Ruibo Liu and Yunxuan Li and Sarah Cogan and Jeremy Chen and Chao Jia and Chenjie Gu and Qiao Zhang and Jordan Grimstad and Ale Jakse Hartman and Xavier Garcia and Thanumalayan Sankaranarayana Pillai and Jacob Devlin and Michael Laskin and Diego de Las Casas and Dasha Valter and Connie Tao and Lorenzo Blanco and Adrià Puigdomènech Badia and David Reitter and Mianna Chen and Jenny Brennan and Clara Rivera and Sergey Brin and Shariq Iqbal and Gabriela Surita and Jane Labanowski and Abhi Rao and Stephanie Winkler and Emilio Parisotto and Yiming Gu and Kate Olszewska and Ravi Addanki and Antoine Miech and Annie Louis and Denis Teplyashin and Geoff Brown and Elliot Catt and Jan Balaguer and Jackie Xiang and Pidong Wang and Zoe Ashwood and Anton Briukhov and Albert Webson and Jennimaria Palomaki and Chrisantha Fernando and Ken Durden and Harsh Mehta and Nikola Momchev and Elahe Rahimtoroghi and Maria Georgaki and Amit Raul and Sebastian Ruder and Morgan Redshaw and Jinhyuk Lee and Denny Zhou and Komal Jalan and Dinghua Li and Blake Hechtman and Parker Schuh and Milad Nasr and Kieran Milan and Vladimir Mikulik and Juliana Franco and Tim Green and Nam Nguyen and Joe Kelley and Aroma Mahendru and Andrea Hu and Joshua Howland and Ben Vargas and Jeffrey Hui and Kshitij Bansal and Vikram Rao and Rakesh Ghiya and Emma Wang and Ke Ye and Jean Michel Sarr and Melanie Moranski Preston and Madeleine Elish and Steve Li and Aakash Kaku and Jigar Gupta and Ice Pasupat and Da-Cheng Juan and Milan Someswar and Tejvi M. and Xinyun Chen and Aida Amini and Alex Fabrikant and Eric Chu and Xuanyi Dong and Amruta Muthal and Senaka Buthpitiya and Sarthak Jauhari and Nan Hua and Urvashi Khandelwal and Ayal Hitron and Jie Ren and Larissa Rinaldi and Shahar Drath and Avigail Dabush and Nan-Jiang Jiang and Harshal Godhia and Uli Sachs and Anthony Chen and Yicheng Fan and Hagai Taitelbaum and Hila Noga and Zhuyun Dai and James Wang and Chen Liang and Jenny Hamer and Chun-Sung Ferng and Chenel Elkind and Aviel Atias and Paulina Lee and Vít Listík and Mathias Carlen and Jan van de Kerkhof and Marcin Pikus and Krunoslav Zaher and Paul Müller and Sasha Zykova and Richard Stefanec and Vitaly Gatsko and Christoph Hirnschall and Ashwin Sethi and Xingyu Federico Xu and Chetan Ahuja and Beth Tsai and Anca Stefanoiu and Bo Feng and Keshav Dhandhania and Manish Katyal and Akshay Gupta and Atharva Parulekar and Divya Pitta and Jing Zhao and Vivaan Bhatia and Yashodha Bhavnani and Omar Alhadlaq and Xiaolin Li and Peter Danenberg and Dennis Tu and Alex Pine and Vera Filippova and Abhipso Ghosh and Ben Limonchik and Bhargava Urala and Chaitanya Krishna Lanka and Derik Clive and Yi Sun and Edward Li and Hao Wu and Kevin Hongtongsak and Ianna Li and Kalind Thakkar and Kuanysh Omarov and Kushal Majmundar and Michael Alverson and Michael Kucharski and Mohak Patel and Mudit Jain and Maksim Zabelin and Paolo Pelagatti and Rohan Kohli and Saurabh Kumar and Joseph Kim and Swetha Sankar and Vineet Shah and Lakshmi Ramachandruni and Xiangkai Zeng and Ben Bariach and Laura Weidinger and Tu Vu and Alek Andreev and Antoine He and Kevin Hui and Sheleem Kashem and Amar Subramanya and Sissie Hsiao and Demis Hassabis and Koray Kavukcuoglu and Adam Sadovsky and Quoc Le and Trevor Strohman and Yonghui Wu and Slav Petrov and Jeffrey Dean and Oriol Vinyals},
year={2024},
eprint={2312.11805},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2312.11805},
}
@misc{geminiteam2024gemini15unlockingmultimodal,
title={Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context},
author={Gemini Team and Petko Georgiev and Ving Ian Lei and Ryan Burnell and Libin Bai and Anmol Gulati and Garrett Tanzer and Damien Vincent and Zhufeng Pan and Shibo Wang and Soroosh Mariooryad and Yifan Ding and Xinyang Geng and Fred Alcober and Roy Frostig and Mark Omernick and Lexi Walker and Cosmin Paduraru and Christina Sorokin and Andrea Tacchetti and Colin Gaffney and Samira Daruki and Olcan Sercinoglu and Zach Gleicher and Juliette Love and Paul Voigtlaender and Rohan Jain and Gabriela Surita and Kareem Mohamed and Rory Blevins and Junwhan Ahn and Tao Zhu and Kornraphop Kawintiranon and Orhan Firat and Yiming Gu and Yujing Zhang and Matthew Rahtz and Manaal Faruqui and Natalie Clay and Justin Gilmer and JD Co-Reyes and Ivo Penchev and Rui Zhu and Nobuyuki Morioka and Kevin Hui and Krishna Haridasan and Victor Campos and Mahdis Mahdieh and Mandy Guo and Samer Hassan and Kevin Kilgour and Arpi Vezer and Heng-Tze Cheng and Raoul de Liedekerke and Siddharth Goyal and Paul Barham and DJ Strouse and Seb Noury and Jonas Adler and Mukund Sundararajan and Sharad Vikram and Dmitry Lepikhin and Michela Paganini and Xavier Garcia and Fan Yang and Dasha Valter and Maja Trebacz and Kiran Vodrahalli and Chulayuth Asawaroengchai and Roman Ring and Norbert Kalb and Livio Baldini Soares and Siddhartha Brahma and David Steiner and Tianhe Yu and Fabian Mentzer and Antoine He and Lucas Gonzalez and Bibo Xu and Raphael Lopez Kaufman and Laurent El Shafey and Junhyuk Oh and Tom Hennigan and George van den Driessche and Seth Odoom and Mario Lucic and Becca Roelofs and Sid Lall and Amit Marathe and Betty Chan and Santiago Ontanon and Luheng He and Denis Teplyashin and Jonathan Lai and Phil Crone and Bogdan Damoc and Lewis Ho and Sebastian Riedel and Karel Lenc and Chih-Kuan Yeh and Aakanksha Chowdhery and Yang Xu and Mehran Kazemi and Ehsan Amid and Anastasia Petrushkina and Kevin Swersky and Ali Khodaei and Gowoon Chen and Chris Larkin and Mario Pinto and Geng Yan and Adria Puigdomenech Badia and Piyush Patil and Steven Hansen and Dave Orr and Sebastien M. R. Arnold and Jordan Grimstad and Andrew Dai and Sholto Douglas and Rishika Sinha and Vikas Yadav and Xi Chen and Elena Gribovskaya and Jacob Austin and Jeffrey Zhao and Kaushal Patel and Paul Komarek and Sophia Austin and Sebastian Borgeaud and Linda Friso and Abhimanyu Goyal and Ben Caine and Kris Cao and Da-Woon Chung and Matthew Lamm and Gabe Barth-Maron and Thais Kagohara and Kate Olszewska and Mia Chen and Kaushik Shivakumar and Rishabh Agarwal and Harshal Godhia and Ravi Rajwar and Javier Snaider and Xerxes Dotiwalla and Yuan Liu and Aditya Barua and Victor Ungureanu and Yuan Zhang and Bat-Orgil Batsaikhan and Mateo Wirth and James Qin and Ivo Danihelka and Tulsee Doshi and Martin Chadwick and Jilin Chen and Sanil Jain and Quoc Le and Arjun Kar and Madhu Gurumurthy and Cheng Li and Ruoxin Sang and Fangyu Liu and Lampros Lamprou and Rich Munoz and Nathan Lintz and Harsh Mehta and Heidi Howard and Malcolm Reynolds and Lora Aroyo and Quan Wang and Lorenzo Blanco and Albin Cassirer and Jordan Griffith and Dipanjan Das and Stephan Lee and Jakub Sygnowski and Zach Fisher and James Besley and Richard Powell and Zafarali Ahmed and Dominik Paulus and David Reitter and Zalan Borsos and Rishabh Joshi and Aedan Pope and Steven Hand and Vittorio Selo and Vihan Jain and Nikhil Sethi and Megha Goel and Takaki Makino and Rhys May and Zhen Yang and Johan Schalkwyk and Christina Butterfield and Anja Hauth and Alex Goldin and Will Hawkins and Evan Senter and Sergey Brin and Oliver Woodman and Marvin Ritter and Eric Noland and Minh Giang and Vijay Bolina and Lisa Lee and Tim Blyth and Ian Mackinnon and Machel Reid and Obaid Sarvana and David Silver and Alexander Chen and Lily Wang and Loren Maggiore and Oscar Chang and Nithya Attaluri and Gregory Thornton and Chung-Cheng Chiu and Oskar Bunyan and Nir Levine and Timothy Chung and Evgenii Eltyshev and Xiance Si and Timothy Lillicrap and Demetra Brady and Vaibhav Aggarwal and Boxi Wu and Yuanzhong Xu and Ross McIlroy and Kartikeya Badola and Paramjit Sandhu and Erica Moreira and Wojciech Stokowiec and Ross Hemsley and Dong Li and Alex Tudor and Pranav Shyam and Elahe Rahimtoroghi and Salem Haykal and Pablo Sprechmann and Xiang Zhou and Diana Mincu and Yujia Li and Ravi Addanki and Kalpesh Krishna and Xiao Wu and Alexandre Frechette and Matan Eyal and Allan Dafoe and Dave Lacey and Jay Whang and Thi Avrahami and Ye Zhang and Emanuel Taropa and Hanzhao Lin and Daniel Toyama and Eliza Rutherford and Motoki Sano and HyunJeong Choe and Alex Tomala and Chalence Safranek-Shrader and Nora Kassner and Mantas Pajarskas and Matt Harvey and Sean Sechrist and Meire Fortunato and Christina Lyu and Gamaleldin Elsayed and Chenkai Kuang and James Lottes and Eric Chu and Chao Jia and Chih-Wei Chen and Peter Humphreys and Kate Baumli and Connie Tao and Rajkumar Samuel and Cicero Nogueira dos Santos and Anders Andreassen and Nemanja Rakićević and Dominik Grewe and Aviral Kumar and Stephanie Winkler and Jonathan Caton and Andrew Brock and Sid Dalmia and Hannah Sheahan and Iain Barr and Yingjie Miao and Paul Natsev and Jacob Devlin and Feryal Behbahani and Flavien Prost and Yanhua Sun and Artiom Myaskovsky and Thanumalayan Sankaranarayana Pillai and Dan Hurt and Angeliki Lazaridou and Xi Xiong and Ce Zheng and Fabio Pardo and Xiaowei Li and Dan Horgan and Joe Stanton and Moran Ambar and Fei Xia and Alejandro Lince and Mingqiu Wang and Basil Mustafa and Albert Webson and Hyo Lee and Rohan Anil and Martin Wicke and Timothy Dozat and Abhishek Sinha and Enrique Piqueras and Elahe Dabir and Shyam Upadhyay and Anudhyan Boral and Lisa Anne Hendricks and Corey Fry and Josip Djolonga and Yi Su and Jake Walker and Jane Labanowski and Ronny Huang and Vedant Misra and Jeremy Chen and RJ Skerry-Ryan and Avi Singh and Shruti Rijhwani and Dian Yu and Alex Castro-Ros and Beer Changpinyo and Romina Datta and Sumit Bagri and Arnar Mar Hrafnkelsson and Marcello Maggioni and Daniel Zheng and Yury Sulsky and Shaobo Hou and Tom Le Paine and Antoine Yang and Jason Riesa and Dominika Rogozinska and Dror Marcus and Dalia El Badawy and Qiao Zhang and Luyu Wang and Helen Miller and Jeremy Greer and Lars Lowe Sjos and Azade Nova and Heiga Zen and Rahma Chaabouni and Mihaela Rosca and Jiepu Jiang and Charlie Chen and Ruibo Liu and Tara Sainath and Maxim Krikun and Alex Polozov and Jean-Baptiste Lespiau and Josh Newlan and Zeyncep Cankara and Soo Kwak and Yunhan Xu and Phil Chen and Andy Coenen and Clemens Meyer and Katerina Tsihlas and Ada Ma and Juraj Gottweis and Jinwei Xing and Chenjie Gu and Jin Miao and Christian Frank and Zeynep Cankara and Sanjay Ganapathy and Ishita Dasgupta and Steph Hughes-Fitt and Heng Chen and David Reid and Keran Rong and Hongmin Fan and Joost van Amersfoort and Vincent Zhuang and Aaron Cohen and Shixiang Shane Gu and Anhad Mohananey and Anastasija Ilic and Taylor Tobin and John Wieting and Anna Bortsova and Phoebe Thacker and Emma Wang and Emily Caveness and Justin Chiu and Eren Sezener and Alex Kaskasoli and Steven Baker and Katie Millican and Mohamed Elhawaty and Kostas Aisopos and Carl Lebsack and Nathan Byrd and Hanjun Dai and Wenhao Jia and Matthew Wiethoff and Elnaz Davoodi and Albert Weston and Lakshman Yagati and Arun Ahuja and Isabel Gao and Golan Pundak and Susan Zhang and Michael Azzam and Khe Chai Sim and Sergi Caelles and James Keeling and Abhanshu Sharma and Andy Swing and YaGuang Li and Chenxi Liu and Carrie Grimes Bostock and Yamini Bansal and Zachary Nado and Ankesh Anand and Josh Lipschultz and Abhijit Karmarkar and Lev Proleev and Abe Ittycheriah and Soheil Hassas Yeganeh and George Polovets and Aleksandra Faust and Jiao Sun and Alban Rrustemi and Pen Li and Rakesh Shivanna and Jeremiah Liu and Chris Welty and Federico Lebron and Anirudh Baddepudi and Sebastian Krause and Emilio Parisotto and Radu Soricut and Zheng Xu and Dawn Bloxwich and Melvin Johnson and Behnam Neyshabur and Justin Mao-Jones and Renshen Wang and Vinay Ramasesh and Zaheer Abbas and Arthur Guez and Constant Segal and Duc Dung Nguyen and James Svensson and Le Hou and Sarah York and Kieran Milan and Sophie Bridgers and Wiktor Gworek and Marco Tagliasacchi and James Lee-Thorp and Michael Chang and Alexey Guseynov and Ale Jakse Hartman and Michael Kwong and Ruizhe Zhao and Sheleem Kashem and Elizabeth Cole and Antoine Miech and Richard Tanburn and Mary Phuong and Filip Pavetic and Sebastien Cevey and Ramona Comanescu and Richard Ives and Sherry Yang and Cosmo Du and Bo Li and Zizhao Zhang and Mariko Iinuma and Clara Huiyi Hu and Aurko Roy and Shaan Bijwadia and Zhenkai Zhu and Danilo Martins and Rachel Saputro and Anita Gergely and Steven Zheng and Dawei Jia and Ioannis Antonoglou and Adam Sadovsky and Shane Gu and Yingying Bi and Alek Andreev and Sina Samangooei and Mina Khan and Tomas Kocisky and Angelos Filos and Chintu Kumar and Colton Bishop and Adams Yu and Sarah Hodkinson and Sid Mittal and Premal Shah and Alexandre Moufarek and Yong Cheng and Adam Bloniarz and Jaehoon Lee and Pedram Pejman and Paul Michel and Stephen Spencer and Vladimir Feinberg and Xuehan Xiong and Nikolay Savinov and Charlotte Smith and Siamak Shakeri and Dustin Tran and Mary Chesus and Bernd Bohnet and George Tucker and Tamara von Glehn and Carrie Muir and Yiran Mao and Hideto Kazawa and Ambrose Slone and Kedar Soparkar and Disha Shrivastava and James Cobon-Kerr and Michael Sharman and Jay Pavagadhi and Carlos Araya and Karolis Misiunas and Nimesh Ghelani and Michael Laskin and David Barker and Qiujia Li and Anton Briukhov and Neil Houlsby and Mia Glaese and Balaji Lakshminarayanan and Nathan Schucher and Yunhao Tang and Eli Collins and Hyeontaek Lim and Fangxiaoyu Feng and Adria Recasens and Guangda Lai and Alberto Magni and Nicola De Cao and Aditya Siddhant and Zoe Ashwood and Jordi Orbay and Mostafa Dehghani and Jenny Brennan and Yifan He and Kelvin Xu and Yang Gao and Carl Saroufim and James Molloy and Xinyi Wu and Seb Arnold and Solomon Chang and Julian Schrittwieser and Elena Buchatskaya and Soroush Radpour and Martin Polacek and Skye Giordano and Ankur Bapna and Simon Tokumine and Vincent Hellendoorn and Thibault Sottiaux and Sarah Cogan and Aliaksei Severyn and Mohammad Saleh and Shantanu Thakoor and Laurent Shefey and Siyuan Qiao and Meenu Gaba and Shuo-yiin Chang and Craig Swanson and Biao Zhang and Benjamin Lee and Paul Kishan Rubenstein and Gan Song and Tom Kwiatkowski and Anna Koop and Ajay Kannan and David Kao and Parker Schuh and Axel Stjerngren and Golnaz Ghiasi and Gena Gibson and Luke Vilnis and Ye Yuan and Felipe Tiengo Ferreira and Aishwarya Kamath and Ted Klimenko and Ken Franko and Kefan Xiao and Indro Bhattacharya and Miteyan Patel and Rui Wang and Alex Morris and Robin Strudel and Vivek Sharma and Peter Choy and Sayed Hadi Hashemi and Jessica Landon and Mara Finkelstein and Priya Jhakra and Justin Frye and Megan Barnes and Matthew Mauger and Dennis Daun and Khuslen Baatarsukh and Matthew Tung and Wael Farhan and Henryk Michalewski and Fabio Viola and Felix de Chaumont Quitry and Charline Le Lan and Tom Hudson and Qingze Wang and Felix Fischer and Ivy Zheng and Elspeth White and Anca Dragan and Jean-baptiste Alayrac and Eric Ni and Alexander Pritzel and Adam Iwanicki and Michael Isard and Anna Bulanova and Lukas Zilka and Ethan Dyer and Devendra Sachan and Srivatsan Srinivasan and Hannah Muckenhirn and Honglong Cai and Amol Mandhane and Mukarram Tariq and Jack W. Rae and Gary Wang and Kareem Ayoub and Nicholas FitzGerald and Yao Zhao and Woohyun Han and Chris Alberti and Dan Garrette and Kashyap Krishnakumar and Mai Gimenez and Anselm Levskaya and Daniel Sohn and Josip Matak and Inaki Iturrate and Michael B. Chang and Jackie Xiang and Yuan Cao and Nishant Ranka and Geoff Brown and Adrian Hutter and Vahab Mirrokni and Nanxin Chen and Kaisheng Yao and Zoltan Egyed and Francois Galilee and Tyler Liechty and Praveen Kallakuri and Evan Palmer and Sanjay Ghemawat and Jasmine Liu and David Tao and Chloe Thornton and Tim Green and Mimi Jasarevic and Sharon Lin and Victor Cotruta and Yi-Xuan Tan and Noah F
Clip Vit Large Patch14
CLIP是由OpenAI开发的视觉-语言模型,通过对比学习将图像和文本映射到共享的嵌入空间,支持零样本图像分类
图像生成文本
C
openai
44.7M
1,710
Clip Vit Base Patch32
CLIP是由OpenAI开发的多模态模型,能够理解图像和文本之间的关系,支持零样本图像分类任务。
图像生成文本
C
openai
14.0M
666
Siglip So400m Patch14 384
Apache-2.0
SigLIP是基于WebLi数据集预训练的视觉语言模型,采用改进的sigmoid损失函数,优化了图像-文本匹配任务。
图像生成文本
Transformers

S
google
6.1M
526
Clip Vit Base Patch16
CLIP是由OpenAI开发的多模态模型,通过对比学习将图像和文本映射到共享的嵌入空间,实现零样本图像分类能力。
图像生成文本
C
openai
4.6M
119
Blip Image Captioning Base
Bsd-3-clause
BLIP是一个先进的视觉-语言预训练模型,擅长图像描述生成任务,支持条件式和非条件式文本生成。
图像生成文本
Transformers

B
Salesforce
2.8M
688
Blip Image Captioning Large
Bsd-3-clause
BLIP是一个统一的视觉-语言预训练框架,擅长图像描述生成任务,支持条件式和无条件式图像描述生成。
图像生成文本
Transformers

B
Salesforce
2.5M
1,312
Openvla 7b
MIT
OpenVLA 7B是一个基于Open X-Embodiment数据集训练的开源视觉-语言-动作模型,能够根据语言指令和摄像头图像生成机器人动作。
图像生成文本
Transformers

英语
O
openvla
1.7M
108
Llava V1.5 7b
LLaVA 是一款开源多模态聊天机器人,基于 LLaMA/Vicuna 微调,支持图文交互。
图像生成文本
Transformers

L
liuhaotian
1.4M
448
Vit Gpt2 Image Captioning
Apache-2.0
这是一个基于ViT和GPT2架构的图像描述生成模型,能够为输入图像生成自然语言描述。
图像生成文本
Transformers

V
nlpconnect
939.88k
887
Blip2 Opt 2.7b
MIT
BLIP-2是一个视觉语言模型,结合了图像编码器和大型语言模型,用于图像到文本的生成任务。
图像生成文本
Transformers

英语
B
Salesforce
867.78k
359
精选推荐AI模型
Llama 3 Typhoon V1.5x 8b Instruct
专为泰语设计的80亿参数指令模型,性能媲美GPT-3.5-turbo,优化了应用场景、检索增强生成、受限生成和推理任务
大型语言模型
Transformers

支持多种语言
L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一个基于SODA数据集训练的超小型对话模型,专为边缘设备推理设计,体积仅为Cosmo-3B模型的2%左右。
对话系统
Transformers

英语
C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基于RoBERTa架构的中文抽取式问答模型,适用于从给定文本中提取答案的任务。
问答系统
中文
R
uer
2,694
98
AIbase是一个专注于MCP服务的平台,为AI开发者提供高质量的模型上下文协议服务,助力AI应用开发。
简体中文