缩略图: https://user-images.githubusercontent.com/54370274/243292723-fa703668-a931-41e1-8bcf-19c72203980b.png
标签:
🐣 请关注我获取最新动态 https://twitter.com/camenduru
🔥 欢迎加入我们的Discord服务器 https://discord.gg/k5BwmmvJJU

Potat 1️⃣
首个开源的1024x576文本转视频模型 🥳
https://huggingface.co/vdo/potat1-5000/tree/main
https://huggingface.co/vdo/potat1-10000/tree/main
https://huggingface.co/vdo/potat1-10000-base-text-encoder/tree/main
https://huggingface.co/vdo/potat1-15000/tree/main
https://huggingface.co/vdo/potat1-20000/tree/main
https://huggingface.co/vdo/potat1-25000/tree/main
https://huggingface.co/vdo/potat1-30000/tree/main
https://huggingface.co/vdo/potat1-35000/tree/main
https://huggingface.co/vdo/potat1-40000/tree/main
https://huggingface.co/vdo/potat1-45000/tree/main
https://huggingface.co/vdo/potat1-50000/tree/main
https://huggingface.co/vdo/potat1-50000-base-text-encoder/tree/main = https://huggingface.co/camenduru/potat1 (您当前所在位置)
信息
原型模型
训练设备:https://lambdalabs.com ❤ 1xA100 (40GB)
训练数据:2197个视频片段,68388帧标注(使用salesforce/blip2-opt-6.7b-coco)
训练步数:10000
数据集与配置
https://huggingface.co/camenduru/potat1_dataset/tree/main
微调工具
https://github.com/Breakthrough/PySceneDetect
https://github.com/ExponentialML/Video-BLIP2-Preprocessor
https://github.com/ExponentialML/Text-To-Video-Finetuning
https://github.com/camenduru/Text-To-Video-Finetuning-colab
基础模型
https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis
https://www.modelscope.cn/models/damo/text-to-video-synthesis
特别感谢 damo-vilab ❤ ExponentialML ❤ kabachuha ❤ @DiffusersLib ❤ @LambdaAPI ❤ @cerspense ❤ @CiaraRowles1 ❤ @p1atdev_art ❤
感谢Orellius ❤(重要漏洞报告)
欢迎试用 🐣
https://github.com/camenduru/text-to-video-synthesis-colab
Potat 2️⃣ 正在开发中 ♨