solidity-t5开源智能合约代码生成模型 - 免费为Web3开发自动生成代码

首页

Solidity T5

由 hululuzhu 开发

基于T5架构的Solidity智能合约代码生成模型，专为Web3开发设计

文本生成

Transformers

英语开源协议:Apache-2.0 #Solidity代码补全 #智能合约生成 #Web3开发辅助

下载量 141

发布时间 : 1/1/2023

模型简介

该模型能够根据给定的Solidity版本声明、父类/库信息以及合约声明头，生成相应的智能合约代码。适用于快速原型开发和代码补全场景。

模型特点

Solidity专用

专门针对Solidity智能合约开发优化的代码生成模型

上下文感知

能够理解并利用输入的父类/库上下文信息生成符合继承规范的代码

Web3优化

生成的代码结构符合Web3智能合约开发的最佳实践

模型能力

Solidity代码生成

智能合约补全

函数体生成

常量定义生成

使用案例

智能合约开发

快速原型开发

根据基本合约框架快速生成初始代码结构

显著减少编写基础模板代码的时间

代码补全

在已有部分合约代码的基础上生成缺失的函数实现

提高开发效率，减少重复劳动

🚀 用于Solidity（Web3智能合约）的代码生成T5模型

本项目是一个用于Solidity（Web3智能合约）的代码生成T5模型，借助该模型可以更高效地生成智能合约代码。更多项目背景信息可查看：https://github.com/hululuzhu/solidity-t5 。

🚀 快速开始

💻 使用示例

基础用法

# !pip install transformers -q

from transformers import AutoTokenizer, T5ForConditionalGeneration

DEVICE = 'cuda'  # fallback to cpu if you do not have cuda
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5").to(DEVICE)

text = """pragma solidity ^0.5.7;
// Context: ParentA | Functions: helloA helloB | Constants: constantA 
contract HelloWorld is ParentA {"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.to(DEVICE)

# Need to tune beam/topk/topp params to get good outcome
generated_ids = model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

# Expect outcome
"""
string public constant name = "Hello World";
...
uint256 public constant override returns (uint256) {
return initialSupply;
}
function initialSupply() public view returns (uint256) {
...
"""

高级用法

# 此示例展示了如何使用该模型进行代码生成，输入的text包含了Solidity版本头、父类信息以及合约声明头。
# 你可以根据实际需求调整解码参数（如beam、topk、topp）以获得更好的输出结果。

# !pip install transformers -q

from transformers import AutoTokenizer, T5ForConditionalGeneration

DEVICE = 'cuda'  # fallback to cpu if you do not have cuda
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5").to(DEVICE)

text = """pragma solidity ^0.5.7;
// Context: ParentA | Functions: helloA helloB | Constants: constantA 
contract HelloWorld is ParentA {"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.to(DEVICE)

# 调整解码参数以获得更好的输出
generated_ids = model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

📚 详细文档

背景信息

基础T5代码模型：https://huggingface.co/Salesforce/codet5-large

源数据：https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts

处理步骤：数据清洗、合约级别的分割、输入输出拆分。

处理后的输入示例

pragma solidity 0.5.7;
// Context: PauserRole | Functions: isPauser addPauser renouncePauser | Constants: 
contract Pausable is PauserRole {

处理后的输出示例（注意：为了减少token数量，故意采用了糟糕的缩进格式）

event Paused(address account);
event Unpaused(address account);
bool private _pausableActive;
bool private _paused;
constructor () internal {
_paused = false;
}
function paused() public view returns (bool) {
return _paused;
}
modifier whenNotPaused() {
require(!_paused);
_;
}
modifier whenPaused() {
require(_paused);
_;
}
function pause() public onlyPauser whenNotPaused whenPausableActive {
_paused = true;
emit Paused(msg.sender);
}
function unpause() public onlyPauser whenPaused whenPausableActive {
_paused = false;
emit Unpaused(msg.sender);
}
function _setPausableActive(bool _active) internal {
_pausableActive = _active;
}
modifier whenPausableActive() {
require(_pausableActive);
_;
}
}

源训练代码：请查看代码目录中的端到端笔记本。

未来计划

由于缺乏GPU预算，该模型训练严重不足，需要10倍的Colab资源（约100美元用于完整训练）。
目前模型的使用方式较为有限，未来可能会尝试切换到GPT2仅解码器模型进行比较，但CodeT5在代码优化方面有其优势。
需要更多的分类器（如T5或BERT）来检测潜在的缺陷。

📄 许可证

本项目采用Apache-2.0许可证。

属性	详情
模型类型	用于Solidity的代码生成T5模型
训练数据	来自https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts 的智能合约数据