语言:
- 英文
许可证: Apache-2.0
标签:
- Solidity
- Web3
- 代码生成
- 智能合约
小部件示例:
- 文本: "pragma solidity ^0.5.7;\n// 上下文: ParentA | 函数: helloA helloB | 常量: constantA \ncontract HelloWorld is ParentA {"
面向Solidity的代码生成T5模型(Web3智能合约)
- 更多背景信息请访问 https://github.com/hululuzhu/solidity-t5
如何使用这个训练好的模型
- 使用示例(注意输入文本需包含):
- Solidity版本声明如
pragma solidity ^0.5.7
- 父类/库信息(如
ParentA
的公共函数和常量)
- 合约/库/接口声明头(如以
{
结尾的HelloWorld
)
!pip install transformers -q
from transformers import AutoTokenizer, T5ForConditionalGeneration
DEVICE = 'cuda'
tokenizer = AutoTokenizer.from_pretrained("hululuzhu/solidity-t5")
model = T5ForConditionalGeneration.from_pretrained("hululuzhu/solidity-t5").to(DEVICE)
text = """pragma solidity ^0.5.7;
// 上下文: ParentA | 函数: helloA helloB | 常量: constantA
contract HelloWorld is ParentA {"""
input_ids = tokenizer(text, return_tensors="pt", truncation=True).input_ids.to(DEVICE)
generated_ids = model.generate(input_ids, max_length=256, num_beams=5, top_p=0.95, top_k=50)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
"""
string public constant name = "Hello World";
...
uint256 public constant override returns (uint256) {
return initialSupply;
}
function initialSupply() public view returns (uint256) {
...
"""
背景信息
- 基础T5代码模型: https://huggingface.co/Salesforce/codet5-large
- 源数据: https://huggingface.co/datasets/mwritescode/slither-audited-smart-contracts
- 处理步骤: 清洗、合约级分段、输入输出拆分
- 处理后的输入样本:
pragma solidity 0.5.7;
// 上下文: PauserRole | 函数: isPauser addPauser renouncePauser | 常量:
contract Pausable is PauserRole {
- 处理后的输出样本(注意缩进不规范是为减少token量有意为之):
event Paused(address account);
event Unpaused(address account);
bool private _pausableActive;
bool private _paused;
constructor () internal {
_paused = false;
}
function paused() public view returns (bool) {
return _paused;
}
modifier whenNotPaused() {
require(!_paused);
_;
}
modifier whenPaused() {
require(_paused);
_;
}
function pause() public onlyPauser whenNotPaused whenPausableActive {
_paused = true;
emit Paused(msg.sender);
}
function unpause() public onlyPauser whenPaused whenPausableActive {
_paused = false;
emit Unpaused(msg.sender);
}
function _setPausableActive(bool _active) internal {
_pausableActive = _active;
}
modifier whenPausableActive() {
require(_pausableActive);
_;
}
}
- 训练代码: 参见代码目录下的端到端笔记本
未来计划
- 由于GPU预算不足,当前模型训练显著不足(需约$100的Colab资源进行完整训练)
- 当前使用方式存在局限,未来可能改用GPT2纯解码架构对比,但CodeT5具有显著的代码优化优势
- 需要更多分类器(如T5/BERT架构)来检测潜在缺陷