Solidity LLM (Open-Sourced)
Solidity LLM (Open-Sourced)
Solidity LLM is a specialized Large Language Model (LLM) developed by ChainGPT, finely tuned to efficiently generate, understand, and analyze Solidity smart contracts. Designed explicitly for the decentralized development ecosystem, Solidity LLM delivers exceptional results, significantly outperforming larger models in syntax accuracy (~83% compilation success), gas optimization (~72% efficiency), and adherence to established standards (~65% OpenZeppelin compliance). By using Solidity LLM, developers achieve faster development cycles, reduced debugging time, and substantial cost savings.

Model Information
Developer: ChainGPT
License: MIT License
Base Model: Salesforce/codegen-2B-multi
Key Technical Details
Model Type: Causal Language Model (Code Generation)
Tokenizer: GPT2Tokenizer
Parameters: 2 Billion
Transformer Layers: 32
Context Length: 2048 tokens
Data Type: bfloat16
Demo & Deployment
Performance Benchmark
Solidity LLM was benchmarked against leading LLMs (GPT-4.5 Preview, GPT-4o mini, Qwen 2.5-Coder-7B, DeepSeek-Coder-7B). Key metrics included:

Metric
Solidity LLM
GPT-4.5
GPT-4o mini
Qwen
DeeapSeek
Compilation Success Rate
83%
50%
30%
20%
15%
OpenZeppelin Compliance
65%
75%
70%
50%
40%
Gas Efficiency
72%
68%
70%
60%
55%
Security Posture
58%
70%
65%
55%
50%
Line-of-Code Efficiency
70%
68%
69%
60%
58%
These benchmarks reflect Solidity LLM’s exceptional efficiency, accuracy, and cost-effectiveness.
Use Cases
Direct Use
Smart contract development assistance
Solidity educational resources
Documentation and template creation
Downstream Applications
Integrated Development Environments (IDEs)
Autonomous blockchain agents
Out-of-Scope Uses
General-purpose coding (other languages)
Legal auditing or formal verification without human oversight
Production deployment without manual review
Risks, Biases, and Limitations
Possible biases from training datasets
Occasional hallucinations or logically incorrect outputs
Caution required in financial or high-stakes scenarios
Recommendation: Always conduct manual code reviews and thorough testing before deploying generated code.
Getting Started
Requirements
pip install transformers==4.51.3 torch==2.7.0 accelerate==1.6.0
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
modelpath = "Chain-GPT/Solidity-LLM"
tokenizer = AutoTokenizer.from_pretrained(modelpath)
model = AutoModelForCausalLM.from_pretrained(modelpath).to("cuda")
prompt = "Write a Solidity function to transfer tokens."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# Generate Solidity code
outputs = model.generate(**inputs, max_new_tokens=1400, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
Streaming Mode (Direct Code Generation)
import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
model = AutoModelForCausalLM.from_pretrained(
"Chain-GPT/Solidity-LLM",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("Chain-GPT/Solidity-LLM")
prompt = "Develop a Solidity Contract for a lottery requiring 1 ETH for registration with a 10 ETH reward."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
Thread(target=model.generate, kwargs={
"input_ids": inputs["input_ids"],
"max_new_tokens": 1800,
"temperature": 0.7,
"do_sample": True,
"streamer": streamer
}).start()
for chunk in streamer:
print(chunk, end="", flush=True)
Training Details
Compute Resources: 80 GB GPU cluster (4 GPUs)
Training Duration: ~1095 hours (1.5 months)
Pre-training: 1 billion tokens of raw Solidity data
Fine-tuning dataset:
Solidity version ≥ 0.5
200-4000 tokens per contract
650,000 curated, deduplicated instructions
Future Roadmap
High
Enhanced Solidity & OpenZeppelin support
Q3 2025
Medium
In-line code editing tools
Q4 2025
Medium
Expanded compatibility (e.g., Rust for Solana)
Q1 2026
Low
Increased context capacity
Q2 2026
Community & Support
HuggingFace: https://huggingface.co/Chain-GPT/Solidity-LLM
Conclusion
Solidity LLM by ChainGPT empowers Web3 developers with a reliable, high-performance model explicitly crafted for Solidity smart contract generation, combining robust technical performance with tangible business impact.
Last updated
Was this helpful?