Solidity LLM (Open-Sourced)

Solidity LLM is a specialized Large Language Model (LLM) developed by ChainGPT, finely tuned to efficiently generate, understand, and analyze Solidity smart contracts. Designed explicitly for the decentralized development ecosystem, Solidity LLM delivers exceptional results, significantly outperforming larger models in syntax accuracy (~83% compilation success), gas optimization (~72% efficiency), and adherence to established standards (~65% OpenZeppelin compliance). By using Solidity LLM, developers achieve faster development cycles, reduced debugging time, and substantial cost savings.

Model Information

Developer: ChainGPT
License: MIT License
Base Model: Salesforce/codegen-2B-multi

Key Technical Details

Model Type: Causal Language Model (Code Generation)
Tokenizer: GPT2Tokenizer
Parameters: 2 Billion
Transformer Layers: 32
Context Length: 2048 tokens
Data Type: bfloat16

Demo & Deployment

Performance Benchmark

Solidity LLM was benchmarked against leading LLMs (GPT-4.5 Preview, GPT-4o mini, Qwen 2.5-Coder-7B, DeepSeek-Coder-7B). Key metrics included:

Metric

Solidity LLM

GPT-4.5

GPT-4o mini

Qwen

DeeapSeek

Compilation Success Rate

83%

50%

30%

20%

15%

OpenZeppelin Compliance

65%

75%

70%

50%

40%

Gas Efficiency

72%

68%

70%

60%

55%

Security Posture

58%

70%

65%

55%

50%

Line-of-Code Efficiency

70%

68%

69%

60%

58%

These benchmarks reflect Solidity LLM’s exceptional efficiency, accuracy, and cost-effectiveness.

Use Cases

Direct Use

Smart contract development assistance
Solidity educational resources
Documentation and template creation

Downstream Applications

Integrated Development Environments (IDEs)
Autonomous blockchain agents

Out-of-Scope Uses

General-purpose coding (other languages)
Legal auditing or formal verification without human oversight
Production deployment without manual review

Risks, Biases, and Limitations

Possible biases from training datasets
Occasional hallucinations or logically incorrect outputs
Caution required in financial or high-stakes scenarios

Recommendation: Always conduct manual code reviews and thorough testing before deploying generated code.

Getting Started

Requirements

pip install transformers==4.51.3 torch==2.7.0 accelerate==1.6.0

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

modelpath = "Chain-GPT/Solidity-LLM"
tokenizer = AutoTokenizer.from_pretrained(modelpath)
model = AutoModelForCausalLM.from_pretrained(modelpath).to("cuda")

prompt = "Write a Solidity function to transfer tokens."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate Solidity code
outputs = model.generate(**inputs, max_new_tokens=1400, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

Streaming Mode (Direct Code Generation)

import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained(
    "Chain-GPT/Solidity-LLM",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("Chain-GPT/Solidity-LLM")

prompt = "Develop a Solidity Contract for a lottery requiring 1 ETH for registration with a 10 ETH reward."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

Thread(target=model.generate, kwargs={
    "input_ids": inputs["input_ids"],
    "max_new_tokens": 1800,
    "temperature": 0.7,
    "do_sample": True,
    "streamer": streamer
}).start()

for chunk in streamer:
    print(chunk, end="", flush=True)

Training Details

Compute Resources: 80 GB GPU cluster (4 GPUs)
Training Duration: ~1095 hours (1.5 months)
Pre-training: 1 billion tokens of raw Solidity data
Fine-tuning dataset:
- Solidity version ≥ 0.5
- 200-4000 tokens per contract
- 650,000 curated, deduplicated instructions

Future Roadmap

Priority

Feature

Timeline

High

Enhanced Solidity & OpenZeppelin support

Q3 2025

Medium

In-line code editing tools

Q4 2025

Medium

Expanded compatibility (e.g., Rust for Solana)

Q1 2026

Low

Increased context capacity

Q2 2026

Community & Support

HuggingFace: https://huggingface.co/Chain-GPT/Solidity-LLM
Discord Community

Conclusion

Solidity LLM by ChainGPT empowers Web3 developers with a reliable, high-performance model explicitly crafted for Solidity smart contract generation, combining robust technical performance with tangible business impact.

Last updated 30 minutes ago

Was this helpful?