Solidity LLM (Open-Sourced)

Solidity LLM (Open-Sourced)

Solidity LLM is a specialized Large Language Model (LLM) developed by ChainGPT, finely tuned to efficiently generate, understand, and analyze Solidity smart contracts. Designed explicitly for the decentralized development ecosystem, Solidity LLM delivers exceptional results, significantly outperforming larger models in syntax accuracy (~83% compilation success), gas optimization (~72% efficiency), and adherence to established standards (~65% OpenZeppelin compliance). By using Solidity LLM, developers achieve faster development cycles, reduced debugging time, and substantial cost savings.


Model Information

  • Developer: ChainGPT

  • License: MIT License

  • Base Model: Salesforce/codegen-2B-multi

Key Technical Details

  • Model Type: Causal Language Model (Code Generation)

  • Tokenizer: GPT2Tokenizer

  • Parameters: 2 Billion

  • Transformer Layers: 32

  • Context Length: 2048 tokens

  • Data Type: bfloat16

Demo & Deployment


Performance Benchmark

Solidity LLM was benchmarked against leading LLMs (GPT-4.5 Preview, GPT-4o mini, Qwen 2.5-Coder-7B, DeepSeek-Coder-7B). Key metrics included:

Metric

Solidity LLM

GPT-4.5

GPT-4o mini

Qwen

DeeapSeek

Compilation Success Rate

83%

50%

30%

20%

15%

OpenZeppelin Compliance

65%

75%

70%

50%

40%

Gas Efficiency

72%

68%

70%

60%

55%

Security Posture

58%

70%

65%

55%

50%

Line-of-Code Efficiency

70%

68%

69%

60%

58%

These benchmarks reflect Solidity LLM’s exceptional efficiency, accuracy, and cost-effectiveness.


Use Cases

Direct Use

  • Smart contract development assistance

  • Solidity educational resources

  • Documentation and template creation

Downstream Applications

  • Integrated Development Environments (IDEs)

  • Autonomous blockchain agents

Out-of-Scope Uses

  • General-purpose coding (other languages)

  • Legal auditing or formal verification without human oversight

  • Production deployment without manual review


Risks, Biases, and Limitations

  • Possible biases from training datasets

  • Occasional hallucinations or logically incorrect outputs

  • Caution required in financial or high-stakes scenarios

Recommendation: Always conduct manual code reviews and thorough testing before deploying generated code.


Getting Started

Requirements

pip install transformers==4.51.3 torch==2.7.0 accelerate==1.6.0

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

modelpath = "Chain-GPT/Solidity-LLM"
tokenizer = AutoTokenizer.from_pretrained(modelpath)
model = AutoModelForCausalLM.from_pretrained(modelpath).to("cuda")

prompt = "Write a Solidity function to transfer tokens."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate Solidity code
outputs = model.generate(**inputs, max_new_tokens=1400, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

Streaming Mode (Direct Code Generation)

import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained(
    "Chain-GPT/Solidity-LLM",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("Chain-GPT/Solidity-LLM")

prompt = "Develop a Solidity Contract for a lottery requiring 1 ETH for registration with a 10 ETH reward."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

Thread(target=model.generate, kwargs={
    "input_ids": inputs["input_ids"],
    "max_new_tokens": 1800,
    "temperature": 0.7,
    "do_sample": True,
    "streamer": streamer
}).start()

for chunk in streamer:
    print(chunk, end="", flush=True)

Training Details

  • Compute Resources: 80 GB GPU cluster (4 GPUs)

  • Training Duration: ~1095 hours (1.5 months)

  • Pre-training: 1 billion tokens of raw Solidity data

  • Fine-tuning dataset:

    • Solidity version ≥ 0.5

    • 200-4000 tokens per contract

    • 650,000 curated, deduplicated instructions


Future Roadmap

Priority
Feature
Timeline

High

Enhanced Solidity & OpenZeppelin support

Q3 2025

Medium

In-line code editing tools

Q4 2025

Medium

Expanded compatibility (e.g., Rust for Solana)

Q1 2026

Low

Increased context capacity

Q2 2026


Community & Support


Conclusion

Solidity LLM by ChainGPT empowers Web3 developers with a reliable, high-performance model explicitly crafted for Solidity smart contract generation, combining robust technical performance with tangible business impact.

Last updated

Was this helpful?