# Solidity LLM (Open-Sourced)

## Solidity LLM (Open-Sourced)

Solidity LLM is a specialized Large Language Model (LLM) developed by ChainGPT, finely tuned to efficiently generate, understand, and analyze Solidity smart contracts. Designed explicitly for the decentralized development ecosystem, Solidity LLM delivers exceptional results, significantly outperforming larger models in syntax accuracy (\~83% compilation success), gas optimization (\~72% efficiency), and adherence to established standards (\~65% OpenZeppelin compliance). By using Solidity LLM, developers achieve faster development cycles, reduced debugging time, and substantial cost savings.

<figure><img src="https://2865549669-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F02IMVe3hN17zPTDRhn1f%2Fuploads%2F70zMqmKjWllSRPPwCs8v%2Ftelegram-cloud-photo-size-4-5769218759293258800-y.jpg?alt=media&#x26;token=6d946c2a-7945-468d-8f40-768834b81f93" alt=""><figcaption></figcaption></figure>

***

### Model Information

* Developer: [ChainGPT](https://chaingpt.org)
* License: MIT License
* Base Model: Salesforce/codegen-2B-multi

#### Key Technical Details

* Model Type: Causal Language Model (Code Generation)
* Tokenizer: GPT2Tokenizer
* Parameters: 2 Billion
* Transformer Layers: 32
* Context Length: 2048 tokens
* Data Type: bfloat16

#### Demo & Deployment

* [Hugging Face Model](https://huggingface.co/Chain-GPT/Solidity-LLM)
* [Interactive Demo](https://huggingface.co/spaces/Chain-GPT/ChainGPT-Solidity-LLM)

***

### Performance Benchmark

Solidity LLM was benchmarked against leading LLMs (GPT-4.5 Preview, GPT-4o mini, Qwen 2.5-Coder-7B, DeepSeek-Coder-7B). Key metrics included:

<figure><img src="https://2865549669-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F02IMVe3hN17zPTDRhn1f%2Fuploads%2FKWCVDGrvkmGlnXVxCGZD%2Fimage.png?alt=media&#x26;token=aeb4ca39-b25e-4bad-8ec2-aabf67fa000f" alt=""><figcaption></figcaption></figure>

| **Metric**               | **Solidity LLM** | **GPT-4.5** | **GPT-4o mini** | **Qwen** | **DeeapSeek** |
| ------------------------ | ---------------- | ----------- | --------------- | -------- | ------------- |
| Compilation Success Rate | 83%              | 50%         | 30%             | 20%      | 15%           |
| OpenZeppelin Compliance  | 65%              | 75%         | 70%             | 50%      | 40%           |
| Gas Efficiency           | 72%              | 68%         | 70%             | 60%      | 55%           |
| Security Posture         | 58%              | 70%         | 65%             | 55%      | 50%           |
| Line-of-Code Efficiency  | 70%              | 68%         | 69%             | 60%      | 58%           |

*These benchmarks reflect Solidity LLM’s exceptional efficiency, accuracy, and cost-effectiveness.*

***

### Use Cases

#### Direct Use

* Smart contract development assistance
* Solidity educational resources
* Documentation and template creation

#### Downstream Applications

* Integrated Development Environments (IDEs)
* Autonomous blockchain agents

#### Out-of-Scope Uses

* General-purpose coding (other languages)
* Legal auditing or formal verification without human oversight
* Production deployment without manual review

***

### Risks, Biases, and Limitations

* Possible biases from training datasets
* Occasional hallucinations or logically incorrect outputs
* Caution required in financial or high-stakes scenarios

Recommendation: Always conduct manual code reviews and thorough testing before deploying generated code.

***

### Getting Started

#### Requirements

```
pip install transformers==4.51.3 torch==2.7.0 accelerate==1.6.0
```

#### Basic Usage

```
from transformers import AutoModelForCausalLM, AutoTokenizer

modelpath = "Chain-GPT/Solidity-LLM"
tokenizer = AutoTokenizer.from_pretrained(modelpath)
model = AutoModelForCausalLM.from_pretrained(modelpath).to("cuda")

prompt = "Write a Solidity function to transfer tokens."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

# Generate Solidity code
outputs = model.generate(**inputs, max_new_tokens=1400, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)
```

#### Streaming Mode (Direct Code Generation)

```
import torch
from threading import Thread
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

model = AutoModelForCausalLM.from_pretrained(
    "Chain-GPT/Solidity-LLM",
    torch_dtype=torch.bfloat16,
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("Chain-GPT/Solidity-LLM")

prompt = "Develop a Solidity Contract for a lottery requiring 1 ETH for registration with a 10 ETH reward."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

Thread(target=model.generate, kwargs={
    "input_ids": inputs["input_ids"],
    "max_new_tokens": 1800,
    "temperature": 0.7,
    "do_sample": True,
    "streamer": streamer
}).start()

for chunk in streamer:
    print(chunk, end="", flush=True)
```

***

### Training Details

* Compute Resources: 80 GB GPU cluster (4 GPUs)
* Training Duration: \~1095 hours (1.5 months)
* Pre-training: 1 billion tokens of raw Solidity data
* Fine-tuning dataset:
  * Solidity version ≥ 0.5
  * 200-4000 tokens per contract
  * 650,000 curated, deduplicated instructions

***

### Future Roadmap

<table><thead><tr><th width="105.046875">Priority</th><th width="372.4921875">Feature</th><th>Timeline</th></tr></thead><tbody><tr><td>High</td><td>Enhanced Solidity &#x26; OpenZeppelin support</td><td>Q3 2025</td></tr><tr><td>Medium</td><td>In-line code editing tools</td><td>Q4 2025</td></tr><tr><td>Medium</td><td>Expanded compatibility (e.g., Rust for Solana)</td><td>Q1 2026</td></tr><tr><td>Low</td><td>Increased context capacity</td><td>Q2 2026</td></tr></tbody></table>

***

### Community & Support

* HuggingFace: <https://huggingface.co/Chain-GPT/Solidity-LLM>
* [Discord Community](https://discord.gg/chaingpt)

***

### Conclusion

Solidity LLM by ChainGPT empowers Web3 developers with a reliable, high-performance model explicitly crafted for Solidity smart contract generation, combining robust technical performance with tangible business impact.
