Instructions to use M-Alkassem/qwen2.5-coder-3b-final-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use M-Alkassem/qwen2.5-coder-3b-final-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="M-Alkassem/qwen2.5-coder-3b-final-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("M-Alkassem/qwen2.5-coder-3b-final-merged")
model = AutoModelForCausalLM.from_pretrained("M-Alkassem/qwen2.5-coder-3b-final-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use M-Alkassem/qwen2.5-coder-3b-final-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "M-Alkassem/qwen2.5-coder-3b-final-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-final-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-final-merged

SGLang

How to use M-Alkassem/qwen2.5-coder-3b-final-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "M-Alkassem/qwen2.5-coder-3b-final-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-final-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "M-Alkassem/qwen2.5-coder-3b-final-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "M-Alkassem/qwen2.5-coder-3b-final-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use M-Alkassem/qwen2.5-coder-3b-final-merged with Docker Model Runner:
```
docker model run hf.co/M-Alkassem/qwen2.5-coder-3b-final-merged
```

qwen2.5-coder-3b-final-merged

This repository contains the final standalone merged model for the project.

It was created by merging:

base model: Qwen/Qwen2.5-Coder-3B-Instruct
final adapter: M-Alkassem/qwen2.5-coder-3b-agent-v1

What This Model Is

This is the final merged result of a two-stage low-resource adaptation pipeline built on Google Colab using T4 GPU.

Project stages:

coding-focused fine-tuning
agent-oriented continued fine-tuning
final merge into one standalone model

The final agent adapter was trained by continuing from the coding adapter, so this merged model represents the latest learned state after both fine-tuning stages.

Training Background

Stage 1: Coding Fine-Tune

Dataset:

bigcode/self-oss-instruct-sc2-exec-filter-50k

Setup:

sampled rows before filtering: 4000
rows used after filtering: 3993
max sequence length: 1024
training steps: 250

Result:

final training loss: about 0.6130

Stage 2: Agent-Oriented Continued Fine-Tune

Dataset:

ernie-research/MEnvData-SWE-Trajectory

Setup:

sampled rows: 700
max sequence length: 1024
training steps: 150

Result:

final training loss: about 1.2940

Evaluation Notes

In the direct-answer benchmark, the original base model remained the strongest plain answer-only model overall.

The main value of this final merged model is different:

it is the final standalone artifact of the project
it is more aligned to constrained tool-using workflows
it performed best when used as the reasoning core of a lightweight coding agent

The benchmark summary image above shows the plain prompting comparison:

Base model overall mean: 3.97
Coding adapter overall mean: 2.97
Agent adapter overall mean: 1.77

The agent workflow image shows the documented agent_v2 result where the model:

ran failing tests
identified a bug
rewrote code
reran tests
stopped after success

How To Load

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_ID = "M-Alkassem/qwen2.5-coder-3b-final-merged"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)

model.eval()

Related Repositories

coding adapter: M-Alkassem/qwen2.5-coder-3b-unsloth-lora
agent adapter: M-Alkassem/qwen2.5-coder-3b-agent-v1

References

Base model: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct
Coding adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-unsloth-lora
Agent adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-agent-v1
Coding dataset: https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k
Agent dataset: https://huggingface.co/datasets/ernie-research/MEnvData-SWE-Trajectory

Citation

If you use this model, please cite:

@article{hui2024qwen2p5coder,
  title={Qwen2.5-Coder Technical Report},
  author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jing and Liu, Dayiheng and Zhang, Liqun and Liu, Tianyang and Zhang, Jiawei and Yu, Bo and Lu, Kaican and others},
  journal={arXiv preprint arXiv:2409.12186},
  year={2024}
}