Git Commit Message Generator

Fine-tuned Qwen-0.5B model for generating professional Git commit messages from code diffs.

Model Description

This model was fine-tuned using LoRA (Low-Rank Adaptation) on the CommitPackFT dataset to generate concise, professional commit messages from git diffs.

Base Model: Qwen-0.5B
Fine-tuning Method: LoRA (r=16, alpha=32)
Training Data: 55K filtered commits from CommitPackFT
Languages: Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more

Intended Use

Generate commit messages for staged changes in a Git repository.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "rajtiwariee/auto-commit"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Prepare your diff
diff = """
Diff:
File: src/auth.py
Language: Python

Old content:
def login(username, password):
    user = get_user(username)
    if user.password == password:
        return True
    return False

New content:
def login(username, password):
    user = get_user(username)
    if user and user.password == password:
        return True
    return False
"""

# Generate commit message
prompt = f"Write a git commit message:\n\n{diff}\n\nCommit message:\n"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=30,
        do_sample=False,  # Deterministic
        pad_token_id=tokenizer.eos_token_id,
    )

message = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(message.split("Commit message:")[-1].strip())
# Output: "Check for user existence before accessing password"

CLI Tool

For easier usage, install the companion CLI tool from the GitHub repository:

pip install -e .
commit-gen generate --commit

Training Details

Training Data

  • Dataset: CommitPackFT (filtered subset)
  • Training samples: 55,730
  • Validation samples: 6,966
  • Test samples: 6,967

Training Procedure

  • Epochs: 3
  • Batch Size: 4 (effective batch size: 32 with gradient accumulation)
  • Learning Rate: 5e-5
  • Optimizer: AdamW
  • LoRA Config:
    • r: 16
    • alpha: 32
    • dropout: 0.05
    • target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Hardware

  • GPU: NVIDIA Tesla T4 (16GB)
  • Precision: Mixed Precision (FP32 weights + FP16 compute)
  • Training Time: ~7.5 hours

Evaluation Results

  • BLEU Score: 0.0244
  • ROUGE-1: 0.1968
  • ROUGE-2: 0.0420
  • ROUGE-L: 0.1816
  • Exact Match Rate: 0.00%

Limitations

  • The model is trained primarily on English commit messages
  • Best suited for code changes in common programming languages
  • May not handle very large diffs well (>384 tokens)
  • Generated messages should be reviewed before committing

Ethical Considerations

This model is intended to assist developers in writing commit messages, not replace human judgment. Users should:

  • Review generated messages for accuracy
  • Ensure messages accurately describe the changes
  • Follow their team's commit message conventions

Citation

@misc{git-commit-generator,
  author = {Raj Tiwari},
  title = {Git Commit Message Generator},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/rajtiwariee/auto-commit}},
}

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rajtiwariee/auto-commit