Granite 3.1 8B โ€” Basic Fantasy RPG (OSFT v2)

A domain-expert language model fine-tuned on Basic Fantasy Role-Playing Game source material using OSFT (Orthogonal Subspace Fine-Tuning), a continual learning method that preserves the base model's general knowledge while injecting new domain expertise.

This is the v2 release, trained on significantly improved synthetic data that closed 68% of the performance gap vs the base model compared to v1.

Model Details

Field Value
Base Model ibm-granite/granite-3.1-8b-instruct
Fine-tuning Method OSFT (Orthogonal Subspace Fine-Tuning)
Parameters 8B
Precision bfloat16
License Apache 2.0
Domain Basic Fantasy RPG rules, monsters, equipment, spells, and gameplay
Source Code github.com/RobbieJ/frank

Evaluation Results

Evaluated on 210 questions across 16 BFRPG categories, scored 0-10 by an automated LLM judge (qwen3-14b) against reference answers.

Overall Performance

Metric OSFT v2 Fine-tuned Base Delta
Mean Score 6.98 7.12 -0.14
Median Score 10.0 10.0 +0.0
Std Deviation 3.65 3.71

Head-to-Head: OSFT v2 wins 23 / Base wins 34 / Ties 153 Win Rate (excl. ties): 40.4%

Performance Across Training Iterations

Iteration Method Training Data Judge FT Score Base Score Delta
v1 LoRA SFT 7,600 (low-quality) granite-3-2-8b 4.12 5.54 -1.41
v2 OSFT 7,600 (low-quality) granite-3-2-8b 4.01 5.52 -1.51
v3 (this) OSFT 5,717 (high-quality) qwen3-14b 6.98 7.12 -0.14

Results by Category

Category Questions OSFT v2 Base Delta OSFT Wins Base Wins Ties
Core Monsters 10 9.1 7.5 +1.6 2 0 8
Weapons 15 7.7 6.7 +1.0 3 1 11
Rules 8 8.6 7.9 +0.7 2 1 5
Armor 10 8.1 7.7 +0.4 1 0 9
Combat 12 5.8 5.4 +0.4 3 3 6
Thief Abilities 8 9.3 9.2 +0.0 1 1 6
Field Guide 30 8.2 8.2 -0.0 3 2 25
Spells 14 5.8 5.8 +0.0 2 1 11
Gear 15 6.8 6.7 +0.0 1 2 12
Beginner's Guide 12 4.0 4.3 -0.3 2 2 8
Character Creation 12 5.4 5.8 -0.4 1 1 10
Animals & Vehicles 15 7.9 8.4 -0.5 1 1 13
Classes 14 5.7 6.5 -0.8 0 6 8
Movement 10 7.0 8.3 -1.3 1 1 8
Races 10 5.1 7.2 -2.1 0 4 6
Monster Index 15 5.8 8.3 -2.5 0 8 7

Key Improvement: Training Data Quality

The biggest change in v2 was not the algorithm โ€” it was the training data:

Metric v1 Data v2 Data
Mean response length 192 chars 386 chars
Has numeric content 22.1% 71.2%
Source coverage 200/476 chunks 466/466 chunks
Teacher model granite-3-2-8b (8B) qwen3-14b (14B)
Meta-reference artifacts 12.2% <1.3%

Why OSFT?

Traditional fine-tuning (SFT/LoRA) on domain data caused significant knowledge degradation โ€” the base model scored 5.54 vs the LoRA fine-tuned model's 4.12 (granite judge). OSFT addresses this by:

  1. Computing SVD of each weight matrix to identify critical dimensions
  2. Freezing the most critical 85% of weight dimensions
  3. Only updating the least critical 15% (unfreeze_rank_ratio=0.15)

This mathematically guarantees that new domain knowledge doesn't overwrite existing capabilities.

Training Details

Data

  • Source: 5 Basic Fantasy RPG PDFs (Core Rules r142, Beginner's Essentials r18, Field Guide Omnibus r4, Monster Index r7, Equipment Emporium r33)
  • Pipeline: PDF โ†’ Markdown (Docling) โ†’ Semantic chunking (466 chunks) โ†’ Q&A generation (qwen3-14b teacher) โ†’ Quality filtering โ†’ Merged dataset
  • Training examples: 5,717 instruction-response pairs (3,678 v2 high-quality + 2,039 best of v1)
  • Coverage: All 466 document chunks, 16 categories
  • Format: ChatML with system prompt, user question, and assistant answer

Hyperparameters

Parameter Value
Algorithm OSFT
Unfreeze rank ratio 0.15
Epochs 3
Learning rate 5e-6
LR scheduler Cosine
Effective batch size 16
Max tokens per GPU 256
Max sequence length 1024
Train dtype float32
Save dtype bfloat16
Total training steps 1,074

Training Loss

Milestone Loss
Step 1 1.97
End of Epoch 1 (step 358) ~1.05
End of Epoch 2 (step 716) ~0.90
End of Epoch 3 (step 1074) 0.85

Hardware

  • GPU: NVIDIA DGX Spark (GB10 Grace Blackwell Superchip)
  • Memory: 128 GB unified CPU+GPU
  • Peak memory usage: 78.4 GB
  • Training throughput: ~75 tokens/sec
  • Total training time: ~10 hours (including ~35 min SVD initialization)

Toolchain

  • Training Hub v0.4.0 (OSFT implementation)
  • Docling (PDF to markdown conversion)
  • Custom SDG script with qwen3-14b teacher model
  • PyTorch 2.9.0+cu130

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "RobbieJ/granite-3.1-8b-bfrpg-osft"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="bfloat16", device_map="auto")

messages = [
    {"role": "system", "content": "You are an expert on Basic Fantasy Role-Playing Game (BFRPG). You provide accurate, helpful answers about BFRPG rules, character creation, combat, spells, monsters, equipment, and gameplay."},
    {"role": "user", "content": "What is the Armor Class and cost of plate mail in BFRPG?"},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))

Limitations

  • Remaining gap: The fine-tuned model scores -0.14 points below the base model overall. Categories like Monster Index (-2.5) and Races (-2.1) have persistent gaps.
  • Response length: Generates shorter responses than the base model, which may affect perceived completeness.
  • Numeric precision: Some specific values (prices, percentages) may be plausible but incorrect โ€” the model learns patterns better than exact data points.
  • BFRPG-specific: Tuned for Basic Fantasy RPG only. May not generalize to other RPG systems.
  • Inherited limitations: All limitations of the base Granite 3.1 8B model apply.

Intended Use

  • BFRPG game masters looking for rules lookup assistance
  • Players needing character creation, spell, or equipment reference help
  • RPG content creators working with BFRPG material
  • Research into domain-specific fine-tuning with knowledge preservation (OSFT)
  • Educational reference for LLM fine-tuning pipelines

Ethical Considerations

This model is fine-tuned on open-source RPG content (Basic Fantasy RPG is released under the Open Game License). It generates fictional game content and should not be used for real-world decision-making.

Downloads last month
15
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RobbieJ/granite-3.1-8b-bfrpg-osft

Finetuned
(14)
this model

Evaluation results