See axolotl config
axolotl version: 0.13.0.dev0
base_model: HuggingFaceTB/SmolLM3-3B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
datasets:
- path: woodman231/monster-sanctuary-conversations
type: chat_template
chat_template: chatml
dataset_prepared_path: /workspace/data/axolotl_prepared_data
val_set_size: 0.1
output_dir: /workspace/data/outputs
special_tokens:
bos_token: <|im_start|>
eos_token: <|im_end|>
pad_token: <|im_end|>
# FULL PRECISION FOR QUALITY
load_in_4bit: false # No quantization on H100
adapter: lora # Regular LoRA
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
# LARGE BATCH SETTINGS
gradient_accumulation_steps: 1 # No accumulation needed
micro_batch_size: 32 # Max batch size (can try 24-32)
# LEARNING RATE
learning_rate: 0.00004 # Slightly higher LR for full precision
# FULL TRAINING
num_epochs: 8 # Full 8 epochs
# max_steps: not set # Train to completion
eval_steps: 100 # Evaluate every 100 steps
# H100 OPTIMIZATIONS
flash_attention: true # H100 has flash attention support
bf16: true # BFloat16 for H100
gradient_checkpointing: true # Trade compute for memory (optional)
# OUTPUT
hub_model_id: woodman231/SmolLM3-3B-monster-sanctuary-lora
save_steps: 100 # Save every 100 steps
save_total_limit: 3 # Keep only 3 checkpoints
SmolLM3-3B-monster-sanctuary-lora
This model is a fine-tuned version of HuggingFaceTB/SmolLM3-3B on the woodman231/monster-sanctuary-conversations dataset. It achieves the following results on the evaluation set:
- Loss: 0.0002
- Memory/max Active (gib): 26.77
- Memory/max Allocated (gib): 26.77
- Memory/device Reserved (gib): 37.58
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 13384
Training results
| Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 2.9715 | 26.28 | 26.28 | 26.43 |
| 0.8052 | 0.0598 | 100 | 0.7336 | 26.77 | 26.77 | 37.52 |
| 0.4467 | 0.1195 | 200 | 0.3855 | 26.77 | 26.77 | 37.58 |
| 0.2207 | 0.1793 | 300 | 0.2723 | 26.77 | 26.77 | 37.58 |
| 0.233 | 0.2391 | 400 | 0.2125 | 26.77 | 26.77 | 37.58 |
| 0.157 | 0.2989 | 500 | 0.1825 | 26.77 | 26.77 | 37.58 |
| 0.1464 | 0.3586 | 600 | 0.1664 | 26.77 | 26.77 | 37.58 |
| 0.1278 | 0.4184 | 700 | 0.1466 | 26.77 | 26.77 | 37.58 |
| 0.1467 | 0.4782 | 800 | 0.1251 | 26.77 | 26.77 | 37.58 |
| 0.1148 | 0.5380 | 900 | 0.1141 | 26.77 | 26.77 | 37.58 |
| 0.1086 | 0.5977 | 1000 | 0.1078 | 26.77 | 26.77 | 37.58 |
| 0.1137 | 0.6575 | 1100 | 0.0998 | 26.77 | 26.77 | 37.58 |
| 0.0975 | 0.7173 | 1200 | 0.0896 | 26.77 | 26.77 | 37.67 |
| 0.092 | 0.7770 | 1300 | 0.0857 | 26.77 | 26.77 | 37.58 |
| 0.0677 | 0.8368 | 1400 | 0.0838 | 26.77 | 26.77 | 37.58 |
| 0.0704 | 0.8966 | 1500 | 0.0749 | 26.77 | 26.77 | 37.58 |
| 0.0914 | 0.9564 | 1600 | 0.0683 | 26.77 | 26.77 | 37.58 |
| 0.0641 | 1.0161 | 1700 | 0.0605 | 26.77 | 26.77 | 37.58 |
| 0.0556 | 1.0759 | 1800 | 0.0569 | 26.77 | 26.77 | 37.67 |
| 0.0495 | 1.1357 | 1900 | 0.0546 | 26.77 | 26.77 | 37.5 |
| 0.0402 | 1.1955 | 2000 | 0.0507 | 26.77 | 26.77 | 37.58 |
| 0.0442 | 1.2552 | 2100 | 0.0472 | 26.77 | 26.77 | 37.58 |
| 0.0355 | 1.3150 | 2200 | 0.0446 | 26.77 | 26.77 | 37.58 |
| 0.0352 | 1.3748 | 2300 | 0.0393 | 26.77 | 26.77 | 37.58 |
| 0.0288 | 1.4345 | 2400 | 0.0373 | 26.77 | 26.77 | 37.48 |
| 0.036 | 1.4943 | 2500 | 0.0341 | 26.77 | 26.77 | 37.58 |
| 0.0299 | 1.5541 | 2600 | 0.0299 | 26.77 | 26.77 | 37.58 |
| 0.0282 | 1.6139 | 2700 | 0.0279 | 26.77 | 26.77 | 37.58 |
| 0.0212 | 1.6736 | 2800 | 0.0248 | 26.77 | 26.77 | 37.58 |
| 0.0252 | 1.7334 | 2900 | 0.0231 | 26.77 | 26.77 | 37.58 |
| 0.0234 | 1.7932 | 3000 | 0.0201 | 26.77 | 26.77 | 37.5 |
| 0.0175 | 1.8530 | 3100 | 0.0188 | 26.77 | 26.77 | 37.5 |
| 0.0154 | 1.9127 | 3200 | 0.0154 | 26.77 | 26.77 | 37.58 |
| 0.0166 | 1.9725 | 3300 | 0.0141 | 26.77 | 26.77 | 37.58 |
| 0.0123 | 2.0323 | 3400 | 0.0135 | 26.77 | 26.77 | 37.5 |
| 0.0112 | 2.0921 | 3500 | 0.0130 | 26.77 | 26.77 | 37.58 |
| 0.0119 | 2.1518 | 3600 | 0.0119 | 26.77 | 26.77 | 37.58 |
| 0.0079 | 2.2116 | 3700 | 0.0118 | 26.77 | 26.77 | 37.58 |
| 0.0105 | 2.2714 | 3800 | 0.0113 | 26.77 | 26.77 | 37.58 |
| 0.0072 | 2.3311 | 3900 | 0.0087 | 26.77 | 26.77 | 37.67 |
| 0.0067 | 2.3909 | 4000 | 0.0080 | 26.77 | 26.77 | 37.58 |
| 0.0081 | 2.4507 | 4100 | 0.0090 | 26.77 | 26.77 | 37.58 |
| 0.0058 | 2.5105 | 4200 | 0.0088 | 26.77 | 26.77 | 37.58 |
| 0.008 | 2.5702 | 4300 | 0.0075 | 26.77 | 26.77 | 37.58 |
| 0.006 | 2.6300 | 4400 | 0.0068 | 26.77 | 26.77 | 37.65 |
| 0.0063 | 2.6898 | 4500 | 0.0076 | 26.77 | 26.77 | 37.58 |
| 0.0066 | 2.7496 | 4600 | 0.0074 | 26.77 | 26.77 | 37.5 |
| 0.0059 | 2.8093 | 4700 | 0.0066 | 26.77 | 26.77 | 37.58 |
| 0.0044 | 2.8691 | 4800 | 0.0050 | 26.77 | 26.77 | 37.5 |
| 0.0061 | 2.9289 | 4900 | 0.0070 | 26.77 | 26.77 | 37.58 |
| 0.0043 | 2.9886 | 5000 | 0.0057 | 26.77 | 26.77 | 37.58 |
| 0.0027 | 3.0484 | 5100 | 0.0050 | 26.77 | 26.77 | 37.5 |
| 0.003 | 3.1082 | 5200 | 0.0048 | 26.77 | 26.77 | 37.58 |
| 0.0056 | 3.1680 | 5300 | 0.0039 | 26.77 | 26.77 | 37.58 |
| 0.0026 | 3.2277 | 5400 | 0.0044 | 26.77 | 26.77 | 37.48 |
| 0.0022 | 3.2875 | 5500 | 0.0038 | 26.77 | 26.77 | 37.58 |
| 0.0064 | 3.3473 | 5600 | 0.0033 | 26.77 | 26.77 | 37.58 |
| 0.004 | 3.4071 | 5700 | 0.0036 | 26.77 | 26.77 | 37.58 |
| 0.0038 | 3.4668 | 5800 | 0.0029 | 26.77 | 26.77 | 37.58 |
| 0.0017 | 3.5266 | 5900 | 0.0038 | 26.77 | 26.77 | 37.58 |
| 0.0016 | 3.5864 | 6000 | 0.0028 | 26.77 | 26.77 | 37.58 |
| 0.0018 | 3.6461 | 6100 | 0.0027 | 26.77 | 26.77 | 37.58 |
| 0.0037 | 3.7059 | 6200 | 0.0041 | 26.77 | 26.77 | 37.58 |
| 0.0016 | 3.7657 | 6300 | 0.0029 | 26.77 | 26.77 | 37.58 |
| 0.0012 | 3.8255 | 6400 | 0.0022 | 26.77 | 26.77 | 37.58 |
| 0.0027 | 3.8852 | 6500 | 0.0023 | 26.77 | 26.77 | 37.58 |
| 0.0009 | 3.9450 | 6600 | 0.0021 | 26.77 | 26.77 | 37.48 |
| 0.001 | 4.0048 | 6700 | 0.0029 | 26.77 | 26.77 | 37.58 |
| 0.0012 | 4.0646 | 6800 | 0.0015 | 26.77 | 26.77 | 37.58 |
| 0.0019 | 4.1243 | 6900 | 0.0016 | 26.77 | 26.77 | 37.58 |
| 0.0009 | 4.1841 | 7000 | 0.0022 | 26.77 | 26.77 | 37.58 |
| 0.0008 | 4.2439 | 7100 | 0.0016 | 26.77 | 26.77 | 37.58 |
| 0.0026 | 4.3036 | 7200 | 0.0015 | 26.77 | 26.77 | 37.5 |
| 0.0018 | 4.3634 | 7300 | 0.0012 | 26.77 | 26.77 | 37.58 |
| 0.0009 | 4.4232 | 7400 | 0.0015 | 26.77 | 26.77 | 37.67 |
| 0.0004 | 4.4830 | 7500 | 0.0018 | 26.77 | 26.77 | 37.58 |
| 0.001 | 4.5427 | 7600 | 0.0012 | 26.77 | 26.77 | 37.58 |
| 0.0026 | 4.6025 | 7700 | 0.0011 | 26.77 | 26.77 | 37.58 |
| 0.0005 | 4.6623 | 7800 | 0.0014 | 26.77 | 26.77 | 37.58 |
| 0.0007 | 4.7221 | 7900 | 0.0010 | 26.77 | 26.77 | 37.58 |
| 0.0011 | 4.7818 | 8000 | 0.0011 | 26.77 | 26.77 | 37.67 |
| 0.0004 | 4.8416 | 8100 | 0.0007 | 26.77 | 26.77 | 37.58 |
| 0.0003 | 4.9014 | 8200 | 0.0007 | 26.77 | 26.77 | 37.58 |
| 0.0009 | 4.9611 | 8300 | 0.0009 | 26.77 | 26.77 | 37.58 |
| 0.0004 | 5.0209 | 8400 | 0.0010 | 26.77 | 26.77 | 37.58 |
| 0.0009 | 5.0807 | 8500 | 0.0009 | 26.77 | 26.77 | 37.58 |
| 0.0006 | 5.1405 | 8600 | 0.0006 | 26.77 | 26.77 | 37.58 |
| 0.0003 | 5.2002 | 8700 | 0.0005 | 26.77 | 26.77 | 37.58 |
| 0.0002 | 5.2600 | 8800 | 0.0005 | 26.77 | 26.77 | 37.58 |
| 0.0002 | 5.3198 | 8900 | 0.0004 | 26.77 | 26.77 | 37.58 |
| 0.0005 | 5.3796 | 9000 | 0.0006 | 26.77 | 26.77 | 37.67 |
| 0.0003 | 5.4393 | 9100 | 0.0005 | 26.77 | 26.77 | 37.58 |
| 0.0006 | 5.4991 | 9200 | 0.0015 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 5.5589 | 9300 | 0.0005 | 26.77 | 26.77 | 37.48 |
| 0.0002 | 5.6186 | 9400 | 0.0004 | 26.77 | 26.77 | 37.58 |
| 0.0005 | 5.6784 | 9500 | 0.0004 | 26.77 | 26.77 | 37.58 |
| 0.0006 | 5.7382 | 9600 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0002 | 5.7980 | 9700 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 5.8577 | 9800 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0004 | 5.9175 | 9900 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0003 | 5.9773 | 10000 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.0371 | 10100 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.0968 | 10200 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0004 | 6.1566 | 10300 | 0.0003 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.2164 | 10400 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.2762 | 10500 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.3359 | 10600 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.3957 | 10700 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0003 | 6.4555 | 10800 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.5152 | 10900 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.5750 | 11000 | 0.0002 | 26.77 | 26.77 | 37.5 |
| 0.0001 | 6.6348 | 11100 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.6946 | 11200 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.7543 | 11300 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.8141 | 11400 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.8739 | 11500 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.9337 | 11600 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 6.9934 | 11700 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.0532 | 11800 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.1130 | 11900 | 0.0002 | 26.77 | 26.77 | 37.48 |
| 0.0 | 7.1727 | 12000 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.2325 | 12100 | 0.0002 | 26.77 | 26.77 | 37.48 |
| 0.0001 | 7.2923 | 12200 | 0.0002 | 26.77 | 26.77 | 37.5 |
| 0.0001 | 7.3521 | 12300 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.4118 | 12400 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.4716 | 12500 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.5314 | 12600 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.5912 | 12700 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.6509 | 12800 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0 | 7.7107 | 12900 | 0.0002 | 26.77 | 26.77 | 37.5 |
| 0.0001 | 7.7705 | 13000 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.8302 | 13100 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.8900 | 13200 | 0.0002 | 26.77 | 26.77 | 37.58 |
| 0.0001 | 7.9498 | 13300 | 0.0002 | 26.77 | 26.77 | 37.58 |
Framework versions
- PEFT 0.18.0
- Transformers 4.57.1
- Pytorch 2.8.0+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
- Downloads last month
- 1