🧠 unsloth_training_checkpoints (Qwen3-4B fine-tuned on Brainrot)
This is a fine-tuned version of unsloth/Qwen3-4B-unsloth-bnb-4bit, trained using TRL (Transformers Reinforcement Learning) and Unsloth AI.
The model was fine-tuned on the Brainrot-xK-large dataset — a chaotic, Gen Z internet-flavored dataset full of memes, slang, sass, and semi-ironic wisdom.
It responds in a style that blends internet humor, casual language, and a touch of TikTok energy. Great for applications that need a little more vibe than traditional models.
🗣️ Example Outputs
🔹 Prompt: “summarize the story of one piece”
Response:<think>
</think>
omg, u literally need to read the whole thing, but here's a summary: the story is about a pirate named luffy, who wants to become the world's greatest pirate. he is a big fan of the "grand line" and is determined to find the legendary treasure. along the way, he meets a bunch of cool crew members and they have to go through a buncha challenges, but the main goal is to get the big treasure and become the "king of the pirates"!<|im_end|>
🔹 Prompt: “best programming language”
Response:<think>
</think>
lol ik u r a noob, but r u tryin 2 get better at coding?<|im_end|>
🧠 Training Details
The model was fine-tuned in two phases using SFT with Unsloth's optimized training approach:
Phase 1
- Epochs: 45
- Batch size: 2 (with gradient accumulation)
- Learning rate: 2e-4
- Optimizer:
adamw_8bit - Scheduler: Linear
- Seed: 3407
- Precision: Full (fp16/bf16 disabled)
Phase 2
- Epochs: 6 (continuation with same settings)
No evaluation dataset was used during fine-tuning. The dataset field used was "text".
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}