Moonlit-SummaryStories-45M

Moonlit-SummaryStories-45M is a 45M-parameter TinyStories model specialized for Summary → Story generation. It starts from the pretrained checkpoint of razor5050/TinyStories-45M and is then supervised fine-tuned to take a short summary prompt and generate a complete TinyStories-style story.

What this model does

Input format:

Summary: A little fox is afraid of the dark until a glowing jar helps him find his way home.
Story:

The model continues with a full short story.

Model details

  • Architecture: LLaMA-style decoder-only transformer
  • Parameters: 45.46M
  • Hidden size: 512
  • Layers: 13
  • Attention heads: 8
  • KV heads (GQA): 4
  • Intermediate size: 1344
  • Vocabulary size: 16384
  • Context length: 512
  • Tokenizer: SentencePiece unigram

Training recipe

Pretraining base

  • Base model: razor5050/TinyStories-45M
  • Original pretraining dataset: roneneldan/TinyStories
  • Original pretraining epochs: 3

Fine-tuning task

  • Dataset source: roneneldan/TinyStoriesInstruct
  • Finetuning format: Summary: ... Story: → full story
  • Loss masking: prompt masked, loss only on story tokens
  • No truncation policy: only samples that fully fit 512 total tokens were kept
  • Usable SFT examples: 1702072

Exact usable dataset size under 512-token no-truncation rule

  • Train: 1685116
  • Validation: 16956
  • Total: 1702072

Fine-tuning hyperparameters

  • Epochs: 1
  • Effective batch size: 64
  • Micro-batch size: 8
  • Learning rate: 8e-5
  • Scheduler: cosine decay
  • Precision: FP16
  • Max sequence length: 512

Evaluation

  • Validation loss: 1.238696612096066
  • Perplexity: 3.4511122703552246
  • Example generations: see evaluation/40_prompts.json
  • Evaluation report: see evaluation/report.md

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "razor5050/Moonlit-SummaryStories-45M"
model = AutoModelForCausalLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

prompt = "Summary: A shy rabbit learns to sing with the help of fireflies.
Story:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=220,
    do_sample=True,
    temperature=0.8,
    top_p=0.95,
    top_k=50,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Files in this repo

  • Root: final finetuned model
  • checkpoints/base_pretrain_final/: pretrained base checkpoint used for finetuning
  • checkpoints/sft/: intermediate SFT checkpoints and final SFT export
  • evaluation/: metrics, prompt generations, and report

Hardware

  • Training GPU: NVIDIA RTX 3060 12GB
  • Intended deployment class: small creative story model

Notes

This model is optimized for TinyStories-style English story generation from a short summary prompt. Because the model context window is 512 tokens total, longer prompts reduce the available generation budget.


Generated on 2026-05-23 11:04:08

Downloads last month
1,093
Safetensors
Model size
45.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for razor5050/Moonlit-SummaryStories-45M

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train razor5050/Moonlit-SummaryStories-45M