File size: 5,745 Bytes
9c88c45
 
a9aad74
 
 
63066e3
a9aad74
 
 
63066e3
 
 
 
9c88c45
 
 
a9aad74
9c88c45
1d23fa2
9c88c45
 
 
 
a9aad74
402ea3c
a9aad74
 
 
 
9c88c45
 
a9aad74
9c88c45
a9aad74
9c88c45
a9aad74
9c88c45
a9aad74
 
 
1d23fa2
9c88c45
 
a9aad74
9c88c45
a9aad74
 
 
9c88c45
 
a9aad74
9c88c45
a9aad74
9c88c45
a9aad74
 
 
 
9c88c45
a9aad74
9c88c45
a9aad74
 
9c88c45
a9aad74
9c88c45
a9aad74
 
 
 
 
 
 
9c88c45
a9aad74
 
 
 
 
 
 
 
 
 
 
 
 
9c88c45
a9aad74
 
 
 
 
 
9c88c45
a9aad74
 
 
 
 
 
9c88c45
a9aad74
 
9c88c45
a9aad74
 
9c88c45
a9aad74
9c88c45
a9aad74
1d23fa2
9c88c45
 
 
 
136f303
376c644
 
 
 
 
 
 
 
9c88c45
a9aad74
9c88c45
a9aad74
 
 
9c88c45
 
a9aad74
 
136f303
a9aad74
136f303
 
a9aad74
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-8B
language:
- ko
- en
tags:
- reasoning
- LLMs
- Korean
---


# KULLM-R

Introducing KULLM-R, a large language model specialized for high-level reasoning queries in Korean, with a particular focus on complex mathematical problems. The model is designed to provide both the correct reasoning paths and answers for such queries, offering enhanced reasoning efficiency and language transferability to Korean compared to general-purpose reasoning models. Reinforcement learning strategy is employed for efficient reasoning path exploration and Korean-specific generation.


## Model Details

- **Model Name**: KULLM-R
- **Developer**: Seungyoon Lee, Minhyuk Kim, Dongjun Kim, Gyuho Shim and Chanjun Park, supported by [NLP&AI Lab in Korea University](https://nlp.korea.ac.kr/)
- **Languages**: Korean, English
- **Objective**: Producing efficient and interpretable reasoning paths and answers for high-level Korean reasoning queries
- **Training Framework**: verl, PyTorch, Transformers
- **Parameter Size**: 8B


### Model Description

KULLM-R is distinguished from standard reasoning LLMs based on Qwen3-8B by its focus on reinforcement learning-based reasoning path exploration and its strong proficiency in Korean language use. It is trained to generate efficient reasoning paths for both English and Korean problems and provides well-structured, readable answers in Korean, delivering strong interpretability and an outstanding user experience for Korean speakers.

### Key Features

- **Reasoning Efficiency Aware Reinforcement Learning**: Introduces RL techniques considering both reasoning path efficiency and answer correctness, reducing unnecessary steps while maintaining answer quality.
- **Reasoning Path Pruning**: Specialized for high-difficulty reasoning problems by pruning ineffective paths and emphasizing transparency and readability in generated answers.
- **Support High Readability in Korean System**: Enhanced both logical reasoning and natural Korean expression ability in answer.
- **Adaptive Length Penalty**: Adaptive penalties optimize the reasoning process according to the questionโ€™s complexity and difficulty, ensuring efficient solutions for various math problems.


## Data & Training Process

- **Data Sources**: ko-limo (only 817 rows)
- **Training Strategy**: Uses reasoning problem difficulty-aware adaptive reward systems, implementing reinforcement learning with dynamic length penalty for optimal performance.
- **Iteration**: The model repeatedly trains on high-difficulty examples to optimize reasoning path generation.


## Quickstart

The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.

With `transformers<4.51.0`, you will encounter the following error:
```
KeyError: 'qwen3'
```

The following contains a code snippet illustrating how to use the model generate content based on given inputs. 

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "nobrand/KULLM-R"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
system_prompt = "You are a helpful assistant.\nPlease reason step by step, and put your final answer within \\boxed{{}}" # Recommend to use given system prompt
user_promt = "1๋ถ€ํ„ฐ 1008๊นŒ์ง€์˜ ์ž์—ฐ์ˆ˜ ์ค‘ 1008๊ณผ ์„œ๋กœ์†Œ์ธ ์ž์—ฐ์ˆ˜์˜ ๊ฐฏ์ˆ˜๋ฅผ ๊ตฌํ•˜์‹œ์˜ค."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_promt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

```

> [!NOTE]
> As mentioned in Qwen3, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions.


## Evaluation

- Shows superior reasoning efficiency, shorter reasoning steps, higher readability in Korean, and better explanation quality compared to models of similar scale when evaluated on HRM-8K.
| Task       | Score | Think Step Length  | Korean Response Ratio |
|------------|:-----:|:------------------:|:---------------------:|
| GSM8k      |   91.9     |   896      |    94.47    |
| KSM        |   70.9     |   7979     |   80.6      |
| MATH       |   95.1     |   2668     |   96.12     |
| OMNI Math  |   61.9     |   7987     |   73.91     |

<img src="KULLM_R_result.png" width="1000"/>

## Intended Use

- Solving complex Korean mathematical and logical reasoning problems
- Improved explainability for Korean logical reasoning
- Tutoring and educational support in reasoning fields


## Citation
```
@misc{KULLM-R2025,
  title   = {KULLM-R: Korea University Large Language Model for Reasoning},
  author  = {Korea University NLP&AI Lab},
  year    = {2025},
}
```