castellina commited on
Commit
124e842
ยท
verified ยท
1 Parent(s): ab0ecb2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -11,6 +11,7 @@ base_model:
11
 
12
  `polyglot-ko-1b-txt2sql`์€ ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์งˆ๋ฌธ์„ SQL ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ์ธํŠœ๋‹๋œ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
13
  ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ [`EleutherAI/polyglot-ko-1.3b`](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)๋ฅผ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, LoRA๋ฅผ ํ†ตํ•ด ๊ฒฝ๋Ÿ‰ ํŒŒ์ธํŠœ๋‹๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
 
14
 
15
  ---
16
 
@@ -23,13 +24,33 @@ base_model:
23
 
24
  ---
25
 
26
- ## ํ•™์Šต ๋ฐ์ดํ„ฐ
27
 
28
  ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด SQL ๋ณ€ํ™˜ ํƒœ์Šคํฌ๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ž์—ฐ์–ด ์งˆ๋ฌธ-์ฟผ๋ฆฌ ํŽ˜์–ด๋กœ ํŒŒ์ธํŠœ๋‹๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
29
- ๋ฐ์ดํ„ฐ๋Š” ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ์†Œ์Šค ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค:
30
-
31
  - [shangrilar/ko_text2sql](https://huggingface.co/datasets/shangrilar/ko_text2sql) ๋ฐ์ดํ„ฐ์…‹ ์ผ๋ถ€
32
- - OpenAI ๊ธฐ๋ฐ˜ LLM(GPT) ์ถ”๋ก ์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ synthetic Korean SQL pairs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ---
35
 
 
11
 
12
  `polyglot-ko-1b-txt2sql`์€ ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์งˆ๋ฌธ์„ SQL ์ฟผ๋ฆฌ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ํŒŒ์ธํŠœ๋‹๋œ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
13
  ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์€ [`EleutherAI/polyglot-ko-1.3b`](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)๋ฅผ ์‚ฌ์šฉํ–ˆ์œผ๋ฉฐ, LoRA๋ฅผ ํ†ตํ•ด ๊ฒฝ๋Ÿ‰ ํŒŒ์ธํŠœ๋‹๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
14
+ ํŒŒ์ธํŠœ๋‹์„ ์ฒ˜์Œ ํ•ด๋ณธ ๊ธ€์“ด์ด๊ฐ€ ์‹ค์Šต์šฉ์œผ๋กœ ๋งŒ๋“  ์ฒซ ๋ชจ๋ธ๋กœ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•  ์ˆœ ์—†์œผ๋‹ˆ ์ฐธ๊ณ ๋ฐ”๋ž๋‹ˆ๋‹ค.
15
 
16
  ---
17
 
 
24
 
25
  ---
26
 
27
+ ## ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹
28
 
29
  ๋ชจ๋ธ์€ ํ•œ๊ตญ์–ด SQL ๋ณ€ํ™˜ ํƒœ์Šคํฌ๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋œ ์ž์—ฐ์–ด ์งˆ๋ฌธ-์ฟผ๋ฆฌ ํŽ˜์–ด๋กœ ํŒŒ์ธํŠœ๋‹๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
 
 
30
  - [shangrilar/ko_text2sql](https://huggingface.co/datasets/shangrilar/ko_text2sql) ๋ฐ์ดํ„ฐ์…‹ ์ผ๋ถ€
31
+
32
+ - ์ „์ฒ˜๋ฆฌ: DDL-Question-SQL ๊ตฌ์กฐ๋กœ prompt ๊ตฌ์„ฑ
33
+ - ํฌ๊ธฐ: ์•ฝ 25,000๊ฑด์˜ DDL + ์ž์—ฐ์–ด ์งˆ๋ฌธ + SQL ์ •๋‹ต ์Œ
34
+
35
+ ---
36
+
37
+ ## ํ‰๊ฐ€ ๊ฒฐ๊ณผ
38
+ - ํ‰๊ฐ€ ๋ฐฉ์‹: GPT-4.1-nano ๋ชจ๋ธ์—๊ฒŒ gen_sql๊ณผ gt_sql ๋น„๊ต ํ›„ ํ‰๊ฐ€ ์š”์ฒญ
39
+ - ํ‰๊ฐ€ ๊ธฐ์ค€: ๊ฒฐ๊ณผ ๋™์ผ ์—ฌ๋ถ€ ๊ธฐ๋ฐ˜ yes/no ํŒ๋‹จ (JSON response: {"resolve_yn": "yes"})
40
+ - ํ‰๊ฐ€ ๊ฒฐ๊ณผ:
41
+ - **๋ฒ ์ด์Šค ๋ชจ๋ธ ์ •ํ™•๋„**: 68%
42
+ - **ํŒŒ์ธํŠœ๋‹ ๋ชจ๋ธ ์ •ํ™•๋„**: 19%
43
+
44
+ ---
45
+
46
+ ## ๋ฌธ์ œ์ 
47
+ - ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์€ gen_sql์— SQL ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜์ง€ ๋ชปํ•˜๊ณ , ์งˆ๋ฌธ์„ ๋ฐ˜๋ณตํ•˜๊ฑฐ๋‚˜ ์˜๋ฏธ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜๋‹ค.
48
+ - ํŒŒ์ธํŠœ๋‹ ๋ชจ๋ธ์€ SQL ํ˜•ํƒœ๋ฅผ ํ‰๋‚ด๋‚ด๊ธด ํ–ˆ์ง€๋งŒ, ์กด์žฌํ•˜์ง€ ์•Š๋Š” ์ปฌ๋Ÿผ๋ช…์ด๋‚˜ ํ…Œ์ด๋ธ”๋ช…์„ ํฌํ•จํ•˜๋Š” ๋“ฑ ๋…ผ๋ฆฌ์ ์œผ๋กœ ํ‹€๋ฆฐ ์ฟผ๋ฆฌ๋ฅผ ์ƒ์„ฑํ–ˆ๋‹ค.
49
+
50
+ - ํ‰๊ฐ€ ๋ชจ๋ธ(GPT-4.1-nano)์€ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์ด ์ž˜๋ชป ์ƒ์„ฑํ•œ ์ฟผ๋ฆฌ์— ๋Œ€ํ•ด "resolve_yn": "yes"๋ผ๊ณ  ์ž˜๋ชป ํŒ๋‹จํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์•˜๋‹ค.
51
+ - ์˜ˆ๋ฅผ ๋“ค์–ด, gen_sql์ด SQL ํ˜•์‹์„ ์ „ํ˜€ ๋”ฐ๋ฅด์ง€ ์•Š๋”๋ผ๋„ resolve_yn = yes๋กœ ์ž˜๋ชป ํ‰๊ฐ€๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์—ˆ๋‹ค.
52
+ - ์ปฌ๋Ÿผ๋ช… ๋ฐ ํ…Œ์ด๋ธ”๋ช…์ด ์กด์žฌํ•˜์ง€ ์•Š๊ฑฐ๋‚˜ ์ž˜๋ชป๋œ ์ฟผ๋ฆฌ์ž„์—๋„ resolve_yn = yes๋กœ ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ ๊ฒฝ์šฐ๊ฐ€ ์กด์žฌํ–ˆ๋‹ค.
53
+ - ํ‰๊ฐ€์ž(GPT ๋ชจ๋ธ)๋Š” ๋ฌธ๋ฒ•์  ํƒ€๋‹น์„ฑ์ด๋‚˜ ํ…Œ์ด๋ธ” ๊ตฌ์กฐ ๋ฐ˜์˜ ์—ฌ๋ถ€๋ฅผ ์ œ๋Œ€๋กœ ํŒ๋‹จํ•˜์ง€ ๋ชปํ•˜๊ณ , ๋‹จ์ˆœ ํ…์ŠคํŠธ ์œ ์‚ฌ์„ฑ์— ๊ธฐ๋ฐ˜ํ•ด ํŒ๋ณ„ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€๋‹ค.
54
 
55
  ---
56