marcuscedricridia commited on
Commit
bf0a86e
·
verified ·
1 Parent(s): d772d79

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -27,20 +27,20 @@ This version remains in non‑thinking mode—built for consistent and bias‑aw
27
  > - Do Sample: True
28
  > - Max New Tokens: 4096
29
 
30
- | Category | Winner | Reason |
31
- |-----------------------|--------|----------------------------------------------------------------------|
32
- | CS (RAM vs. ROM) | KTO | KTO is clearer, more structured, and avoids inaccuracies like BASE’s claim about excessive RAM. |
33
- | ENGINEERING (Water Filtration) | KTO | KTO provides a practical, scientifically grounded system; BASE is confusing and impractical. |
34
- | MATH (Mean, Median, Mode) | KTO | KTO’s structured, concise explanation outperforms BASE’s wordy but accurate response. |
35
- | SCIENCE (Osmosis vs. Diffusion) | KTO | KTO is more detailed and accurate despite a minor error; BASE oversimplifies and has vague examples. |
36
- | WRITING (Lost Dog Story) | BASE | BASE focuses on the dog and partially meets the prompt; KTO is off-topic and incoherent. |
37
- | CODING (Vowel Counting) | BASE | BASE’s program is more robust (handles uppercase/lowercase) and includes test cases; KTO misses uppercase vowels. |
38
- | MATH SOLVING (Train Speed) | KTO | Both are accurate, but KTO is more concise, delivering the result with less verbosity. |
39
- | COMMON SENSE LOGIC (Ice Melting) | KTO | KTO accurately describes melting; BASE’s sublimation claim is incorrect. |
40
- | SOFT REASONING (Dog Barking) | BASE | BASE provides a clearer affirmation despite flaws; KTO overcomplicates and undermines the premise. |
41
- | RIDDLE (Keys and Locks) | Neither | Both fail to identify the correct answer (piano) and provide irrelevant explanations. |
42
- | GENERAL CHAT (Hobby) | BASE | BASE’s detailed, engaging piano description outperforms KTO’s brief, shallow list. |
43
- | REWRITING (Formal Sentence) | KTO | KTO’s rewrite is concise and equally formal; BASE is wordy with unnecessary alternatives. |
44
- | SUMMARIZATION (Tortoise and Hare) | KTO | KTO is accurate and concise; BASE has factual errors (e.g., ten-day race). |
45
- | INSTRUCTION FOLLOWING (Vegetable Soup) | KTO | KTO adheres closely to the prompt with clear, healthy steps; BASE misinterprets and lacks clarity. |
46
- | **Overall** | **KTO** | KTO wins 9 categories vs. BASE’s 4, showing greater accuracy, clarity, and adherence to prompts. |
 
27
  > - Do Sample: True
28
  > - Max New Tokens: 4096
29
 
30
+ | Category | Prompt | Winner | Reason |
31
+ |-----------------------|----------------------------------------------------------------------|--------|----------------------------------------------------------------------|
32
+ | CS (RAM vs. ROM) | | KTO | KTO is clearer, more structured, and avoids inaccuracies like BASE’s claim about excessive RAM. |
33
+ | ENGINEERING (Water Filtration) | | KTO | KTO provides a practical, scientifically grounded system; BASE is confusing and impractical. |
34
+ | MATH (Mean, Median, Mode) | | KTO | KTO’s structured, concise explanation outperforms BASE’s wordy but accurate response. |
35
+ | SCIENCE (Osmosis vs. Diffusion) | | KTO | KTO is more detailed and accurate despite a minor error; BASE oversimplifies and has vague examples. |
36
+ | WRITING (Lost Dog Story) | Write a short story about a lost dog finding its way home. | BASE | BASE focuses on the dog and partially meets the prompt; KTO is off-topic and incoherent. |
37
+ | CODING (Vowel Counting) | Create a simple program that counts the number of vowels in a sentence. | BASE | BASE’s program is more robust (handles uppercase/lowercase) and includes test cases; KTO misses uppercase vowels. |
38
+ | MATH SOLVING (Train Speed) | If a train travels 60 miles in 1.5 hours, what is its average speed? | KTO | Both are accurate, but KTO is more concise, delivering the result with less verbosity. |
39
+ | COMMON SENSE LOGIC (Ice Melting) | If you leave ice outside on a hot day, what happens to it? | KTO | KTO accurately describes melting; BASE’s sublimation claim is incorrect. |
40
+ | SOFT REASONING (Dog Barking) | If all dogs bark and Rex is a dog, does Rex bark? Why? | BASE | BASE provides a clearer affirmation despite flaws; KTO overcomplicates and undermines the premise. |
41
+ | RIDDLE (Keys and Locks) | What has keys but can’t open locks? | Neither | Both fail to identify the correct answer (piano) and provide irrelevant explanations. |
42
+ | GENERAL CHAT (Hobby) | Tell me about a hobby you enjoy. | BASE | BASE’s detailed, engaging piano description outperforms KTO’s brief, shallow list. |
43
+ | REWRITING (Formal Sentence) | Make this sentence more formal: “Can you fix the problem soon?” | KTO | KTO’s rewrite is concise and equally formal; BASE is wordy with unnecessary alternatives. |
44
+ | SUMMARIZATION (Tortoise and Hare) | Summarize the story of “The Tortoise and Hare” in two sentences. | KTO | KTO is accurate and concise; BASE has factual errors (e.g., ten-day race). |
45
+ | INSTRUCTION FOLLOWING (Vegetable Soup) | Explain how to prepare a simple vegetable soup that meets the following conditions: Use at least 3 different vegetables. The cooking time must not exceed 30 minutes. Include steps to make the soup both flavorful and healthy. Mention any kitchen tools needed. Provide alternatives if a vegetable is not available. Include tips to serve the soup nicely. | KTO | KTO adheres closely to the prompt with clear, healthy steps; BASE misinterprets and lacks clarity. |
46
+ | **Overall** | | **KTO** | KTO wins 9 categories vs. BASE’s 4, showing greater accuracy, clarity, and adherence to prompts. |