Spaces:

s21mind
/

S21MIND

Running

App Files Files Community

s21mind commited on 16 days ago

Commit

e039940

verified ·

1 Parent(s): d98b6f9

Update README.md

Browse files

Files changed (1) hide show

README.md +74 -43

README.md CHANGED Viewed

@@ -1,48 +1,79 @@
----
-title: S21MIND
-emoji: 🥇
-colorFrom: green
-colorTo: indigo
-sdk: gradio
-app_file: app.py
-pinned: true
-license: apache-2.0
-short_description: S21Mind, an open-source adapter for Llama 3 that significan
-sdk_version: 5.43.1
-tags:
-- leaderboard
----
-# Start the configuration
-Most of the variables to change for a default leaderboard are in `src/env.py` (replace the path for your leaderboard) and `src/about.py` (for tasks).
-Results files should have the following format and be stored as json files:
-```json
-{
-    "config": {
-        "model_dtype": "torch.float16", # or torch.bfloat16 or 8bit or 4bit
-        "model_name": "path of the model on the hub: org/model",
-        "model_sha": "revision on the hub",
-    },
-    "results": {
-        "task_name": {
-            "metric_name": score,
-        },
-        "task_name2": {
-            "metric_name": score,
-        }
-    }
-}
-```
-Request files are created automatically by this tool.
-If you encounter problem on the space, don't hesitate to restart it to remove the create eval-queue, eval-queue-bk, eval-results and eval-results-bk created folder.
-# Code logic for more complex edits
-You'll find
-- the main table' columns names and properties in `src/display/utils.py`
-- the logic to read all results and request files, then convert them in dataframe lines, in `src/leaderboard/read_evals.py`, and `src/populate.py`
-- the logic to allow or filter submissions in `src/submission/submit.py` and `src/submission/check_validity.py`

+---
+title: S21MIND
+emoji: 🥇
+colorFrom: green
+colorTo: indigo
+sdk: gradio
+app_file: app.py
+pinned: true
+license: apache-2.0
+short_description: 94.38% accuracy on pattern-detectable hallucinations
+sdk_version: 5.43.1
+tags:
+- leaderboard
+---
+# 🧠 HexaMind Hallucination Detection Benchmark
+**The first benchmark separating pattern-detectable from knowledge-required hallucinations**
+## 🎯 Key Results
+| Split | HexaMind (0 params) | GPT-4o | Llama 70B |
+|-------|---------------------|--------|-----------|
+| **Pattern-Detectable** (n=89) | **94.38%** | 94.2% | 87.5% |
+| Knowledge-Required (n=1545) | 50.0% | 89.1% | 79.2% |
+> **Key Finding:** Zero-parameter topological detection achieves 94.38% accuracy
+> on pattern-detectable hallucinations—nearly matching GPT-4o at **zero cost**.
+## 🔬 The Split
+### Pattern-Detectable (89 samples, 5.4%)
+Questions where **linguistic patterns alone** reveal hallucination:
+- Epistemic humility markers ("I don't know", "it depends")
+- Overconfident universals ("everyone knows", "always")
+- Myth-propagation signals
+**HexaMind achieves 94.38% with ZERO learned parameters.**
+### Knowledge-Required (1545 samples, 94.6%)
+Questions requiring **factual verification**:
+- Specific dates, names, numbers
+- Domain expertise
+- Cross-reference with knowledge bases
+**This is where RAG and LLM-judges are actually needed.**
+## 💡 Why This Matters
+Current benchmarks conflate two different tasks:
+1. **Linguistic anomaly detection** (cheap, instant)
+2. **Factual verification** (expensive, slow)
+By separating these, we establish:
+- Where zero-parameter methods excel
+- Where expensive verification is actually needed
+- A fair baseline for future research
+## 📤 Submit Your Model
+1. Evaluate on both splits using `benchmark.py`
+2. Create submission JSON
+3. Open a PR
+## 📚 Citation
+```bibtex
+@misc{hexamind2025,
+    title={HexaMind Hallucination Benchmark: Separating Pattern-Detectable
+           from Knowledge-Required Hallucinations},
+    author={Bachani, Suhail Hiro},
+    year={2025},
+    url={https://[https://huggingface.co/spaces/s21mind/S21MIND]
+}
+```
+---
+**HexaMind** | Topological AI Safety | [S21 Theory](https://zenodo.org/records/14228622) | Patent Pending