ArpitSinghGautam commited on
Commit
60505a0
·
verified ·
1 Parent(s): e6d552f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -10
README.md CHANGED
@@ -1,10 +1,72 @@
1
- ---
2
- title: README
3
- emoji: 👀
4
- colorFrom: indigo
5
- colorTo: blue
6
- sdk: static
7
- pinned: false
8
- ---
9
-
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CogniSQL: Lightweight Reinforced Reasoning for Efficient SQL Generation
2
+
3
+ ## Overview
4
+
5
+ Welcome to CogniSQL! This organization hosts research datasets and resources for advancing Text-to-SQL generation through reinforcement learning. Our work focuses on building efficient, execution-aligned SQL generation systems that scale effectively while maintaining accuracy on complex database queries.
6
+
7
+ ## Research Focus
8
+
9
+ CogniSQL develops novel approaches to translate natural language into SQL (Text-to-SQL) using:
10
+
11
+ - **Reinforcement Learning (RL) Frameworks**: Lightweight reward signals based on execution correctness and format-tag compliance
12
+ - **Efficient Training**: State-of-the-art performance on a smaller 7B parameter backbone (compared to 236B+ models)
13
+ - **Execution-Aligned Generation**: Direct optimization for producing correct, executable SQL without intermediate supervision
14
+ - **Interpretable Reasoning**: Multi-path reasoning traces for better understanding of model behavior
15
+
16
+ ## Key Achievements
17
+
18
+ - **State-of-the-Art Results**: Outperforms SFT CodeS-7B, DeepSeek-Coder 236B, and Mistral 123B on BIRD benchmark
19
+ - **Efficient Training**: Trained on just 4 NVIDIA A100 GPUs (40GB VRAM each)
20
+ - **Resource-Constrained Deployment**: Enables practical Text-to-SQL systems for real-world applications
21
+ - **Open Research**: Two curated datasets released for community research
22
+
23
+ ## Datasets
24
+
25
+ This organization maintains two high-quality datasets:
26
+
27
+ 1. **Reasoning_Traces**: 5,024 reasoning traces with varying context lengths for interpretable SQL generation
28
+ 2. **Positive_Sample_Corpus**: 36,356 weakly supervised queries, each annotated with six semantically diverse reasoning paths
29
+
30
+ Both datasets are designed to support research in efficient and interpretable Text-to-SQL modeling.
31
+
32
+ ## Citation
33
+
34
+ If you use our datasets or research, please cite the following paper:
35
+
36
+ ```bibtex
37
+ @article{gajjar2025cognisql,
38
+ title={CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation},
39
+ author={Gajjar, Kushal and Sikchi, Harshit and Gautam, Arpit Singh and Hammons, Marc and Jha, Saurabh},
40
+ journal={arXiv preprint arXiv:2507.06013},
41
+ year={2025},
42
+ url={https://arxiv.org/abs/2507.06013}
43
+ }
44
+ ```
45
+
46
+ **arXiv**: [2507.06013](https://arxiv.org/abs/2507.06013)
47
+
48
+ ## Research Team
49
+
50
+ - **Kushal Gajjar**
51
+ - **Harshit Sikchi**
52
+ - **Arpit Singh Gautam**
53
+ - **Marc Hammons**
54
+ - **Saurabh Jha**
55
+
56
+ ## Applications
57
+
58
+ Our work enables:
59
+ - Database query systems that understand natural language
60
+ - Efficient SQL generation in resource-constrained environments
61
+ - Interpretable AI systems with transparent reasoning traces
62
+ - Production-grade Text-to-SQL pipelines
63
+
64
+ ## License
65
+
66
+ Please refer to individual dataset cards for specific licensing information.
67
+
68
+ ## Related Links
69
+
70
+ - [Paper on arXiv](https://arxiv.org/abs/2507.06013)
71
+ - [Reasoning Traces Dataset](https://huggingface.co/datasets/CogniSQL/Reasoning_Traces)
72
+ - [Positive Sample Corpus Dataset](https://huggingface.co/datasets/CogniSQL/Positive_Sample_Corpus)