Spaces:

CogniSQL
/

README

Configuration error

App Files Files Community

ArpitSinghGautam commited on 10 days ago

Commit

60505a0

verified ·

1 Parent(s): e6d552f

Update README.md

Browse files

Files changed (1) hide show

README.md +72 -10

README.md CHANGED Viewed

@@ -1,10 +1,72 @@
----
-title: README
-emoji: 👀
-colorFrom: indigo
-colorTo: blue
-sdk: static
-pinned: false
----
-Edit this `README.md` markdown file to author your organization card.

+# CogniSQL: Lightweight Reinforced Reasoning for Efficient SQL Generation
+## Overview
+Welcome to CogniSQL! This organization hosts research datasets and resources for advancing Text-to-SQL generation through reinforcement learning. Our work focuses on building efficient, execution-aligned SQL generation systems that scale effectively while maintaining accuracy on complex database queries.
+## Research Focus
+CogniSQL develops novel approaches to translate natural language into SQL (Text-to-SQL) using:
+- **Reinforcement Learning (RL) Frameworks**: Lightweight reward signals based on execution correctness and format-tag compliance
+- **Efficient Training**: State-of-the-art performance on a smaller 7B parameter backbone (compared to 236B+ models)
+- **Execution-Aligned Generation**: Direct optimization for producing correct, executable SQL without intermediate supervision
+- **Interpretable Reasoning**: Multi-path reasoning traces for better understanding of model behavior
+## Key Achievements
+- **State-of-the-Art Results**: Outperforms SFT CodeS-7B, DeepSeek-Coder 236B, and Mistral 123B on BIRD benchmark
+- **Efficient Training**: Trained on just 4 NVIDIA A100 GPUs (40GB VRAM each)
+- **Resource-Constrained Deployment**: Enables practical Text-to-SQL systems for real-world applications
+- **Open Research**: Two curated datasets released for community research
+## Datasets
+This organization maintains two high-quality datasets:
+1. **Reasoning_Traces**: 5,024 reasoning traces with varying context lengths for interpretable SQL generation
+2. **Positive_Sample_Corpus**: 36,356 weakly supervised queries, each annotated with six semantically diverse reasoning paths
+Both datasets are designed to support research in efficient and interpretable Text-to-SQL modeling.
+## Citation
+If you use our datasets or research, please cite the following paper:
+```bibtex
+@article{gajjar2025cognisql,
+  title={CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation},
+  author={Gajjar, Kushal and Sikchi, Harshit and Gautam, Arpit Singh and Hammons, Marc and Jha, Saurabh},
+  journal={arXiv preprint arXiv:2507.06013},
+  year={2025},
+  url={https://arxiv.org/abs/2507.06013}
+}
+```
+**arXiv**: [2507.06013](https://arxiv.org/abs/2507.06013)
+## Research Team
+- **Kushal Gajjar**
+- **Harshit Sikchi**
+- **Arpit Singh Gautam**
+- **Marc Hammons**
+- **Saurabh Jha**
+## Applications
+Our work enables:
+- Database query systems that understand natural language
+- Efficient SQL generation in resource-constrained environments
+- Interpretable AI systems with transparent reasoning traces
+- Production-grade Text-to-SQL pipelines
+## License
+Please refer to individual dataset cards for specific licensing information.
+## Related Links
+- [Paper on arXiv](https://arxiv.org/abs/2507.06013)
+- [Reasoning Traces Dataset](https://huggingface.co/datasets/CogniSQL/Reasoning_Traces)
+- [Positive Sample Corpus Dataset](https://huggingface.co/datasets/CogniSQL/Positive_Sample_Corpus)