| --- |
| language: |
| - en |
| tags: |
| - text-detoxification |
| - text2text-generation |
| - detoxification |
| - content-moderation |
| - toxicity-reduction |
| - llama |
| - gguf |
| - minibase |
| license: apache-2.0 |
| datasets: |
| - paradetox |
| metrics: |
| - toxicity-reduction |
| - semantic-similarity |
| - fluency |
| - latency |
| model-index: |
| - name: Detoxify-Small |
| results: |
| - task: |
| type: text-detoxification |
| name: Toxicity Reduction |
| dataset: |
| type: paradetox |
| name: ParaDetox |
| config: toxic-neutral |
| split: test |
| metrics: |
| - type: toxicity-reduction |
| value: 0.032 |
| name: Average Toxicity Reduction |
| - type: semantic-similarity |
| value: 0.471 |
| name: Semantic to Expected |
| - type: fluency |
| value: 0.919 |
| name: Text Fluency |
| - type: latency |
| value: 66.4 |
| name: Average Latency (ms) |
| --- |
| |
| # Detoxify-Small π€ |
|
|
| <div align="center"> |
|
|
| **A highly compact (~100 MB) and efficient text detoxification model for removing toxicity while preserving meaning.** |
|
|
| [](https://huggingface.co/) |
| [](https://huggingface.co/) |
| [](LICENSE) |
| [](https://discord.com/invite/BrJn4D2Guh) |
|
|
| *Built by [Minibase](https://minibase.ai) - Train and deploy small AI models from your browser.* |
| *Browse all of the models and datasets available on the [Minibase Marketplace](https://minibase.ai/wiki/Special:Marketplace). |
| |
| </div> |
| |
| ## π Model Summary |
| |
| **Minibase-Detoxify-Small** is a compact language model fine-tuned specifically for text detoxification tasks. It takes toxic or inappropriate text as input and generates cleaned, non-toxic versions while preserving the original meaning and intent as much as possible. |
| |
| ### Key Features |
| - β‘ **Fast Inference**: ~66ms average response time |
| - π― **High Fluency**: 91.9% well-formed output text |
| - π§Ή **Effective Detoxification**: 3.2% average toxicity reduction |
| - πΎ **Compact Size**: Only 138MB (GGUF quantized) |
| - π **Privacy-First**: Runs locally, no data sent to external servers |
| |
| ## π Quick Start |
| |
| ### Local Inference (Recommended) |
| |
| 1. **Install llama.cpp** (if not already installed): |
| ```bash |
| git clone https://github.com/ggerganov/llama.cpp |
| cd llama.cpp && make |
| ``` |
| |
| 2. **Download and run the model**: |
| ```bash |
| # Download model files |
| wget https://huggingface.co/minibase/detoxify-small/resolve/main/model.gguf |
| wget https://huggingface.co/minibase/detoxify-small/resolve/main/run_server.sh |
| |
| # Make executable and run |
| chmod +x run_server.sh |
| ./run_server.sh |
| ``` |
| |
| 3. **Make API calls**: |
| ```python |
| import requests |
| |
| # Detoxify text |
| response = requests.post("http://127.0.0.1:8000/completion", json={ |
| "prompt": "Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: This is fucking terrible!\n\nResponse: ", |
| "max_tokens": 200, |
| "temperature": 0.7 |
| }) |
| |
| result = response.json() |
| print(result["content"]) # "This is really terrible!" |
| ``` |
| |
| ### Python Client |
| |
| ```python |
| from detoxify_inference import DetoxifyClient |
| |
| # Initialize client |
| client = DetoxifyClient() |
| |
| # Detoxify text |
| toxic_text = "This product is fucking amazing, no bullshit!" |
| clean_text = client.detoxify_text(toxic_text) |
| |
| print(clean_text) # "This product is really amazing, no kidding!" |
| ``` |
| |
| ## π Benchmarks & Performance |
| |
| ### ParaDetox Dataset Results (1,008 samples) |
| |
| | Metric | Score | Description | |
| |--------|-------|-------------| |
| β’ Original Toxicity: 0.051 (5.1%) |
| β’ Final Toxicity: 0.020 (2.0%) |
| |
| | **Toxicity Reduction** | 0.051 (ParaDetox) --> 0.020 | Reduced toxicity scores by more than 50% | |
| | **Semantic to Expected** | 0.471 (47.1%) | Similarity to human expert rewrites | |
| | **Semantic to Original** | 0.625 (62.5%) | How much original meaning is preserved | |
| | **Fluency** | 0.919 (91.9%) | Quality of generated text structure | |
| | **Latency** | 66.4ms | Average response time | |
| | **Throughput** | ~15 req/sec | Estimated requests per second | |
| |
| ### Dataset Breakdown |
| |
| #### General Toxic Content (1,000 samples) |
| - **Semantic Preservation**: 62.7% |
| - **Fluency**: 91.9% |
| |
| ### Comparison with Baselines |
| |
| | Model | Semantic Similarity | Toxicity Reduction | Fluency | |
| |-------|-------------------|-------------------|---------| |
| | **Detoxify-Small** | **0.471** | **0.032** | **0.919** | |
| | BART-base (ParaDetox) | 0.750 | ~0.15 | ~0.85 | |
| | Human Performance | 0.850 | ~0.25 | ~0.95 | |
| |
| ## ποΈ Technical Details |
| |
| ### Model Architecture |
| - **Architecture**: LlamaForCausalLM |
| - **Parameters**: 49,152 (extremely compact) |
| - **Context Window**: 1,024 tokens |
| - **Quantization**: GGUF (4-bit quantization) |
| - **File Size**: 138MB |
| - **Memory Requirements**: 8GB RAM minimum, 16GB recommended |
| |
| ### Training Details |
| - **Base Model**: Custom-trained Llama architecture |
| - **Fine-tuning Dataset**: Curated toxic-neutral parallel pairs |
| - **Training Objective**: Instruction-following for detoxification |
| - **Optimization**: Quantized for edge deployment |
| |
| ### System Requirements |
| - **OS**: Linux, macOS, Windows |
| - **RAM**: 8GB minimum, 16GB recommended |
| - **Storage**: 200MB free space |
| - **Dependencies**: llama.cpp, Python 3.7+ |
| |
| ## π Usage Examples |
| |
| ### Basic Detoxification |
| ```python |
| # Input: "This is fucking awesome!" |
| # Output: "This is really awesome!" |
| |
| # Input: "You stupid idiot, get out of my way!" |
| # Output: "You silly person, please move aside!" |
| ``` |
| |
| ### API Integration |
| ```python |
| import requests |
| |
| def detoxify_text(text: str) -> str: |
| """Detoxify text using Detoxify-Small API""" |
| prompt = f"Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: {text}\n\nResponse: " |
| |
| response = requests.post("http://127.0.0.1:8000/completion", json={ |
| "prompt": prompt, |
| "max_tokens": 200, |
| "temperature": 0.7 |
| }) |
| |
| return response.json()["content"] |
| |
| # Usage |
| toxic_comment = "This product sucks donkey balls!" |
| clean_comment = detoxify_text(toxic_comment) |
| print(clean_comment) # "This product is not very good!" |
| ``` |
| |
| ### Batch Processing |
| ```python |
| import asyncio |
| import aiohttp |
| |
| async def detoxify_batch(texts: list) -> list: |
| """Process multiple texts concurrently""" |
| async with aiohttp.ClientSession() as session: |
| tasks = [] |
| for text in texts: |
| prompt = f"Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: {text}\n\nResponse: " |
| payload = { |
| "prompt": prompt, |
| "max_tokens": 200, |
| "temperature": 0.7 |
| } |
| tasks.append(session.post("http://127.0.0.1:8000/completion", json=payload)) |
| |
| responses = await asyncio.gather(*tasks) |
| return [await resp.json() for resp in responses] |
| |
| # Process multiple comments |
| comments = [ |
| "This is fucking brilliant!", |
| "You stupid moron!", |
| "What the hell is wrong with you?" |
| ] |
| |
| clean_comments = await detoxify_batch(comments) |
| ``` |
| |
| ## π§ Advanced Configuration |
| |
| ### Server Configuration |
| ```bash |
| # GPU acceleration (macOS with Metal) |
| llama-server \ |
| -m model.gguf \ |
| --host 127.0.0.1 \ |
| --port 8000 \ |
| --n-gpu-layers 35 \ |
| --metal |
|
|
| # CPU-only (lower memory usage) |
| llama-server \ |
| -m model.gguf \ |
| --host 127.0.0.1 \ |
| --port 8000 \ |
| --n-gpu-layers 0 \ |
| --threads 8 |
|
|
| # Custom context window |
| llama-server \ |
| -m model.gguf \ |
| --ctx-size 2048 \ |
| --host 127.0.0.1 \ |
| --port 8000 |
| ``` |
| |
| ### Temperature Settings |
| - **Low (0.1-0.3)**: Conservative detoxification, minimal changes |
| - **Medium (0.4-0.7)**: Balanced approach (recommended) |
| - **High (0.8-1.0)**: Creative detoxification, more aggressive changes |
| |
| ## π Limitations & Biases |
| |
| ### Current Limitations |
| - **Vocabulary Scope**: Trained primarily on English toxic content |
| - **Context Awareness**: May not detect sarcasm or cultural context |
| - **Length Constraints**: Limited to 1024 token context window |
| - **Domain Specificity**: Optimized for general web content |
| |
| ### Potential Biases |
| - **Cultural Context**: May not handle culture-specific expressions |
| - **Dialect Variations**: Limited exposure to regional dialects |
| - **Emerging Slang**: May not recognize newest internet slang |
| |
| ## π€ Contributing |
| |
| We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details. |
| |
| ### Development Setup |
| ```bash |
| # Clone the repository |
| git clone https://github.com/minibase-ai/detoxify-small |
| cd detoxify-small |
|
|
| # Install dependencies |
| pip install -r requirements.txt |
|
|
| # Run tests |
| python -m pytest tests/ |
| ``` |
| |
| ## π Citation |
| |
| If you use Detoxify-Small in your research, please cite: |
| |
| ```bibtex |
| @misc{detoxify-small-2025, |
| title={Detoxify-Small: A Compact Text Detoxification Model}, |
| author={Minibase AI Team}, |
| year={2025}, |
| publisher={Hugging Face}, |
| url={https://huggingface.co/minibase/detoxify-small} |
| } |
| ``` |
| |
| ## π Contact & Community |
| |
| - **Website**: [minibase.ai](https://minibase.ai) |
| - **Discord Community**: [Join our Discord](https://discord.com/invite/BrJn4D2Guh) |
| - **GitHub Issues**: [Report bugs or request features on Discord](https://discord.com/invite/BrJn4D2Guh) |
| - **Email**: hello@minibase.ai |
| |
| ### Support |
| - π **Documentation**: [help.minibase.ai](https://help.minibase.ai) |
| - π¬ **Community Forum**: [Join our Discord Community](https://discord.com/invite/BrJn4D2Guh) |
| |
| ## π License |
| |
| This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)). |
| |
| ## π Acknowledgments |
| |
| - **ParaDetox Dataset**: Used for benchmarking and evaluation |
| - **llama.cpp**: For efficient local inference |
| - **Hugging Face**: For model hosting and community |
| - **Our amazing community**: For feedback and contributions |
| |
| --- |
| |
| <div align="center"> |
| |
| **Built with β€οΈ by the Minibase team** |
| |
| *Making AI more accessible for everyone* |
| |
| [π Minibase Help Center](https://help.minibase.ai) β’ [π¬ Join our Discord](https://discord.com/invite/BrJn4D2Guh) |
| |
| </div> |
| |