DeBERTa-v3 Smishing & Spam Detector — v0.2.2

A high-precision, robust SMS/MMS spam detector fine-tuned for real-world deployment.

Trained on approximately 150,000 English messages (balanced at a 3:1 benign-to-spam ratio).
Fixes critical failure modes from v0.1 through architectural and training innovations.


What Is New in v0.2.2

v0.2.2 is the trained, production-ready release of the v0.2 architecture. This version incorporates systematic error analysis (624 misclassified samples) and targeted improvements to address:

  • Overconfident false positives on legitimate promotional messages
  • Missed spam due to obfuscation, truncation, or feature poverty

Compared with v0.1, this release achieves significant improvements in precision while maintaining high recall.


Performance Summary

Test Results on 38,331 Hold-Out Samples

Best model checkpoint: Epoch 8 (saved at model_output_v2.2/checkpoint_epoch8.pt)
Optimal classification threshold (determined via validation): 0.720

At Optimized Threshold (Recommended for Deployment)

Class Precision Recall F1-Score
Benign 0.98 0.99 0.98
Spam/Smishing 0.92 0.90 0.91
Accuracy (overall) 0.97

Overall Metrics @ Optimized Threshold (0.72)

Metric Value
F1 0.9096
Precision 0.9170
Recall 0.9023
AUC-ROC 0.9883

Confusion Matrix

Predicted Benign Predicted Spam
Actual Benign 32,131 468
Actual Spam 560 5,172

At Default Threshold (0.500) — Reference Only

Class Precision Recall F1-Score
Benign 0.99 0.96 0.97
Spam/Smishing 0.79 0.95 0.86

Overall Metrics @ Threshold = 0.50 (Default)

Metric Value
F1 0.8619
Precision 0.7901
Recall 0.9480
AUC-ROC 0.9883

Training and Validation Metrics

Epoch Train Loss Val F1 (Optimal) Threshold AUC-ROC
1 0.2453 0.8781 0.735 0.9823
2 0.2204 0.8942 0.710 0.9857
3 0.2183 0.8920 0.675 0.9846
4 0.2180 0.8991 0.675 0.9863
5 0.2156 0.8976 0.725 0.9844
6 0.2159 0.9037 0.695 0.9870
7 0.2158 0.9016 0.675 0.9864
8 0.2147 0.9075 0.720 0.9883
  • Total training time: 166 minutes
  • Hardware: 4 x NVIDIA RTX 3090 GPUs
  • Gradient checkpointing enabled (memory-efficient training)

Architecture Highlights

Model Structure

Input SMS/MMS Text
   → DeBERTa-v3-base Encoder (89,065,216 parameters)
      ├─ [CLS] token embedding (768 dimensions)  
      └─ Attention-weighted pooling over all tokens (768 dimensions)
   → 23 Engineered Features  
      → Linear(23 → 128) + LayerNorm + GELU + Dropout (1,554,930 parameters)  
         ↓
Combined representation: [CLS] ∥ Attention-Pooled ∥ Feature Embedding = 1,664 dimensions  
   → Bottleneck projection (1,664 → 256) + Residual Block  
   → Final classification head (256 → 2)

Key Improvements Over v0.1

Component v0.1 v0.2.2
Pooling strategy [CLS] only Dual pooling: [CLS] + learned attention pooling
Engineered features 15 basic features 23 features (8 new, targeting evasion patterns)
Feature projection Linear(15 → 64), no normalization Linear(23 → 128) + LayerNorm + GELU
Loss function Cross-Entropy Loss Focal Loss (γ=2) + label smoothing (ε=0.05)
Threshold selection Fixed at 0.6993 Per-epoch optimization on validation set; final: 0.720

Engineered Features

Original 15 Features (v0.1)

  1. char_count
  2. word_count
  3. avg_word_length
  4. uppercase_ratio
  5. digit_ratio
  6. special_char_ratio
  7. exclamation_count
  8. question_mark_count
  9. has_url
  10. url_count
  11. has_shortened_url
  12. has_phone_number
  13. has_email
  14. has_currency
  15. urgency_score

New 8 Features (v0.2)

  1. unicode_ratio — Detects Unicode substitution (e.g., "Vérífy yøur àccount")
  2. char_entropy — Measures character distribution randomness (low entropy = template spam)
  3. suspicious_spacing — Counts spaced-out patterns like "m e s s a g e"
  4. leet_ratio — Detects leetspeak substitutions (e.g., "l0g1n", "@dDr355")
  5. max_digit_run — Longest consecutive digit sequence (useful for OTP detection)
  6. repeated_char_ratio — Ratio of consecutive repeated characters (e.g., "URGENT!!!")
  7. vocab_richness — Unique words / total words (low = template spam)
  8. has_obfuscated_url — Regex-based detection of broken URLs (e.g., "httpscluesjdko", spaced domains)

These new features specifically target the 283 false negatives observed in v0.1, especially short, feature-poor, or obfuscated messages.


Usage Example

Installation

pip install torch transformers scikit-learn joblib sentencepiece huggingface_hub

Inference Code

import re
import math
import json
import torch
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib

# --- Feature extraction functions (must match training) ---

URGENCY_WORDS = {
    "urgent", "immediately", "expires", "verify", "confirm", "suspended",
    "locked", "alert", "action required", "limited time", "click here",
    "act now", "final notice", "winner", "prize", "claim", "free",
    "blocked", "deactivated", "unusual activity"
}

URL_PATTERN = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED_DOMAINS = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$£€₹¥]|(usd|gbp|eur|inr)', re.I)

LEET_MAP = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL = re.compile(
    r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
    r"|(?:h\s*t\s*t\s*p)"
    r"|(?:www\s*\.\s*\w)"
    r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD = re.compile(r"\b(?:\w\s){3,}\w\b")


def extract_features(text):
    words = text.split()
    letters = [c for c in text if c.isalpha()]
    chars = list(text)
    n = len(chars)

    original = [
        len(text),                      # char_count
        len(words),                     # word_count
        sum(len(w) for w in words) / max(len(words), 1),
        sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
        sum(1 for c in text if c.isdigit()) / max(len(text), 1),
        sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
        text.count('!'),                # exclamation_count
        text.count('?'),                # question_mark_count
        int(bool(URL_PATTERN.search(text))),
        len(URL_PATTERN.findall(text)),
        int(any(d in text.lower() for d in SHORTENED_DOMAINS)),
        int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
        int(bool(EMAIL_PATTERN.search(text))),
        int(bool(CURRENCY_PATTERN.search(text))),
        sum(1 for w in URGENCY_WORDS if w in text.lower()),
    ]

    counts = Counter(text.lower())
    entropy = -sum((c/n) * math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
    translated = text.translate(LEET_MAP)
    leet_changes = sum(1 for a, b in zip(text, translated) if a != b)
    max_drun, cur = 0, 0
    for c in chars:
        if c.isdigit(): 
            cur += 1
            max_drun = max(max_drun, cur)
        else: 
            cur = 0

    repeats = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0

    new_features = [
        sum(1 for c in chars if ord(c) > 127) / max(n, 1),         # unicode_ratio
        entropy,                                                     # char_entropy
        len(SPACED_WORD.findall(text)),                              # suspicious_spacing
        leet_changes / max(n, 1),                                    # leet_ratio
        max_drun,                                                    # max_digit_run
        repeats / max(n - 1, 1) if n > 1 else 0.0,                   # repeated_char_ratio
        len(set(w.lower() for w in words)) / max(len(words), 1),     # vocab_richness
        int(bool(OBFUSCATED_URL.search(text))),                      # has_obfuscated_url
    ]

    return original + new_features


# --- Model loading and inference ---

model_id = "notd5a/deberta-v3-malicious-sms-mms-detector"
device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler = joblib.load(hf_hub_download(model_id, "scaler.pkl"))

with open(hf_hub_download(model_id, "threshold.json")) as f:
    THRESHOLD = json.load(f)["threshold"]


class AttentionPooling(torch.nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = torch.nn.Sequential(
            torch.nn.Linear(hidden_size, hidden_size),
            torch.nn.Tanh(),
            torch.nn.Linear(hidden_size, 1, bias=False),
        )

    def forward(self, hidden_states, attention_mask):
        scores = self.attention(hidden_states).squeeze(-1)
        scores = scores.masked_fill(attention_mask == 0, float("-inf"))
        weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
        return (hidden_states * weights).sum(dim=1)


class DeBERTaWithFeaturesV2(torch.nn.Module):
    def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
        super().__init__()
        self.deberta = AutoModel.from_pretrained(model_name)
        H = self.deberta.config.hidden_size
        self.attn_pool = AttentionPooling(H)
        feat_dim = 128
        self.feature_proj = torch.nn.Sequential(
            torch.nn.Linear(num_extra_features, feat_dim),
            torch.nn.LayerNorm(feat_dim), 
            torch.nn.GELU(), 
            torch.nn.Dropout(dropout),
        )
        combined_dim = 2 * H + feat_dim
        bottleneck = 256
        self.fc1 = torch.nn.Linear(combined_dim, bottleneck)
        self.ln1 = torch.nn.LayerNorm(bottleneck)
        self.residual_block = torch.nn.Sequential(
            torch.nn.Linear(bottleneck, bottleneck), 
            torch.nn.LayerNorm(bottleneck),
            torch.nn.GELU(), 
            torch.nn.Dropout(dropout),
            torch.nn.Linear(bottleneck, bottleneck), 
            torch.nn.LayerNorm(bottleneck),
        )
        self.dropout = torch.nn.Dropout(dropout)
        self.output_head = torch.nn.Linear(bottleneck, num_labels)

    def forward(self, input_ids, attention_mask, extra_features):
        out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
        hidden = out.last_hidden_state
        cls_emb = hidden[:, 0, :]
        attn_emb = self.attn_pool(hidden, attention_mask)
        feat = self.feature_proj(extra_features)
        combined = torch.cat([cls_emb, attn_emb, feat], dim=1)
        x = torch.nn.functional.gelu(self.ln1(self.fc1(combined)))
        x = x + self.residual_block(x)
        return self.output_head(self.dropout(x))


model = DeBERTaWithFeaturesV2(model_id)
state_dict = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state_dict)
model.to(device).eval()


def predict(texts):
    if isinstance(texts, str):
        texts = [texts]
    
    enc = tokenizer(
        texts, 
        max_length=256, 
        padding="max_length", 
        truncation=True, 
        return_tensors="pt"
    )
    
    raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
    scaled_feats = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)
    
    with torch.no_grad():
        logits = model(
            enc["input_ids"].to(device),
            enc["attention_mask"].to(device),
            scaled_feats
        )
        probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
    
    return [
        {
            "text": t,
            "prob_spam": round(float(p), 4),
            "label": int(p >= THRESHOLD),
            "prediction": "spam" if p >= THRESHOLD else "benign"
        }
        for t, p in zip(texts, probs)
    ]


# --- Example usage ---
results = predict([
    "Your account has been suspended. Verify immediately: http://bit.ly/abc123",
    "Hey, are you free for lunch tomorrow?",
    "Y ou've got mail: new messa ge w7",
])

for r in results:
    print(r)

Files Included

File Description
pytorch_model.pt Full model weights (DeBERTaWithFeaturesV2)
tokenizer/ Saved DeBERTa-v3-base tokenizer
scaler.pkl StandardScaler for 23 engineered features (fitted during training)
threshold.json Optimized classification threshold (value: 0.720)
config.json DeBERTa base configuration

Limitations

  • Language: English-only; non-English messages may be misclassified.
  • Message length: Maximum sequence length is 256 tokens; longer messages are truncated.
  • Promotional boundary: Legitimate marketing messages with urgency cues (e.g., "30% OFF!") remain challenging.
  • Evasion tactics: Novel obfuscation techniques not present in training data may reduce performance over time.
  • No metadata: The model operates on text only — sender reputation, short codes, or carrier signals are not used.

License

CC BY-NC 4.0
Free for research and non-commercial use. Commercial use requires explicit permission.

Contact: ahmadabushawar21@gmail.com


Model version: v0.2.2
Release date: 2026 Training time: 166 minutes on 4×RTX3090

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for notd5a/deberta-v3-malicious-sms-mms-detector-v0.2.2

Finetuned
(549)
this model

Dataset used to train notd5a/deberta-v3-malicious-sms-mms-detector-v0.2.2