DeBERTa-v3 Smishing & Spam Detector — v0.2.2
A high-precision, robust SMS/MMS spam detector fine-tuned for real-world deployment.
Trained on approximately 150,000 English messages (balanced at a 3:1 benign-to-spam ratio).
Fixes critical failure modes from v0.1 through architectural and training innovations.
What Is New in v0.2.2
v0.2.2 is the trained, production-ready release of the v0.2 architecture. This version incorporates systematic error analysis (624 misclassified samples) and targeted improvements to address:
- Overconfident false positives on legitimate promotional messages
- Missed spam due to obfuscation, truncation, or feature poverty
Compared with v0.1, this release achieves significant improvements in precision while maintaining high recall.
Performance Summary
Test Results on 38,331 Hold-Out Samples
Best model checkpoint: Epoch 8 (saved at model_output_v2.2/checkpoint_epoch8.pt)
Optimal classification threshold (determined via validation): 0.720
At Optimized Threshold (Recommended for Deployment)
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Benign | 0.98 | 0.99 | 0.98 |
| Spam/Smishing | 0.92 | 0.90 | 0.91 |
| Accuracy (overall) | — | — | 0.97 |
Overall Metrics @ Optimized Threshold (0.72)
| Metric | Value |
|---|---|
| F1 | 0.9096 |
| Precision | 0.9170 |
| Recall | 0.9023 |
| AUC-ROC | 0.9883 |
Confusion Matrix
| Predicted Benign | Predicted Spam | |
|---|---|---|
| Actual Benign | 32,131 | 468 |
| Actual Spam | 560 | 5,172 |
At Default Threshold (0.500) — Reference Only
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Benign | 0.99 | 0.96 | 0.97 |
| Spam/Smishing | 0.79 | 0.95 | 0.86 |
Overall Metrics @ Threshold = 0.50 (Default)
| Metric | Value |
|---|---|
| F1 | 0.8619 |
| Precision | 0.7901 |
| Recall | 0.9480 |
| AUC-ROC | 0.9883 |
Training and Validation Metrics
| Epoch | Train Loss | Val F1 (Optimal) | Threshold | AUC-ROC |
|---|---|---|---|---|
| 1 | 0.2453 | 0.8781 | 0.735 | 0.9823 |
| 2 | 0.2204 | 0.8942 | 0.710 | 0.9857 |
| 3 | 0.2183 | 0.8920 | 0.675 | 0.9846 |
| 4 | 0.2180 | 0.8991 | 0.675 | 0.9863 |
| 5 | 0.2156 | 0.8976 | 0.725 | 0.9844 |
| 6 | 0.2159 | 0.9037 | 0.695 | 0.9870 |
| 7 | 0.2158 | 0.9016 | 0.675 | 0.9864 |
| 8 | 0.2147 | 0.9075 | 0.720 | 0.9883 |
- Total training time: 166 minutes
- Hardware: 4 x NVIDIA RTX 3090 GPUs
- Gradient checkpointing enabled (memory-efficient training)
Architecture Highlights
Model Structure
Input SMS/MMS Text
→ DeBERTa-v3-base Encoder (89,065,216 parameters)
├─ [CLS] token embedding (768 dimensions)
└─ Attention-weighted pooling over all tokens (768 dimensions)
→ 23 Engineered Features
→ Linear(23 → 128) + LayerNorm + GELU + Dropout (1,554,930 parameters)
↓
Combined representation: [CLS] ∥ Attention-Pooled ∥ Feature Embedding = 1,664 dimensions
→ Bottleneck projection (1,664 → 256) + Residual Block
→ Final classification head (256 → 2)
Key Improvements Over v0.1
| Component | v0.1 | v0.2.2 |
|---|---|---|
| Pooling strategy | [CLS] only | Dual pooling: [CLS] + learned attention pooling |
| Engineered features | 15 basic features | 23 features (8 new, targeting evasion patterns) |
| Feature projection | Linear(15 → 64), no normalization | Linear(23 → 128) + LayerNorm + GELU |
| Loss function | Cross-Entropy Loss | Focal Loss (γ=2) + label smoothing (ε=0.05) |
| Threshold selection | Fixed at 0.6993 | Per-epoch optimization on validation set; final: 0.720 |
Engineered Features
Original 15 Features (v0.1)
- char_count
- word_count
- avg_word_length
- uppercase_ratio
- digit_ratio
- special_char_ratio
- exclamation_count
- question_mark_count
- has_url
- url_count
- has_shortened_url
- has_phone_number
- has_email
- has_currency
- urgency_score
New 8 Features (v0.2)
- unicode_ratio — Detects Unicode substitution (e.g., "Vérífy yøur àccount")
- char_entropy — Measures character distribution randomness (low entropy = template spam)
- suspicious_spacing — Counts spaced-out patterns like "m e s s a g e"
- leet_ratio — Detects leetspeak substitutions (e.g., "l0g1n", "@dDr355")
- max_digit_run — Longest consecutive digit sequence (useful for OTP detection)
- repeated_char_ratio — Ratio of consecutive repeated characters (e.g., "URGENT!!!")
- vocab_richness — Unique words / total words (low = template spam)
- has_obfuscated_url — Regex-based detection of broken URLs (e.g., "httpscluesjdko", spaced domains)
These new features specifically target the 283 false negatives observed in v0.1, especially short, feature-poor, or obfuscated messages.
Usage Example
Installation
pip install torch transformers scikit-learn joblib sentencepiece huggingface_hub
Inference Code
import re
import math
import json
import torch
import numpy as np
from collections import Counter
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download
import joblib
# --- Feature extraction functions (must match training) ---
URGENCY_WORDS = {
"urgent", "immediately", "expires", "verify", "confirm", "suspended",
"locked", "alert", "action required", "limited time", "click here",
"act now", "final notice", "winner", "prize", "claim", "free",
"blocked", "deactivated", "unusual activity"
}
URL_PATTERN = re.compile(r'(https?://|www\.)\S+|\w+\.(com|net|org|io|co|uk)', re.I)
SHORTENED_DOMAINS = {"bit.ly","tinyurl.com","goo.gl","t.co","ow.ly","smsg.io","rb.gy"}
PHONE_PATTERN = re.compile(r'(\+?\d[\d\s\-().]{7,}\d)')
EMAIL_PATTERN = re.compile(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', re.I)
CURRENCY_PATTERN = re.compile(r'[$£€₹¥]|(usd|gbp|eur|inr)', re.I)
LEET_MAP = str.maketrans("013457@!", "oieastai")
OBFUSCATED_URL = re.compile(
r"(https?(?:clue|[a-z]{4,}[a-z0-9]{2,})\b)"
r"|(?:h\s*t\s*t\s*p)"
r"|(?:www\s*\.\s*\w)"
r"|(?:\w+\s*\.\s*(?:com|net|org|xyz|info|co)\b)", re.I)
SPACED_WORD = re.compile(r"\b(?:\w\s){3,}\w\b")
def extract_features(text):
words = text.split()
letters = [c for c in text if c.isalpha()]
chars = list(text)
n = len(chars)
original = [
len(text), # char_count
len(words), # word_count
sum(len(w) for w in words) / max(len(words), 1),
sum(1 for c in letters if c.isupper()) / max(len(letters), 1),
sum(1 for c in text if c.isdigit()) / max(len(text), 1),
sum(1 for c in text if not c.isalnum() and not c.isspace()) / max(len(text), 1),
text.count('!'), # exclamation_count
text.count('?'), # question_mark_count
int(bool(URL_PATTERN.search(text))),
len(URL_PATTERN.findall(text)),
int(any(d in text.lower() for d in SHORTENED_DOMAINS)),
int(bool([m for m in PHONE_PATTERN.findall(text) if len(re.sub(r'\D','',m)) >= 7])),
int(bool(EMAIL_PATTERN.search(text))),
int(bool(CURRENCY_PATTERN.search(text))),
sum(1 for w in URGENCY_WORDS if w in text.lower()),
]
counts = Counter(text.lower())
entropy = -sum((c/n) * math.log2(c/n) for c in counts.values() if c > 0) if n > 0 else 0.0
translated = text.translate(LEET_MAP)
leet_changes = sum(1 for a, b in zip(text, translated) if a != b)
max_drun, cur = 0, 0
for c in chars:
if c.isdigit():
cur += 1
max_drun = max(max_drun, cur)
else:
cur = 0
repeats = sum(1 for i in range(1, n) if chars[i] == chars[i-1]) if n > 1 else 0
new_features = [
sum(1 for c in chars if ord(c) > 127) / max(n, 1), # unicode_ratio
entropy, # char_entropy
len(SPACED_WORD.findall(text)), # suspicious_spacing
leet_changes / max(n, 1), # leet_ratio
max_drun, # max_digit_run
repeats / max(n - 1, 1) if n > 1 else 0.0, # repeated_char_ratio
len(set(w.lower() for w in words)) / max(len(words), 1), # vocab_richness
int(bool(OBFUSCATED_URL.search(text))), # has_obfuscated_url
]
return original + new_features
# --- Model loading and inference ---
model_id = "notd5a/deberta-v3-malicious-sms-mms-detector"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_id)
scaler = joblib.load(hf_hub_download(model_id, "scaler.pkl"))
with open(hf_hub_download(model_id, "threshold.json")) as f:
THRESHOLD = json.load(f)["threshold"]
class AttentionPooling(torch.nn.Module):
def __init__(self, hidden_size):
super().__init__()
self.attention = torch.nn.Sequential(
torch.nn.Linear(hidden_size, hidden_size),
torch.nn.Tanh(),
torch.nn.Linear(hidden_size, 1, bias=False),
)
def forward(self, hidden_states, attention_mask):
scores = self.attention(hidden_states).squeeze(-1)
scores = scores.masked_fill(attention_mask == 0, float("-inf"))
weights = torch.softmax(scores, dim=-1).unsqueeze(-1)
return (hidden_states * weights).sum(dim=1)
class DeBERTaWithFeaturesV2(torch.nn.Module):
def __init__(self, model_name, num_extra_features=23, num_labels=2, dropout=0.1):
super().__init__()
self.deberta = AutoModel.from_pretrained(model_name)
H = self.deberta.config.hidden_size
self.attn_pool = AttentionPooling(H)
feat_dim = 128
self.feature_proj = torch.nn.Sequential(
torch.nn.Linear(num_extra_features, feat_dim),
torch.nn.LayerNorm(feat_dim),
torch.nn.GELU(),
torch.nn.Dropout(dropout),
)
combined_dim = 2 * H + feat_dim
bottleneck = 256
self.fc1 = torch.nn.Linear(combined_dim, bottleneck)
self.ln1 = torch.nn.LayerNorm(bottleneck)
self.residual_block = torch.nn.Sequential(
torch.nn.Linear(bottleneck, bottleneck),
torch.nn.LayerNorm(bottleneck),
torch.nn.GELU(),
torch.nn.Dropout(dropout),
torch.nn.Linear(bottleneck, bottleneck),
torch.nn.LayerNorm(bottleneck),
)
self.dropout = torch.nn.Dropout(dropout)
self.output_head = torch.nn.Linear(bottleneck, num_labels)
def forward(self, input_ids, attention_mask, extra_features):
out = self.deberta(input_ids=input_ids, attention_mask=attention_mask)
hidden = out.last_hidden_state
cls_emb = hidden[:, 0, :]
attn_emb = self.attn_pool(hidden, attention_mask)
feat = self.feature_proj(extra_features)
combined = torch.cat([cls_emb, attn_emb, feat], dim=1)
x = torch.nn.functional.gelu(self.ln1(self.fc1(combined)))
x = x + self.residual_block(x)
return self.output_head(self.dropout(x))
model = DeBERTaWithFeaturesV2(model_id)
state_dict = torch.load(hf_hub_download(model_id, "pytorch_model.pt"), map_location=device)
model.load_state_dict(state_dict)
model.to(device).eval()
def predict(texts):
if isinstance(texts, str):
texts = [texts]
enc = tokenizer(
texts,
max_length=256,
padding="max_length",
truncation=True,
return_tensors="pt"
)
raw_feats = np.array([extract_features(t) for t in texts], dtype=np.float32)
scaled_feats = torch.tensor(scaler.transform(raw_feats), dtype=torch.float32).to(device)
with torch.no_grad():
logits = model(
enc["input_ids"].to(device),
enc["attention_mask"].to(device),
scaled_feats
)
probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
return [
{
"text": t,
"prob_spam": round(float(p), 4),
"label": int(p >= THRESHOLD),
"prediction": "spam" if p >= THRESHOLD else "benign"
}
for t, p in zip(texts, probs)
]
# --- Example usage ---
results = predict([
"Your account has been suspended. Verify immediately: http://bit.ly/abc123",
"Hey, are you free for lunch tomorrow?",
"Y ou've got mail: new messa ge w7",
])
for r in results:
print(r)
Files Included
| File | Description |
|---|---|
pytorch_model.pt |
Full model weights (DeBERTaWithFeaturesV2) |
tokenizer/ |
Saved DeBERTa-v3-base tokenizer |
scaler.pkl |
StandardScaler for 23 engineered features (fitted during training) |
threshold.json |
Optimized classification threshold (value: 0.720) |
config.json |
DeBERTa base configuration |
Limitations
- Language: English-only; non-English messages may be misclassified.
- Message length: Maximum sequence length is 256 tokens; longer messages are truncated.
- Promotional boundary: Legitimate marketing messages with urgency cues (e.g., "30% OFF!") remain challenging.
- Evasion tactics: Novel obfuscation techniques not present in training data may reduce performance over time.
- No metadata: The model operates on text only — sender reputation, short codes, or carrier signals are not used.
License
CC BY-NC 4.0
Free for research and non-commercial use. Commercial use requires explicit permission.
Contact: ahmadabushawar21@gmail.com
Model version: v0.2.2
Release date: 2026
Training time: 166 minutes on 4×RTX3090
Model tree for notd5a/deberta-v3-malicious-sms-mms-detector-v0.2.2
Base model
microsoft/deberta-v3-base