CEFR Classifier

A text classification model that predicts CEFR (Common European Framework of Reference for Languages) levels (A1-C2) for English texts.

Fine-tuned from microsoft/deberta-v3-large.

Model Performance

Parallel Corpus Dataset confusion_matrix_parallel

Instruction Dataset confusion_matrix_instruction

Quick Start

Simple Usage (Recommended)

from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="dksysd/cefr-classifier")

# Classify a text
text = "This is a sample sentence to classify."
result = classifier(text)

print(result)
# [{'label': 'A1', 'score': 0.535}]

Get All Class Probabilities

classifier = pipeline(
    "text-classification",
    model="dksysd/cefr-classifier",
    return_all_scores=True
)

result = classifier(text)[0]

for item in result:
    print(f"{item['label']}: {item['score']:.4f}")

Batch Processing

texts = [
    "The cat sat on the mat.",
    "Quantum entanglement represents a fundamental phenomenon in physics.",
    "I like pizza."
]

results = classifier(texts)

for text, result in zip(texts, results):
    print(f"{text} -> {result['label']} ({result['score']:.3f})")

Advanced Usage

Manual Loading with PyTorch

For more control over the inference process:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
model_name = "dksysd/cefr-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Setup device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Label mapping
id2label = {0: 'A1', 1: 'A2', 2: 'B1', 3: 'B2', 4: 'C1', 5: 'C2'}

# Inference
text = "Your text here"
inputs = tokenizer(text, padding="max_length", truncation=True, 
                   max_length=1024, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)[0]
    pred_idx = torch.argmax(probs).item()

print(f"Predicted: {id2label[pred_idx]} (confidence: {probs[pred_idx]:.4f})")

CEFR Levels

  • A1: Beginner
  • A2: Elementary
  • B1: Intermediate
  • B2: Upper Intermediate
  • C1: Advanced
  • C2: Proficient

License

This model is released under the CC-BY-NC-SA-4.0 license.

Downloads last month
231
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dksysd/cefr-classifier

Adapter
(8)
this model

Dataset used to train dksysd/cefr-classifier