File size: 5,059 Bytes
57a5444
 
3422a3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57a5444
3422a3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---

license: cc-by-4.0
language: ti
widget:
- text: "<text-to-classify>"
datasets:
- fgaim/tigrinya-abusive-language-detection
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: tiroberta-tiald-all-tasks
  results:
  - task:
      name: Text Classification
      type: text-classification
    metrics:
    - name: Abu Accuracy
      type: accuracy
      value: 0.8611111111111112
    - name: F1
      type: f1
      value: 0.8611109396431353
    - name: Precision
      type: precision
      value: 0.8611128943846637
    - name: Recall
      type: recall
      value: 0.8611111111111112
---



# TiRoBERTa Fine-tuned for Multi-task Abusiveness, Sentiment, and Topic Classification

This model is a fine-tuned version of [TiRoBERTa](https://huggingface.co/fgaim/tiroberta-base) on the [TiALD](https://huggingface.co/datasets/fgaim/tigrinya-abusive-language-detection) dataset.

**Tigrinya Abusive Language Detection (TiALD) Dataset** is a large-scale, multi-task benchmark dataset for abusive language detection in the Tigrinya language. It consists of **13,717 YouTube comments** annotated for **abusiveness**, **sentiment**, and **topic** tasks. The dataset includes comments written in both the **Ge’ez script** and prevalent non-standard Latin **transliterations** to mirror real-world usage.

> ⚠️ The dataset contains explicit, obscene, and potentially hateful language. It should be used for research purposes only. ⚠️

This work accompanies the paper ["A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings"](https://arxiv.org/abs/2505.12116).

## Model Usage

```python

from transformers import pipeline



tiald_multitask = pipeline("text-classification", model="fgaim/tiroberta-tiald-all-tasks", top_k=11)

tiald_multitask("<text-to-classify>")

```

### Performance Metrics

This model achieves the following results on the TiALD test set:

```json

"abusiveness_metrics": {

    "accuracy": 0.8611111111111112,

    "macro_f1": 0.8611109396431353,

    "macro_recall": 0.8611111111111112,

    "macro_precision": 0.8611128943846637,

    "weighted_f1": 0.8611109396431355,

    "weighted_recall": 0.8611111111111112,

    "weighted_precision": 0.8611128943846637

},

"topic_metrics": {

    "accuracy": 0.6155555555555555,

    "macro_f1": 0.5491185274678864,

    "macro_recall": 0.5143416011263588,

    "macro_precision": 0.7341640739780486,

    "weighted_f1": 0.5944096153417657,

    "weighted_recall": 0.6155555555555555,

    "weighted_precision": 0.6870800624645906

},

"sentiment_metrics": {

    "accuracy": 0.6533333333333333,

    "macro_f1": 0.5340845253007789,

    "macro_recall": 0.5410170159158625,

    "macro_precision": 0.534652401599494,

    "weighted_f1": 0.6620101614004723,

    "weighted_recall": 0.6533333333333333,

    "weighted_precision": 0.6750245466592532

}

```

## Training Hyperparameters

The following hyperparameters were used during training:

- learning_rate: 3e-05

- train_batch_size: 8

- optimizer: Adam (betas=0.9, 0.999, epsilon=1e-08)

- lr_scheduler_type: linear

- num_epochs: 7.0
- seed: 42

## Intended Usage

The TiALD dataset and models designed to support:

- Research in abusive language detection in low-resource languages
- Context-aware abuse, sentiment, and topic modeling
- Multi-task and transfer learning with digraphic scripts
- Evaluation of multilingual and fine-tuned language models

Researchers and developers should avoid using this dataset for direct moderation or enforcement tasks without human oversight.

## Ethical Considerations

- **Sensitive content**: Contains toxic and offensive language. Use for research purposes only.
- **Cultural sensitivity**: Abuse is context-dependent; annotations were made by native speakers to account for cultural nuance.
- **Bias mitigation**: Data sampling and annotation were carefully designed to minimize reinforcement of stereotypes.
- **Privacy**: All the source content for the dataset is publicly available on YouTube.
- **Respect for expression**: The dataset should not be used for automated censorship without human review.

This research received IRB approval (Ref: KH2022-133) and followed ethical data collection and annotation practices, including informed consent of annotators.

## Citation

If you use this model or the `TiALD` dataset in your work, please cite:

```bibtex

@misc{gaim-etal-2025-tiald-benchmark,

  title         = {A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings},

  author        = {Fitsum Gaim and Hoyun Song and Huije Lee and Changgeon Ko and Eui Jun Hwang and Jong C. Park},

  year          = {2025},

  eprint        = {2505.12116},

  archiveprefix = {arXiv},

  primaryclass  = {cs.CL},

  url           = {https://arxiv.org/abs/2505.12116}

}

```

## License

This dataset is released under the [Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).