ljvmiranda921/tl_calamancy_trf
Token Classification • Updated
• 10
Model collection for https://github.com/ljvmiranda921/calamanCy. You can find more information in each model (or dataset) card.
Note Transformer-based pipeline using mDeBERTa-v3 (base)
Note Latest large-sized pipeline based on Tagalog fastText vectors (714k unique vectors, 300 dimensions, Size: 1.4 GB)
Note Latest medium-sized pipeline based on floret (200k unique vectors, 200 dimensions, Size: 400 MB)
Note LEGACY: Transformer-based pipeline using RoBERTa-Tagalog
Note LEGACY: Large-sized pipeline based on fastText (714k unique vectors, 300 dimensions, Size: 455 MB)
Note LEGACY: Medium-sized pipeline based on floret (50k unique vectors, 200 dimensions, Size: 77 MB)
Note Gold-standard Tagalog NER dataset. Cohen's kappa = 0.81