| language: ur | |
| thumbnail: https://raw.githubusercontent.com/urduhack/urduhack/master/docs/_static/urduhack.png | |
| tags: | |
| - roberta-urdu-small | |
| - urdu | |
| - transformers | |
| license: mit | |
| ## roberta-urdu-small | |
| [](https://github.com/urduhack/urduhack/blob/master/LICENSE) | |
| ### Overview | |
| **Language model:** roberta-urdu-small | |
| **Model size:** 125M | |
| **Language:** Urdu | |
| **Training data:** News data from urdu news resources in Pakistan | |
| ### About roberta-urdu-small | |
| roberta-urdu-small is a language model for urdu language. | |
| ``` | |
| from transformers import pipeline | |
| fill_mask = pipeline("fill-mask", model="urduhack/roberta-urdu-small", tokenizer="urduhack/roberta-urdu-small") | |
| ``` | |
| ## Training procedure | |
| roberta-urdu-small was trained on urdu news corpus. Training data was normalized using normalization module from | |
| urduhack to eliminate characters from other languages like arabic. | |
| ### About Urduhack | |
| Urduhack is a Natural Language Processing (NLP) library for urdu language. | |
| Github: https://github.com/urduhack/urduhack | |