NepBERTa, Nepali Language Model

This project is developing a large-scale pre-trained Nepali language model to support NLP applications in Nepali, including text classification, named entity recognition, and other language understanding tasks.
Nepali NLP has historically suffered from limited high-quality language resources, restricting the development of robust NLP applications. Most existing language models are trained on English or other high-resource languages, leading to poor performance on Nepali text tasks. There is a need for a large, monolingual Nepali model that can understand language structure, context, and semantics to enable better NLP tools for Nepali users.
The project aimed to pre-train a large-scale Nepali language model, NepBERTa, using extensive monolingual Nepali corpora. It seeks to provide a foundation for a wide range of NLP tasks in Nepali, such as text classification, named entity recognition, part-of-speech tagging, and more, enabling downstream NLP applications to perform reliably on Nepali text.
NepBERTa was successfully developed and evaluated on multiple Nepali NLP benchmarks, demonstrating strong performance compared to existing models. The model provides a foundation for researchers and developers to fine-tune for a variety of Nepali language applications, improving accessibility and effectiveness of NLP solutions in Nepali.
Official Release: View