Authors: Pranjal Mahendra Bhosale, Revati Machindra Wahul
Abstract: Cyber bullying is an increasing issue on all online platforms, especially targeting teenagers and young people. Conventional machine learning algorithms fail to perform well in identifying subtle or context-related abusive language. Recent developments in Natural Language Processing (NLP), specifically the transformer model BERT, have demonstrated immense potential in text classification. However, the computational requirements of the full-sized BERT model make it impractical for real-time applications or mobile-based solutions. Proposed in this research is a fast and light cyberbullying detection system based on compact BERT variants like DistilBERT and TinyBERT,CNN,LSTM. These models preserve the language understanding abilities of the original BERT model but with far fewer parameters and computational costs. The model is then fine-tuned on labeled datasets with content related to cyberbullying, and particular emphasis is placed on handling the class imbalance problem through methods such as Focal Loss. Through this process, the model is able to achieve performance metrics that are comparable to those of the full-sized BERT models.
DOI: