Abstract:
The present world has given people a huge amount of freedom and people frequently misuse
this great opportunity by harassing others. Modern people use the internet as an essential part
of their lives and there are almost 4.9 billion active users of the internet and 4.66 billion active
social media users. As people can easily reach each other and freely share their thoughts, many
of them abuse, harass, or threaten other people on social media. In spite of having a huge
number of Bangla speakers and a huge risk and potential of cyberbullying, there are very few
studies to identify bullying messages or comments in the Bengali language. Artificial
intelligence has made an amazing development in recent years and researchers have decided
to build an ensemble model based on deep learning models to identify the bully comments on
cyberspace so that they can remove them and decrease the rate of cyberbullying.
A Kaggle dataset with 44001 Bangla comments has been used in the study for training and
testing the ensemble model. An ensemble model based on GRU, LSTM, and CNN was
developed in this study which showed 97.4% accuracy. Before training and testing the dataset,
several data pre-processing methods including data cleaning, stop words removal, and
tokenization were followed. In this study, we used BERT tokenization for tokenizing texts and
used Explainable AI (XAI) to understand the procedure of the model. The results of single
models were compared with the ensemble model to understand the efficiency of the model
which can be implemented to reduce cyberbullying problems.