Abstract:The spread of hate speech has become increasingly harmful in modern digital environments, particularly on social networking platforms. While recent advances have shown promising results in automatic hate speech detection, a key challenge remains: distinguishing genuine hate speech from reclaimed language. Accurate labeling is difficult due to the nuanced and context-dependent nature of reclaimed expressions. In this paper, we present a simple and interpretable approach for distinguishing hate speech from reclaimed language, developed for the MultiPride Shared Task. Our method generates dense semantic text embeddings and incorporates a label-noise filtering stage using Cleanlab with logistic regression, followed by a Multi-layer Perceptron (MLP) neural network for final classification. The system is designed to operate under limited computational resources while maintaining strong performance. We evaluate our approach using precision, recall, and F1-score, including macro-averaged metrics. Experimental results demonstrate robust performance despite extreme class imbalance in the dataset. Overall, the findings highlight the potential for further improvements through larger embedding models and more advanced preprocessing techniques while preserving interpretability.
| Comments: | 9 pages, 2 figures, Published in EVALITA 2026, CEUR Workshop Proceedings Vol. 4195 |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2606.01298 [cs.CL] |
| (or arXiv:2606.01298v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2606.01298 arXiv-issued DOI via DataCite (pending registration) |
|
| Journal reference: | CEUR Workshop Proceedings, Vol. 4195, 2026 |
Submission history
From: Hadi Bayrami Asl Tekanlou [view email]
[v1]
Sun, 31 May 2026 15:38:58 UTC (145 KB)
