Conference Paper

Exploring State-of-the-Art Models in Arabic NLP: Insights into Multi-Label Text Classification

2024

Authors: Salem A., Medhat W., Gamaleldin N.

This study addresses the challenge of multi-label text classification in the Arabic language, focusing on movie genre categorization using plot summaries. Even though over 400 million people speak Arabic, its natural language processing (NLP) advances are not keeping up with those of other languages because of data shortages and quality difficulties. Three key contributions are made by this research to narrow this gap: a thorough analysis of prior research on Arabic multi-label text classification; the introduction of a newly curated dataset containing 22 genre labels for Egyptian movies; and the creation of a framework for optimizing large language models (LLMs) on this dataset. Our methodology leverages multiple fine-tuned BERT models to evaluate and compare performance against existing datasets, offering a reproducible path for future research. © 2024 IEEE.

View Publication