Named entity recognition of persons' names in Arabic tweets
The rise in Arabic usage within various socialmedia platforms, and notably in Twitter, has led to a growing interest in building ArabicNatural Language Processing (NLP) applications capable of dealing with informal colloquialArabic, as it is the most commonly used form of Arabic in social media. The uniquecharacteristics of the Arabic language make the extraction of Arabic named entities achallenging task, to which, the nature of tweets adds new dimensions. The majority ofprevious research done on Arabic NER focused on extracting entities from the formallanguage, namely Modern Standard Arabic (MSA). However, the unstructured nature ofthe colloquial language used in tweets degrades the performance of NER systems developedto support formal MSA text. In this paper, we focus on the task of Arabic persons'names recognition. Specifically, we introduce an approach to extract Arabic persons'names from tweets without employing any morphological analysis or languagedependentfeatures. The proposed approach adopts a rule-based model combined with astatistical one. This approach uses unsupervised learning of patterns and clustered dictionariesas constrains to identify a person's name and resolve its ambiguity. Our approachoutperforms the best reported result in the literature on the same test set by an increaseof 19.6% in the F-score.