The automatic generation of textual descriptions from images, known as image captioning, holds significant importance in various applications. Image captioning applications include accessibility for the visually impaired, social media enhancement, automatic image description for search engines, assistive technology for education, and many more. While extensive research has been conducted in English, exploring this challenge in Arabic remains limited due to its complexity. Arabic is one of the world's most widely spoken languages. Around 420 million native people speak this language. It is also the official language of 22 nations in the Middle East and North Africa. Image captioning in Arabic will foster inclusion, improve communication, and provide technological breakthroughs. Therefore, this study aims to develop an Arabic image caption generator using the Flicker30k dataset. The proposed model comprises an encoder, decoder, and translation transformer, which generate descriptive captions for Arabic images. The encoder component utilizes ResNet101, a powerful convolutional neural network, to extract rich visual features from the input images. The decoder module consists of an attention block that allows the model to focus on different parts of the image, and an LSTM model, used to generate the captions. Finally, the generated captions are translated using an English-to-Arabic dialect translation transformer. The model achieved a BLEU-4 score of 0.1253 for English-generated captions and medium user satisfaction for Arabic-generated captions. © 2023 IEEE.

Towards Arabic Image Captioning: A Transformer-Based Approach

Towards Arabic Image Captioning: A Transformer-Based Approach

Hands-on analysis of using large language models for the auto evaluation of programming assignments

Development of hepatocellular carcinoma organoid model recapitulating HIF-1A metabolic signature

Analysis of plasmonic nanoparticles effects on the performance of perovskite solar cells through surface recombination and short-circuiting behaviors

Development of low-density AlNbTaTiZr refractory high-entropy-intermetallic-alloy: Microstructural evolution, mechanical properties, and high-temperature deformation

Towards Arabic Image Captioning: A Transformer-Based Approach

Towards Arabic Image Captioning: A Transformer-Based Approach

Related Publications

Hands-on analysis of using large language models for the auto evaluation of programming assignments

Development of hepatocellular carcinoma organoid model recapitulating HIF-1A metabolic signature

Analysis of plasmonic nanoparticles effects on the performance of perovskite solar cells through surface recombination and short-circuiting behaviors

Development of low-density AlNbTaTiZr refractory high-entropy-intermetallic-alloy: Microstructural evolution, mechanical properties, and high-temperature deformation