

Diabetic Retinopathy Detection: A PySpark-Driven Approach with VGG 16 Feature Extraction and MLP Classification
The current study used cutting-edge techniques to experimentally test the early diagnosis of diabetes via retinal scans. The goal was to enable effective disease prediction and management by facilitating quick and precise medical diagnostics. Three processes were involved in the development of a Diabetic Retinopathy (DR) diagnosis tool: feature extraction, feature reduction, and image classification. The research employed Apache Spark, a distributed computing framework, to manage large datasets and enhance the performance of the multilayer perceptron (MLP) model via hyperparameter tuning and cross validation. Utilizing resources more effectively and achieving faster training times were made possible by Apache Spark. To support data-driven decision-making, the study also emphasized the significance of distributed platforms for analyzing large amounts of real-time diabetic data. To produce discriminative features for classification, the VGG16 architecture was employed for feature extraction. In the last epoch, the MLP model performed remarkably well, with an accuracy of 97%. The study also underlined the value of distributed platforms for data-driven decision-making by analyzing substantial volumes of real-time diabetes data. © 2024 IEEE.