about.jpg
Conference Paper

Apache Spark Powered: Enhancing Network Intrusion Detection System Using Random Forest

By
Mamdouh H.
Tarek M.
Radwan A.
Saeed A.
Abdeen A.
Ashraf M.
Fouad K.M.
Abdelbaky I.

The increasing sophistication of cyber attacks necessitates effective intrusion detection systems. We propose a novel intrusion detection method integrating deep learning with big data management using Apache Spark. Leveraging the comprehensive CSE-CIC-IDS2018 dataset, we apply extensive data preprocessing, including handling missing and unreliable values, duplicates, and redundant columns. In addition, implementation of a Random Forest based feature importance approach is derived to prioritize the most impactful Features. Furthermore, stratified k-fold cross-validation is used for a model selection process on a class-imbalanced dataset. Our weighted Random Forest classifier achieves a remarkable weighted average F1-score of 0.999 and a test inference time of 0.673 seconds using only the top 34 features, outperforming previous studies without sampling techniques. The proposed architecture offers a scalable and accurate solution for intrusion detection in cloud architectures, demonstrating the effectiveness of combining deep learning and big data technologies for cybersecurity. © 2024 IEEE.