

Apache Spark Powered: Enhancing Network Intrusion Detection System Using Random Forest
The increasing sophistication of cyber attacks necessitates effective intrusion detection systems. We propose a novel intrusion detection method integrating deep learning with big data management using Apache Spark. Leveraging the comprehensive CSE-CIC-IDS2018 dataset, we apply extensive data preprocessing, including handling missing and unreliable values, duplicates, and redundant columns. In addition, implementation of a Random Forest based feature importance approach is derived to prioritize the most impactful Features. Furthermore, stratified k-fold cross-validation is used for a model selection process on a class-imbalanced dataset. Our weighted Random Forest classifier achieves a remarkable weighted average F1-score of 0.999 and a test inference time of 0.673 seconds using only the top 34 features, outperforming previous studies without sampling techniques. The proposed architecture offers a scalable and accurate solution for intrusion detection in cloud architectures, demonstrating the effectiveness of combining deep learning and big data technologies for cybersecurity. © 2024 IEEE.