Predicting all star player in the national basketball association using random forest
National Basketball Association (NBA) All Star Game is a demonstration game played between the selected Western and Eastern conference players. The selection of players for the NBA All Star game purely depends on votes. The fans and coaches vote for the players and decide who is going to make the All Star roster. A player who continues to receive enough votes in following years will play more All Star games. The selection of All Star players in NBA is subjective based on voting and there are no selection criteria that take out the human bias and opinion. Analyzing data from previous sports leagues can provide insight into the factors that lead to winning games and titles. This study aims to classify the players into regular or All Star players from the National Basketball Association and identify the most important characteristics that make a player an All Star player. To accomplish this, the performance per minute of play and per average of total minutes of player were analyzed using Random Forest supported in Apache Spark's scalable machine learning library to identify which variables best predict the regular and All Star players categories. The NBA men basketball dataset is used that is publically available at open source sports in the period 1937 till 2011. This study showed that Random Forest predicts All Star players with an accuracy of 92.5% when studying the performance per average of total minutes of player, whereas an accuracy of 92.48% is obtained for the performance per minute of play. The results identified the features of importance that contribute significantly to scoring and performance index rating of player. In this study, the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology is implemented to address the data mining problem in consistent and professional way. CRISP-DM presents a hierarchical and iterative process model, and provides an extendable framework with generic-To-specific approach, starting from six phases, which are further detailed by generic and then specialized tasks. © 2017 IEEE.