Performance of Random Oversampling, Random Undersampling, and SMOTE-NC Methods in Handling Imbalanced Class in Classification Models

Classification imbalanced class random oversampling random undersampling SMOTE-NC

Authors

  • Andika Putri Ratnasari Universitas Negeri Yogyakarta, Faculty of Mathematics and Natural Sciences, Colombo Road, Yogyakarta, , Indonesia
Vol. 12 No. 04 (2024)
Mathematics and Statistics
April 29, 2024

Downloads

One common challenge in classification modeling is the existence of imbalanced classes within the data. If the analysis continues with imbalanced classes, it is probable that the result will demonstrate inadequate performance when forecasting new data. Various approaches exist to rectify this class imbalance issue, such as random oversampling, random undersampling, and the Synthetic Minority Over-sampling Technique for Nominal and Continuous (SMOTE-NC). Each of these methods encompasses distinct techniques aimed at achieving balanced class distribution within the dataset. Comparison of classification performance on imbalanced classes handled by these three methods has never been carried out in previous research. Therefore, this study undertakes an evaluation of classification models (specifically Gradient Boosting, Random Forest, and Extremely Randomized Trees) in the context of imbalanced class data. The results of this research show that the random undersampling method used to balance the class distribution has the best performance on two classification models (Random Forest and Gradient Boosted Tree).