Evaluating Student Performance Prediction Using Machine Learning Models

Document Type : Review Article

Authors

1 department Information technology management, management technology and information systems, port said university

2 Department of Information Technology Management, Faculty of Management Technology and Information Systems, Port Said University, Port Said

Abstract

Machine learning plays a crucial role in addressing various challenges in data science. A widely used application of machine learning is the prediction of outcomes based on large educational datasets. This study examines a dataset of 4,424 students with 20 features. Several regression models, including Linear Regression (LR), XGBoost, Support Vector Regression (SVR), Random Forest (RF), and Stacking Regressor, were developed and compared to predict students’ GPA on a 0–4 scale. Additionally, classification models such as LR, RF, XGBoost, and Support Vector Machine (SVM) were implemented to categorize students into Dropout, Enrolled, or Graduate groups. Various evaluation metrics such as accuracy, specificity, precision, recall, and F1 score are utilized to assess model performance. Furthermore, a clustering is implemented using the Principal Component Analysis (PCA) on the numerical features algorithm and Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction on high-dimensional categorical data. Students were segmented into three groups based on the Silhouette Score and Davies-Bouldin Index (DB). The clustering technique identifies three student clusters, yielding a silhouette score 0.35. The proposed system demonstrates strong predictive capabilities as the most effective model, achieving minimal Mean Squared Error (MSE) and high accuracy. These clusters are analyzed through visualizations of exam score distributions and feature averages.

Keywords

Main Subjects