Predicting student performance and identifying learning behaviors using decision trees and K-means clustering
Md. Mahadhi Hasan, Md Nakibul Islam, Md Ikramul Haque Nirjon, Md Sharif Uddin, Md. Muntasir Mamun, Zaheed Alam Munna, Al Mahmud Rumman
Abstract
The insufficiency of a strong mechanism to measure student performance and learning behavior has been pointed out as a result of the expansion of higher education in Bangladesh. The objectives of the study are to predict students’ performance and recognize unique learning behaviors in the Bangladeshi higher education contexts by applying decision trees and K-means clustering methods. Validity and reliability of the results are ensured by following methods: 10-fold cross-validation for the decision tree model and Silhouette score assessment for the K-means clustering model, thus improving the predictive accuracy and differentiation of clusters. The study is based on a dataset of student records numbering 1,200, researching factors such as attendance (91.22%), exam results (mean 83.54%), completed assignments (mean 80.54%), and age (mean 23.47). Learning analytics theory is used since it is crucial to apply data to enhance the understanding and effectiveness of learning processes. The decision tree model showed excellent performance with high rates in precision, recall, and F1-scores, which were all at 0.99 for the evaluated performance measures, hence increasing its good predictive power. K-means clustering analysis grouped the students into three distinct groups: active learners, passive learners, and at-risk students. This research urges the adaptation of data mining methodologies within the framework of higher education and strongly emphasizes the important role that an early identification of at-risk students can play. This research is a contribution to the learning analytics area, and it further proves the applicability of data mining methods in predicting academic performance and improving education outcomes in developing contexts.