Machine Learning for Predictive Analytics on e-Learning Platforms

Unlike traditional brick-and-mortar classrooms, there often exist large spatial and temporal gaps between teachers and learners in typical e-learning environments where learning occurs in virtual classrooms without walls. This is one of the major challenges faced by today's e-learning platforms, particularly massive open online course (MOOC) platforms such as Coursera and edX which have students from all over the world.

A research team led by Prof. Dit-Yan Yeung, with collaborators from the Massachusetts Institute of Technology (MIT), has been developing advanced learning analytics tools to improve online education using machine learning technologies. As a project funded by the Innovation and Technology Commission of the Hong Kong Government under the HKUST-MIT Research Alliance Consortium, the approach adopted by the team is centered around the notion of personalization. The research mission is to significantly improve the learning effectiveness of e-learning platforms through personalization without having to pare down the massive scale and high learner-to-teacher ratio of the courses.

Their personalized e-learning tools monitor the clickstream events of the learners for all the online activities, which include video lecture viewing, discussion forum participation, assignment submission, and quiz/exam performance. Based on the user profiles built and continuously updated, the learning performance of each learner is assessed and predicted with respect to a wide range of performance indicators using ensemble learning and temporal learning models which include recurrent neural network models. The goal is to identify the underperforming learners and their areas of underperformance and to help them improve using two other components of the system, the study plan generator and the AI tutor.

Some of the predictive analytics tools that the team members have developed using machine learning techniques for temporal sequence data include predicting the grade distribution at the end of a course, analyzing the effect of each feature of the machine learning model on performance prediction, identifying learners and zooming into each of them to understand why there is a significant drop in the predicted performance within a short period of time, and providing recommendation on additional learning activities to make improvement with respect to various specific performance indicators.

Based on similar machine learning techniques for predictive analytics, Prof. Yeung and two of his research team members recently participated in an international educational data mining competition, the 2017 ASSISTments Data Mining Competition. The dataset provided by the competition contains massive data for several hundred students, ranging from the detailed interaction data collected on the popular ASSISTments learning platform when they were middle school students, to career choice data of the students after they graduated from university. The goal was to use the ASSISTments interaction data to predict which students would, a few years later, enter STEM (Science, Technology, Engineering, and Mathematics) graduate school programs and later STEM careers. Of the 74 teams of researchers participated in the competition, Prof. Yeung's team was awarded the first place. They developed a recurrent neural network model augmented with novel regularizers to address some problems encountered by existing knowledge tracing models to learn semantically meaningful features related to the knowledge states of different mathematical skills. They then made use of these learned features to augment the other static features extracted directly from the ASSISTments interaction data to boost the STEM prediction accuracy. More information about this event can be found in a separate news article.