Removing Learning Barriers in Self-paced Online STEM Education Supprimer les obstacles à l’apprentissage dans l’enseignement asynchrone en ligne des STIM

Self-paced online learning provides great flexibility for learning, yet it brings some inherent learning barriers because of the nature of this educational paradigm. This review paper suggests some corresponding strategies to address these barriers in order to create a more supportive self-paced online learning environment. These strategies include a) increasing students’ self-awareness of learning, b) identifying struggling students


Introduction
Self-paced online learning (SPOL) has become an important educational model in higher education, where students can learn anytime, anywhere, and follow their own learning schedules.Such flexibility makes SPOL a vital self-directed educational paradigm, often adopted in adult learning, MOOCs, and life-long learning.In SPOL, students typically use asynchronous and independent learning approaches, often demanding high self-directed and self-regulated learning skills.However, because of the online distance and self-paced schedules, SPOL often lacks immediate feedback, communication, and collaboration among students and instructors.Thus, while enjoying the flexibility of SPOL, students could encounter some inherent learning barriers if the learning environment is not carefully designed.
From students' perspective, SPOL usually demands higher self-directed or self-regulated learning skills than synchronous or face-to-face educational paradigms.However, not every student has an adequate level of such skills (Kinshuk, 2016).For example, some students often wonder how well they have mastered the knowledge, where are their learning weaknesses, when they should ask for help, and if they are ready for the next topic or examination.
From instructors' perspective, because of limited communication with their students, the individualized learning pace and schedule impose a significant challenge for instructors to follow each student's learning progress.Thus, they usually have no clues whether a student is stuck somewhere in a topic unless students are reaching out to them for help.Hence, proactive academic intervention from instructors is often absent (Yan, 2020).
From the course design perspective, because of the limited feedback from students, it is not easy to learn about the effectiveness of the learning materials.Although course evaluation is usually embedded to get students' feedback about a course, its response rates are often relatively low in SPOL and may not pinpoint the design gaps.
These learning barriers often lead to students' struggle, poor performance, or even failure of a course.For example, according to a report from Athabasca University (Athabasca.ca,2020), while most courses delivered at Athabasca University through SPOL have relatively high pass rates, some courses (including those in STEM disciplines -Science, Technology, Engineering, and Mathematics) still have pass rates as low as 50-60%.Therefore, to improve learning success, educators need to find solutions to remove these barriers in SPOL.Thus, this review paper will answer two main questions: a) What strategies can be considered to alleviate these learning barriers?b) What learning design solution has the potential to implement these strategies?
As different disciplines in different educational contexts need different learning approaches and pedagogies, this review paper focuses on Science, Technology, Engineering, and Mathematics (STEM) disciplines that are delivered through SPOL.STEM disciplines share certain common pedagogical needs, such as STEM-related conceptual development, scientific inquiry, engineering design, and problem-solving (Kennedy & Odell, 2014).
In this review paper, we first identified some strategies for alleviating the learning barriers in SPOL.Then, we proposed a possible solution from the learning design perspective -designing adaptive practicing in SPOL courses -to implement these strategies.Finally, we reviewed models and techniques that can be used for adaptive practicing and concluded with a suggestion for a reinforcement learning-based adaptive practicing model.

Some Strategies for Removing the Learning Barriers in SPOL
Educators have suggested many measures to improve learning success from different perspectives, such as providing more administration, technical, and advice support, increasing communication, motivation, and counselling service, or designing more engaging and effective instructional materials.In this review paper, we mainly investigated some strategies from the course design point of view with the goal of creating a more supportive SPOL learning environment.Focusing on the STEM disciplines and their pedagogical needs -conceptual development, scientific inquiry, and problem-solving, we recognize three critical strategies for removing learning barriers.

Increasing Learning Awareness to Improve Students' Self-directed Skills
Learning awareness means how much students know their learning progress and proficiency level with each knowledge component (KC).In this paper, KCs refer to cognitive units at different levels in a course, such as a concept, a skill, or a topic.With a better picture of their knowledge proficiency level (e.g., learning strengths and weaknesses), students can more effectively direct or regulate their learning, including seeking help and managing time.As pointed out by Pelánek (2017), students' meta-cognitive abilities, discussion with instructors, and self-regulated learning can be enhanced with increased awareness of their knowledge levels.

Identifying Academically Struggling Students to Enhance Proactive Intervention
Due to the difficulty of conceptual development and problem-solving (or steep learning curve), some students might encounter struggles during learning in a course.If alerted by the system for such struggling students, instructors can provide academic intervention proactively before those students become frustrated or give up.One type of struggle is wheel-spinning (or unproductive persistence).Wheel-spinning means students continuously make great efforts or invest considerable time in learning a concept or skill but without success (Beck & Gong, 2013).Wheel-spinning can be caused by a steep learning curve of a knowledge component, ineffective instructional materials, or students' low prior knowledge or learning approach.Owing to such students' desperation for success (or being highly motivated for learning), it is imperative for instructors to identify these wheel-spinning students and provide them with proactive academic intervention.

Facilitating Mastery Learning to Improve Success Rates
Given the need for conceptual and skill development in STEM disciplines, mastery learning should be promoted (Bloom, 1973;Guskey, 2010).For example, if students in a course are provided with practicing opportunities on knowledge components towards mastery learning, they can do much better to improve their performance.Otherwise, some students might barely meet the minimum requirements for passing a course if not failing it altogether.Thus, facilitating students' mastery of learning is crucial to improve pass rates and grades.

The Theoretical Background for Implementing these Strategies
To implement such strategies, we have examined what it entails for a practical solution by identifying the key learning process-related questions that need to be answered by the solution.First, we analyzed a typical learning process in SPOL.
According to the knowledge space theory (Doignon & Falmagne, 1999), a field of knowledge is represented by a finite set of KCs.Each KC corresponds to some items (questions, problems, or tasks) for students to answer or solve.Prerequisite relationships usually exist among these knowledge components.A student's knowledge state is a subset of items that this student can solve.Possible knowledge states restricted by the prerequisite relationships among KCs form the knowledge space in a field or a course.
Like knowledge states, learning outcomes refer to the competencies students are expected to achieve after completing a topic or course.Revised Bloom's taxonomy is a widely used framework that defines learning outcomes in three major domains: the cognitive, the affective, and the psychomotor (Krathwohl, 2002).In the cognitive domain, learning outcomes are related to acquiring factual, conceptual, procedural, and meta-cognitive knowledge at six levels of cognitive processes.These six levels consist of three lower-order thinking skills (remember, understand, apply) and three higher-order thinking skills (analyze, evaluate, and create).Of them, the higher-order thinking skills learning usually depend on the mastery of the lower-order thinking skills (Krathwohl, 2002).
In SPOL, the learning outcomes usually drive course design and guide the learning process.Thus, this paper combines the knowledge space theory with revised Bloom's taxonomy to illustrate the typical learning process.In Figure 1, knowledge states is used to represent the sum of learning outcomes a student has achieved at a certain point during a topic or course learning.Because of the prerequisite relationships among knowledge components and the hierarchical relationships among the six levels of cognitive skills, learning outcomes and their corresponding exercise items (questions or tasks) are usually interdependent.As shown in Figure 1, some learning outcomes (LO) are independent (e.g., LO-A, B, C, and D), while others have prerequisite relationships (e.g., LO-E depends on LO-B, and LO-F depends on LO-C and LO-D).Also, exercise items are created to achieve each LO, possibly with different difficulty levels.Based on the mastery learning theory and zone of proximal development (ZPD) theory (Vygotsky, 1997), this research categorizes the proficiency levels that a student achieves as mastered, ZPD (effective practicing), ZPD (wheel-spinning), and non-mastered.When students are in ZPD, learning happens if they are engaged with slightly challenging exercises while referring to hints, feedback, or remedial materials for each exercise item.In this situation, students are in the process of effectively learning and no human intervention is needed at this point.However, in some cases, students could be experiencing wheel-spinning when they are learning in ZPD.For example, as shown in Figure 1, the student is wheel-spinning at LO-E.Although the student has mastered the prerequisite knowledge components in LO-B, they cannot achieve LO-E even after practicing with all available exercise items in the EC-e category.Therefore, at that point, it should be the time for a human instructor to provide academic intervention.
Based on the above analysis, we expect that a practical solution should be able to answer the following questions in order to implement the strategies for removing those learning barriers in SPOL.As these questions are raised during the learning process, they are named Learning Process Questions (LP-Q) in this paper and for easier reference, including:

Formative Assessment
This paper proposes that systematically designing formative assessment in a SPOL course can answer the five LP-Q questions.As a low-stake evaluation approach, formative assessment is recognized as a fundamental process for learning (Menéndez et al., 2019).Formative assessment has great potential to enhance students' performance by providing knowledge estimation and learning feedback to students (Kingston & Nash, 2011).This statement indicates that formative assessment could answer the LP-Q1 (about detecting students' knowledge proficiency level).Different types of formative assessment can be embedded in online courses, such as self-testing, practicing, reflection, survey, etc.For STEM disciplines, two types of formative assessments are often used: self-diagnosis and self-practicing.Both types can help to learn by providing instructional feedback to students.Selfdiagnosis usually provides a summary of feedback at the end of the assessment (e.g., pointing out all the learning weaknesses).In contrast, self-practicing provides learning hints, immediate and detailed feedback, and remediation along with each item or step (e.g., why the answer is wrong).Thus, selfdiagnosis is mainly used to estimate a student's knowledge proficiency, while self-practicing is to improve a student's knowledge proficiency level during the assessment.Which type of formative assessment is used depends on the purpose, design conditions, and context.For example, self-diagnosis can be used for pre-tests at the beginning of the course, while self-practicing can be used for learning a topic or preparing for an examination.

Potential of Adaptive Practicing
Currently, the one-size-fits-all formative assessment is often adopted due to its easy implementation.It delivers a set of pre-determined items to all students.However, because of the differences in background knowledge, a student might consider some questions too easy while another might feel them too difficult.Therefore, with the fixed question items, time could be wasted, and students could get bored or frustrated.To address this problem, adaptive assessment can be adopted.
The adaptive formative assessment uses specific intelligence techniques to tailor assessment items to individual students' knowledge or skill level by choosing a subsequent item based on a student's responses to previous items (Weiss & Kingsbury, 1984).Therefore, the most crucial feature of adaptive assessment is to detect a student's knowledge level, especially the learning weakness (Yan, 2020).By this, the adaptive assessment should answer the question LP-Q1 (about detecting students' knowledge proficiency level).Suppose an adaptive assessment mechanism is used in self-diagnosis.In that case, it can choose items with high discrimination values for individual students, thus, reducing the assessment time or improving the accuracy of knowledge estimation with the same number of items (Sorrel et al., 2020).If an adaptive assessment mechanism is used in self-practicing (we call it adaptive practicing in this review paper), adaptive practicing should not only estimate a student's knowledge level efficiently but also trigger more exercises that target a student's weaknesses (Yan et al., 2021).Thus, adaptive practicing can promote mastery learning more effectively (Beck & Gong, 2013) if the adaptive instructional policy is based on the principles of mastery learning (Pelánek, 2017).Therefore, adaptive practicing could also answer LP-Q1 (about detecting students' knowledge proficiency level).In addition, adaptive practicing has the potential to answer LP-Q2 (about selecting the next effective exercise) and LP-Q5 (about sequencing exercises for mastery learning) if appropriately designed.
During the mastery learning process with adaptive practicing, it is also possible to detect wheelspinning.Insights obtained from knowledge tracing can help identify students who need specific treatment or intervention (Pelánek, 2017).For example, if the system notices that a student has worked on the maximum number of exercises for a learning outcome but still cannot succeed, it most likely means that the student is wheel-spinning and needs help.Therefore, the adaptive practicing should be able to answer LP-Q3 (about identifying struggling students).
Wheel-spinning can be caused by individual students' factors (such as low prior knowledge or limited learning ability) or by design factors (such as the steep learning curve of a knowledge component or ineffective instructional materials).This paper assumes that if wheel-spinning happens to only a few individuals who are learning a knowledge component, it is most likely due to student factors.But if wheel-spinning occurs to most students with the same knowledge component, it is probably due to ineffective instructional materials or other design factors.As suggested by Pelánek (2017), the knowledge tracing function embedded in assessment can help to identify problematic items, learning materials, or knowledge structure.Therefore, adaptive practicing with knowledge tracing function can answer LP-Q4 (about detecting ineffective learning materials).
To promote mastery learning, there has been an emerging interest in applying certain techniques in adaptive learning to sequence learning content or tasks.For example, He-Yueya and Singla (2021) used reinforcement learning to sequence quiz questions.If such sequential decision-making technologies are adopted in adaptive practicing, LP-Q5 (about sequencing exercises for mastery learning) can be answered.Figure 2 illustrates that different exercise sequences have different effects on the learning process in terms of practicing effectiveness.When learning a concept or a skill, practicing can be an effective approach.But different exercise sequences could end up with different learning processes (e.g., LP-1, 2, and 3 in Figure 2).If an exercise selected is too easy or too hard for the student, time would be wasted, and the student could feel bored or frustrated (e.g., LP-3).Therefore, the adaptive practicing model should be able to select exercises that can effectively facilitate mastery learning.In other words, selecting an exercise sequence that consists of the minimum number of exercises to reach mastery of learning is the primary goal of an adaptive practicing model.

Figure 2
Exercise Question Sequencing Optimization Note.Different exercise selections and sequences affect mastery learning process Therefore, embedding adaptive practicing in courses can accurately estimate students' knowledge states (to answer LP-Q1), select the effective subsequent exercise (to answer LP-Q2), help detect wheel-spinning students (to answer LP-Q3), identify ineffective instructional materials (to answer LP-Q4), and promote mastery learning with the minimum number of exercises (to answer LP-Q5).Given its potential, we argue that systematically designing and embedding adaptive practicing in courses can be a possible solution for removing learning barriers in self-paced STEM courses.

Feature Requirements for the Adaptive Practicing Model
To meet the above functional needs, this literature review identifies some key features that an adaptive practicing model should provide:

Knowledge Tracing
This model should quickly estimate a student's proficiency level or trace the knowledge level based on a student's responses to previous exercise items.However, since adaptive practicing is a lowstake evaluation, its demand on the accuracy of knowledge tracing is not as strict as for high-stake summative assessment.Also, the primary goal of adaptive practicing is to gain learning while checking hints, feedback, and remediation recommendation.Thus, students' knowledge and skill levels will change during the practice process.Therefore, first and foremost, tracing the changing knowledge level should be the fundamental function of this adaptive practicing model.

Mastery Learning Promotion
To promote mastery learning, the adaptive practicing model should provide students with exercise items that are slightly challenging, meaning the model should always choose the next exercise that is in a student's ZPD of a KC until the KC is mastered.In the meantime, criteria for detecting wheel-spinning should be built into this model.

Exercise Sequencing
According to the knowledge space theory and Bloom's cognitive process framework, prerequisite relationships among learning outcomes in a course or topic usually exist.Also, because contents are interrelated, a student's knowledge state may be a complicated function of the history of activities done so far for different KCs (in this case, the exercises completed so far) (Doroudi et al., 2019).Therefore, this model should be able to cope with such a complex and uncertain practicing environment to determine the most effective exercise sequence.

Online Machine Learning Approach
Students in the SPOL follow different learning paces and learning approaches.In addition, due to students' different prior knowledge and learning ability, applying a pre-trained model to a new student is often inappropriate.Therefore, an online machine learning approach should be considered, where data in a sequential order is used to update the prediction as opposed to using an entire training data set at once.
To find an appropriate adaptive practicing model that can provide the above features, this paper has reviewed the models and techniques developed for adaptive assessment, as described in the following section.

A Review of Models and Techniques Used for Adaptive Assessment
Among the adaptive assessment models and techniques developed so far, some focus on knowledge diagnosis, and others focus on learning promotion.In either case, knowledge tracing is an essential function of adaptive assessment.
As pointed out by Liu et al. (2021), Given the sequence of students' learning interactions in online learning systems, knowledge tracing aims to monitor students' changing knowledge states during the learning process and accurately predict their performance on future exercises; this information can be further applied to individualize students' learning schemes in order to maximize their learning efficiency.(p.3) The majority of knowledge tracing models can be grouped into several categories, including logistic models, probability models, deep learning-based models, cognitive diagnosis models, and reinforcement learning-based models.

Logistic Models
Based on the logistic function, these models use a mathematical function of student parameters and KC parameters to represent the probability of correctly answering the next question.For example, item response theory model (Hambleton et al., 1991) and performance factor analysis (Pavlik Jr et al., 2009) were developed in the context of computerized adaptive testing.However, since such models are typically based on the assumption that a student's knowledge level is constant, this contradicts the knowledge changing during practicing.Therefore, these models can hardly be used for adaptive practicing.
One exception to the logistic models is the Elo rating system (Elo, 1978), which tracks changing knowledge during the assessment.It can accommodate guessing behaviour, individual item difficulty, measuring correlated skills, and partial credit modelling of answers in the presence of hints, response time, etc.However, the Elo rating system is mainly applied in domains with simple structures (such as declarative knowledge, vocabulary learning, and simple procedure knowledge).Moreover, it needs enough historical data to fit the parameters first (Pelánek, 2017).Therefore, the Elo rating model does not meet the complex needs of the STEM disciplines and the context of SPOL in this study.

Probability Models
Probability models treat the learning process as a Markov process, where students' latent knowledge state can be estimated by their observed learning performance (assessment results) (Corbett & Anderson, 1995).The classic and widely used model in this category is Bayesian knowledge tracing (BKT) (Corbett & Anderson, 1995), which uses the Bayesian networks technique and assumes that the learning process is a two-state (learned or unlearned) hidden Markov model (HMM).This model uses four interpretable parameters to estimate a student's knowledge state, including the initial learning probability ( ), learning transition probability ( ), guessing probability ( ), and slipping probability ( ).Although BKT can be used to trace knowledge change, it requires a large amount of prior student data.Additionally, the standard BKT model assumes that KCs are independent of each other, and one set of parameters is for one KC only.Some extended BKT models, including dynamic Bayesian knowledge tracing (Kaser et al., 2017), can model the prerequisite hierarchies and relationships within KCs.But all BKT models would use historical data to train a model first and usually do not consider the difference between students (e.g., learning ability and prior knowledge).Thus, such models can barely meet the feature requirements of the adaptive practicing model.

Deep Learning-based Models
In recent years, research on deep learning-based knowledge tracing models shows its powerful ability to deal with the complex learning process and has achieved quite good performance.One typical example is deep knowledge tracing (DKT) (Piech et al., 2015), which utilizes recurrent neural networks or long short-term memory networks to provide a high-dimensional and continuous representation of students' knowledge states.Some variants of DTK can handle the relationship among KCs and other side information.However, deep learning-based models are poorly interpretable due to their end-to-end learning characteristics.This limits their applicability due to the crucial significance of interpretability in education.Like BKT, DKT also needs considerable training data to fit the model first.Moreover, although DKT can be used to schedule exercises, it cannot be used to optimize the efficacy of exercises (Bassen et al., 2020).Therefore, the deep-learning models are mainly developed to estimate knowledge level rather than promote mastery learning.So, DKT models are not appropriate for adaptive practicing.

Cognitive Diagnostic Models
Cognitive diagnostic models (CDMs) have been developed to detect mastery and non-mastery of knowledge or skills based on a Q-matrix (a map of the relationship between assessment items and the skills) (de la Torre, 2009).Compared to the unidimensional item response models, CDMs can provide a more detailed evaluation or more granular evidence of the strengths and weaknesses of students.As such, these models focus on task mastery and guide how students invest efforts in competency rather than grade achievement (Shepard et al., 2018).CDMs include a broad family of models, such as DINA, DINO, and GenMa.However, all these models require large data samples to estimate skills effectively.Although CDMs can provide more detailed insights into a student's knowledge state (e.g., where the learning gaps are), they lack the ability to sequence exercises to maximize learning.Therefore, it would be hard to use CDMs in adaptive practicing.

Reinforcement Learning-based Models
Reinforcement learning (RL) is a computational framework for modelling and automating goaldirected learning and sequential decision-making (Sutton & Barto, 2018).Given the centrality of sequencing learning content or tasks, there has been an emerging interest in applying RL to improve students' performance.For example, He-Yueya and Singla (2021) investigated how an RL-based policy can be used for quizzing students to infer their knowledge state.Bassen et al. (2020) developed a reinforcement scheduling model to maximize learning gains while reducing the time spent on educational activities.As one RL family, multi-armed bandit algorithms (Berry & Fristedt, 1985) are also starting to attract researchers' attention in the educational world (Lin, 2020).For example, Clement et al. (2015) used multi-armed bandits in an intelligent tutoring system.Therefore, RL-based models hold great potential for adaptive practicing.

Towards Reinforcement Learning-based Adaptive Practicing Design
After examining these adaptive assessment models, we believe most models can be used for adaptive self-diagnosis, which aims to detect a student's static knowledge proficiency level.However, for adaptive practicing, we argue that RL-based models would be the best option that can potentially meet all the feature requirements identified in this paper.Reinforcement learning uses a rewarding mechanism to sequence actions to optimize the outcomes in an uncertain and changing environment without pre-populating a model with historical data (Kaelbling et al., 1996).
First, the adaptive practicing system in this study is researched in an uncertain and changing environment.The prerequisite and interdependent relationships among KCs make learning a complicated function of previous practicing history.For example, completing exercises successfully for a particular KC could mean that a student has mastered the current KC and its prerequisite KCs.Additionally, a student's knowledge state across many KCs could constantly change while simultaneously completing every exercise item.However, because of students' different learning approaches and prior knowledge, the number of exercises and which exercises a student needs variety in mastering a KC.Compared to other types of models, RL-based models can better handle such uncertain and changing environments better than other models.
Secondly, because of students' different learning backgrounds and learning paces in SPOL, a pre-trained model would hardly work for adaptive practicing in such an educational paradigm.Reinforcement learning is an online machine learning technology that does not need historical data to fit a model first.Unlike other models that typically require historical student data and model training, a RL-based model can make decisions and learn from their outcomes continuously and in real time (Bassen et al., 2020).
Finally, RL aims to use a rewarding mechanism to sequence actions and optimize the outcomes.In our case, RL-based models can be used to sequence exercises to promote mastery learning with the minimum number of exercises.Sequential decision-making is an essential advantage of RL-based models over other types of models when they are used for adaptive practicing.
We argue that the reinforcement learning-based model is the best candidate for adaptive practicing.As designed by Bassen et al. (2020) for reinforcement scheduling, which adaptively selects assignment that leverages reinforcement learning and requires no pre-existing course data or skill labels, we can design an RL model-based adaptive practicing system.

A Typical Use Case Scenario of Adaptive Practicing
Adaptive practicing can be embedded in a course for mastering a topic, preparing for examinations, or passing the whole course.A typical use of adaptive practicing would be preparing students for the mid-term or the final examination.As a high-stakes assessment, the examination evaluates a student's knowledge proficiency and motivates students to review and synthesize what they have learned in the previous topics.However, if a student gets poor performance on such a high-stakes assessment, it could result in a lower grade or even failing a course.Therefore, it would be necessary for students to get ready for the examination.This is where adaptive practicing can play a role.
Since examination usually spans multiple learning topics with many KCs, it would be ideal for students to know which parts still need more effort and how to invest their time in different KCs prior to taking the exam.Thus, it is by adaptive practicing that students can not only efficiently identify their learning weaknesses or gaps but also effectively fill such learning gaps.Furthermore, with the alert function of adaptive practicing, a student can self-determine if they are experiencing wheel-spinning.

Conclusion
This review paper identified the inherent learning barriers in self-paced online learning, such as the high demand for self-directed learning skills, low learning awareness, lack of proactive academic support, lack of students' feedback, and poor academic performance.Focusing on STEM disciplines delivered in SPOL, we suggest three essential strategies for alleviating those barriers -increasing students' self-awareness of learning, identifying struggling or wheel-spinning students, and facilitating mastery learning.We argue that systematically designing adaptive practicing in STEM courses could be an effective solution for implementing these strategies.Then, this paper depicts a typical SPOL learning process according to knowledge space theory and Bloom's taxonomy of learning.From there, a few key questions to be answered by the adaptive practicing system are determined.Based on these questions, the features requested for the adaptive practicing model are identified.After reviewing the models and techniques used for adaptive assessment, we argue that a reinforcement learning-based model would be the best option for adaptive practicing.We hope that this preliminary review work can help educators to understand the challenges of SPOL and the potential of adaptive practicing design in addressing these challenges in STEM disciplines.

Future Work
At this point, we have advocated several strategies for addressing the learning barriers in SPOL through adaptive practicing, which we argue is a promising and partial solution.However, using a reinforcement learning model for adaptive practicing still needs more research.As Clement et al. (2015) pointed out, most RL models only consider the correct or incorrect answer for knowledge tracing.Additionally, other aspects of information could be valuable, such as response time, clue click, learner feedback, etc.This is especially true in the case of adaptive practicing because answer correctness alone does not usually tell if an exercise is effective or within a student's ZPD.
Based on the potential of the RL-based model for adaptive practicing and the research gap, we outline some future work and the methodology for further research, mainly for our study, including (1) stage-1: designing an RL-based adaptive practicing model, which will embed an algorithm that considers not only the answer correctness but also some vital side information; (2) stage-2: creating a prototype adaptive practicing system and simulating the model; (3) stage-3: conducting a comparison experiment with some real-world courses; and (4) stage-4: validating our hypothesis about the effectiveness of the solution and model through learning data and a survey.

Figure 1 A
Figure 1A Typical Learning Process