Publications
2024
- Momentary measures of emotions during technology-enhanced learning prospectively predict standardized test scores in two large samplesD’Mello, Sidney K., Moulder, Robert G., and Jensen, EmilyLearning and Instruction Apr 2024
- Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction TasksJensen, Emily, Sankaranarayanan, Sriram, and Hayes, BradleyIn Human-Large Language Model Interaction workshop at the 2024 ACM/IEEE International Conference on Human-Robot Interaction Mar 2024
Representing knowledge and assessing someone’s ability in an HRI task is difficult, due to complex objectives and high variability in human performance. In previous work, we begin to address this question by breaking down HRI tasks into objective primitives that can be combined sequentially and concurrently (e.g., maintain slow speed and reach waypoints). They then show that signal temporal logic specifications, paired with a robustness metric, are a useful tool for assessing performance along each primitive. These formal methods allow designers to precisely represent ideal trajectories. This formulation admits explainability, as one can identify and elaborate upon specific objectives that learners did not accomplish. We claim that LLMs can be paired with formal analysis methods to provide accessible, relevant feedback for HRI tasks. While logic specifications are useful for defining and assessing a task, these representations are not easily interpreted by non-experts. Luckily, LLMs are adept at generating easy-to-understand text that explains difficult concepts. By integrating task assessment outcomes and other contextual information into an LLM prompt, we can effectively synthesize a useful set of recommendations for the learner to improve their performance
- HRI Curriculum for a Liberal Arts EducationWilson, Jason R., and Jensen, EmilyIn Designing an Intro to HRI Course Workshop at the 2024 ACM/IEEE International Conference on Human-Robot Interaction Mar 2024
In this paper, we discuss the opportunities and challenges of teaching a human-robot interaction course at an undergraduate liberal arts college. We provide a sample syllabus adapted from a previous version of a course.
- A comparison of head movement classification methodsCallahan-Flintoft, Chloe, Jensen, Emily, Naeem, Jasim, Nonte, Michael W., Madison, Anna M., and Ries, Anthony J.Sensors Feb 2024
To understand human behavior, it is essential to study it in the context of natural movement in immersive, three-dimensional environments. Virtual reality (VR), with head-mounted displays, offers an unprecedented compromise between ecological validity and experimental control. However, such technological advancements mean that new data streams will become more widely available, and therefore, a need arises to standardize methodologies by which these streams are analyzed. One such data stream is that of head position and rotation tracking, now made easily available from head-mounted systems. The current study presents five candidate algorithms of varying complexity for classifying head movements. Each algorithm is compared against human rater classifications and graded based on the overall agreement as well as biases in metrics such as movement onset/offset time and movement amplitude. Finally, we conclude this article by offering recommendations for the best practices and considerations for VR researchers looking to incorporate head movement analysis in their future studies.
- Temporal Behavior Trees: Robustness and SegmentationSchirmer, Sebastian, Singh, Jasdeep, Jensen, Emily, Dauer, Johann C., Finkbeiner, Bernd, and Sankaranarayanan, SriramIn Proceedings of the 2024 ACM International Conference on Hybrid Systems: Computation and Control May 2024
This paper presents temporal behavior trees (TBT), a specification formalism inspired by behavior trees that are commonly used to program robotic applications. We then introduce the concept of trace segmentation, wherein given a TBT specification and a trace, we split the trace optimally into sub-traces that are associated with various portions of the TBT specification. Segmentation of a trace then serves to explain precisely how a trace satisfies or violates a specification, and which portions of a specification are actually violated. We introduce the syntax and semantics of TBT and compare their expressiveness in relation to temporal logic. Next, we define robustness semantics for TBT specification with respect to a trace. Rather than a Boolean interpretation, the robustness provides a real-valued numerical outcome that quantifies how close or far away a trace is from satisfying or violating a TBT specification. We show that computing the robustness of a trace also segments it into subtraces. Finally, we provide efficient approximations for computing robustness and segmentation for long traces with guarantees on the result. We demonstrate how segmentations are useful through applications such as understanding how novice users pilot an aerial vehicle through a sequence of waypoints in desktop experiments and the offline monitoring of automated lander for a drone on a ship. Our case studies demonstrate how TBT specification and segmentation can be used to understand and interpret complex behaviors of humans and automation in cyber-physical systems.
2023
- More Than a Number : A Multi-dimensional Framework For Automatically Assessing Human Teleoperation SkillJensen, Emily, Hayes, Bradley, and Sankaranarayanan, SriramIn Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction Mar 2023
We present a framework for the formal evaluation of human teleoperator skill level in a systematic fashion, aiming to quantify how skillful a particular operator is for a well-defined task. Our proposed framework has two parts. First, the tasks used to evaluate skill levels are decomposed into a series of domain-specific primitives, each with a formal specification using signal temporal logic. Secondly, skill levels are automatically evaluated along multiple dimensions rather than a singular number. These dimensions include robustness, efficiency, resilience and readiness for each primitive task. We provide an initial evaluation for the task of taking-off, hovering, and landing in a drone simulator. This preliminary evaluation shows the value of a multi-dimensional evaluation of human operator performance.
2022
- Emotional Learning AnalyticsD’Mello, Sidney K., and Jensen, EmilyMar 2022
This chapter discusses the ubiquity and importance of emotion to learning. It argues substantial progress can be made by coupling discovery-oriented, data-driven, analytic methods of learning analytics and educational data mining with theoretical advances and methodologies from the affective and learning sciences. Core, emerging, and future themes of research at the intersection of these areas are discussed.
- A novel video recommendation system for algebra : An effectiveness evaluation studyLeite, Walter L., Roy, Samrat, Chakraborty, Nilanjana, Michailidis, George, Huggins-Manley, A. Corinne, D’Mello, Sidney K., Faradonbeh, Mohammad Kazem Shirani, Jensen, Emily, Kuang, Huan, and Jing, ZeyuanIn LAK22: 12th International Learning Analytics and Knowledge Conference (LAK22) Mar 2022
This study presents a novel video recommendation system for an algebra virtual learning environment (VLE) that leverages ideas and methods from engagement measurement, item response theory, and reinforcement learning. Following Vygotsky’s Zone of Proximal Development (ZPD) theory, but considering low affect and high affect students separately, we developed a system of five categories of video recommendations: 1) Watch new video; 2) Review current topic video with a new tutor; 3) Review segment of current video with current tutor; 4) Review segment of current video with a new tutor; 5) Watch next video in curriculum sequence. The category of recommendation was determined by student scores on a quiz and a sensor-free engagement detection model. New video recommenda- tions (i.e., category 1) were selected based on a novel reinforcement learning algorithm that takes input from an item response the- ory model. The recommendation system was evaluated in a large field experiment, both before and after school closures due to the COVID-19 pandemic. The results show evidence of effectiveness of the video recommendation algorithm during the period of normal school operations, but the effect disappears after school closures. Implications for teacher orchestration of technology for normal classroom use and periods of school closure are discussed.
- Mathematical Models of Human Drivers Using Artificial Risk FieldsJensen, Emily, Luster, Maya, Yoon, Hansol, Pitts, Brandon, and Sankaranarayanan, SriramIn Intelligent Transportation Systems Conference Oct 2022
In this paper, we use the concept of artificial risk fields to predict how human operators control a vehicle in response to upcoming road situations. A risk field assigns a non-negative risk measure to the state of the system in order to model how close that state is to violating a safety property, such as hitting an obstacle or exiting the road. Using risk fields, we construct a stochastic model of the operator that maps from states to likely actions. We demonstrate our approach on a driving task wherein human subjects are asked to drive a car inside a realistic driving simulator while avoiding obstacles placed on the road. We show that the most likely risk field given the driving data is obtained by solving a convex optimization problem. Next, we apply the inferred risk fields to generate distinct driving behaviors while comparing predicted trajectories against ground truth measurements. We observe that the risk fields are excellent at predicting future trajectory distributions with high prediction accuracy for up to twenty seconds prediction horizons. At the same time, we observe some challenges such as the inability to account for how drivers choose to accelerate/decelerate based on the road conditions.
- Using Artificial Potential Fields To Model Driver Situational AwarenessJensen, Emily, Luster, Maya, Pitts, Brandon, and Sankaranarayanan, SriramIn 4th IFAC Workshop on Cyber-Physical Human-Systems Dec 2022
Recently, the use of artificial potential fields, known as risk fields, has been proposed for modeling human driver decision making. Such potential fields map from vehicle states and control inputs to a numerical risk measure such that the probability of choosing a control decreases as the risk associated increases. In this paper, we show that such a model can be used in a natural manner to also capture aspects of the driver’s situational awareness, assuming that the risk fields govern their underlying behavior. We demonstrate our ideas on a specific obstacle avoidance scenario wherein obstacles to be avoided are placed in front of a driver at predicable intervals. Using data collected on a pilot experiment involving six different drivers using a high-fidelity driving simulator, we demonstrate the ability of our approach to capture the likelihood that the driver has perceived/reacted to the obstacle. Our approach works for scenarios when the driver collides with the obstacle as well as scenarios involving successful collision avoidance.
2021
- What You Do Predicts How You Do: Prospectively Modeling Student Quiz Performance Using Activity Features in an Online Learning EnvironmentJensen, Emily, Umada, Tetsumichi, Hunkins, Nicholas C., D’Mello, Sidney K., Hutt, Stephen, and Huggins-Manley, A. CorinneIn LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21) Dec 2021
Students using online learning environments need to effectively self-regulate their learning. However, with an absence of teacher-provided structure, students often resort to less effective, passive learning strategies versus constructive ones. We consider the potential benefits of interventions that promote retrieval practice – retrieving learned content from memory – which is an effective strategy for learning and retention. The goal is to nudge students towards completing short, formative quizzes when they are likely to succeed on those assessments. Towards this goal, we developed a machine-learning model using data from 32,685 students who used an online mathematics platform over an entire school year to prospectively predict scores on three-item assessments (N=210,020) from interaction patterns up to 9 minutes before the assessment as well as Item Response Theory (IRT) estimates of student ability and quiz difficulty. These models achieved a student-independent correlation of 0.55 between predicted and actual scores on the assessments and outperformed IRT-only predictions (r=0.34). Model performance was largely independent of the length of the analyzed window preceding a quiz. We discuss potential for future applications of the models to trigger dynamic interventions that aim to encourage students to engage with formative assessments rather than more passive learning strategies.
- A Deep Transfer Learning Approach to Modeling Teacher Discourse in the ClassroomJensen, Emily, Pugh, Samuel L., and D’Mello, Sidney K.In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21) Dec 2021
Teachers, like everyone else, need objective reliable feedback in order to improve their effectiveness. However, developing a system for automated teacher feedback entails many decisions regarding data collection procedures, automated analysis, and presentation of feedback for reflection. We address the latter two questions by comparing two different machine learning approaches to automatically model seven features of teacher discourse (e.g., use of questions, elaborated evaluations). We compared a traditional open-vocabulary approach using n-grams and Random Forest classifiers with a state-of-the-art deep transfer learning approach for natural language processing (BERT). We found a tradeoff between data quantity and accuracy, where deep models had an advantage on larger datasets, but not for smaller datasets, particularly for variables with low incidence rates. We also compared the models based on the level of feedback granularity: utterance-level (e.g., whether an utterance is a question or a statement), class session-level proportions by averaging across utterances (e.g., question incidence score of48%), and session-level ordinal feedback based on pre-determined thresholds (e.g., question asking score is medium [vs. low or high]) and found that BERT generally provided more accurate feedback at all levels of granularity. Thus, BERT appears to be the most viable approach to providing automatic feedback on teacher discourse provided there is sufficient data to fine tune the model.
2020
- Toward Automated Feedback on Teacher Discourse to Enhance Teacher LearningJensen, Emily, Dale, Meghan, Donnelly, Patrick J., Stone, Cathlyn, Kelly, Sean, Godley, Amanda, and D’Mello, Sidney K.In 2020 CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2020) Dec 2020
Like anyone, teachers need feedback to improve. Due to the high cost of human classroom observation, teachers receive infrequent feedback which is often more focused on evaluating performance than on improving practice. To address this critical barrier to teacher learning, we aim to provide teachers with detailed and actionable automated feedback. Towards this end, we developed an approach that enables teachers to easily record high-quality audio from their classes. Using this approach, teachers recorded 142 classroom sessions, of which 127 (89%) were usable. Next, we used speech recognition and machine learning to develop teacher-generalizable computer-scored estimates of key dimensions of teacher discourse. We found that automated models were moderately accurate when compared to human coders and that speech recognition errors did not influence performance. We conclude that authentic teacher discourse can be recorded and analyzed for automatic feedback. Our next step is to incorporate the automatic models into an interactive visualization tool that will provide teachers with objective feedback on the quality of their discourse.
2019
- Generalizability of Sensor-Free Affect Detection Models in a Longitudinal Dataset of Tens of Thousands of StudentsJensen, Emily, Hutt, Stephen, and D’Mello, Sidney K.In The 12th International Conference on Educational Data Mining Dec 2019
Recent work in predictive modeling has called for increased scrutiny of how models generalize between different populations within the training data. Using interaction data from 69,174 students who used an online mathematics platform over an entire school year, we trained a sensor-free affect detection model and studied its generalizability to clusters of students based on typical platform use and demographic features. We show that models trained on one group perform similarly well when tested on the other groups, although there was a small advantage obtained by training individual subpopulation models compared to a general (all-population) model. Lastly, we perform a series of simulations to show how generalizability is affected by sample size. These results agree with our initial analysis that individual subpopulation models yield a small advantage over all-population models. Additionally, we show that training sizes smaller than 1,500 yield unstable models which make generalizability difficult to interpret. We discuss applications of this work in the context of developing large-scale affect detection models for diverse populations.
2018
- Classification of Rail Switch Data Using Machine Learning TechniquesBryan, Kaylen J, Solomon, Mitchell, Jensen, Emily, Coley, Christina, Rajan, Kailas, Tian, Charlie, Mijatovic, Nenad, Kiss, James M, Lamoureux, Benjamin, Dersin, Pierre, Smith, Anthony O, and Peter, Adrian MIn Proceedings of the 2018 Joint Rail Conference Dec 2018
Rail switches are critical infrastructure components of a railroad network, that must maintain high-levels of reliable operation. Given the vast number and variety of switches that can exist across a rail network, there is an immediate need for robust automated methods of detecting switch degradations and failures without expensive add-on equipment. In this work, we explore two recent machine learning frameworks for classifying various switch degradation indicators: (1) a featureless recurrent neural network called a Long Short-Term Memory (LSTM) architecture , and (2), the Deep Wavelet Scattering Transform (DWST), which produces features that are locally time invariant and stable to time-warping deformations. We describe both methods as they apply to rail switch monitoring and demonstrate their feasibility on a dataset captured under the service conditions by Al-stom Corporation. For multiple categories of degradation types, the baseline models consistently achieve near-perfect accuracies and are competitive with the manual analysis conducted by human switch-maintenance experts.