Publications | Emily Jensen

2024

Momentary measures of emotions during technology-enhanced learning prospectively predict standardized test scores in two large samples

D’Mello, Sidney K., Moulder, Robert G., and Jensen, Emily

Learning and Instruction Apr 2024
Large Language Models Enable Automated Formative Feedback in Human-Robot Interaction Tasks

Jensen, Emily, Sankaranarayanan, Sriram, and Hayes, Bradley

In Human-Large Language Model Interaction workshop at the 2024 ACM/IEEE International Conference on Human-Robot Interaction Mar 2024

Abs PDF

Representing knowledge and assessing someone’s ability in an HRI task is difficult, due to complex objectives and high variability in human performance. In previous work, we begin to address this question by breaking down HRI tasks into objective primitives that can be combined sequentially and concurrently (e.g., maintain slow speed and reach waypoints). They then show that signal temporal logic specifications, paired with a robustness metric, are a useful tool for assessing performance along each primitive. These formal methods allow designers to precisely represent ideal trajectories. This formulation admits explainability, as one can identify and elaborate upon specific objectives that learners did not accomplish. We claim that LLMs can be paired with formal analysis methods to provide accessible, relevant feedback for HRI tasks. While logic specifications are useful for defining and assessing a task, these representations are not easily interpreted by non-experts. Luckily, LLMs are adept at generating easy-to-understand text that explains difficult concepts. By integrating task assessment outcomes and other contextual information into an LLM prompt, we can effectively synthesize a useful set of recommendations for the learner to improve their performance
HRI Curriculum for a Liberal Arts Education

Wilson, Jason R., and Jensen, Emily

In Designing an Intro to HRI Course Workshop at the 2024 ACM/IEEE International Conference on Human-Robot Interaction Mar 2024

Abs PDF

In this paper, we discuss the opportunities and challenges of teaching a human-robot interaction course at an undergraduate liberal arts college. We provide a sample syllabus adapted from a previous version of a course.
A comparison of head movement classification methods

Callahan-Flintoft, Chloe, Jensen, Emily, Naeem, Jasim, Nonte, Michael W., Madison, Anna M., and Ries, Anthony J.

Sensors Feb 2024

Abs

To understand human behavior, it is essential to study it in the context of natural movement in immersive, three-dimensional environments. Virtual reality (VR), with head-mounted displays, offers an unprecedented compromise between ecological validity and experimental control. However, such technological advancements mean that new data streams will become more widely available, and therefore, a need arises to standardize methodologies by which these streams are analyzed. One such data stream is that of head position and rotation tracking, now made easily available from head-mounted systems. The current study presents five candidate algorithms of varying complexity for classifying head movements. Each algorithm is compared against human rater classifications and graded based on the overall agreement as well as biases in metrics such as movement onset/offset time and movement amplitude. Finally, we conclude this article by offering recommendations for the best practices and considerations for VR researchers looking to incorporate head movement analysis in their future studies.
Temporal Behavior Trees: Robustness and Segmentation

Schirmer, Sebastian, Singh, Jasdeep, Jensen, Emily, Dauer, Johann C., Finkbeiner, Bernd, and Sankaranarayanan, Sriram

In Proceedings of the 2024 ACM International Conference on Hybrid Systems: Computation and Control May 2024

Abs

This paper presents temporal behavior trees (TBT), a specification formalism inspired by behavior trees that are commonly used to program robotic applications. We then introduce the concept of trace segmentation, wherein given a TBT specification and a trace, we split the trace optimally into sub-traces that are associated with various portions of the TBT specification. Segmentation of a trace then serves to explain precisely how a trace satisfies or violates a specification, and which portions of a specification are actually violated. We introduce the syntax and semantics of TBT and compare their expressiveness in relation to temporal logic. Next, we define robustness semantics for TBT specification with respect to a trace. Rather than a Boolean interpretation, the robustness provides a real-valued numerical outcome that quantifies how close or far away a trace is from satisfying or violating a TBT specification. We show that computing the robustness of a trace also segments it into subtraces. Finally, we provide efficient approximations for computing robustness and segmentation for long traces with guarantees on the result. We demonstrate how segmentations are useful through applications such as understanding how novice users pilot an aerial vehicle through a sequence of waypoints in desktop experiments and the offline monitoring of automated lander for a drone on a ship. Our case studies demonstrate how TBT specification and segmentation can be used to understand and interpret complex behaviors of humans and automation in cyber-physical systems.

2023

More Than a Number : A Multi-dimensional Framework For Automatically Assessing Human Teleoperation Skill

Jensen, Emily, Hayes, Bradley, and Sankaranarayanan, Sriram

In Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction Mar 2023

Abs PDF Poster

We present a framework for the formal evaluation of human teleoperator skill level in a systematic fashion, aiming to quantify how skillful a particular operator is for a well-defined task. Our proposed framework has two parts. First, the tasks used to evaluate skill levels are decomposed into a series of domain-specific primitives, each with a formal specification using signal temporal logic. Secondly, skill levels are automatically evaluated along multiple dimensions rather than a singular number. These dimensions include robustness, efficiency, resilience and readiness for each primitive task. We provide an initial evaluation for the task of taking-off, hovering, and landing in a drone simulator. This preliminary evaluation shows the value of a multi-dimensional evaluation of human operator performance.

2022

Emotional Learning Analytics

D’Mello, Sidney K., and Jensen, Emily

Mar 2022

Abs HTML

This chapter discusses the ubiquity and importance of emotion to learning. It argues substantial progress can be made by coupling discovery-oriented, data-driven, analytic methods of learning analytics and educational data mining with theoretical advances and methodologies from the affective and learning sciences. Core, emerging, and future themes of research at the intersection of these areas are discussed.
A novel video recommendation system for algebra : An effectiveness evaluation study

Leite, Walter L., Roy, Samrat, Chakraborty, Nilanjana, Michailidis, George, Huggins-Manley, A. Corinne, D’Mello, Sidney K., Faradonbeh, Mohammad Kazem Shirani, Jensen, Emily, Kuang, Huan, and Jing, Zeyuan

In LAK22: 12th International Learning Analytics and Knowledge Conference (LAK22) Mar 2022

Abs HTML

This study presents a novel video recommendation system for an algebra virtual learning environment (VLE) that leverages ideas and methods from engagement measurement, item response theory, and reinforcement learning. Following Vygotsky’s Zone of Proximal Development (ZPD) theory, but considering low affect and high affect students separately, we developed a system of five categories of video recommendations: 1) Watch new video; 2) Review current topic video with a new tutor; 3) Review segment of current video with current tutor; 4) Review segment of current video with a new tutor; 5) Watch next video in curriculum sequence. The category of recommendation was determined by student scores on a quiz and a sensor-free engagement detection model. New video recommenda- tions (i.e., category 1) were selected based on a novel reinforcement learning algorithm that takes input from an item response the- ory model. The recommendation system was evaluated in a large field experiment, both before and after school closures due to the COVID-19 pandemic. The results show evidence of effectiveness of the video recommendation algorithm during the period of normal school operations, but the effect disappears after school closures. Implications for teacher orchestration of technology for normal classroom use and periods of school closure are discussed.
Mathematical Models of Human Drivers Using Artificial Risk Fields

Jensen, Emily, Luster, Maya, Yoon, Hansol, Pitts, Brandon, and Sankaranarayanan, Sriram

In Intelligent Transportation Systems Conference Oct 2022

Abs HTML Poster

In this paper, we use the concept of artificial risk fields to predict how human operators control a vehicle in response to upcoming road situations. A risk field assigns a non-negative risk measure to the state of the system in order to model how close that state is to violating a safety property, such as hitting an obstacle or exiting the road. Using risk fields, we construct a stochastic model of the operator that maps from states to likely actions. We demonstrate our approach on a driving task wherein human subjects are asked to drive a car inside a realistic driving simulator while avoiding obstacles placed on the road. We show that the most likely risk field given the driving data is obtained by solving a convex optimization problem. Next, we apply the inferred risk fields to generate distinct driving behaviors while comparing predicted trajectories against ground truth measurements. We observe that the risk fields are excellent at predicting future trajectory distributions with high prediction accuracy for up to twenty seconds prediction horizons. At the same time, we observe some challenges such as the inability to account for how drivers choose to accelerate/decelerate based on the road conditions.
Using Artificial Potential Fields To Model Driver Situational Awareness

Jensen, Emily, Luster, Maya, Pitts, Brandon, and Sankaranarayanan, Sriram

In 4th IFAC Workshop on Cyber-Physical Human-Systems Dec 2022

Abs PDF Poster

Recently, the use of artificial potential fields, known as risk fields, has been proposed for modeling human driver decision making. Such potential fields map from vehicle states and control inputs to a numerical risk measure such that the probability of choosing a control decreases as the risk associated increases. In this paper, we show that such a model can be used in a natural manner to also capture aspects of the driver’s situational awareness, assuming that the risk fields govern their underlying behavior. We demonstrate our ideas on a specific obstacle avoidance scenario wherein obstacles to be avoided are placed in front of a driver at predicable intervals. Using data collected on a pilot experiment involving six different drivers using a high-fidelity driving simulator, we demonstrate the ability of our approach to capture the likelihood that the driver has perceived/reacted to the obstacle. Our approach works for scenarios when the driver collides with the obstacle as well as scenarios involving successful collision avoidance.

2021

What You Do Predicts How You Do: Prospectively Modeling Student Quiz Performance Using Activity Features in an Online Learning Environment

Jensen, Emily, Umada, Tetsumichi, Hunkins, Nicholas C., D’Mello, Sidney K., Hutt, Stephen, and Huggins-Manley, A. Corinne

In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21) Dec 2021

Abs PDF Slides

Students using online learning environments need to effectively self-regulate their learning. However, with an absence of teacher-provided structure, students often resort to less effective, passive learning strategies versus constructive ones. We consider the potential benefits of interventions that promote retrieval practice – retrieving learned content from memory – which is an effective strategy for learning and retention. The goal is to nudge students towards completing short, formative quizzes when they are likely to succeed on those assessments. Towards this goal, we developed a machine-learning model using data from 32,685 students who used an online mathematics platform over an entire school year to prospectively predict scores on three-item assessments (N=210,020) from interaction patterns up to 9 minutes before the assessment as well as Item Response Theory (IRT) estimates of student ability and quiz difficulty. These models achieved a student-independent correlation of 0.55 between predicted and actual scores on the assessments and outperformed IRT-only predictions (r=0.34). Model performance was largely independent of the length of the analyzed window preceding a quiz. We discuss potential for future applications of the models to trigger dynamic interventions that aim to encourage students to engage with formative assessments rather than more passive learning strategies.
A Deep Transfer Learning Approach to Modeling Teacher Discourse in the Classroom

Jensen, Emily, Pugh, Samuel L., and D’Mello, Sidney K.

In LAK21: 11th International Learning Analytics and Knowledge Conference (LAK21) Dec 2021

Abs PDF Slides

Teachers, like everyone else, need objective reliable feedback in order to improve their effectiveness. However, developing a system for automated teacher feedback entails many decisions regarding data collection procedures, automated analysis, and presentation of feedback for reflection. We address the latter two questions by comparing two different machine learning approaches to automatically model seven features of teacher discourse (e.g., use of questions, elaborated evaluations). We compared a traditional open-vocabulary approach using n-grams and Random Forest classifiers with a state-of-the-art deep transfer learning approach for natural language processing (BERT). We found a tradeoff between data quantity and accuracy, where deep models had an advantage on larger datasets, but not for smaller datasets, particularly for variables with low incidence rates. We also compared the models based on the level of feedback granularity: utterance-level (e.g., whether an utterance is a question or a statement), class session-level proportions by averaging across utterances (e.g., question incidence score of48%), and session-level ordinal feedback based on pre-determined thresholds (e.g., question asking score is medium [vs. low or high]) and found that BERT generally provided more accurate feedback at all levels of granularity. Thus, BERT appears to be the most viable approach to providing automatic feedback on teacher discourse provided there is sufficient data to fine tune the model.

2020

Toward Automated Feedback on Teacher Discourse to Enhance Teacher Learning

Jensen, Emily, Dale, Meghan, Donnelly, Patrick J., Stone, Cathlyn, Kelly, Sean, Godley, Amanda, and D’Mello, Sidney K.

In 2020 CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2020) Dec 2020

Abs PDF Poster

Like anyone, teachers need feedback to improve. Due to the high cost of human classroom observation, teachers receive infrequent feedback which is often more focused on evaluating performance than on improving practice. To address this critical barrier to teacher learning, we aim to provide teachers with detailed and actionable automated feedback. Towards this end, we developed an approach that enables teachers to easily record high-quality audio from their classes. Using this approach, teachers recorded 142 classroom sessions, of which 127 (89%) were usable. Next, we used speech recognition and machine learning to develop teacher-generalizable computer-scored estimates of key dimensions of teacher discourse. We found that automated models were moderately accurate when compared to human coders and that speech recognition errors did not influence performance. We conclude that authentic teacher discourse can be recorded and analyzed for automatic feedback. Our next step is to incorporate the automatic models into an interactive visualization tool that will provide teachers with objective feedback on the quality of their discourse.

2019

Generalizability of Sensor-Free Affect Detection Models in a Longitudinal Dataset of Tens of Thousands of Students

Jensen, Emily, Hutt, Stephen, and D’Mello, Sidney K.

In The 12th International Conference on Educational Data Mining Dec 2019

Abs HTML Slides

Recent work in predictive modeling has called for increased scrutiny of how models generalize between different populations within the training data. Using interaction data from 69,174 students who used an online mathematics platform over an entire school year, we trained a sensor-free affect detection model and studied its generalizability to clusters of students based on typical platform use and demographic features. We show that models trained on one group perform similarly well when tested on the other groups, although there was a small advantage obtained by training individual subpopulation models compared to a general (all-population) model. Lastly, we perform a series of simulations to show how generalizability is affected by sample size. These results agree with our initial analysis that individual subpopulation models yield a small advantage over all-population models. Additionally, we show that training sizes smaller than 1,500 yield unstable models which make generalizability difficult to interpret. We discuss applications of this work in the context of developing large-scale affect detection models for diverse populations.

2018

Classification of Rail Switch Data Using Machine Learning Techniques

Bryan, Kaylen J, Solomon, Mitchell, Jensen, Emily, Coley, Christina, Rajan, Kailas, Tian, Charlie, Mijatovic, Nenad, Kiss, James M, Lamoureux, Benjamin, Dersin, Pierre, Smith, Anthony O, and Peter, Adrian M

In Proceedings of the 2018 Joint Rail Conference Dec 2018

Abs HTML

Rail switches are critical infrastructure components of a railroad network, that must maintain high-levels of reliable operation. Given the vast number and variety of switches that can exist across a rail network, there is an immediate need for robust automated methods of detecting switch degradations and failures without expensive add-on equipment. In this work, we explore two recent machine learning frameworks for classifying various switch degradation indicators: (1) a featureless recurrent neural network called a Long Short-Term Memory (LSTM) architecture , and (2), the Deep Wavelet Scattering Transform (DWST), which produces features that are locally time invariant and stable to time-warping deformations. We describe both methods as they apply to rail switch monitoring and demonstrate their feasibility on a dataset captured under the service conditions by Al-stom Corporation. For multiple categories of degradation types, the baseline models consistently achieve near-perfect accuracies and are competitive with the manual analysis conducted by human switch-maintenance experts.