Principal Investigator: Diane Litman
CoPrincipal Investigator(s): Amanda Godley, Diane Litman
Organization: University of Pittsburgh
Abstract:
Writing and revising are essential parts of learning, yet many college students graduate without
demonstrating improvement or mastery of academic writing. This project explores the feasibility
of improving students’ academic writing through a revision environment that integrates natural
language processing methods, best practices in data visualization and user interfaces, and
current pedagogical theories. The environment will support and encourage students to develop
self-regulation skills that are necessary for writing and revising, including goal-setting, selection
of writing strategies, and self-monitoring of progress. As a learning technology, the environment
can be applied on a large scale, thereby improving the writing of diverse student populations,
including English learners. Additionally, the project’s multidisciplinary training of graduate
students is focused on increasing diversity in cyberlearning research and development.
Three stages of investigation are planned. First, to analyze data on students’ revision behaviors,
a series of experiments are conducted to study interactions between students
and variations of the revision writing environment. Second, the collected data forms the gold
standard for developing an end-to-end system that automatically extracts revisions between
student drafts and identifies the goal for each revision. Multiple extraction algorithms are
considered, including phrasal alignment based on semantic similarity metrics and deep learning
approaches. To identify the goal of a revision, a supervised classifier is trained from the gold
standard. A diverse set of features and the representations of the identified goals (e.g.,
granularity, scope) are explored. In addition to the “extract-then-classify” pipeline, an alternative
joint sequence labeling model is also developed. The labeling of sequences is used to
recognize revision goals and the sequences are mutated to generate possible corrections of
sentence alignments for revision extraction. The writing environment is iteratively refined,
augmenting the interface prototyping through frequent user studies. Third, a complete
end-to-end system that integrates the most successful component models is deployed in
college-level writing classes. Student progress is tracked across multiple assignments.