TFlex technical report: June 2007 to August 2007

Principal Investigator: Johanna D. Moore

Co-principal Investigator: Myroslava O. Dzikovska

Grant number: N00014-08-1-0048

Grant title: A Shared Resource for Robust Semantic Interpretation for Both Linguists and Non-linguists -- Edinburgh participation

Scientific and technical objectives

The overarching objective of the TFlex project is to make language technology easier to use across a range of educational applications, including tutorial dialogue and learning research.

The core objective of the Edinburgh team was to develop lexical tools for fast, robust, deep parsing of natural language, in order to support educational applications. Specifically, we address the following three issues:

  • Depth: accessible language processing tools that produce "deep", detailed analyses of student input. If such tools are easily available, they can benefit different educational and training applications:
    • tutorial dialogue systems require detailed analysis of student language in order to give feedback adapted to the learning situation
    • the accuracy of both computer-supported collaborative learning systems and annotation tools for educational researchers can be improved by incorporating additional linguistic features
  • Coverage: tools and resources to ensure that a large variety of common words used in educational settings can be understood by a system, and that new words can be added quickly and easily.

Approach

The approach pursued by the Edinburgh team involves developing core language technology based on our depth and coverage goals, which can then be integrated with the user interfaces for educational researchers developed by the CMU team. We are building on the results of the previous joint effort between Edinburgh and CMU, as part of which we developed tools to harvest verb lexical entries from the VerbNet and FrameNet resources. We are approaching our current goals as follows:

  • We are combining two existing types of resources: deep parsers, which provide depth of analysis but lack coverage, and semantic lexicons, which have wider coverage but are not integrated with deep parsing systems.
  • Specifically, we are harvesting a wide range of verb entries from the FrameNet semantically annotated corpus, before filtering out spurious information, and then integrating the newly extracted entries with a deep parser used in tutorial dialogue systems (the TRIPS parser).

Concise accomplishments

  • We harvested 2,700 verb entries from the FrameNet corpus for potential inclusion in the lexicon of the parser used in the Beetle2 tutorial dialogue system. This includes 1,200 words that have not been defined in the lexicon at all, and thus would expand system coverage by 44%. Additionally, many entries for the verbs already defined in the system lexicon describe word usages not previously handled by the system. The investigation of how to detect and merge in such entries is in progress
  • We developed automated procedures for improving the quality of the harvested verb entries, assuring that the resulting subcategorization frames encode only the "canonical" usages (and do not encode regular alternants like passives, relatives etc.)

Expanded accomplishments

Currently, tutorial dialogue systems only allow very limited forms of student input, and all data annotation is done entirely by hand. Deep parsing can facilitate the detailed analysis necessary for assessment of student input in tutorial dialogue systems and collaborative problem solving environments. In addition, linguistic features can improve classiffication accuracy. However, deep parsers are often difficult to extend to new domains because of their limited lexical coverage. Existing wide coverage lexical resources (used in information extraction and question answering) are not in the format suitable for deep parsers.

The Edinburgh team developed tools to improve coverage of deep parsers.

First of all, we developed a method to automatically harvest verb entries from a widely used and easily available semantically annotated corpus, FrameNet . In this way, we extracted 2,600 verb entries and encoded them in a theory-neutral format, suitable for importing into any deep, lexicalized parser. Around 1,200 of these entries are not contained in the TRIPS lexicon, meaning that when we integrate them, TRIPS system coverage will be expanded by 44%.

Next, we inspected many of the harvested entries by hand, and identified classes of spurious subcategorisation frames, which are unsuitable for inclusion in a deep parsing verb lexicon. These involved the following linguistic phenomena:

  • passive
  • modifiers
  • control
  • imperatives
  • relative clauses
We developed and evaluated procedures for automatically eliminating these frames from the harvested lexical entries. Results are reported in McConville and Dzikovska (2007).

Next we initiated development of a tool to align the syntactic and semantic information in the harvested lexicon with that assumed by the TRIPS parser.

Workplan

This effort has now been superseded by ONR award N00014-08-1-0179, for which the appropriate project report has been written.

Major problems/issues

Technology transfer

The results of TRIPS parser improvement will directly benefit another ONR-funded project, "BEETLE: The Role of Student Input and Tutor Adaptation in Learning from Tutoring". We expect that improved lexicon coverage would result in more accurate language interpretation, and we will evaluate the results of our project with the human-computer interaction data collected in the BEETLE2 project.

Foreign collaborations and supported foreign nationals

Publications

  • Mark McConville and Myrosia O. Dzikovska (2007). Extracting a verb lexicon for deep parsing from FrameNet . Proceedings of the ACL Workshop on Deep Linguistic Processing, Prague.

-- MarkMcConville - 14 Aug 2008


This topic: TFlex > WebHome > Reports > August2007
Topic revision: r4 - 15 Aug 2008 - 14:03:24 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies