TWiki> TFlex Web>Reports>August2007 (revision 1)EditAttach

TFlex technical report: June 2007 to August 2007

Scientific and technical objectives

The overall goal of the TFLex project was to make language technology easier to use in tutorial dialogue and learning research. Specifically, we aimed to provide:

  • tools for fast and robust deep parsing of natural language, supporting tutorial dialogue and computer-assisted learning
  • lexical resources to be used in parsing, and tools to extend their coverage to new domains
  • tools that make language technology accessible to domain and educational experts


One of the big bottlenecks in designing tutorial dialogue systems is building semantic representa- tions of utterance content. Our approach to solving this problem is to develop tools resources for deep parsing combined with semantic interpretation and role labelling. These include building a fast parser with a grammar for syntactic and semantic interpretation (Objective 1), algorithms for extracting lexical entries from wide-coverage lexicons, and tools to support linguists in resolving inconsistencies and improving lexicon precision (Objective 2).

We are working on extending coverage of a parsing lexicon by combining information from lexical semantic resources developed in the computational linguistic community, in particular FrameNet . Because the existing lexicons have not been designed for parsing as primary application, and because they are built based on different linguistic theories, the information in them is sometimes incomplete or inconsistent.

Combining information from such different sources results in loss of precision. To improve precision, the automatically generated entries can be checked manually by experienced linguists. Our approach is to develop tools simplifying this process based on information from different lexicons and corpora.

Concise accomplishments

  • developing lexical resources for deep parsing

Expanded accomplishments

Developing Lexical Resources for Deep Parsing

Currently, tutorial dialogue systems only allow very limited forms of student input, and all data annotation is done entirely by hand. Deep parsing can facilitate the detailed analysis necessary for assessment of student input in tutorial dialogue systems and collaborative problem solving environments. In addition, linguistic features can improve classification accuracy. However, deep parsers are often difficult to extend to new domains because of their limited lexical coverage. Existing wide coverage lexical resources (used in information extraction and question answering) are not in the format suitable for deep parsers.

The Edinburgh team developed tools to improve coverage of deep parsers:

  1. Developed a set of methods to extract verb lexical entries from a widely used and publicly available corpus: FrameNet (McConville and Dzikovska, 2007):
    • Extracted a lexicon with 2,600 verb senses from the FrameNet corpus, with the goal of further expanding the TRIPS lexicon (pending funding)
  2. Developed a set of tools to check extracted entries and merge them efficiently with existing lexicon
  3. Developed a framework to connect lexical resources to different parsers
    • All lexical entries are extracted into a framework-independent representation
    • The representation can then be mapped to lexical entries for different parsers. Currently, we are only using a mapping to the TRIPS parser. However, the approach should allow a mapping to a different parser, for example, if text rather than dialogue needs to be parsed, with a different grammar


The work below is planned as part of the extension proposal funded by ONR, Grant N00014-08- 1-0179:

  1. Integrate the features available from our lexicons and tools for deep parsing into tutorial dialogue tools for dialogue researchers (TagHelper and TuTalk ) being developed at CMU as part of the connected grant number N000140510043.
    • Integrate the TRIPS/TFlex parser with the TagHelper tool, and evaluate the impact of deeper linguistic features on classification accuracy.
    • Extend the TuTalk tools to take advantage of the more detailed analyses output by the parser.
    • Add mappings from the FrameNet and TRIPS lexicons to a robust parser for text, to support analyzing essays.
    • Develop tools for quickly adding domain-specific vocabulary when a new domain needs to be analyzed.

  1. Integrate the algorithms developed as part of this grant into BEETLE tutorial dialogue system (developed at the University of Edinburgh, supported by ONR grant N000149910165)
    • Extend the algorithms to extract lexical entries for nouns and adjectives
    • Merge the lexicon extracted from FrameNet with the TRIPS lexicon for use in the Beetle system
    • Evaluate improvements in parsing and interpretation quality (in the Beetle system) when using the extended lexicon

Major problems/issues

Technology transfer

Foreign collaborations and supported foreign nationals


  • Mark McConville and Myrosia O. Dzikovska (2007). Extracting a verb lexicon for deep parsing from FrameNet . Proceedings of the ACL Workshop on Deep Linguistic Processing, Prague.

-- MarkMcConville - 14 Aug 2008

Edit | Attach | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 14 Aug 2008 - 11:30:07 - MarkMcConville
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies