TWiki> TFlex Web>Reports>July2008 (revision 16)EditAttach

TFlex technical report: June 2007 to May 2008

Scientific and technical objectives

[Summarize the current project objectives noting if they deviate from those listed in the original proposal (200 words).]

The overarching objective of the TFlex project is to make language technology easier to use across a range of educational applications, including tutorial dialogue and learning research.

The core objective of the Edinburgh team is to develop lexical tools for fast, robust, deep parsing of natural language, in order to support educational applications. Specifically, we address the following three issues:

  • Depth: accessible language processing tools that produce "deep", detailed analyses of student input. If such tools are easily available, they can benefit different educational and training applications:
    • tutorial dialogue systems require detailed analysis of student language in order to give feedback adapted to the learning situation
    • the accuracy of both computer-supported collaborative learning systems and annotation tools for educational researchers can be improved by incorporating additional linguistic features
  • Coverage: tools and resources to ensure that a large variety of common words used in educational settings can be understood by a system, and that new words can be added quickly and easily
  • Robustness: tools to understand incomplete, ungrammatical or fragmentary utterances common in tutorial dialogue

[200 words]


[Summarize the current project approach noting if/how the current approach deviates from the original proposal (200 words).]

The approach pursued by the Edinburgh team involves developing core language technology based on our depth, coverage and robustness goals, which can then be integrated with the user interfaces for educational researchers developed by the CMU team. We are building on the results of the previous joint effort between Edinburgh and CMU, as part of which we developed tools to harvest verb lexical entries from the VerbNet and FrameNet resources. We are approaching our current goals as follows:

  • Depth and coverage. We are combining two existing types of resources: deep parsers, which provide depth of analysis but lack coverage, and semantic lexicons, which have wider coverage but are not integrated with deep parsing systems. We are developing new methods for utilizing information available from FrameNet , to improve quality of extracted entries, and to extend coverage to nouns and adjectives in addition to verbs. We are integrating the newly extracted entries with a deep parser used in tutorial dialogue systems (the TRIPS parser)
  • Robustness. We are investigating the information available from dependency parsers, which are more robust than deep linguistic parsers, and ways to utilize it if a deep parser does not produce a sentence analysis.

[194 words]

Concise accomplishments

[Briefly summarize accomplishments from the current reporting period and briefly note the significance of data/results.]

  • We harvested 2,700 verb entries from the FrameNet corpus for potential inclusion in the lexicon of the parser used in the Beetle2 tutorial dialogue system. This includes 1,200 words that have not been defined in the lexicon at all, and thus would expand system coverage by 44%. Additionally, many entries for the verbs already defined in the system lexicon describe word usages not previously handled by the system. The investigation of how to detect and merge in such entries is in progress
  • We investigated the usability of five parsing schemes with a view for inclusion in a tutorial dialogue system. The investigation showed that features needed for interpretation in tutorial dialogue are available from different parsers. However, no single system provides all the necessary features. We plan to investigate the best options during the next funding year
  • To prepare for integration with CMU tools, parsed 100 doctor/patient dialogues supplied by CMU, and sent the results to the CMU team for further analysis.

[163 words]

Expanded accomplishments

[Describe in greater detail the progress achieved during the current reporting period and include the significance of data/results. You are encouraged to include graphs, charts, and photos. No word limit.]


One of the key components in language-enhanced educational applications is a language interpretation module. In an automated tutorial dialogue system, this module is responsible for analysing what the student says so as to allow the domain reasoner to plan an appropriate response. In computer-supported collaborative learning, the language interpreter is responsible for analyzing student essays and interactions between the students to determine the appropriate time for tutor intervention.

A language interpretation module needs to be:

  • wide-coverage -- able to handle all the words and sentence structures that occur in educational domains

  • robust -- able to deal with fragmentary, mispelled or ungrammatical input, or with words and idioms the system has not seen before

  • deep -- able to output representations which can interface straightforwardly with a symbolic reasoning system, or serve as informative features for a machine learning classifier.

The two key components of the interpretation system that determine its coverage, depth and robustness are the lexicon and the grammar. The lexicon constitutes a list of all the words (and idioms) known to the system. The grammar determines which sequences of words are grammatical English sentences, and how they can be interpreted (e.g. "a bulb is in a path" is a grammatical English sentence, whereas "a bulb path is in" isn't).

Language interpretation in a system may fail for several reasons:

  • lack of coverage -- the appropriate word is not defined in the lexicon

  • lack of robustness -- the grammar does not recognize the sentence as grammatical English.

Our focus has been to address those issues in two ways:

  • to compensate for lack of coverage in deep lexicons by extracting additional entries from semantic lexicons not previously used with deep parsers

  • to investigate the extent to which dependency parsers, which provide greater robustness, but less depth, can be used in the context of language interpretation for educational applications.

Extending coverage of deep lexicons

During the previous phase of the TFlex project, we harvested a lexicon of 2,700 verbs from the FrameNet semantically annotated corpus. The main issue that needed to be addressed in this process involves the fact that FrameNet annotation has not been defined with its use in deep parsing in mind. Because of this, the lexical entries extracted from the corpus tend to contain a lot of redundant information. These redundancies cause problems with accuracy and efficiency in language interpretation. First, the efficiency of the parser decreases, since it has to process a lot of additional unnecessary information in each entry. Second, the interpretation becomes less accurate. A lexicon entry for a single word usually has multiple subcategorizations, which correspond to different usage alternations, such as "the terminals are connected", "the bulb is connected to the battery". Multiple redundant subcategorizations may cause the parser to choose an incorrect interpretation among the many available options.

Having identified redundancy as the main issue, our goal is to discard the redundant word subcategorizations automatically, while not discarding valid subcategorizations necessary for interpretation.

As a first step in this process, we conducted a more detailed evaluation of the lexicon extraction method developed in the previous phase of the project. It focused on the "complement/modifier" distinction, a well known requirement of deep parsing. This dichotomy is not explicitly marked in the FrameNet annotations, and we developed an approach to approximate it from the "core argument" feature in the FrameNet ontology.

We discovered that by discarding the arguments marked as "non-core", we were able to compress the lexicon by 45%, resulting in a significant reduction of spurious usages. The compression accuracy was 85%; specifically, 13% of discarded usages were correct (false positives), while 9% of the retained usages should have been discarded (false negatives). While the accuracy is not perfect, we concluded that it is sufficient for out needs at present, because many of the discarded usages can be obtained from other sources (e.g. the VerbNet lexicon), and at present the low false negative rate is more important. The results of this evaluation were published as McConville and Dzikovska (2008b).

We then used the ontological information distributed with the FrameNet corpus to further compress our harvested lexicon. FrameNet uses a knowledge representation geared towards question answering and information retrieval tasks, but not ideally suited for deep parsing. Each concept in the lexicon is associated with a large number of semantic roles describing participants in a given action - for example, the concept underlying the verb "speak" will specify roles for "Speaker" and "Addressee". Across the verb senses annotated in FrameNet as a whole, there is a total of 440 distinct role labels. In contrast, lexicons for deep parsing generally utilize simpler representations, with a vocabulary of 10-50 role labels. The large number of distinct role labels in FrameNet results in many redundant entries in the harvested lexicon. The FrameNet ontology contains features, so called role inheritance and coreness sets, which are intended to simplify the representation for use in other applications. We investigated the use of these features to reduce the number of semantic role labels. We discovered that these features had not been annotated consistently across the corpus as a whole, but in cases where they had been annotated, we were able to reduce the lexicon size by 39%, with an estimated 90% accuracy. The results of this part of the project were published in McConville and Dzikovska (2008a).

After these operations were completed, we have a lexicon with 2,700 verb entries, each with an average of 3.4 subcategorizations, and are using it to extend the TRIPS parsing lexicon. The TRIPS grammar and lexicon constitute a deep robust parsing system originally developed at the University of Rochester (partly funded by ONR). It has been used in a variety of applications, including intellingent assistants, Command Post of the Future system (DARPA funded), and two tutorial dialogue systems.

The lexicon/grammar underlying the TRIPS system has been developed painstakingly by hand, over a number of years. One advantage of this mode of grammar development is that the resultant system has very high accuracy - if the parser returns a particular analysis for an input sentence, you can be reasonably sure that this analysis is not a spurious one. On the other hand, the major disadvantage of constructing lexicons and grammars by hand involves the problem of coverage - at the start of the TFlex project, the TRIPS lexicon contained just 400 verbs, less than one tenth of the total number in common use in contemporary English.

The main thrust of the Edinburgh TFlex effort has been to semi-automatically harvest verbs from existing lexical resources (first VerbNet and then FrameNet ), in order to deliver an order-of-magnitude improvement in the coverage of the TRIPS lexicon, whilst maintaining both the high levels of accuracy and the "deepness" of the semantic analyses. In this we have had considerable success, adding around 2,000 brand new verbs from VerbNet . The extraction method we developed for FrameNet produced an additional 1,500 new verb entries, plus a number of entries for existing verbs that describe usages not previously defined in the lexicon. We are currently in the process of merging these entries.

We have completed the first step of merging, by making a mapping between the syntactic information available from FrameNet (e.g. whether a given can take a clause as a complement, as "means" in "This means that the path is closed). The second part will involve aligning the TRIPS and FrameNet semantic ontologies. We developed a method to do that during the previous TFlex effort, which resulted in a tool for mapping between two semantic lexicons based on class representations (see Figure XXX). This stage requires manual intervention from trained linguists, since existing mapping methods result in XX accuracy (Crabbe et al, 2006). We have extended the tool to display additional information that can facilitate mappings, such as example sentences (which were not available from the lexicon we used previously). We are currently starting the process of manual alignment, after which the TRIPS lexicon extension will be complete.

Our previous research concentrated on adding new entries for verbs. We have now extended our methods to cover other entries, specifically adjectives and nouns. We have extracted an initial lexicon with XXX entries (only YYY are currently defined in TRIPS), and are planning to evaluate whether methods for reducing redundancy developed for verb entries will perform sufficiently well for noun and adjective entries, after which the mapping process can be straightforwardly applied to further extend the lexicon.

Improving robustness in interpretation

The second thrust of our research involves investigating the extent to which one recent development in language technology, dependency parsers, can be used to improve the robustness of a language interpretation module. Dependency parsers aim to output a syntactic representation which supports reasoning tasks, and they are now extensively used in information retrieval and biomedical text mining. They provide a higher level of robustness than deep parsers, at the expense of some depth of processing. We are investigating the potential for using these "off-the-shelf", wide-coverage dependency parsers for use in educational applications, for example for interpreting student input to tutorial dialogue systems, or in essay grading (which requires longer texts for which deep parsers developed for dialogue are not well suited).

Our work to date has involved analysing the output of various dependency parsers, and comparing them with output of the TRIPS parser, as to how successful they are at interpreting student input and connecting to a symbolic reasoning system. We started by drafting a list of desiderata for a system of 'deep' grammatical relations, which are as close to the level of semantic structure as it is possible to go, while still remaining straightforwardly computable by a parser. We then decided to use this proposal as the basis of our participation in the shared task at the Cross-Domain and Cross-Framework Parser Evaluation workshop at COLING. The aim of this task was to compare and contrast different systems of grammatical dependency annotation based on various practical tasks, in our case tutorial dialogue. We concluded that no one system provided for all of our desiderata, but every feature we desired was covered by at least one of the competing systems. The results of this investigation were published as McConville and Dzikovska (2008c).

This conclusion opens future new avenues of research, in particular in combining output from different systems to collect the range of features required for interpretation. To prepare for that, we created a gold-standard evaluation corpus for dependency parsers based on our deep grammatical dependency system and the dialogues between an automatic tutorial system and human learners collected as part of the BEETLE project. Next, we conducted a preliminary evaluation of one of the available off-the-shelf dependency parsers based on how good it was at parsing this evaluation corpus, with initial results showing around 65% recall.

While the current dependency parser performance is not perfect, harnessing this technology would open new possibilities for improving computer-based training due to enhanced robustness they offer. Thus we are planning to investigate the best approaches to improving the accuracy of dependency parsing in tutorial dialogue, and integrating the results into a tutorial dialogue system, as the next step during the funding year 2008-2009.

Integration with CMU team

One aim of the joint work we are undertaking with the CMU team is to investigate to what extent the 'deep' output representation of the TRIPS parser can improve the kinds of classification technologies they are working on. As a preliminary step in this direction, we parsed one of the dialogues collected by the CMU team using TRIPS, and sent them a brief report listing the features inherent in the TRIPS output which we believe will be of use to them.


  • Further developing the methods for extending lexicon coverage
    • Evaluating the methods for removing redundant noun and adjective subcategorizations from the lexicon entries we extracted from FrameNet , and merging the harvested entries into the TRIPS lexicon and ontology used with the Beetle2 tutorial dialogue system
    • Developing tools for harvesting domain-specific terminology and integrating them into the lexicon. Any dialogue in an educational setting requires the use of domain-specific terminology. For many domains such as thermodynamics or calculus the appropriate terms are not likely to be found in a general-purpose lexicon. We will develop tools to support adding domain terminology to the lexicon based either on domain-specific dictionaries (if available), or on discovering domain terms in corpora. This will support porting the applications to new domains.
  • Integration with a tutorial dialogue system
    • Aligning the TRIPS and FrameNet ontologies, as discussed in expanded accomplishments. We will merge the verbal lexicon harvested from FrameNet during the ongoing effort with the TRIPS lexicon and ontology, using our merging tool. We will test the usability of the tool, and improve it as necessary. We will then evaluate the efficiency of this approach, based on its precision and recall, and the number of usable entries added.
    • Evaluating the extended lexicon with human-computer interaction data. We expect that the result of this effort will be an improved interpreter for the Beetle2 tutorial system. To verify that, we will compare the performance of the original TRIPS parser to the TRIPS parser with the improved lexicon, when used with human-human tutoring data collected as part of Beetle2 project.
  • Improving system robustness via dependency parsers, as discussed in the expanded accomplishments section
    • Developing a method to combine dependency parser output with the lexical entries we extracted, to obtain semantic representations. We will develop a set of mappings between our lexical entries and at least one other wide-coverage parser, and pilot-test the parser with our lexicons on Edinburgh and CMU data sets.
    • Improving accuracy of dependency parsing in our tutorial dialogue domain by combining output of different parsers

[345 words]

[Describe the objectives you intend to achieve and the approaches that will be taken during the next reporting period. Detail any changes from the original proposed work plan (500 words).]

Major problems/issues

The funds were recieved at the end of February 2008 (backdated to January 15th, 2008). This resulted in a 4 month delay compared to the start dates and deadlines specified in the original proposal (planned to start on October 1, 2007)

Technology transfer

The results of TRIPS parser improvement will directly benefit another ONR-funded project, "BEETLE: The Role of Student Input and Tutor Adaptation in Learning from Tutoring". We expect that improved lexicon coverage would result in more accurate language interpretation, and we will evaluate the results of our project with the human-computer interaction data collected in the BEETLE2 project.

Foreign collaborations and supported foreign nationals


  • Mark McConville and Myrosia O. Dzikovska (2008a). Using Inheritance and Coreness Sets to Improve a Wide-Coverage Verb Lexicon Harvested from FrameNet . Proceedings of the Second Linguistic Annotation Workshop (LAW'08), Marrakech, Morocco.
  • Mark McConville and Myrosia O. Dzikovska (2008b). Evaluating Complement-Modifier Distinctions in a Semantically Annotated Corpus. Proceedings of the Sixth Conference on Language Resources and Evaluation (LREC'08).
  • Mark McConville and Myrosia O. Dzikovska (2008c). Deep' Grammatical Relations for Semantic Interpretation. Proceedings of the Cross-Domain and Cross-Framework Parser Evaluation Workshop at the 22nd International Conference on Computational Linguistics (COLING'08).

-- MarkMcConville - 15 Jul 2008

Edit | Attach | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r16 - 08 Aug 2008 - 17:11:13 - MyrosiaDzikovska
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies