Lin (1995)

Dekang Lin (1995). "A dependency-based method for evaluating broad-coverage parsers". Proceedings of IJCAI'95.

  • a method for evaluating parsers based on counting word-word dependencies with respect to a gold standard (rather than phrase boundaries)
  • three desiderata for a parser evaluation scheme:
    • ignoring inconsequential differences (e.g. adverbs as VP or S modifiers)
    • selective evaluation (e.g. how well does the parser handle conjunction)
    • facilitate diagnosis of incorrect parses (tell the parser developer what errors were made)
  • assumes dependency trees based on Melcuk's book -- rooted, unique parent, connected
  • presents an algorithm for converting phrase structure representations into dependency trees
  • error count -- the number of words which are assigned a different head in the parser output than in the gold standard (this is possible since every non-root word has exactly one head). This is a Hamming distance between the two representations -- the number of dependency relationships that must be altered to convert the parser output into the gold standard
  • a dependency-based parser evaluation method is "more relevant to semantic interpretation".

Lin (1998a)

Dekang Lin (1998). "Dependency-based evaluation of MINIPAR". LREC'98 Workshop on the Evaluation of Parsing Systems.

  • Minipar is a descendant of Principar (a hand-crafted GB parser), adopting some ideas from the Minimalist Program (bare phrase structure, economy rinciples)
  • precision/recall counts of dependencies are better than the error count, because of:
    • tokenisation differences (e.g. hyphens)
    • selective evaluation
    • the parser output may be fragmented
  • we evaluated Minipar (a hand-crafted parser, not statistically trained on a training corpus, attachment ambiguities are resolved structurally by 'minimal attachment' and 'right association') on the Suzanne corpus (64-text, 7,103 sentence, cross-domain subset of the Brown Corpus, annotated with parse trees and functional information)
  • precision = 89%, recall = 79%, words per second = 320 (Pentium II 300, 128M memory)
  • we did selective evaluation of particular dependencies: subject, complement, PP attachment, relative clause, and conjunction
  • we also did selective evaluation of particular words, e.g. attachment accuracy for individual prepositions.

Lin (1998b)

Dekang Lin (1998). "A dependency-based method for evaluating broad-coverage parsers". Natural Language Engineering

  • a journal-length version of Lin (1995)
  • discussion of Carroll and Charniak's (1992) definition of dependency grammars as headed CFGs (i.e. every rule has the form X' -> ...X...). One criticism is that this definition does not allow the following analysis to be derived:
What do you like?
(like,you)
(like,what)
(do,like)
But this IS possible in categorial grammar:
what        do                      you like
----------- ----------------------- --- ----------
Sa/(Sa/NP1) S2/NPc/(Sd\NPe/NPc)/NPe NP3 S4\NPf/NPg
            (2,d)                       (4,f)
                                        (4,g)
            -------------------------->
            S2/NPc/(Sd\NP3/NPc)
            (2,d)
            ------------------------------------->
            S2/NPg
            (2,4)
            (4,3)
            (4,g)
------------------------------------------------->
                 S2
                (2,4)
                (4,3)
                (4,1)

-- MarkMcConville - 25 Aug 2008

Topic revision: r1 - 25 Aug 2008 - 10:39:55 - MarkMcConville
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies