TWiki> TheBeast Web>UnsupervisedAdaptation (revision 3)EditAttach

Unsupervised Adaptation

Data

  • PTB Gold.
  • BIO unlabelled.

Requirements

  • Script to random add domain label to dependency labels. add_domains(G,X)
  • Script to fix domain dependency labels. fix_domains(G,X)
  • Script to filter incorrect system dependency compared to a gold standard. filter

Initial Procedure

  1. PTB -> add_domains(G,N) -> PTB-D
  2. PTB-D -> train parser
  3. BIO -> run parser -> BIO-D-noisy
  4. BIO-D-noisy -> fix_domains(G,B) -> BIO-D

This gives us the following data:

  • PTB-D: gold dependencies with randomly assigned domain labels.
  • BIO-D: system dependencies with a mix of system and randomly assigned domain labels.

Iterative Phase

  1. PTB-D + BIO-D -> train parser
  2. PTB + BIO -> run parser -> PTB-D-noisy + BIO-D-noisy
  3. PTB-D-noisy -> fix_domains(G, N) -> PTB-D
  4. PTB-D + PTB-Gold -> filter -> PTB-D
  5. BIO-D-noisy -> fix_domains(G, B) -> BIO-D
  6. If not converged: goto step 1.

Alternatively

  1. define a loss function l_d1(gold-structure, guessedgold, guess) for the domain d1 we have gold data (penalizes out of domain labels and wrong structure)
  2. define a loss function l_d2(guessedgold, guess) (only penalizes out of domain labels)

Preliminary Results

  General News Bio
PTB 112175 112223 0
Bio TB 35682 0 11568
Initial 147857 112223 11568
Iteration 1 noise 207354 63264 1030
Iteration 1 fixed 209201 58799 3648

Filter output for iteration 1:

Parses fixed: 2708
Domains added: 2708
Domains changed: 5244
Edit | Attach | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r3 - 19 Feb 2007 - 15:12:52 - JamesClarke
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies