TWiki> TheBeast Web>MachineTranslation (revision 2)EditAttach

Machine Translation

Here goes everything we consider important for doing MT with The Beast.

Table of Contents

Atomic Types

With The Beast you need a fixed set of possible values for the attributes of objects/rows. These are internally mapped to integers for efficient processing and storage. Let's call these sets Atomic Types for now.

For Phrase-Based MT it would make sense to have the following types (written in TheBeast format):

CREATE TYPE SourceWord (Ich, mag, das, boot);
CREATE TYPE TargetWord (I, like, the, boat);
CREATE TYPE ID (0,1,2,3,4,5,6...);
CREATE TYPE Position (0,1,2,3,4);
CREATE TYPE TranslationPhrase ("I like","the boat", "I like the", "the boat", "the",...);

Note: The integer type variables will be defined like CREATE TYPE ID (1..10) soon.


I would propose the following schema for our variables. We would need a table to store a source sentence. A table of tokens with indices might be a good idea:

   position Position,
   word SourceWord);    

Then we need table for possible translation phrases (plus which phrase in the source they belong to)

   id ID,
   words TranslationPhrase,
   Position begin,
   Position end);

Finally we need a table that represents our results: pairs of (target) phrases that reflect what we translate and in what order. Each entry in this table says that in the translation the first phrase follows the second phrase of the pair.

   first ID,
   second ID);

I think this is all we need in terms of the data schema for now.

-- SebastianRiedel - 26 Oct 2006

Edit | Attach | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 27 Oct 2006 - 19:44:18 - SebastianRiedel
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies