Entry tags:
Shavian and disambiguation
I mentioned earlier about an idea I had for automatic part of speech disambiguation based only on the part of speech of the preceding word. I also mentioned that I believe this would be a workable solution for disambiguating the pronunciation of most homonyms.
I would therefore like to create a distributable database which mapped conventional spellings of English words either to (part of speech, phonemic representation) pairs, or (in the case of ambiguous spellings) to a mapping from sets of parts of speech to such pairs; the part of speech of the previous word would be used in choosing the new one.
Sources of data would be:
I would therefore like to create a distributable database which mapped conventional spellings of English words either to (part of speech, phonemic representation) pairs, or (in the case of ambiguous spellings) to a mapping from sets of parts of speech to such pairs; the part of speech of the previous word would be used in choosing the new one.
Sources of data would be:
- the Shavian wiki, where possible (licence is cc-by)
- cmudict where the Shavian wiki wasn't possible (licence is BSD-like)
- the Brown tagger for the parts of speech (licence is MIT)
- what this database would be called
- how to evaluate it.
- assuming all words are nouns
- assuming all words which the Shavian wiki believes are ambiguous are nouns, and using the Brown tagger for the rest
- using the POS-of-the-previous-word method outlined above
- using the Brown tagger