Sometimes I hear people saying that they believe morality to be designed by God, and so they can't understand how atheists and agnostics can have an understanding of morality. This is not an argument I can easily get my head around. I mean, if we talk about languages for a moment, there's still no consensus on how humans as a whole started to speak. But it's still pretty obvious that individual humans learn language as they grow up from the people around them, that language exists by consensus, and that there are certain necessary features for language to be language. I don't see Esperantists going around telling everyone that they can't understand how we can speak English if we don't know who started Proto-Indo-European.

ETA:  Then again, if the Esperantists did do that, I probably wouldn't understand too well anyway.

After she had had trouble spelling a word, Rio suddenly said:

"Great Vowel Shift? Huh. I don't see anything great about it. It just seems like a nuisance."

(Further conversations showed that she was aware that "great" in this sense means important, and was making a conscious pun.)
I won't be doing this any time in the near future, because I'm rather too busy, but meanwhile it's interesting to think about:

I would like to make a wiki where you could set up a basic Latin lexicon. There would be three parts: firstly, you would list each lexeme on its own page, like this:


You could populate this automatically from Lewis and Short (which is in the public domain, and Perseus has it available for download).

Secondly, you would represent the morphology on another page as a list of rules:

{{rule|%ārum|noun genitive plural|%a|noun nominative singular}}

"If you see a word ending ...ārum, and the same word with ...a is in the dictionary as a nominative singular, then assume it's the genitive plural of that same word." (Perhaps the syntax would be different; I'm thinking aloud here.)

Thirdly, and this would be the especially fun part, I would make it transfer all this content automatically to indexed form in a database. Then I'd make a front screen where you could paste some Latin text such as this, and have some JavaScript (à la ckeditor) which automatically highlighted the parts of speech and cases for you, and wrote in a gloss. Then if you were writing Latin, it would show you whether you'd said what you meant to say, and if you were reading Latin, it would give you a useful visual aid to checking that your understanding was correct.

It would also be possible to use this as a filter on sites such as the Latin Wikipedia and, rather as the BBC does for Welsh.

Anyway, probably not going to do that for a few months, but it would be an interesting experiment.
One of the difficulties inherent in automated transliteration is that of homonyms: words which are pronounced differently but spelt the same in the Latin alphabet.
  • I live near a live wire.
  • I like to read. I read a book yesterday. I will read one tomorrow.
  • I advocate happiness to the advocate.
  • He does love to play with does.
  • Please lead me to the box of lead.
  • It used to be used for that. Now it is used for this.
  • I never knew a number number.
On the Shavian wiki we've solved this problem manually, but it's a bit of a pain. With things like the Shavian Firefox extension, it's just been necessary so far to pick one randomly.

The other day I was on a plane and got to thinking. In most of these cases, the two words have different parts of speech: for example, does (he does) is a verb, does (more than one doe) is a noun. What if we could do part-of-speech tagging? But that's rather a complicated field. How simple can we make a part-of-speech tagger?

So I took the lexicon from the Shavian wiki and added the default part-of-speech tags from the Brown tagger to each word. Then rather than marking the homonyms with the part of speech they represented, I marked them with the part of speech which should precede them. That is, rather than having "does" choose between "N" and "V", it chooses between "D/J/I" (for N) and "N/P/V" (for V).

This works surprisingly well-- it even handles the "read" case as well as can be expected-- and I think it's simple and effective enough to use whenever I get around to updating the Firefox extension, and possibly elsewhere.  It handles all the most common cases; other than occasional misclassifications, the only major failing I can find is that it cannot distinguish nouns and adjectives.  There are only a few cases where an adjective and a noun are known to the system to have a different pronunciation:
  • agape
  • arithmetic
  • content
  • invalid
  • minute
  • number
  • pasty
I think in all these cases there's one option which is far more common than the other, and so we can get away with choosing that one always.  ("Content" is probably the least unbalanced; I think the noun form still has the edge.)


