Nargery: How the triolet builder works
Apr. 2nd, 2011 01:53 pmIn case you're interested, here's how the triolet builder works.
We start by taking a set of words. Then we remove all the words which don't alternate between stressed and unstressed syllables. Next, we assign each of them a stress number whose absolute value is the number of syllables, and which is positive if the word begins on a stressed syllable and negative if it begins on an unstressed syllable. The stress numbers -1 and 0 are not allowed. Then we also assign each one a rhyme number, such that words with the same rhyme number rhyme. Here is the lexicon with stress numbers and rhyme numbers.
Next, we assign each word its most likely part of speech, as given in the Brown corpus tagset. If the word doesn't exist in that tagset, we throw it out. We use only the first character, so for example "NNS" is represented as "N". The part of speech gets concatenated with the stress number to form a "tag". For example, "N+3" is a three-syllable noun which begins on a stressed syllable, such as "cauliflower".
Then we make up a battery of phrases, each consisting of terminals (such as "if"), and nonterminals (such as "N" for noun). A script is then run over the battery to create all possible tags for the nonterminals which would fit into iambic tetrameter.
For example, this line, where R is a pronoun, A is an article, J is an adjective and N is a noun,
R are A J N
produces these arrangements. Note that R and A are constrained to be no longer than +1, and that A+1 is special-cased so that it cannot fall on a stressed syllable. Note also that we provide for both masculine (here, 8-syllable) and feminine (9-syllable) lines.
R+1 are A+1 J+1 N-4
R+1 are A+1 J+1 N-5
R+1 are A+1 J+2 N+3
R+1 are A+1 J+2 N+4
R+1 are A+1 J+3 N-2
R+1 are A+1 J+3 N-3
R+1 are A+1 J+4 N+1
R+1 are A+1 J+4 N+2
At runtime, we pick a random word from the lexicon, and select a list of words which rhyme with that word. Then for each word in the list, we note its tag, and find an arrangement that ends with that tag. We then proceed by matching nonterminals as appropriate. If after all this we find we don't have enough entries, we throw it all out and start again.
The back end is only polled twice, for the first and second lines. The rhymes for each are stored at the time. This is why all other lines will eventually loop around (as noticed by Sally and Simon).
All the lines we produce are logged and kept for a week. The front end can tell the back end to turn a triolet into a permanent post, but it can only do so for lines which exist in the log.
We start by taking a set of words. Then we remove all the words which don't alternate between stressed and unstressed syllables. Next, we assign each of them a stress number whose absolute value is the number of syllables, and which is positive if the word begins on a stressed syllable and negative if it begins on an unstressed syllable. The stress numbers -1 and 0 are not allowed. Then we also assign each one a rhyme number, such that words with the same rhyme number rhyme. Here is the lexicon with stress numbers and rhyme numbers.
Next, we assign each word its most likely part of speech, as given in the Brown corpus tagset. If the word doesn't exist in that tagset, we throw it out. We use only the first character, so for example "NNS" is represented as "N". The part of speech gets concatenated with the stress number to form a "tag". For example, "N+3" is a three-syllable noun which begins on a stressed syllable, such as "cauliflower".
Then we make up a battery of phrases, each consisting of terminals (such as "if"), and nonterminals (such as "N" for noun). A script is then run over the battery to create all possible tags for the nonterminals which would fit into iambic tetrameter.
For example, this line, where R is a pronoun, A is an article, J is an adjective and N is a noun,
R are A J N
produces these arrangements. Note that R and A are constrained to be no longer than +1, and that A+1 is special-cased so that it cannot fall on a stressed syllable. Note also that we provide for both masculine (here, 8-syllable) and feminine (9-syllable) lines.
R+1 are A+1 J+1 N-4
R+1 are A+1 J+1 N-5
R+1 are A+1 J+2 N+3
R+1 are A+1 J+2 N+4
R+1 are A+1 J+3 N-2
R+1 are A+1 J+3 N-3
R+1 are A+1 J+4 N+1
R+1 are A+1 J+4 N+2
At runtime, we pick a random word from the lexicon, and select a list of words which rhyme with that word. Then for each word in the list, we note its tag, and find an arrangement that ends with that tag. We then proceed by matching nonterminals as appropriate. If after all this we find we don't have enough entries, we throw it all out and start again.
The back end is only polled twice, for the first and second lines. The rhymes for each are stored at the time. This is why all other lines will eventually loop around (as noticed by Sally and Simon).
All the lines we produce are logged and kept for a week. The front end can tell the back end to turn a triolet into a permanent post, but it can only do so for lines which exist in the log.