Posted: Wed Sep 21 2011 14:34
OK, I understand what you mean - but as far as I can see, the results of this "distinguishing entries by meaning" is not available in your dataset., but they ARE available on the website. Where did you store the information how to group the entries like you did?Roman wrote:I should explain how the entries are constructed. Tolkien's translations have often variations, so one has to decide whether they're the same word or not. For example, we find:
paran 'naked, bare' PE/17:86
paran 'bald, bare' PE/17:171
paran 'smooth, shaven' RC/433
I think these three are reasonably close, so they get slammed together into one entry:
paran 'naked, bare, bald, smooth, shaven' PE/17:86,171, RC/433.
But then we also encounter something like this:
ogol 'gloom(y)' PE18:88
ogol 'bad, evil, wrong PE17:170, VT/48:32
ogol untranslated PE/17:149
'Gloomy' and 'evil' are not quite the same thing, and the untranslated gloss could be either, so there should be two entries:
ogol 'gloom(y)' PE18:88, PE/17:149
ogol 'bad, evil, wrong PE17:170, VT/48:32, PE/17:149
This is all about external homophones so far. The procedure runs into problems when there are internal homophones appearing on the same page:
pann (*pand) 'courtyard' Ety/380
pann 'wide' Ety/380
These can be distinguished by the alternate form (indicating a different etymology). This may also fail, however:
lorn 'asleep' VT45:29
lorn 'quiet water, anchorage, haven, harbour' VT45:29
Here I had to alter the reference manually by including the etymology:
lorn 'asleep' VT45:29, LOR-
lorn 'quiet water, anchorage, haven, harbour' VT45:29, LUR-