What do you propose? To keep those as two separate entries, or rather as one, but with an alternative translation added?Roman wrote:The dôl/dol matter is intentional - there seems to have been a change of conception regarding the etymology of the word. The former comes from ndolo and has the plural duil, the latter from *ndoll- (hence the alternate doll) and probably has the plural *dyl.
Überarbeitung der Wortliste
Moderator: Moderatoren
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
Separate entries. The alternate words are different (mostly earlier) forms within the internal history of Sindarin. Different forms within the external history are always separate entries, as are differently derived forms.What do you propose? To keep those as two separate entries, or rather as one, but with an alternative translation added?
Pluralization o > ui seems to work only with nouns formerly ending in -o without a consonant cluster, as thono > thôn, thuin 'pine' (PE17:81) or *tholo > thôl, thuil 'helm' (Q. kastolo) (PE17:188). The altered dol(l) probably cannot have the plural *duil anymore, but rather *dyl(l), as tol(l), tyll 'island'. So merging the entries wouldn't work; or would be very, very misleading.
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
Hm, this is giving me more of a headache than I hoped - especially those collation issues in MySQL are very annoying. Never have that at work, where we just put everything in utf8. Maybe Oracle is a bit smarter with these things, too.
Another thing that came up is this: I wanted to make a list of all the words that do not have an English translation yet, ie. the PE17 words, etc.
Therefore I thought to run a comparison between your (Aran's) dataset and Didier's. That works well enough, but just from curiosity i also did the reverse: find the words that do occur somewhere in Didier's set, but not in your set.
I put the results in an XLS file. It's a bit complicated, because I wanted to make sure to not miss anything. Therefore I matched every Sindarin word (regular, alternative, and plural forms) to either the primary entry column from Didier's set, and the "derived forms" column. On the second sheet, there is a single column compilation of those words - the first sheet has a large number of multiple rows because I included all the data that could be interesting.
Eryniel suggested to me that these words could be the Noldorin entries - that your list contains only Sindarin forms. Is that indeed so?
Or are there also deprecated entries in there?
If you could have a look at it, that'd be great.
I uploaded the excel sheet here: didier_words_not_found_in_aran_list.xls
Another thing that came up is this: I wanted to make a list of all the words that do not have an English translation yet, ie. the PE17 words, etc.
Therefore I thought to run a comparison between your (Aran's) dataset and Didier's. That works well enough, but just from curiosity i also did the reverse: find the words that do occur somewhere in Didier's set, but not in your set.
I put the results in an XLS file. It's a bit complicated, because I wanted to make sure to not miss anything. Therefore I matched every Sindarin word (regular, alternative, and plural forms) to either the primary entry column from Didier's set, and the "derived forms" column. On the second sheet, there is a single column compilation of those words - the first sheet has a large number of multiple rows because I included all the data that could be interesting.
Eryniel suggested to me that these words could be the Noldorin entries - that your list contains only Sindarin forms. Is that indeed so?
Or are there also deprecated entries in there?
If you could have a look at it, that'd be great.
I uploaded the excel sheet here: didier_words_not_found_in_aran_list.xls
What? Noo... There are several reasons for mismatches:Eryniel suggested to me that these words could be the Noldorin entries - that your list contains only Sindarin forms. Is that indeed so?
- overregularization in the Hiswelóke lists; I have often gone back to Tolkien's own spelling and glosses
- inflected verb forms like imperatives are not listed in the German wordlists
- some verbs are given by Tolkien only via their infinitives rather than roots, so we cannot be sure which class they belong to
- some words seem to be Ilkorin rather than Noldorin/Sindarin
- I've added a new regularization X/NT affecting engui and cannui
- some words are really missing in the German wordlists
I've added some comments to the left column (http://www.sindarin.de/didier_words_not ... n_list.xls). There are too many words to elaborate on each one of them, unless you have specific questions:
adlann aclod
adlanna- atlanna-
adlant atlant
aglonn aglon (aglond)
an- error, see na (Ety/374)
andrann anrand
anno inflected
aphad- aphada-
apharch missing
athan athar
avorn born (SD/129-31)
braig **brêg, but should be breig, braig
caenen interpolated, could be added
caenui reconstructed?
camlann camland
canthui cannui (normalized)
caraes interpolated, could be added
caro inflected
carth interpolated, could be added
círbann cirban
cuinar inflected
dambeth too uncertain
daro inflected
dem Ilkorin
díheno inflected
draf- drava-
drego inflected
edledhia- egledhia-
edledhron egledhron
edlothiad too uncertain, reconstructed *edlothia-
edro inflected
eglerio inflected
egol just not necessary
enchui engui (normalized)
erchammon erchamon
erchammui erchamui
erin inflected
fing missing
gaw- gawa-
gerin inflected
gladh- gladha-
godref way too uncertain
gohena- missing
govad- govan-
ha Noldorin pronoun
haf- hav-
hâl missing
hawn haun
he Noldorin pronoun
hebin inflected
hent missing
ho Noldorin pronoun
hwind *hwinn (chwind)
idhor idher
idhrinn idhrin (dhrind)
iôl missing
lachenn lachend
lacho inflected
lasto inflected
laws laus
lefnar lefnor
linnathon inflected
linnod missing
linnon inflected
luithiad uncertain
medli megli
medlin meglin
menniath missing
minno inflected
mistad mistrad + note
na- way, way too uncertain
naglath missing
nagol missing
nallon inflected
narch missing
nawb naub
neled missing
neleg missing
nerthui missing
nornwaith missing
nothlir missing
nuin inflected
nÿw nyf
othlonn othlon (othlond)
othrad ostrad
othronn othrond
pedo inflected
penneth missing
pihen pichen
revia- renia-
rhosg rosc
sen hen
send too uncertain
suilannad missing
suith sûth
tadeg deleted by Tolkien
talagan talagand
talu dalu (dalw)
tangada- tangad-
ten den
then part of a word
thórod Ilkorin
tiro inflected
tolo inflected
toniel wrongly interpolated
tuilinn tuilin (tuilind)
uin inflected
ulunn ulun (ulund)
Thanks for bringing this up. How should we proceed? Should I check it all (including the second column) and update the lists once more for you to download? I'm sure you already have a script to break them up into normalized lists which you can run again.
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
I changed the datamodel a little bit because I made a mistake: the "plural" and "alt" markers were on the TRANSLATION table, so that they would have been properties of a combination of a Sindarin and a German entry -> a 'translation' row.
This wasn't right, because plural and alternative forms don't belong to a translation but to a Sindarin entry.
Now, the ENTRY is recursive: every entry can have a parent entry, and thus plurals and alternative forms belong to the main Sindarin entry. The order of these forms is maintained by an index marker, and every row can have a 'reconstructed' marker set.
This is the new model:
This wasn't right, because plural and alternative forms don't belong to a translation but to a Sindarin entry.
Now, the ENTRY is recursive: every entry can have a parent entry, and thus plurals and alternative forms belong to the main Sindarin entry. The order of these forms is maintained by an index marker, and every row can have a 'reconstructed' marker set.
This is the new model:
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
hi,
thanks for the comments! That's a great help.
The functioning of the application does not depend on whether or not the data are complete, after all! As long as I have a working model I can move on.
Maybe it is easier for you to compare things if I would give you both the current dataset (based on your list) and Didier's set, in, for instance, a SQLite datafile or a MySQL dump? Just tell me what format works best.
If you could then cross-check those two sets so that we'd have that done by the time that there is a working application ready to test by others, that would be great
I don't know how long that will take me. It's definitely not a very complex thing, but there are a few thorny issues indeed concerning issues like collation, etc.
It shouldn't take longer than a few weeks at most if things don't get too hectic at work
EDIT - I'm not sure how familiar you are with SQL? I'm asking because the model with Didier's data is rather convoluted (see the diagram on the previous page, and that's not even all I think). I could also create a large view that essentially merges that whole thing into one large table, not unlike the Hisweloke HTML pages on Didier's website. I made something like that already for that first application, so that's not a big deal to write out.
That'd make it a lot easier to compare the Hisweloke set to your set, maybe.
PS - Eryniel is also looking at this list of non-matching entries.
thanks for the comments! That's a great help.
We can always add or edit entries afterwards. Maybe the best thing to do is that I now finish the whole dataset with the data as I have them at the moment as soon as possible, give that back to you to look at, and meanwhile start on rebuilding the application based on the datamodel as it is now.Thanks for bringing this up. How should we proceed? Should I check it all (including the second column) and update the lists once more for you to download? I'm sure you already have a script to break them up into normalized lists which you can run again.
The functioning of the application does not depend on whether or not the data are complete, after all! As long as I have a working model I can move on.
Maybe it is easier for you to compare things if I would give you both the current dataset (based on your list) and Didier's set, in, for instance, a SQLite datafile or a MySQL dump? Just tell me what format works best.
If you could then cross-check those two sets so that we'd have that done by the time that there is a working application ready to test by others, that would be great
I don't know how long that will take me. It's definitely not a very complex thing, but there are a few thorny issues indeed concerning issues like collation, etc.
It shouldn't take longer than a few weeks at most if things don't get too hectic at work
EDIT - I'm not sure how familiar you are with SQL? I'm asking because the model with Didier's data is rather convoluted (see the diagram on the previous page, and that's not even all I think). I could also create a large view that essentially merges that whole thing into one large table, not unlike the Hisweloke HTML pages on Didier's website. I made something like that already for that first application, so that's not a big deal to write out.
That'd make it a lot easier to compare the Hisweloke set to your set, maybe.
PS - Eryniel is also looking at this list of non-matching entries.
That irritated me, I admit. And isn't just about everything dependent on the Sindarin entry - scmut, pronunciation, url, comment, ref and type?I changed the datamodel a little bit because I made a mistake: the "plural" and "alt" markers were on the TRANSLATION table, so that they would have been properties of a combination of a Sindarin and a German entry -> a 'translation' row.
This wasn't right, because plural and alternative forms don't belong to a translation but to a Sindarin entry.
That sounds... intriguing. I certainly had my fun fiddling with the php script to sort in the alternative words into the main list correctly in all the combinations of attested and unattested words.. I hope recursive entries are easy to handle...Now, the ENTRY is recursive: every entry can have a parent entry, and thus plurals and alternative forms belong to the main Sindarin entry. The order of these forms is maintained by an index marker, and every row can have a 'reconstructed' marker set.
Something I should mention here: I limited the amount of alternative forms to one word only for simplicity, but there can actually be up to three of them for a Sindarin entry.
Division of labour - I like that.We can always add or edit entries afterwards. Maybe the best thing to do is that I now finish the whole dataset with the data as I have them at the moment as soon as possible, give that back to you to look at, and meanwhile start on rebuilding the application based on the datamodel as it is now.
The functioning of the application does not depend on whether or not the data are complete, after all! As long as I have a working model I can move on.
I didn't even know what SQL was before the beginning my work on the wordlists around June.Maybe it is easier for you to compare things if I would give you both the current dataset (based on your list) and Didier's set, in, for instance, a SQLite datafile or a MySQL dump? Just tell me what format works best.
[...]
EDIT - I'm not sure how familiar you are with SQL? I'm asking because the model with Didier's data is rather convoluted (see the diagram on the previous page, and that's not even all I think).
If it's just for comparing and checking the mismatches, then what you gave is enough. But if you have a definitive strucutre of the tables you'll use in the application, then send them to me.
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
to answer your question about self-referencing the ENTRY table, here's an example how it works. If you execute this query:
select es.GLOSS sindarin, 'pl.', ep1.GLOSS, ep2.GLOSS, ep3.GLOSS, ep4.GLOSS,
' alt.:', ea1.GLOSS, ea2.GLOSS
from ENTRY es
left join ENTRY ep1
on es.ID = ep1.PARENT_ID and ep1.PLURAL = 1
left join ENTRY ep2
on es.ID = ep2.PARENT_ID and ep2.PLURAL = 2
left join ENTRY ep3
on es.ID = ep3.PARENT_ID and ep3.PLURAL = 3
left join ENTRY ep4
on es.ID = ep4.PARENT_ID and ep4.PLURAL = 4
left join ENTRY ea1
on es.ID = ea1.PARENT_ID and ea1.ALT = 1
left join ENTRY ea2
on es.ID = ea2.PARENT_ID and ea2.ALT = 2
where es.id < 9999
you'll get this result:
a
ab-
ablad
abonnen pl. ebennin ebœnnin
ach
achad
achar
achar-
achared
acharn
achas
aclod alt. atlaud
ad-
ada
adab pl. edaib edeb
adan pl. edain
adanadar pl. edenedair
<etc>
Here I edited out the NULL values and the empty 'pl' and 'alt' tags. Of course that will be done by the software eventually. It's also a little crude because the columns are here hardcoded; the program will look how many plurals and/or alt entries there are and fetch them if necessary. But it's just to demonstrate the idea
full list is here: http://parendili.org/doc/example.txt
select es.GLOSS sindarin, 'pl.', ep1.GLOSS, ep2.GLOSS, ep3.GLOSS, ep4.GLOSS,
' alt.:', ea1.GLOSS, ea2.GLOSS
from ENTRY es
left join ENTRY ep1
on es.ID = ep1.PARENT_ID and ep1.PLURAL = 1
left join ENTRY ep2
on es.ID = ep2.PARENT_ID and ep2.PLURAL = 2
left join ENTRY ep3
on es.ID = ep3.PARENT_ID and ep3.PLURAL = 3
left join ENTRY ep4
on es.ID = ep4.PARENT_ID and ep4.PLURAL = 4
left join ENTRY ea1
on es.ID = ea1.PARENT_ID and ea1.ALT = 1
left join ENTRY ea2
on es.ID = ea2.PARENT_ID and ea2.ALT = 2
where es.id < 9999
you'll get this result:
a
ab-
ablad
abonnen pl. ebennin ebœnnin
ach
achad
achar
achar-
achared
acharn
achas
aclod alt. atlaud
ad-
ada
adab pl. edaib edeb
adan pl. edain
adanadar pl. edenedair
<etc>
Here I edited out the NULL values and the empty 'pl' and 'alt' tags. Of course that will be done by the software eventually. It's also a little crude because the columns are here hardcoded; the program will look how many plurals and/or alt entries there are and fetch them if necessary. But it's just to demonstrate the idea
full list is here: http://parendili.org/doc/example.txt
What about sorting the alternative words into the main list? For example, you should be able to search for atlaud or agr, find them in the list and be referred to aclod and agor.
Btw, I have just found out that Sindarin has an ISO 639 code assigned to it, which is sjn ("sin" and "snd" were already taken), see this list. So if it appears in the dictionary, one should use sjn beside eng, ger/deu, fra. I've also changed it on the site.
Btw, I have just found out that Sindarin has an ISO 639 code assigned to it, which is sjn ("sin" and "snd" were already taken), see this list. So if it appears in the dictionary, one should use sjn beside eng, ger/deu, fra. I've also changed it on the site.
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
Sure, that's no problem at all. Because the structural information is contained within parent_id -> id relationships, you can represent it in any way that you want. It's all in the sort of query that is used, how it is presented, sorted, etc.Roman wrote:What about sorting the alternative words into the main list? For example, you should be able to search for atlaud or agr, find them in the list and be referred to aclod and agor.
I think that Eryniel put a topic for "feature requests" on Parendili. Could you maybe write it down in there?
That's interesting! I remember from the previous Omentielva that someone was trying to get the Tengwar included in the Unicode set. I have no idea what became of that.Roman wrote:Btw, I have just found out that Sindarin has an ISO 639 code assigned to it, which is sjn ("sin" and "snd" were already taken), see this list. So if it appears in the dictionary, one should use sjn beside eng, ger/deu, fra. I've also changed it on the site.
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
Hi,
I did now run into something that I am uncertain about. It's this: since I have now transfered the attributes to the Sindarin entries, I am now assigning the values from your wortliste table to the Sindarin entries.
I did get an unequal number of rows, which turns out to be cause by the fact that I had added only *one* instance of every Sindarin word that occurs in your list to the Entry table.
The more than one translations are modeled by the relationship via the Translation entity.
But the available German translations do not account for all occurences of Sindarin entries in your table, it seems.
If I take, for instance, Sindarin _ad-_ - this is how it occurs in your 'wortliste' table:
Obviously, this is not a case of one Sindarin entry with more than one translation, but these are two separate entries: one reconstructed, one attested.
In the first migration, I have not taken this into account, so I need to change that.
Can you tell me how I can distinguish true separate Sindarin entries from the ones where an entry occurs more than once because there is more than one translation? In this case, it's the 'srek' attribute that I could use, but is this always the case?
I did now run into something that I am uncertain about. It's this: since I have now transfered the attributes to the Sindarin entries, I am now assigning the values from your wortliste table to the Sindarin entries.
I did get an unequal number of rows, which turns out to be cause by the fact that I had added only *one* instance of every Sindarin word that occurs in your list to the Entry table.
The more than one translations are modeled by the relationship via the Translation entity.
But the available German translations do not account for all occurences of Sindarin entries in your table, it seems.
If I take, for instance, Sindarin _ad-_ - this is how it occurs in your 'wortliste' table:
Obviously, this is not a case of one Sindarin entry with more than one translation, but these are two separate entries: one reconstructed, one attested.
In the first migration, I have not taken this into account, so I need to change that.
Can you tell me how I can distinguish true separate Sindarin entries from the ones where an entry occurs more than once because there is more than one translation? In this case, it's the 'srek' attribute that I could use, but is this always the case?
Well, I don't see it as an optional 'feature', it's just a basic thing the dictionary has to provide.I think that Eryniel put a topic for "feature requests" on Parendili. Could you maybe write it down in there?
The state of affairs from this year's Omentielva is apparently that he [Michael Everson] is still trying.That's interesting! I remember from the previous Omentielva that someone was trying to get the Tengwar included in the Unicode set. I have no idea what became of that.
If it's the same entry, then the alternate form and the reference should match.Can you tell me how I can distinguish true separate Sindarin entries from the ones where an entry occurs more than once because there is more than one translation? In this case, it's the 'srek' attribute that I could use, but is this always the case?
I should explain how the entries are constructed. Tolkien's translations have often variations, so one has to decide whether they're the same word or not. For example, we find:
paran 'naked, bare' PE/17:86
paran 'bald, bare' PE/17:171
paran 'smooth, shaven' RC/433
I think these three are reasonably close, so they get slammed together into one entry:
paran 'naked, bare, bald, smooth, shaven' PE/17:86,171, RC/433.
But then we also encounter something like this:
ogol 'gloom(y)' PE18:88
ogol 'bad, evil, wrong PE17:170, VT/48:32
ogol untranslated PE/17:149
'Gloomy' and 'evil' are not quite the same thing, and the untranslated gloss could be either, so there should be two entries:
ogol 'gloom(y)' PE18:88, PE/17:149
ogol 'bad, evil, wrong PE17:170, VT/48:32, PE/17:149
This is all about external homophones so far. The procedure runs into problems when there are internal homophones appearing on the same page:
pann (*pand) 'courtyard' Ety/380
pann 'wide' Ety/380
These can be distinguished by the alternate form (indicating a different etymology). This may also fail, however:
lorn 'asleep' VT45:29
lorn 'quiet water, anchorage, haven, harbour' VT45:29
Here I had to alter the reference manually by including the etymology:
lorn 'asleep' VT45:29, LOR-
lorn 'quiet water, anchorage, haven, harbour' VT45:29, LUR-
_____________________________________
Btw, I have a question to all the others regarding listing inflected forms in the dictionary: How would you like to have it? I don't think it's sensible to have pedo 'speak!, say!' or cuinar 'they live' as headwords. If anything, one could make them subordinate to the entries ped- and cuina- in a separate column for attested usage.
It should be very helpful, however, to include the past tenses for all verbs - reconstructed, if needed - as they aren't trivial in Sindarin, and also indicative of a verb's etymology - which does belong into a helpful dictionary. Luckily, one can conveniently use the noun plural slot for that.
- Eryniel Elmíris
- Posts: 1533
- Joined: Tue Jun 24 2008 19:51
- Location: Ribobel / Rímdor
Hey, Luthien asked me to look at that list as well. I have run across two things, also with regards to your answer, that I think need clarification.
1) Since according to Lúthien the script can differenciate between all sorts of things, but not capitalization, I think the entries for e.g. ardhon (region) / Ardhon (world) might be a problem (unless there is something else to differentiate in your wordlist that I cannot see...
2) The entries for delu/delw show 3 entries. *delu (delw) adj gefährlich / delu (delw) adj hasserfüllt. / *delu (delw) adj tödlich (Ety/355); adj dick (PE/17:17). Maybe I am nitpicking, but shouldn't "dick" and "tödlich" have different entries?
As for inflected forms and past tenses, I am all for it.
Attested usage sounds good, since I would never think of actually searching for an inflected form. And for past tense I can only say: HURRAY!
1) Since according to Lúthien the script can differenciate between all sorts of things, but not capitalization, I think the entries for e.g. ardhon (region) / Ardhon (world) might be a problem (unless there is something else to differentiate in your wordlist that I cannot see...
2) The entries for delu/delw show 3 entries. *delu (delw) adj gefährlich / delu (delw) adj hasserfüllt. / *delu (delw) adj tödlich (Ety/355); adj dick (PE/17:17). Maybe I am nitpicking, but shouldn't "dick" and "tödlich" have different entries?
As for inflected forms and past tenses, I am all for it.
Attested usage sounds good, since I would never think of actually searching for an inflected form. And for past tense I can only say: HURRAY!
- Lúthien Meliel
- Posts: 79
- Joined: Thu Jun 26 2008 21:58
Roman wrote:Well, I don't see it as an optional 'feature', it's just a basic thing the dictionary has to provide.I think that Eryniel put a topic for "feature requests" on Parendili. Could you maybe write it down in there?
ok, I'll rename that topic to Mandatory & Optional Requirements then
- but seriously: it's good to also mention whatever you might think is obvious. It might not be altogether obvious to me, because I don't know that much about the linguistic technicalities.
And also, I know that I tend to overlook things if I don't keep them where I can't miss them.
Of course I can put this requirement in there myself (and I will) - but just if you think of something else that I shouldn't forget, please mention it in that topic.
Re. your explanation: thanks! I'll look into that asap.