
I checked the dictionary as it exists now and there it looks fine.
Moderator: Moderatoren
Googling around it seems that this was an issue with earlier SQL versions, but now choosing the right collation (order of characters for sorting) should take care of that.I think I remember we had to replace umlauts and ß, because SQL would somehow treat them as special characters and sort them at the end of the alphabet.
Mhm, that's an issue I haven't thought of. But the second column is only used for sorting, not for searching. The search runs over the first column, with ä=a etc. already implicit, but for some reason not ß=s or ß=ss - so you cannot substitute a different character for "ß" in the search. I'll probably have to solve that in the php script.Also, for anyone with a keyboard which does not provide umlauts or ß, it is thus possible to type just "a", "o", "u" or "ss". Try it and look up "schräg" and "schrag" via the search function.
Good idea.Maybe you just add a comment about that below the search box?!
Lúthien did mention to write a script to incorporate the existing english glosses, leaving only the newer words to be filled in again.How are you planning the next steps? Do you want to go through the German list and put back Tolkien's English glosses?
I'll try to explain that issue a bit more in detail here:Roman wrote: I'm not claiming to know anything about XML, but that's the same impression I got. Sometimes I do have involved searches - like all words with a certain combination of sounds in a certain source, or something like that - but nothing which cannot be done by an SQL query.
I would need a dump of those two tables: I'm not sure what database you use, but SQL is pretty much standard. I think that'd not be a problem anyhow. A set of CREATE TABLE statements to create the tables, and INSERT statements for the contents would be fine.Roman wrote:That's all the relevant information I can think of right now. Tell me what you need and I'll send it to you. How are you planning the next steps? Do you want to go through the German list and put back Tolkien's English glosses?
In any case, it'd be great to have a common English/German database once again.
For Windows, there is a very nice tool called Autohotkey. With it, you can write a personal script which returns characters on the input of specified keys. The script can be copied into the auto-startup so that it'll run right away and you don't have to switch anything.In Windows you can set your keyboard in the control panel (in "Region and Language", then "change keyboards") to an international setting (it will show then you click on "show more"). This is what I actually use. You can then type the single quotation mark ' with any vowel and you get an accented vowel. If you press the double quotation mark " with a vowel, it will become an umlaut (or in case of "e" it becomes "ë" - quite handy for Quenya texts).
The crux is that you need to press an additional <space> if you don't want to use the marks that way.
Unfortunately I haven't found out how to do the "ß" in this way yet - or if indeed there is a way. So I still have to use <Alt>0223 for that.
Sure, go ahead if you think it will improve things. Note, though, that there might be problems with homophonous words, e.g. gwa- is both a verb and a prefix, *dan is reconstructed in the sense 'but' and attested in the sense 'against', barad is a special case mutation in the sense 'damned' and not a s.c. in the sense of 'tower'.If you're ok with it, I could have a look at your data and maybe normalize it a bit: it could be interesting to split off some recurring data in separate tables, for instance the "special case mutation", the "word type", "reconstruction marker" and maybe some others. That would render it easier to maintain and also allow for other primary languages to be added.
I've uploaded the dumps here (hope they're not too smellyI would need a dump of those two tables: I'm not sure what database you use, but SQL is pretty much standard. I think that'd not be a problem anyhow. A set of CREATE TABLE statements to create the tables, and INSERT statements for the contents would be fine.
Sounds like a plan. But again, some warning: Words from PE17-19 may be of identical shape, but with a different meaning, e.g. dîr 'hard, difficult', previously 'man'. On the other hand, some words which were attested before also appear in PE17-19, so that the old entries are expanded by the new references.I don't have a fixed plan as yet, but I have been thinking something like to take my old database, and use that to retrieve the English entries via the Sindarin words that are common in both databases. Incidentally, that could give us also the French entries.
This doesn't seem like a good idea to me.. For the English wordlist, you'll need Tolkien's exact glosses, so there is no shortcut from going through the publications and typing them in by hand.The remaining entries would have to be entered by hand, but I think that there are maybe about 700 of them, which is doable. Maybe something like retrieving German-English translations via the Google Translate API could work, and then going over it to correct eventual mistakes.
Proper German spelling has to have üöäß, of course. But this works fine with the newer SQL versions. I've just changed the php script to sort by the first column and it all appears to be in order - so feel free to ignore the second one.As for the (German) letters like ü / ä/ ö / ß - I think you best decide whether it's better to adhere to that .. I don't know what the current correct form is in Germany? And should maintaining both forms be necessary? As seen from a database design point of view, it's redundant to store both. If we put the original form in the database, we can always choose to display either ü / ä/ ö / ß or ue / ae/ oe / ss by character substitution in the program itself. We could even make it a setting so that people could choose for themselves.
As I described above, they show whether the corresponding entry was reconstructed (value 1) or not (no value, value 0). "Rek" stands for "Rekonstruktion". In the case of plurals (sind2), you often have more than one, and more than one reconstructed, so s2rek indicates the amout of reconstructions in the list sind2 - the reconstructed ones have to go first in the list.I don't understand the meaning of the columns srek, saltrek and s2rek in table wortliste.
These columns only take the values 0, 1, 2, 3 or 9 - it seems as if the names suggest they refer to the columns sind, sind_alt and Sind2 - although I can't figure out what -rek refers to ?
It doesn't seem to be an index to separate otherwise identical rows for the above columns, but maybe I should check that again (though it'd be great if you could just tell me Smile )
It's for various comments: 'untranslated word', 'isolated from X', 'etymology uncertain', 'part of speech uncertain', 'verbal root uncertain', 'only attested in lenited form', 'dialectal form', 'deleted by Tolkien' and so on.Also, I don't know what the second table wordliste_komm is for. By its contents I'd think that these are the somewhat uncertain entries that you talk about (like _lorn_) but if so, how does this table fit into the data of the first one?
Ah, I'm sorry: I overlooked your post on the previous page. I get it now.Roman wrote: As I described above, they show whether the corresponding entry was reconstructed (value 1) or not (no value, value 0). "Rek" stands for "Rekonstruktion". In the case of plurals (sind2), you often have more than one, and more than one reconstructed, so s2rek indicates the amout of reconstructions in the list sind2 - the reconstructed ones have to go first in the list.
In the php script, the integers are correspondingly replaced with asterisks.
"9" is a typo.
And the same thing here. I'll add all these to the model and post that when done.Roman wrote:It's for various comments: 'untranslated word', 'isolated from X', 'etymology uncertain', 'part of speech uncertain', 'verbal root uncertain', 'only attested in lenited form', 'dialectal form', 'deleted by Tolkien' and so on.
It is joined with the main table in the php script and the comment is added whenever sind and rf are identical, which indubitably identifies the entry. I guess, if you use an index for the Sindarin entries, you'll take that then.
But aren't the IDs different? I thought it would be something likeNB - it's entirely possible to merge the Function, Ref, Type and Comment tables with the Entry table, because they have the exact same structure. I think it's better not to though, because although it would be compact and more flexible, it would render it practically impossible to read "manually".
"really needed"; no. You can model this thing in an arbitrary number of ways. What I was thinking of was this (showing rows here):Roman wrote:Is #LANGUAGE_ID really needed there?
Also, how do you exactly handle various translations of the same Sindarin word? The rule is that Tolkien's entries are like this: baug 'tyrannous, cruel, oppressive'; and the amount of glosses will likely vary among the languages of translations.
Code: Select all
LANGUAGE
1 - Sindarin
100 - Deutsch
101 - English
102 - Français
ENTRY
1000 - 1 baug
1001 - 1 alfirin
...
10000 - 101 - tyrannous
10001 - 101 - cruel
10002 - 101 - oppressive
10010 - 100 - grausam
10011 - 100 - hart
10012 - 100 - unbarmherzig
10013 - 100 - bitter
10020 - 101 - immortal
10021 - 101 - type of white flower
10030 - 100 - unsterblich
TRANSLATION (showing only the first two columns for now)
1000 - 10000
1000 - 10001
1000 - 10002
1000 - 10010
1000 - 10011
1000 - 10012
1000 - 10013
1001 - 10020
1001 - 10021
1001 - 10030