Demetriuss väggmeddelanden

Demetrius {{ icon }}

keyboard_arrow_right

Profil

keyboard_arrow_right

Meningar

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Granskningar

keyboard_arrow_right

Listor

keyboard_arrow_right

Favoriter

keyboard_arrow_right

Kommentarer

keyboard_arrow_right

Kommentarer på Demetriuss meningar

keyboard_arrow_right

Väggmeddelanden

keyboard_arrow_right

Loggar

keyboard_arrow_right

Ljudinspelningar

keyboard_arrow_right

Transkriptioner

translate

Översätt Demetriuss meningar

Demetrius 20 augusti 2010 20 augusti 2010 15:44:55 UTC

link

Permalänk

Should we keep proverbs offending other nations in the database? And how should these be tagged?

We already have some Ukrainian proverbs about Russians in the database (sth like “You can ward off the devil crossing yourself, but you can’t ward off a Moskal”). ^^

Demetrius 20 augusti 2010 20 augusti 2010 11:02:51 UTC

link

Permalänk

> Good point. But transliteration can't handle
> letter pair --> single letter correspondence?
Of course it can. I mean that usually nj = њ, but in the word injekcija it's a morpheme boundary and it should be retained нј.

> Unless there are instances of
> "nj" that are нј and not њ,
> but this is never the case.
Инјекциjа!

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.
In fact, you don't need a dictionary at all for Latin>Arabic. Only for zh/j and ' if people omit these.

All the other directions you need a dictionary.

> The Latin "ng" could be transliterated as either.
No, as far as I know. Latin requires breaking these with ': ng and n'g.

> I think there are other cases,
> as well (personally, I don't
> much like the Latin Uighur...)
But it's indeed very easy to process.

Demetrius 20 augusti 2010 20 augusti 2010 10:56:14 UTC

link

Permalänk

OK, that is a good idea. I'll send my (imperfect) Uzbek transliterator on Monday.

Demetrius 20 augusti 2010 20 augusti 2010 10:54:29 UTC

link

Permalänk

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.

Demetrius 20 augusti 2010 20 augusti 2010 10:49:29 UTC

link

Permalänk

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

Demetrius 20 augusti 2010 20 augusti 2010 10:45:57 UTC

link

Permalänk

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Demetrius 20 augusti 2010 20 augusti 2010 10:39:35 UTC

link

Permalänk

It's easy, but I'm too lazy to do this. ^^ I've started working on Uzbek, maybe I'll finish it someday...

Also, sysko has to do transliteration caching. It will allow making transliteration more time-intensive (dictionary searches...).

> one-to-one letter correspondence between
> the alternative alphabets exists (e.g. Serbian).
But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

> Uighur
Latin > Arabic is the easiest (unless people omit ' and don't differenciate j/zh ^^).
Since we have no Latin Uyghur sentences, there is no rush.

Others require a LARGE dictionary of proper names, since Arabic has no capital letters. Cyrillic requires a dictionary of Russian loanwords.

> and - I think, but Demetrius could confirm - Tatar
No, this one is tricky in both directions. The hardest part is q and ğ. Usually к = k and г = g w/ front vowels, к = q and г = q with back ones.

But Arabic words break vowel harmony:
нигъмәт — niğmət ‘dish’ (ğ is marked as гъ), сәгать — səğət ‘clock’ (ğ is marked by changing vowel letter, the vowel quality is marked by the soft sign), сәгатем — səğətem ‘my clock’ (ğ is marked by a vowel letter w/out a soft sign)

Russian loanwords break vowel harmony in other way, they force the K and G even near back vowels.

Also, there are W and V:
В = V (вагон — vagon ‘carriage’), W (авыл — awıl ‘village’)
У = U (су — su ‘water’), W (тау = taw ‘mountain’)
Ү = Ü (күрү — kürü ‘see’), W (Мəскəү — Məskəw ‘Moscow’)

Demetrius 18 augusti 2010 18 augusti 2010 15:12:25 UTC

link

Permalänk

Please excuse me for my suspicious. ^^

Demetrius 18 augusti 2010 18 augusti 2010 15:02:01 UTC

link

Permalänk

BTW, he didn't add any genuinely Bulgarian sentences. ^_^

Demetrius 18 augusti 2010 18 augusti 2010 15:01:32 UTC

link

Permalänk

Gruzilkin (http://tatoeba.org/eng/sentence...ser/Gruzilkin) have added some Bulgarian sentences marked as Russian. It's either auto-detection failure or intended action (he brought the number of Bulgarian sentences to 500 this way).

Can we reassign the language of these in a batch?

Demetrius 18 augusti 2010 18 augusti 2010 10:53:02 UTC

link

Permalänk

Some preventive measures?

timsa (http://tatoeba.org/eng/user/profile/timsa) have added lots of sentences that are either not translations (but may look like them for a learner), or too impolite/vulgar, or too colloquial...

I think we need some rules regarding the user behaviour.

Demetrius 16 augusti 2010 16 augusti 2010 13:55:49 UTC

link

Permalänk

Please do the same for non-breaking space [ ]. I sometimes use it to prevent dashes (—) from being moved to the next line; it should be treated in the same way as the ordinary space for search purposes.

There are lots of other spaces, but I’m not sure anyone has ever used these on Tatoeba: en quad [ ], em quad [ ], en space [ ], em space [ ], 3-per-em space [ ], 4-per-em space [ ], 6-per-em space [ ], figure space [ ], medium mathematical space [ ], punctuation space [ ], thin space [ ], hair space [ ], zero-width space []

Demetrius 16 augusti 2010 16 augusti 2010 09:11:29 UTC

link

Permalänk

Actually, I’m not sure it’s the best way of sorting sentences. Shorter sentences tend to be stranger, as there is little context.

Demetrius 16 augusti 2010 16 augusti 2010 09:06:48 UTC

link

Permalänk

This seems to happen with all the batch imports. The Ukrainian proverbs from Shtoota are contributed by sysko and owned by me (422069). I don't see any problem with this.

After all, such things come from collections not neccessarily compiled by the people who have suggested importing them, and IMO there’s nothing bad with admins being higher in the contributors table. They contribute in other ways too.

Demetrius 12 augusti 2010 12 augusti 2010 15:54:17 UTC

link

Permalänk

I suggest not putting the tag on the other people's sentences, or at least writing something about it in the comments. I've been very surprised to find my Russian sentence tagges 'needs native check' recently (http://tatoeba.org/sentences/show/451147).

Demetrius 3 augusti 2010 3 augusti 2010 21:38:56 UTC

link

Permalänk

I don’t think the number of corrections can be equal to unreliability.

In fact, I think what we need is a rating system, which would allow users to vote for the translations they believe to be good.

By the way...
> check if it matches the former version regardless
> of punctuation and spaces (easy!)
Punctuation rules and spaces are also very important. Comma in the wrong place can change the meaning of the sentence more than a misspelled word.

Demetrius 2 augusti 2010 2 augusti 2010 23:01:56 UTC

link

Permalänk

Yet another feature request. Can we add several tags at a time? I.e. «familiar;said to female». It should be relatively easy to implement, but it can save some time and semicolons are very unlikely to appear in tags anyway.

Demetrius 1 augusti 2010 1 augusti 2010 22:14:17 UTC

link

Permalänk

BTW, if I'm not mistaken, Mnemosyne expects CSV fields to be Text<tab>Translation, which is different from what Tatoeba has.

Demetrius 31 juli 2010 31 juli 2010 21:32:12 UTC

link

Permalänk

a) This is counter-intuitive for many languages: ‘cmn’ for Chinese MaNdarin, ‘epo’ for EsPerantO, ‘kat’ for Kartuli, ‘nob’ for Norwegian Bokmål, ‘non’ for Norwegian Nynorsk, ‘ron’ for Romanian
b) The icons will be harder to distinguish (as they’re just black-on-white text, unlike flags)
c) Latin script is not neutral. ;) At least not more neutral than flags.

Demetrius 31 juli 2010 31 juli 2010 21:15:22 UTC

link

Permalänk

I like this initiative, but its aims seem to be different from what we need.

It’s just one symbol that is intended to mean “Select language” in different colours and sizes.

Behöver du hjälp?

Utvecklare

Om

Demetriuss meddelanden på väggen (totalt 442)

Behöver du hjälp?

Utvecklare

Om