menu
Татоэба
language
Теркәлергә Авторлашу
language Татар
menu
Татоэба

chevron_right Теркәлергә

chevron_right Авторлашу

Карау

chevron_right Очраклы җөмләне күрсәтергә

chevron_right Тел буенча карау

chevron_right Исемлек буенча карау

chevron_right Тег буенча карау

chevron_right Аудионы карау

Җәмгыять

chevron_right Дивар

chevron_right Барлык катнашучылар исемлеге

chevron_right Катнашучылар телләре

chevron_right Тел йөртүчеләр

search
clear
swap_horiz
search
Demetrius {{ icon }} keyboard_arrow_right

Профиле

keyboard_arrow_right

Җөмләләр

keyboard_arrow_right

Сүз саклыклыгы

keyboard_arrow_right

Күзәтү

keyboard_arrow_right

Исемлек

keyboard_arrow_right

Сайланганнар

keyboard_arrow_right

Шәрехләр

keyboard_arrow_right

Demetrius's җөмләләренә шәрехләр бирә

keyboard_arrow_right

Дивардагы хәбәрләр

keyboard_arrow_right

Логлар

keyboard_arrow_right

Аудио

keyboard_arrow_right

Транскрипцияләр

translate

Җөмләләрне тәрҗемә итү Demetrius

Дивардагы Demetrius's хәбәрләре (барлыгы 442)

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 15:44:55 UTC link Даими сылтама

Should we keep proverbs offending other nations in the database? And how should these be tagged?

We already have some Ukrainian proverbs about Russians in the database (sth like “You can ward off the devil crossing yourself, but you can’t ward off a Moskal”). ^^

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 11:02:51 UTC link Даими сылтама

> Good point. But transliteration can't handle
> letter pair --> single letter correspondence?
Of course it can. I mean that usually nj = њ, but in the word injekcija it's a morpheme boundary and it should be retained нј.

> Unless there are instances of
> "nj" that are нј and not њ,
> but this is never the case.
Инјекциjа!

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.
In fact, you don't need a dictionary at all for Latin>Arabic. Only for zh/j and ' if people omit these.

All the other directions you need a dictionary.

> The Latin "ng" could be transliterated as either.
No, as far as I know. Latin requires breaking these with ': ng and n'g.

> I think there are other cases,
> as well (personally, I don't
> much like the Latin Uighur...)
But it's indeed very easy to process.

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 10:56:14 UTC link Даими сылтама

OK, that is a good idea. I'll send my (imperfect) Uzbek transliterator on Monday.

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 10:54:29 UTC link Даими сылтама

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 10:49:29 UTC link Даими сылтама

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 10:45:57 UTC link Даими сылтама

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Demetrius Demetrius 20 август, 2010 ел 20 август, 2010 ел, 10:39:35 UTC link Даими сылтама

It's easy, but I'm too lazy to do this. ^^ I've started working on Uzbek, maybe I'll finish it someday...

Also, sysko has to do transliteration caching. It will allow making transliteration more time-intensive (dictionary searches...).

> one-to-one letter correspondence between
> the alternative alphabets exists (e.g. Serbian).
But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

> Uighur
Latin > Arabic is the easiest (unless people omit ' and don't differenciate j/zh ^^).
Since we have no Latin Uyghur sentences, there is no rush.

Others require a LARGE dictionary of proper names, since Arabic has no capital letters. Cyrillic requires a dictionary of Russian loanwords.

> and - I think, but Demetrius could confirm - Tatar
No, this one is tricky in both directions. The hardest part is q and ğ. Usually к = k and г = g w/ front vowels, к = q and г = q with back ones.

But Arabic words break vowel harmony:
нигъмәт — niğmət ‘dish’ (ğ is marked as гъ), сәгать — səğət ‘clock’ (ğ is marked by changing vowel letter, the vowel quality is marked by the soft sign), сәгатем — səğətem ‘my clock’ (ğ is marked by a vowel letter w/out a soft sign)

Russian loanwords break vowel harmony in other way, they force the K and G even near back vowels.

Also, there are W and V:
В = V (вагон — vagon ‘carriage’), W (авыл — awıl ‘village’)
У = U (су — su ‘water’), W (тау = taw ‘mountain’)
Ү = Ü (күрү — kürü ‘see’), W (Мəскəү — Məskəw ‘Moscow’)

Demetrius Demetrius 18 август, 2010 ел 18 август, 2010 ел, 15:12:25 UTC link Даими сылтама

Please excuse me for my suspicious. ^^

Demetrius Demetrius 18 август, 2010 ел 18 август, 2010 ел, 15:02:01 UTC link Даими сылтама

BTW, he didn't add any genuinely Bulgarian sentences. ^_^

Demetrius Demetrius 18 август, 2010 ел 18 август, 2010 ел, 15:01:32 UTC link Даими сылтама

Gruzilkin (http://tatoeba.org/eng/sentence...ser/Gruzilkin) have added some Bulgarian sentences marked as Russian. It's either auto-detection failure or intended action (he brought the number of Bulgarian sentences to 500 this way).

Can we reassign the language of these in a batch?

Demetrius Demetrius 18 август, 2010 ел 18 август, 2010 ел, 10:53:02 UTC link Даими сылтама

Some preventive measures?

timsa (http://tatoeba.org/eng/user/profile/timsa) have added lots of sentences that are either not translations (but may look like them for a learner), or too impolite/vulgar, or too colloquial...

I think we need some rules regarding the user behaviour.

Demetrius Demetrius 16 август, 2010 ел 16 август, 2010 ел, 13:55:49 UTC link Даими сылтама

Please do the same for non-breaking space [ ]. I sometimes use it to prevent dashes (—) from being moved to the next line; it should be treated in the same way as the ordinary space for search purposes.

There are lots of other spaces, but I’m not sure anyone has ever used these on Tatoeba: en quad [ ], em quad [ ], en space [ ], em space [ ], 3-per-em space [ ], 4-per-em space [ ], 6-per-em space [ ], figure space [ ], medium mathematical space [ ], punctuation space [ ], thin space [ ], hair space [ ], zero-width space [​]

Demetrius Demetrius 16 август, 2010 ел 16 август, 2010 ел, 9:11:29 UTC link Даими сылтама

Actually, I’m not sure it’s the best way of sorting sentences. Shorter sentences tend to be stranger, as there is little context.

Demetrius Demetrius 16 август, 2010 ел 16 август, 2010 ел, 9:06:48 UTC link Даими сылтама

This seems to happen with all the batch imports. The Ukrainian proverbs from Shtoota are contributed by sysko and owned by me (422069). I don't see any problem with this.

After all, such things come from collections not neccessarily compiled by the people who have suggested importing them, and IMO there’s nothing bad with admins being higher in the contributors table. They contribute in other ways too.

Demetrius Demetrius 12 август, 2010 ел 12 август, 2010 ел, 15:54:17 UTC link Даими сылтама

I suggest not putting the tag on the other people's sentences, or at least writing something about it in the comments. I've been very surprised to find my Russian sentence tagges 'needs native check' recently (http://tatoeba.org/sentences/show/451147).

Demetrius Demetrius 3 август, 2010 ел 3 август, 2010 ел, 21:38:56 UTC link Даими сылтама

I don’t think the number of corrections can be equal to unreliability.

In fact, I think what we need is a rating system, which would allow users to vote for the translations they believe to be good.

By the way...
> check if it matches the former version regardless
> of punctuation and spaces (easy!)
Punctuation rules and spaces are also very important. Comma in the wrong place can change the meaning of the sentence more than a misspelled word.

Demetrius Demetrius 2 август, 2010 ел 2 август, 2010 ел, 23:01:56 UTC link Даими сылтама

Yet another feature request. Can we add several tags at a time? I.e. «familiar;said to female». It should be relatively easy to implement, but it can save some time and semicolons are very unlikely to appear in tags anyway.

Demetrius Demetrius 1 август, 2010 ел 1 август, 2010 ел, 22:14:17 UTC link Даими сылтама

BTW, if I'm not mistaken, Mnemosyne expects CSV fields to be Text<tab>Translation, which is different from what Tatoeba has.

Demetrius Demetrius 31 июль, 2010 ел 31 июль, 2010 ел, 21:32:12 UTC link Даими сылтама

a) This is counter-intuitive for many languages: ‘cmn’ for Chinese MaNdarin, ‘epo’ for EsPerantO, ‘kat’ for Kartuli, ‘nob’ for Norwegian Bokmål, ‘non’ for Norwegian Nynorsk, ‘ron’ for Romanian
b) The icons will be harder to distinguish (as they’re just black-on-white text, unlike flags)
c) Latin script is not neutral. ;) At least not more neutral than flags.

Demetrius Demetrius 31 июль, 2010 ел 31 июль, 2010 ел, 21:15:22 UTC link Даими сылтама

I like this initiative, but its aims seem to be different from what we need.

It’s just one symbol that is intended to mean “Select language” in different colours and sizes.