menu
Tatoeba
language
Zarejestruj się Zaloguj się
language Polski
menu
Tatoeba

chevron_right Zarejestruj się

chevron_right Zaloguj się

Przeglądaj

chevron_right Wyświetl losowe zdanie

chevron_right Przeglądaj po języku

chevron_right Przeglądaj według listy

chevron_right Przeglądaj po tagu

chevron_right Przeszukuj audio

Społeczność

chevron_right Tablica ogłoszeń

chevron_right Spis członków

chevron_right Członkowie wg języka

chevron_right Rodzimi użytkownicy języka

search
clear
swap_horiz
search
Demetrius {{ icon }} keyboard_arrow_right

Profil

keyboard_arrow_right

Zdania

keyboard_arrow_right

Słownictwo

keyboard_arrow_right

Oceny

keyboard_arrow_right

Listy

keyboard_arrow_right

Ulubione

keyboard_arrow_right

Komentarze

keyboard_arrow_right

Komentarze do zdań użytkownika Demetrius

keyboard_arrow_right

Wiadomości na tablicy ogłoszeń

keyboard_arrow_right

Logi

keyboard_arrow_right

Nagranie

keyboard_arrow_right

Transkrypcje

translate

Tłumacz zdania członka Demetrius

Wypowiedzi na tablicy ogłoszeń użytkownika Demetrius (łącznie 442)

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 15:44:55 UTC link Bezpośredni link

Should we keep proverbs offending other nations in the database? And how should these be tagged?

We already have some Ukrainian proverbs about Russians in the database (sth like “You can ward off the devil crossing yourself, but you can’t ward off a Moskal”). ^^

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 11:02:51 UTC link Bezpośredni link

> Good point. But transliteration can't handle
> letter pair --> single letter correspondence?
Of course it can. I mean that usually nj = њ, but in the word injekcija it's a morpheme boundary and it should be retained нј.

> Unless there are instances of
> "nj" that are нј and not њ,
> but this is never the case.
Инјекциjа!

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.
In fact, you don't need a dictionary at all for Latin>Arabic. Only for zh/j and ' if people omit these.

All the other directions you need a dictionary.

> The Latin "ng" could be transliterated as either.
No, as far as I know. Latin requires breaking these with ': ng and n'g.

> I think there are other cases,
> as well (personally, I don't
> much like the Latin Uighur...)
But it's indeed very easy to process.

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 10:56:14 UTC link Bezpośredni link

OK, that is a good idea. I'll send my (imperfect) Uzbek transliterator on Monday.

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 10:54:29 UTC link Bezpośredni link

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 10:49:29 UTC link Bezpośredni link

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 10:45:57 UTC link Bezpośredni link

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Demetrius Demetrius 20 sierpnia 2010 20 sierpnia 2010 10:39:35 UTC link Bezpośredni link

It's easy, but I'm too lazy to do this. ^^ I've started working on Uzbek, maybe I'll finish it someday...

Also, sysko has to do transliteration caching. It will allow making transliteration more time-intensive (dictionary searches...).

> one-to-one letter correspondence between
> the alternative alphabets exists (e.g. Serbian).
But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

> Uighur
Latin > Arabic is the easiest (unless people omit ' and don't differenciate j/zh ^^).
Since we have no Latin Uyghur sentences, there is no rush.

Others require a LARGE dictionary of proper names, since Arabic has no capital letters. Cyrillic requires a dictionary of Russian loanwords.

> and - I think, but Demetrius could confirm - Tatar
No, this one is tricky in both directions. The hardest part is q and ğ. Usually к = k and г = g w/ front vowels, к = q and г = q with back ones.

But Arabic words break vowel harmony:
нигъмәт — niğmət ‘dish’ (ğ is marked as гъ), сәгать — səğət ‘clock’ (ğ is marked by changing vowel letter, the vowel quality is marked by the soft sign), сәгатем — səğətem ‘my clock’ (ğ is marked by a vowel letter w/out a soft sign)

Russian loanwords break vowel harmony in other way, they force the K and G even near back vowels.

Also, there are W and V:
В = V (вагон — vagon ‘carriage’), W (авыл — awıl ‘village’)
У = U (су — su ‘water’), W (тау = taw ‘mountain’)
Ү = Ü (күрү — kürü ‘see’), W (Мəскəү — Məskəw ‘Moscow’)

Demetrius Demetrius 18 sierpnia 2010 18 sierpnia 2010 15:12:25 UTC link Bezpośredni link

Please excuse me for my suspicious. ^^

Demetrius Demetrius 18 sierpnia 2010 18 sierpnia 2010 15:02:01 UTC link Bezpośredni link

BTW, he didn't add any genuinely Bulgarian sentences. ^_^

Demetrius Demetrius 18 sierpnia 2010 18 sierpnia 2010 15:01:32 UTC link Bezpośredni link

Gruzilkin (http://tatoeba.org/eng/sentence...ser/Gruzilkin) have added some Bulgarian sentences marked as Russian. It's either auto-detection failure or intended action (he brought the number of Bulgarian sentences to 500 this way).

Can we reassign the language of these in a batch?

Demetrius Demetrius 18 sierpnia 2010 18 sierpnia 2010 10:53:02 UTC link Bezpośredni link

Some preventive measures?

timsa (http://tatoeba.org/eng/user/profile/timsa) have added lots of sentences that are either not translations (but may look like them for a learner), or too impolite/vulgar, or too colloquial...

I think we need some rules regarding the user behaviour.

Demetrius Demetrius 16 sierpnia 2010 16 sierpnia 2010 13:55:49 UTC link Bezpośredni link

Please do the same for non-breaking space [ ]. I sometimes use it to prevent dashes (—) from being moved to the next line; it should be treated in the same way as the ordinary space for search purposes.

There are lots of other spaces, but I’m not sure anyone has ever used these on Tatoeba: en quad [ ], em quad [ ], en space [ ], em space [ ], 3-per-em space [ ], 4-per-em space [ ], 6-per-em space [ ], figure space [ ], medium mathematical space [ ], punctuation space [ ], thin space [ ], hair space [ ], zero-width space [​]

Demetrius Demetrius 16 sierpnia 2010 16 sierpnia 2010 09:11:29 UTC link Bezpośredni link

Actually, I’m not sure it’s the best way of sorting sentences. Shorter sentences tend to be stranger, as there is little context.

Demetrius Demetrius 16 sierpnia 2010 16 sierpnia 2010 09:06:48 UTC link Bezpośredni link

This seems to happen with all the batch imports. The Ukrainian proverbs from Shtoota are contributed by sysko and owned by me (422069). I don't see any problem with this.

After all, such things come from collections not neccessarily compiled by the people who have suggested importing them, and IMO there’s nothing bad with admins being higher in the contributors table. They contribute in other ways too.

Demetrius Demetrius 12 sierpnia 2010 12 sierpnia 2010 15:54:17 UTC link Bezpośredni link

I suggest not putting the tag on the other people's sentences, or at least writing something about it in the comments. I've been very surprised to find my Russian sentence tagges 'needs native check' recently (http://tatoeba.org/sentences/show/451147).

Demetrius Demetrius 3 sierpnia 2010 3 sierpnia 2010 21:38:56 UTC link Bezpośredni link

I don’t think the number of corrections can be equal to unreliability.

In fact, I think what we need is a rating system, which would allow users to vote for the translations they believe to be good.

By the way...
> check if it matches the former version regardless
> of punctuation and spaces (easy!)
Punctuation rules and spaces are also very important. Comma in the wrong place can change the meaning of the sentence more than a misspelled word.

Demetrius Demetrius 2 sierpnia 2010 2 sierpnia 2010 23:01:56 UTC link Bezpośredni link

Yet another feature request. Can we add several tags at a time? I.e. «familiar;said to female». It should be relatively easy to implement, but it can save some time and semicolons are very unlikely to appear in tags anyway.

Demetrius Demetrius 1 sierpnia 2010 1 sierpnia 2010 22:14:17 UTC link Bezpośredni link

BTW, if I'm not mistaken, Mnemosyne expects CSV fields to be Text<tab>Translation, which is different from what Tatoeba has.

Demetrius Demetrius 31 lipca 2010 31 lipca 2010 21:32:12 UTC link Bezpośredni link

a) This is counter-intuitive for many languages: ‘cmn’ for Chinese MaNdarin, ‘epo’ for EsPerantO, ‘kat’ for Kartuli, ‘nob’ for Norwegian Bokmål, ‘non’ for Norwegian Nynorsk, ‘ron’ for Romanian
b) The icons will be harder to distinguish (as they’re just black-on-white text, unlike flags)
c) Latin script is not neutral. ;) At least not more neutral than flags.

Demetrius Demetrius 31 lipca 2010 31 lipca 2010 21:15:22 UTC link Bezpośredni link

I like this initiative, but its aims seem to be different from what we need.

It’s just one symbol that is intended to mean “Select language” in different colours and sizes.