menu
Tatoeba
language
S'inscriure Connexion
language Occitan
menu
Tatoeba

chevron_right S'inscriure

chevron_right Connexion

Percórrer

chevron_right Afichar la frasa aleatòria

chevron_right Percórrer per lenga

chevron_right Percórrer per lista

chevron_right Percórrer per etiqueta

chevron_right Percórrer los enregistraments àudio

Community

chevron_right Paret

chevron_right Lista de totes los membres

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Demetrius {{ icon }} keyboard_arrow_right

Perfil

keyboard_arrow_right

Frasas

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Reviews

keyboard_arrow_right

Lists

keyboard_arrow_right

Marcapaginas

keyboard_arrow_right

Comentaris

keyboard_arrow_right

Comentaris sus las frasas de Demetrius

keyboard_arrow_right

Cabinats

keyboard_arrow_right

Jornals

keyboard_arrow_right

Audio

keyboard_arrow_right

Transcriptions

translate

Translate Demetrius's sentences

Cabinets de Demetrius sus la paret (total 442)

Demetrius Demetrius August 20, 2010 August 20, 2010 at 3:44:55 PM UTC link Permalink

Should we keep proverbs offending other nations in the database? And how should these be tagged?

We already have some Ukrainian proverbs about Russians in the database (sth like “You can ward off the devil crossing yourself, but you can’t ward off a Moskal”). ^^

Demetrius Demetrius August 20, 2010 August 20, 2010 at 11:02:51 AM UTC link Permalink

> Good point. But transliteration can't handle
> letter pair --> single letter correspondence?
Of course it can. I mean that usually nj = њ, but in the word injekcija it's a morpheme boundary and it should be retained нј.

> Unless there are instances of
> "nj" that are нј and not њ,
> but this is never the case.
Инјекциjа!

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.
In fact, you don't need a dictionary at all for Latin>Arabic. Only for zh/j and ' if people omit these.

All the other directions you need a dictionary.

> The Latin "ng" could be transliterated as either.
No, as far as I know. Latin requires breaking these with ': ng and n'g.

> I think there are other cases,
> as well (personally, I don't
> much like the Latin Uighur...)
But it's indeed very easy to process.

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:56:14 AM UTC link Permalink

OK, that is a good idea. I'll send my (imperfect) Uzbek transliterator on Monday.

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:54:29 AM UTC link Permalink

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:49:29 AM UTC link Permalink

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:45:57 AM UTC link Permalink

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:39:35 AM UTC link Permalink

It's easy, but I'm too lazy to do this. ^^ I've started working on Uzbek, maybe I'll finish it someday...

Also, sysko has to do transliteration caching. It will allow making transliteration more time-intensive (dictionary searches...).

> one-to-one letter correspondence between
> the alternative alphabets exists (e.g. Serbian).
But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

> Uighur
Latin > Arabic is the easiest (unless people omit ' and don't differenciate j/zh ^^).
Since we have no Latin Uyghur sentences, there is no rush.

Others require a LARGE dictionary of proper names, since Arabic has no capital letters. Cyrillic requires a dictionary of Russian loanwords.

> and - I think, but Demetrius could confirm - Tatar
No, this one is tricky in both directions. The hardest part is q and ğ. Usually к = k and г = g w/ front vowels, к = q and г = q with back ones.

But Arabic words break vowel harmony:
нигъмәт — niğmət ‘dish’ (ğ is marked as гъ), сәгать — səğət ‘clock’ (ğ is marked by changing vowel letter, the vowel quality is marked by the soft sign), сәгатем — səğətem ‘my clock’ (ğ is marked by a vowel letter w/out a soft sign)

Russian loanwords break vowel harmony in other way, they force the K and G even near back vowels.

Also, there are W and V:
В = V (вагон — vagon ‘carriage’), W (авыл — awıl ‘village’)
У = U (су — su ‘water’), W (тау = taw ‘mountain’)
Ү = Ü (күрү — kürü ‘see’), W (Мəскəү — Məskəw ‘Moscow’)

Demetrius Demetrius August 18, 2010 August 18, 2010 at 3:12:25 PM UTC link Permalink

Please excuse me for my suspicious. ^^

Demetrius Demetrius August 18, 2010 August 18, 2010 at 3:02:01 PM UTC link Permalink

BTW, he didn't add any genuinely Bulgarian sentences. ^_^

Demetrius Demetrius August 18, 2010 August 18, 2010 at 3:01:32 PM UTC link Permalink

Gruzilkin (http://tatoeba.org/eng/sentence...ser/Gruzilkin) have added some Bulgarian sentences marked as Russian. It's either auto-detection failure or intended action (he brought the number of Bulgarian sentences to 500 this way).

Can we reassign the language of these in a batch?

Demetrius Demetrius August 18, 2010 August 18, 2010 at 10:53:02 AM UTC link Permalink

Some preventive measures?

timsa (http://tatoeba.org/eng/user/profile/timsa) have added lots of sentences that are either not translations (but may look like them for a learner), or too impolite/vulgar, or too colloquial...

I think we need some rules regarding the user behaviour.

Demetrius Demetrius August 16, 2010 August 16, 2010 at 1:55:49 PM UTC link Permalink

Please do the same for non-breaking space [ ]. I sometimes use it to prevent dashes (—) from being moved to the next line; it should be treated in the same way as the ordinary space for search purposes.

There are lots of other spaces, but I’m not sure anyone has ever used these on Tatoeba: en quad [ ], em quad [ ], en space [ ], em space [ ], 3-per-em space [ ], 4-per-em space [ ], 6-per-em space [ ], figure space [ ], medium mathematical space [ ], punctuation space [ ], thin space [ ], hair space [ ], zero-width space [​]

Demetrius Demetrius August 16, 2010 August 16, 2010 at 9:11:29 AM UTC link Permalink

Actually, I’m not sure it’s the best way of sorting sentences. Shorter sentences tend to be stranger, as there is little context.

Demetrius Demetrius August 16, 2010 August 16, 2010 at 9:06:48 AM UTC link Permalink

This seems to happen with all the batch imports. The Ukrainian proverbs from Shtoota are contributed by sysko and owned by me (422069). I don't see any problem with this.

After all, such things come from collections not neccessarily compiled by the people who have suggested importing them, and IMO there’s nothing bad with admins being higher in the contributors table. They contribute in other ways too.

Demetrius Demetrius August 12, 2010 August 12, 2010 at 3:54:17 PM UTC link Permalink

I suggest not putting the tag on the other people's sentences, or at least writing something about it in the comments. I've been very surprised to find my Russian sentence tagges 'needs native check' recently (http://tatoeba.org/sentences/show/451147).

Demetrius Demetrius August 3, 2010 August 3, 2010 at 9:38:56 PM UTC link Permalink

I don’t think the number of corrections can be equal to unreliability.

In fact, I think what we need is a rating system, which would allow users to vote for the translations they believe to be good.

By the way...
> check if it matches the former version regardless
> of punctuation and spaces (easy!)
Punctuation rules and spaces are also very important. Comma in the wrong place can change the meaning of the sentence more than a misspelled word.

Demetrius Demetrius August 2, 2010 August 2, 2010 at 11:01:56 PM UTC link Permalink

Yet another feature request. Can we add several tags at a time? I.e. «familiar;said to female». It should be relatively easy to implement, but it can save some time and semicolons are very unlikely to appear in tags anyway.

Demetrius Demetrius August 1, 2010 August 1, 2010 at 10:14:17 PM UTC link Permalink

BTW, if I'm not mistaken, Mnemosyne expects CSV fields to be Text<tab>Translation, which is different from what Tatoeba has.

Demetrius Demetrius July 31, 2010 July 31, 2010 at 9:32:12 PM UTC link Permalink

a) This is counter-intuitive for many languages: ‘cmn’ for Chinese MaNdarin, ‘epo’ for EsPerantO, ‘kat’ for Kartuli, ‘nob’ for Norwegian Bokmål, ‘non’ for Norwegian Nynorsk, ‘ron’ for Romanian
b) The icons will be harder to distinguish (as they’re just black-on-white text, unlike flags)
c) Latin script is not neutral. ;) At least not more neutral than flags.

Demetrius Demetrius July 31, 2010 July 31, 2010 at 9:15:22 PM UTC link Permalink

I like this initiative, but its aims seem to be different from what we need.

It’s just one symbol that is intended to mean “Select language” in different colours and sizes.