menu
Tatoeba
language
Kaydol Giriş yap
language Türkçe
menu
Tatoeba

chevron_right Kaydol

chevron_right Giriş yap

Göz At

chevron_right Rastgele cümle göster

chevron_right Dile göre ara

chevron_right Listeye göre ara

chevron_right Etikete göre ara

chevron_right Ses ara

Topluluk

chevron_right Duvar

chevron_right Tüm üyelerin listesi

chevron_right Üyelerin dilleri

chevron_right Ana diller

search
clear
swap_horiz
search

Menü

Duvar'a dön

Lepotdeterre Lepotdeterre 5 Mayıs 2015 5 Mayıs 2015 21:59:53 UTC link Kalıcı bağlantı

Dear members,

Does Tatoeba have an automatic duplicate-checker, such that it can directly prevent users from posting a sentence which already exists in the corpus? If not, could that be implemented? I think it would be very useful.

P.S. How do I search this wall? I don't want to post things that have already been discussed, but I don't see any way to check previous topics without reading through all 3686 of them, which I'm not too keen on doing.

{{vm.hiddenReplies[22487] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Ooneykcall Ooneykcall 5 Mayıs 2015 5 Mayıs 2015 23:16:04 UTC link Kalıcı bağlantı

There isn't a duplicate checker, but there is a duplicate merging script that runs occasionally.
This is actually more useful, I think, when you think e.g. of the scenario where a new member adds common sentences and translates them into an underrepresented language that they hadn't been translated into before. If this member couldn't do so and was instead forced to use the search function every time to find the already existing common sentence s/he wishes to add and translate that one, their willingness would be depleted quite faster. This way, there is no necessity to bother and the script will take care of it quietly later.

{{vm.hiddenReplies[22488] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Lepotdeterre Lepotdeterre 6 Mayıs 2015 6 Mayıs 2015 09:22:47 UTC link Kalıcı bağlantı

So I shouldn't worry about posting anything twice? Even if I post the same sentence a hundred times, the program will fix it for me? If so, how often does that program run? Also, does it work for Macedonian? I think that I've inadvertently duplicated several Macedonian sentences, while translating distinct English sentences which don't have separate equivalents in Macedonian, e.g. because we don't distinguish between "beautiful" and "pretty" or between "should" and "ought".

{{vm.hiddenReplies[22492] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Guybrush88 Guybrush88 6 Mayıs 2015, 6 Mayıs 2015 tarihinde düzenlendi 6 Mayıs 2015 09:30:40 UTC, 6 Mayıs 2015 09:45:33 UTC düzenlendi link Kalıcı bağlantı

The script works for any language. Generally it's used once a month. Even if you add the same sentence a million times, this script will merge all the sentences so you'll have just one when it's done. You don't have to worry if you translate different sentences and they happen to have the same translation. With this script, you'll have just one sentence linked to all the sentences you translated with the same sentence

{{vm.hiddenReplies[22493] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
AlanF_US AlanF_US 6 Mayıs 2015 6 Mayıs 2015 14:57:52 UTC link Kalıcı bağlantı

The script will merge sentences only if they are exactly the same. If you were adding what you thought was the same sentence a million times (scary thought!), chances are that the punctuation, if not the wording, would vary in some of those sentences. Then there would be no link between those sentences unless someone created one. To anticipate the next question: Is it possible to eliminate punctuation from consideration when merging sentences? Theoretically, yes, but implementing that change is not trivial. And of course, merging sentences with different wording is not feasible. For this reason, if you're going to be adding a lot of simple sentences that have a high probability of already existing, I think it's a good idea to search for the key words first. Naturally, a search takes time, but so does adding a new sentence.

There is an issue ticket for offering search for Wall messages: https://github.com/Tatoeba/tatoeba2/issues/38 , but it has not been addressed yet.

{{vm.hiddenReplies[22501] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Guybrush88 Guybrush88 6 Mayıs 2015 6 Mayıs 2015 16:37:34 UTC link Kalıcı bağlantı

A more urgent change before deduplicating is, imho, to make apostrophes and spaces before punctuation (especially for french) the same. There are sentences that are exactly the same (same wording, same punctuation) but can't be merged because there are different apostrophes or different spaces before punctuation.

for example: #50984 and #2176373
#1850008 and #11578

and i'm sure there are many more

{{vm.hiddenReplies[22502] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
AlanF_US AlanF_US 6 Mayıs 2015 6 Mayıs 2015 18:01:10 UTC link Kalıcı bağlantı

There are at least two ways to resolve this:

(1) Agree on a standard "canonical" set of punctuation marks and spaces, and convert "noncanonical" ones to canonical ones at the time the sentences are added. (There would also need to be a pass to convert legacy sentences once this change was instituted.) Example: for English, declare that straight quotation marks are canonical and that curly quotation marks will be converted into straight ones.

(2) Rather than agreeing on canonical punctuation marks, agree on ones that will be treated equivalently by the script, and let the standard rules determine which sentence wins (for instance, in the absence of other factors, the older sentence wins).

The problem is that the discussion generally bogs down at this point. There are some people who really like the idea of canonical punctuation and some people who hate it -- and some of the people who like it can't agree with others about what should be canonical. For instance, should straight quotation marks be canonical because they're easier to type and easier to process (no need to figure out whether they're opening or closing marks)? Or should curly quotation marks be canonical because they're traditional, and are the form that appears in books? Such decisions would potentially have to be made for every language. Approach 2 cuts down on the pain of having to reach agreement as to which ones "win", but we still would need to identify sets of equivalent characters.

Undoubtedly, we will have to tackle this problem at some point, but I think it will require a concerted planning effort involving multiple people.

Ooneykcall Ooneykcall 6 Mayıs 2015 6 Mayıs 2015 10:06:35 UTC link Kalıcı bağlantı

You can't be possibly telling us Macedonian hasn't got an array of adjectives describing beauty, huh? o.o

{{vm.hiddenReplies[22494] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
User55521 User55521 6 Mayıs 2015, 6 Mayıs 2015 tarihinde düzenlendi 6 Mayıs 2015 10:21:08 UTC, 6 Mayıs 2015 10:21:32 UTC düzenlendi link Kalıcı bağlantı

Ofc it does, but every language draws borders between words differently. E.g. Russian doesn't have a distinction between 'beautiful' and 'handsome'.

{{vm.hiddenReplies[22495] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Ooneykcall Ooneykcall 6 Mayıs 2015 6 Mayıs 2015 10:29:24 UTC link Kalıcı bağlantı

Come on, the difference between beautiful and handsome is purely a sex difference, so that's not the same. Plus we can still make a distinction, just be a little creative and use the nouns. xD
It would be strange for a language not to distinguish between the 'noble' sort and the 'sweet' sort of beauty though.

{{vm.hiddenReplies[22496] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
pullnosemans pullnosemans 6 Mayıs 2015 6 Mayıs 2015 10:57:18 UTC link Kalıcı bağlantı

this last claim is funny seeing as nobility and sweetness are in themselves highly complex concepts made up of several different properties.

spontaneously, the daakaka language of vanuatu comes to my mind, which does not have our concept of "beauty" in the sense of physical beauty of humans altogether (nor "intelligence", for example). you'll always find cultures with structures different enough in certain aspects from educated, industrialized, rich cultures to lack (or exhibit) traits you never even thought of as possibly "optional" in human societies.

User55521 User55521 6 Mayıs 2015 6 Mayıs 2015 10:57:21 UTC link Kalıcı bağlantı

> difference between beautiful and handsome is purely a sex difference

Actually, it’s not. You can call a man beautiful too. It’s a difference between two 'kinds' of beauty.

{{vm.hiddenReplies[22498] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Lepotdeterre Lepotdeterre 6 Mayıs 2015, 6 Mayıs 2015 tarihinde düzenlendi 6 Mayıs 2015 11:24:11 UTC, 6 Mayıs 2015 11:24:45 UTC düzenlendi link Kalıcı bağlantı

As I see it, "handsome" implies romantic and/or sexual beauty, whereas "beautiful" is neutral. Additionally, "handsome" is preferred for men, whereas beautiful is preferred for women.

Anyhow, it's not as though Macedonian has only one word to describe beauty - it's just that all of the words that exist (which are fewer than in English, as with almost any lexical concept) overlap greatly, i.e. they don't really form a spectrum. We have the following:

убав - pretty, beautiful, nice (can be used to describe anything, e.g. the weather, a car, a woman or the world)
мил - pretty, nice, sweet, dear
миловиден - lovely, sweet, dear, pleasant to look at
преубав - emphatic form of the former (beautiful, wonderful, gorgeous, amazing)
красен - beautiful, gorgeous (little known word)
згоден - handsome (colloquially, also "hot", as in "sexy"; for both genders)
личен - pretty, beautiful (little known word)
*угледен - sightly, all right to look at
*строен - stately, beautiful

{{vm.hiddenReplies[22499] ? 'expand_more' : 'expand_less'}} cevapları gizle cevapları göster
Ooneykcall Ooneykcall 6 Mayıs 2015 6 Mayıs 2015 12:27:03 UTC link Kalıcı bağlantı

Of course they overlap, it's the natural thing. Still, there are differences, and from what you say it's clear then "beautiful" and "pretty" can be rendered differently, even if not necessarily. In Russian translations I generally use красивый/прекрасный for beautiful and милый/миловидный/хорошенький (the latter not for men) for pretty/lovely. Meanings fluctuate depending on particulars, but in any case you can surely make a distinction if need be.

tommy_san tommy_san 6 Mayıs 2015 6 Mayıs 2015 23:13:01 UTC link Kalıcı bağlantı

Since no one seems to have told you yet,

1. Let me introduce to you our colleague Horus.
https://tatoeba.org/user/profile/Horus
He's been doing a great work since January this year.

2. There has been a discussion about a mechanism that prevents users from adding a duplicate.
https://github.com/Tatoeba/tato...mment-73356216
However, no one is actually working on it, as far as I know.