menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
CK CK October 13, 2014, edited October 30, 2019 October 13, 2014 at 10:24:41 AM UTC, edited October 30, 2019 at 7:27:17 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[20656] ? 'expand_more' : 'expand_less'}} hide replies show replies
gleki gleki October 15, 2014 October 15, 2014 at 12:48:05 PM UTC link Permalink

Is there an open issue that would just detect that the sentence is a duplicate and show a warning in such case?

CK CK November 8, 2014, edited October 30, 2019 November 8, 2014 at 3:11:11 PM UTC, edited October 30, 2019 at 7:27:23 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[20769] ? 'expand_more' : 'expand_less'}} hide replies show replies
cvbge cvbge November 9, 2014 November 9, 2014 at 4:48:13 AM UTC link Permalink

It should be fairly easy to normalize sentences in some respects. For example, all white-space characters could be normalized to plain ASCII space. Maybe some languages would need specialized rules, but still. It should in fact happen before a sentence is stored in the database.

{{vm.hiddenReplies[20770] ? 'expand_more' : 'expand_less'}} hide replies show replies
neron neron November 9, 2014, edited November 9, 2014 November 9, 2014 at 4:41:26 PM UTC, edited November 9, 2014 at 4:46:41 PM UTC link Permalink

I believe some people take great pain in using proper spacing character, for example non-braking space, around % in German, so forcing plain space might be wrong approach. However, maybe merging script could report sentences that differ only by a single letter? So admin can decide what to do with it.

{{vm.hiddenReplies[20778] ? 'expand_more' : 'expand_less'}} hide replies show replies
cvbge cvbge November 10, 2014 November 10, 2014 at 12:05:13 AM UTC link Permalink

This is an interesting point. I didn't even know about the % sign rules in other languages.

Differing by one letter is too narrow a solution. What about sentences differing by two letters?

Maybe normalization rules should be defined strictly per language and not employed until confirmed to be correct.

{{vm.hiddenReplies[20780] ? 'expand_more' : 'expand_less'}} hide replies show replies
neron neron November 10, 2014 November 10, 2014 at 1:45:23 AM UTC link Permalink

There is also always possibility to normalize for the sake of comparing sentences, but that that normalization is not carried out as a final version of the sentence. For example, always using plain space in comparisons, but if sentence use non-breaking space, leave it (as a principle to leave more complex one as a chosen example to survive deduplication process, and use simpler as normalized variant). But, it can get messy, that is: it could bring more problems. It is hard job.

{{vm.hiddenReplies[20781] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux November 10, 2014 November 10, 2014 at 4:17:33 PM UTC link Permalink

That would be the best approach in my humble opinion. What kind of problems are you thinking about?

{{vm.hiddenReplies[20784] ? 'expand_more' : 'expand_less'}} hide replies show replies
neron neron November 18, 2014 November 18, 2014 at 4:33:23 PM UTC link Permalink

As usual: finding a balance is the key. What if we got several duplicates, but every one has slightly different issue with it? Which one is a good representation of that group of probable duplicates? Other thing: what if one start with a wrong idea that as a normalization prototype we could remove all the interpunctions and than compare things (we can't) - causing huge falsely identified duplicates... So, what we can use for certain as a duplicate, and what we cant. If only a "space" would be the only problem here, but I guess it would be a good start (and I guess, various sorts of hyphens, multiple consecutive signs like !!!, etc)...

sacredceltic sacredceltic November 15, 2014 November 15, 2014 at 5:43:51 PM UTC link Permalink

The differences between the 2 spaces is one is a normal space and the other is non-breaking space.
I should be using non-breaking spaces before double-points :;?!
But as I explained earlier in a conversation on the wall with Impersonator, non-breaking spaces render as ugly squares on Tatoeba, depending on your operating system and browser.
When Tatoeba comes up with a solution for these non-breaking spaces to look as they should, I'll use them. Meanwhile, I don't want my sentences to look like some cabalistic mess.

{{vm.hiddenReplies[20880] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic November 15, 2014 November 15, 2014 at 5:51:58 PM UTC link Permalink

and to be complete, you can also see some of my sentences with non-breaking spaces, but I wasn't the one to introduce those. At some point in the history of Tatoeba,they were automatically substituted from my standard spaces by some batch procedure.

This substitution should be automatic, at the sentence insertion, if and only if the non-breaking spaces are correctly displayed, regardless of browser and operating system. I don't think it is the case now but I haven't checked for a while. Maybe it was improved. I haven't been informed whether a change took place.
I remember seing these ugly squares on iOS but I can't see them anymore now on the examples you supplied.

sacredceltic sacredceltic November 23, 2014 November 23, 2014 at 1:03:22 PM UTC link Permalink

Now that it is possible to drag-and-drop translations across pages and sentences, without the use of a bookmarklet, I find it much easier to avoid creating duplicates.
That will obviously show in the future.

For those who are not yet accustomed to the procedure, here is how I proceed :

I use a first webpage to search for sentences in language A that I want to translate.
I then use a second webpage, in a different browser's tab, to search for sentences in language B that possibly match those in the first page.
If I find some, I drag each from the second page to the first, across the browser's tabs, down on the new chain-link-icon button above the sentence to translate, and I simply drop it on this button.
It's very comfortable this way and enables to gain precious productivity.

It's not yet possible to drag from the main sentences if you own them (but you can always inverse the order by clicking on a direct translation) or from their blue buttons, but I was told this would come in a next release.

It will be bliss.