menu
Tatoeba
language
Magrehistro Pumasok
language Tagalog
menu
Tatoeba

chevron_right Magrehistro

chevron_right Pumasok

Magtingin-tingin

chevron_right Show random sentence

chevron_right Magtingin-tingin ayon sa wika

chevron_right Magtingin-tingin ayon sa talaan

chevron_right Magtingin-tingin ayon sa etiketa

chevron_right Magtingin-tingin ng audio

Pamayanan

chevron_right Wall

chevron_right Talaan ng lahat ng mga kasapi

chevron_right Wika ng mga kasapi

chevron_right Mga katutubong tagapagsalita

search
clear
swap_horiz
search
xtofu80 xtofu80 Hunyo 3, 2010 Hunyo 3, 2010 nang 8:12:20 AM UTC flag Report link Permakawing

I am not sure whether this is worth discussing, but there are some sentences which are really redundant, e.g.
162883, 83091, two rather long sentences which only differ in the subject being "my mom" vs. "my dad".
Shouldn't we remove one of such pairs and concentrate on the gist instead of wasting our efforts on translating countless variants?

{{vm.hiddenReplies[1085] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
blay_paul blay_paul Hunyo 3, 2010 Hunyo 3, 2010 nang 9:24:01 AM UTC flag Report link Permakawing

> Shouldn't we remove one of such pairs and concentrate on
> the gist instead of wasting our efforts on translating
> countless variants?

There is a constant effort to remove near - duplicates. At the current rate we're probably losing a couple of dozen a week, if not more.

However removing duplicates does not produce _new_ content. And new content is what's needed to fill out Tatoeba and make it more appealing.

{{vm.hiddenReplies[1086] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
xtofu80 xtofu80 Hunyo 3, 2010 Hunyo 3, 2010 nang 10:06:29 AM UTC flag Report link Permakawing

Yes, you are right, producing new content is also important, though I as a native German speaker am right now mostly busy with adding German translations to the already existing Jap-Eng. sentence pairs. And that's when I came across these near-duplicates.
Currently I am thinking about how I could involve my Japanese language exchange partner to produce some content. At least, I will check with her some sentences I found dubious.

So how would be the best procedure if I come across such a sentence pair? Make a comment? Add it to the "mark for deletion" list?

sysko sysko Hunyo 3, 2010 Hunyo 3, 2010 nang 10:34:31 AM UTC flag Report link Permakawing

In an other side I'm working with an other guy on a machine-learning based automated translator, and this kind of "near" duplicate sentences are REALLY usefull

{{vm.hiddenReplies[1092] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
sysko sysko Hunyo 3, 2010 Hunyo 3, 2010 nang 10:37:24 AM UTC flag Report link Permakawing

in fact as a learner I also like to find sometimes this kind of sentences where only a part change, it's easier to see some grammar point this way (because for example in French sentences changing a "my mom" by "my dad" could change the verbs / adjectiv and so in the sentences, which is always interesting to see this variation on the same sentence)

{{vm.hiddenReplies[1093] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
Swift Swift Hunyo 3, 2010 Hunyo 3, 2010 nang 5:23:29 PM UTC flag Report link Permakawing

On this point, I've chosen to add these nuances in comments. There are otherwise just going to be way too many similar sentences.

sysko sysko Hunyo 3, 2010 Hunyo 3, 2010 nang 10:44:48 AM UTC flag Report link Permakawing

moreover I think here the problem is not to have or not this countless variant (for the reasons below I would prefer to keep them), but rather "how to show to contributors only 'usefull' sentences"

TRANG TRANG Hunyo 11, 2010 Hunyo 11, 2010 nang 7:08:29 PM UTC flag Report link Permakawing

Okay I haven't replied to this yet so I will, to make it clear about "variations" of sentences.

Our position is: people can do whatever they like. If they want to add all the possible variations, they can. If they don't want to, they don't have to.

It doesn't hurt to have "near duplicates". It just make Tatoeba a bit noisy. But that's our job, as engineers, to figure out how to filter and organize data so that it can be used efficiently for language learners.

Meanwhile, as sysko said, variations of sentences can be very useful for language processing, so we shouldn't delete them.

{{vm.hiddenReplies[1237] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
blay_paul blay_paul Hunyo 11, 2010 Hunyo 11, 2010 nang 7:43:26 PM UTC flag Report link Permakawing

Just to clarify the clarification. Near duplicates will be removed from WWWJDIC - but not by deleting them from Tatoeba. So feel free to point out Japanese sentences and English sentences linked to Japanese sentences that are near duplicates.

{{vm.hiddenReplies[1238] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
xtofu80 xtofu80 Hunyo 19, 2010 Hunyo 19, 2010 nang 1:26:52 PM UTC flag Report link Permakawing

Hi Paul, I saw you always post a comment "Not for WWWJDIC" in each sentence. Shouldn't that be solved by using tags?

{{vm.hiddenReplies[1315] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
blay_paul blay_paul Hunyo 19, 2010 Hunyo 19, 2010 nang 1:33:44 PM UTC flag Report link Permakawing

I could, but I started doing that before tags existed.

It also gives people a chance to notice what sentences I'm excluding and ask why (or just complain ;-).

{{vm.hiddenReplies[1316] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
xtofu80 xtofu80 Hunyo 19, 2010 Hunyo 19, 2010 nang 1:53:38 PM UTC flag Report link Permakawing

So do you filter the sentences according to your comment, or do you mark them somewhere else AND put a comment in?
I just want to know how we should approach sentences we find should not appear there (e.g. hiragana-kanji variants of exactly the same sentence.)

{{vm.hiddenReplies[1317] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
blay_paul blay_paul Hunyo 19, 2010 Hunyo 19, 2010 nang 2:08:01 PM UTC flag Report link Permakawing

In the secret sentence annotation page, where the Japanese index can be entered / edited, I put -1 in the meaning field.

No one else can see that so the note is just to let people know what I'm doing (generally excluding near-duplicate sentences from WWWJDIC).

CK CK Hunyo 12, 2010, binago noong noong Oktubre 25, 2019 Hunyo 12, 2010 nang 3:31:00 AM UTC, binago noong Oktubre 25, 2019 nang 8:09:42 AM UTC flag Report link Permakawing

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1239] ? 'expand_more' : 'expand_less'}} itago ang mga tugon ipakita ang mga tugon
xtofu80 xtofu80 Hunyo 12, 2010 Hunyo 12, 2010 nang 10:43:16 AM UTC flag Report link Permakawing

Hi CK,
I completely agree with your notion of near duplicates versus clutter.
I think that besides "dealing" with clutter that already exists, we should also put some effort into guidelines about creating new content.