clear
{{language.name}} Neniu lingvo trovita.
swap_horiz
{{language.name}} Neniu lingvo trovita.
search

Muro (5 913 fadenoj)

Konsiloj

Antaŭ ol starigi demandon, bonvolu legi la oftajn demandojn.

Ni strebas konservi sanan etoson en civilizitaj diskutoj. Bv. legi niajn regulojn kontraŭ malbona konduto.

Lastaj mesaĝoj subdirectory_arrow_right

morbrorper

antaŭ 16 horoj

subdirectory_arrow_right

deniko

antaŭ 21 horoj

feedback

morbrorper

antaŭ 22 horoj

subdirectory_arrow_right

Ricardo14

hieraŭ

subdirectory_arrow_right

Ricardo14

hieraŭ

subdirectory_arrow_right

Ricardo14

hieraŭ

feedback

samir_t

hieraŭ

subdirectory_arrow_right

TRANG

hieraŭ

subdirectory_arrow_right

yatomoya

antaŭ 2 tagoj

feedback

CK

antaŭ 2 tagoj

Swift Swift 2010-septembro-02 11:21 2010-septembro-02 11:21 link Konstanta ligilo

** Related translations **

Recently I adopted a sentence and added a comment on an alternative way to word it.[1] CK then added a comment asking why I hadn't just added another translation, but seeing how I'd be interested in the community's thoughts on the subject, I decided to reply here.

I've been doing this regularly for the Icelandic sentences (and this is in fact the main reason I set up the cmt-tags[2]). From using Tatoeba, I had found it to be confusing to have several translations to choose from without any annotation as to in which situations these could be used. Rather than just add all the translations I could find, I decided to add only the most fitting one and then describe the alternatives in the comments.

I don't always describe the alternatives terribly well, but try to mention whether words are formal or slang, or whether there is a particular situation in which one would use it.[3] Once we have more qualified links, I'm hoping these could be turned into descriptive relationships between sentences so that multiple related sentence (that are essentially the same) can be collapsed in the search results.

Then again, maybe that should just be a feature of the search.

Either way, I've long been wondering whether I should just add all the variants and make comments with links (in which case it would be super cool if #<number> would add a link[4]).

[1] http://tatoeba.org/eng/sentences/show/15806
[2] http://martin.swift.is/tatoeba/tags.html#meta
[3] http://tatoeba.org/eng/tags/sho...t_on_situation
[4] http://tatoeba.org/eng/wall/show_message/720

{{vm.hiddenReplies[2553] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-02 19:35 2010-septembro-02 19:35 link Konstanta ligilo

There were already several recommendations from Trang to create all the variants as full sentences and I support this.
Who are we to decree what is the main sentence and what is the variant, according to whose definition, and which variant should be used in what context ?
We might have information on usage that we may want to deliver through tags (slang /...) but other people may have different or additional views on the usage of the same sentences and add their own tags accordingly. We don't know what we don't know, do we?
In the future, functionalities enabling us to filter certain tags will enable users to consult the corpus along their own requirements: Slang only, no slang, no sexist slang...

{{vm.hiddenReplies[2571] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-03 12:13 2010-septembro-03 12:13 link Konstanta ligilo

sacredceltic, if you think that I've been choosing "main sentences" as "the accurate one" then you've seriously misunderstood what I meant by "most fitting".

No-one mentioned anything about "decrees" but you'll find a good number of people here giving their opinions to the best of their abilities. Many seem quite well aware of the limits of those abilities.

We are, of course, already "decreeing" what are archaic[1] and unnatural[2] sentences, and which are rude[3] or polite[4]. I should hope that none of us assumes these to be any more objective than comments on usage.

[1] http://tatoeba.org/eng/tags/sho...th_tag/archaic
[2] http://tatoeba.org/eng/tags/sho..._tag/unnatural
[3] http://tatoeba.org/eng/tags/sho..._with_tag/rude
[4] http://tatoeba.org/eng/tags/sho...ith_tag/polite

{{vm.hiddenReplies[2587] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
xtofu80 xtofu80 2010-septembro-03 12:36 2010-septembro-03 12:36 link Konstanta ligilo

I am not sure whether it makes sense to produce too many translation variants. Instead of having ten translation variants for a single sentence, contributers should rather translate ten different sentences. By browsing through different sentences, language learners will then see that there are several ways to translate a sentence, e.g. they would encounter sentences which use "must" and sentences which use "have to". Creating too many translation variants for one sentence is a) boring b) overwhelming, having a long list of translations.
The reason why I do not like it is that when I search for a certain word, say e.g. "umbrella", instead of getting ten nice different sentences in which umbrella is used, I get thousand translation variants, where "must" is replaced by "has to", which has no bearing on the usage of the word umbrella.

An overuse of translation variants will make tatoeba an unwieldy monster, and more problematic, it will be rather useless to browse through.

Another problem is that all those translation variants will again be translated into other languages, creating an exponential number of sentence variants. This all is rather useless in my humble opinion.

{{vm.hiddenReplies[2588] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-03 13:01 2010-septembro-03 13:01 link Konstanta ligilo

I disagree with you, xtofu80. Tatoeba must reflect the richness of languages and I don't see volume as an issue other than merely technical and transitory, if you know Moore's law.
Besides, I don't see a difference between a variant and a different translation. At some point, someone or something will link them if their meaning are close and one single translation among thousands of languages will happen to be identical...Given the number of languages, the probability that 2 close sentences are linked is very high, transforming 2 former distinct sentences and their translations into variants of each other...

{{vm.hiddenReplies[2592] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
xtofu80 xtofu80 2010-septembro-04 00:16 2010-septembro-04 00:16 link Konstanta ligilo

Well, Moore's law is about computer capacity, but not about human capacity. If you are looking for a nice example sentence, but instead of getting 10 nice and distinct ones, you get 500 variants of the same sentence, you don't want to look on page 117 to find the next uniquely distinct sentence.

{{vm.hiddenReplies[2616] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-04 09:12 2010-septembro-04 09:12 link Konstanta ligilo

I think you're exagerating the issue. There won't be 500 variants of the same sentence because they don't exist. In most cases you have 2 or 3 and in extreme cases you'll get a dozen. Not the end of the world...

{{vm.hiddenReplies[2622] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-04 10:23 2010-septembro-04 10:23 link Konstanta ligilo

Exaggeration or not, the point still stands.

I'm not sure how others use Tatoeba, but I often find near-duplicates seriously diluting the usefulness of Tatoeba. I frequently search between Japanese and English, the two languages with the greatest number of sentences, meaning that the effect is the most pronounced there. It's understandable that others, searching between languages with an order of magnitude fewer sentences would not notice this problem of scale.

{{vm.hiddenReplies[2624] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-04 10:33 2010-septembro-04 10:33 link Konstanta ligilo

For all your reasons, I will not refrain myself from producing variants. I'm sorry I still can't see a valid reason to do that.
Your problem is a problem of querying and presentation methods. It should not affect content in any way. What you propose is no less than censorship and is not acceptable.

{{vm.hiddenReplies[2626] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-05 05:12 2010-septembro-05 05:12 link Konstanta ligilo

Sorry, who's asking you to refrain from anything? Actually, let's leave it. This isn't going anywhere fast...

{{vm.hiddenReplies[2670] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-05 09:26 2010-septembro-05 09:26 link Konstanta ligilo

You mean not in your direction? Indeed, that is the very essence of debates, you know...

FeuDRenais FeuDRenais 2010-septembro-04 21:31 2010-septembro-04 21:31 link Konstanta ligilo

(Strangely enough,) I agree with sc here. In my personal opinion (and based on a strong hunch), I don't think any website can expect to gain widespread success without being able to efficiently regulate bad user behavior - even when such behavior is widespread.

In other words, setting user guidelines and saying things like "you should not produce too many variants", "you should not translate into languages other than the ones you're native-level in" etc., will work when the community is still small, like it is now (though still with limited success, as is being witnessed now). As it grows, it'll be naturally harder for the moderators to keep good track of everything and the quality of the personal interaction will be diluted, inevitably.

As sacredceltic says, it is mainly a matter of querying and presentation methods. There needs to be a robust system in place to handle all the inefficiency of the users, because the users will always be inefficient. Unfortunately, this just means more work for the programmers and less for us normal users...

blay_paul blay_paul 2010-septembro-04 10:31 2010-septembro-04 10:31 link Konstanta ligilo

When (if) we get a filter system, you should be able to filter out Japanese sentences that are excluded from WWWJDIC. There are 255 records explicitly removed from those being used for WWWJDIC, many because they are near-duplicates.

Demetrius Demetrius 2010-septembro-03 13:09 2010-septembro-03 13:09 link Konstanta ligilo

> but the system will be organised by
> adding 2 variants of wording as equivalent.
There will be a way to tell that some tag is an alias of another one.

> Instead of having ten translation
> variants for a single sentence,
> contributers should rather translate
> ten different sentences
This does prevent users from using Tatoeba as a phrasebook. ^^

Language learners ususally can change gender or number in a verb ending. But sometimes people are willing to say just one phrase, and to be understood correctly.

Of course, we shouldn’t add dozens of translations for every single sentence. (In fact, we won’t be able to ^^) But concerning the sentences that are the easiest and are often said or spoken, such translation may turn out to be helpful.

Swift Swift 2010-septembro-03 14:50 2010-septembro-03 14:50 link Konstanta ligilo

This was my concern, precisely. I do think that there is worth in all possible translations, but as you pointed out, the current system tends to be overwhelmed.

I think that a more detailed relational database will make things easier, but until we get there we'll do well not to pack the corpus too tightly -- just tightly enough so that we can see the stress points and spot the problems that arise as the corpus grows ;-).

sacredceltic sacredceltic 2010-septembro-03 12:52 2010-septembro-03 12:52 link Konstanta ligilo

I don't understand yout logic at all, since your former point supported comments on usage rather than tags, and the examples you point to do show the exact opposite.
By the way how exactly do you decree that a sentence is "archaïc"? When it has not been used in the last minute? Because "what a clever student/man/teacher/woman...you are!" is a phrase I hear all the time and no later than yesterday in the mouth of a British woman in her 30s...your logic is beyond me, really!

{{vm.hiddenReplies[2589] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-03 13:13 2010-septembro-03 13:13 link Konstanta ligilo

Sorry, what are we talking about now? I pointed out that you seemed to have misunderstood my original post. If you misunderstand the premise, the logic will indeed not make much sense.

I don't know how to explain this to you so how about we just leave it at that?

> By the way how exactly do you decree that a sentence is "archaïc"?

Personally? I don't.

Those who do /probably/ do so the way others decree a sentence to be "rude" (see link above). You might want to ask them.

{{vm.hiddenReplies[2595] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sacredceltic sacredceltic 2010-septembro-03 13:41 2010-septembro-03 13:41 link Konstanta ligilo

Well I do use tags and for a purpose! The examples tagged 'rude' by me and others are so tagged because they are, indeed, considered rude by a vast majority of the population. Applicable tags, as translations, should not be taken for granted and should be the object of debates and moderation. So if a tag is inappropriate, such as this "archaïc" on "what a clever student you are!", it should be removed.

Demetrius Demetrius 2010-septembro-03 09:15 2010-septembro-03 09:15 link Konstanta ligilo

I totally agree.

Demetrius Demetrius 2010-septembro-02 17:57 2010-septembro-02 17:57 link Konstanta ligilo

BTW, I believe 'you have to' and 'you must' is not an alternative wording, since the sense does change... At least according to what I know.

{{vm.hiddenReplies[2561] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
FeuDRenais FeuDRenais 2010-septembro-02 20:30 2010-septembro-02 20:30 link Konstanta ligilo

Personally, the two are close enough for me. "You should" is where things change...

Swift Swift 2010-septembro-03 11:33 2010-septembro-03 11:33 link Konstanta ligilo

I've been wondering about this, and I agree with FeuDRenais: I can't see a difference in meaning. Both imply an obligation.
There is certainly a difference in nuance ('must' /can/ be more formal than 'have to'). Thoughts on this would certainly be useful in the comments to that sentence.

Demetrius Demetrius 2010-septembro-02 17:47 2010-septembro-02 17:47 link Konstanta ligilo

I believe this is not a good solution, because:
a) it is not suitable for use outside of Tatoeba, because comments are not exportable,
b) it's not suitable for beginners since they may be not proficent enough to understand comments
c) it's not suitable for automatic processing, because comments are free-form.

While the first problem can be soloved by adding an 'annotation' field (it was suggested by FeunDRenais earlier, and I believe it would be useful for learners... and for playing games ^^), others still remain.

Of course, this approach also has its downsides, as illustrated by http://tatoeba.org/eng/sentences/show/475445 (It[masculine/feminine/neuter] is (his|hers), isn't it?): having too many sentences may be hard to read, but still I believe it's a better solution.

{{vm.hiddenReplies[2560] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-02 20:26 2010-septembro-02 20:26 link Konstanta ligilo

The sentence you linked to illustrates precisely what I was hoping to avoid. Tagging sentences with genders, numbers and cases is useful, but difficult with complex sentences.

I hadn't really thought of a) and c), thanks. I've mainly been thinking how best to present the information here on Tatoeba -- partly in anticipation of annotation fields and qualified links that might solve the exportability and processing issues.

Reason b) doesn't really worry me as I've left the elementary stuff simple (largely on the form "gender/number: „word“"). My intent was actually to make it easier for readers to use the sentences as It might in fact be harder for beginners were the different genders not distinguished.

At the same time, I guess genders is stuff that can be looked up pretty easily and links to the other gender versions with the "cmt link to related" tag[1] would be pretty easy to convert to qualified links. Hmmm...

[1] http://tatoeba.org/eng/tags/sho...ink_to_related

{{vm.hiddenReplies[2572] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Demetrius Demetrius 2010-septembro-03 07:05 2010-septembro-03 07:05 link Konstanta ligilo

I believe that the problem is that tags can be viewed only after clicking on the sentence page.

I think we need some way to present tags. I.e., it can be done by icons: if a sentence has a female face with an open mouth, it has ‘female’ or ‘said by female’ tags. If a sentence has a male face with an ear icon, it is ‘said to male’, etc.

This will help learners to find the sentence they need somewhat easier. Hovewer, this would still be worse than annotations... :o Actually, I’m not sure what is the best way to handle this.

{{vm.hiddenReplies[2580] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-03 11:28 2010-septembro-03 11:28 link Konstanta ligilo

It is certainly /a/ problem. Another is that the search feature would benefit from knowing the relationship between sentences and thus simplify the search results.

There are two fundamental problems with finding the right sentence: Clutter in the search results and finding the most fitting sentence for a given context.

I see two solutions to mitigate the clutter problem: Group related or near identical sentences together, and a "basket" to collect sentences that one might be interested in (while browsing several pages worth of results).

For the situational problem, I think extra information in the search results view is useful but it'd quickly become overcrowded. An annotation link that would pop out a little frame might be useful.

It'd be interesting to see various solutions to this problem once the API is established.

boracasli boracasli 2010-septembro-02 18:52 2010-septembro-02 18:52 link Konstanta ligilo

But I don't know the launchpad threads
What OS that you and Trang use?
If you or Trang use Linux, what distribution?

boracasli boracasli 2010-septembro-02 18:37 2010-septembro-02 18:37 link Konstanta ligilo

What should I not do in Launchpad?
I'm using Google Translate in translating Ubuntu.
But I hate Launchpad.
Mark Shuttleworth can know afrikaans because he was born in Welkom, an afrikaans speaking region.
I can translate Ubuntu to any language with Google Translate?

{{vm.hiddenReplies[2566] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sysko sysko 2010-septembro-02 18:40 2010-septembro-02 18:40 link Konstanta ligilo

http://ubuntuforums.org/ for questions about ubuntu:)

{{vm.hiddenReplies[2567] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
boracasli boracasli 2010-septembro-02 18:43 2010-septembro-02 18:43 link Konstanta ligilo

But I don't know the launchpad threads
What OS that you and Trang use?

boracasli boracasli 2010-septembro-02 18:44 2010-septembro-02 18:44 link Konstanta ligilo

If you use Linux, what distribution?

boracasli boracasli 2010-septembro-02 18:35 2010-septembro-02 18:35 link Konstanta ligilo

http://tatoeba.org/eng/sentences/show/483463
Please delete this Icelandic sentence, please do not delete the english and turkish ones.

carlos1516 carlos1516 2010-septembro-02 13:57 2010-septembro-02 13:57 link Konstanta ligilo

Espero ser util y que me sea de utilidad este proyecto.
Gracias por ayudarnos a comprendernos mejor como seres humanos.

{{vm.hiddenReplies[2554] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Pharamp Pharamp 2010-septembro-02 16:11 2010-septembro-02 16:11 link Konstanta ligilo

Bienvenidooooo, Carlos! :D :D

boracasli boracasli 2010-septembro-02 14:44 2010-septembro-02 14:44 link Konstanta ligilo

When Lithuanian, Thai, Quechua, Croatian, Bosnian and Azerbaijani added?

{{vm.hiddenReplies[2555] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sysko sysko 2010-septembro-02 14:53 2010-septembro-02 14:53 link Konstanta ligilo

For Quechua and Azerbaijani, they will be added when we will have enough sentences which DO NOT come from copyrighted sources. For the others, they will be added when I will have fix something in the database. Moreover in the future can you avoid add sentences in language you do not speak. Tatoeba goal is to offer natural sentences, added by native, or people who can be considered as native by natives. The goal is not to have million of sentences in hundreds languages if this mean "machine translation" quality, because then tatoeba will be less useful than google translate or other automatic translation website.

{{vm.hiddenReplies[2557] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
boracasli boracasli 2010-septembro-02 18:30 2010-septembro-02 18:30 link Konstanta ligilo

when the new languages added?
in latest two week you do not added any language.

{{vm.hiddenReplies[2562] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
sysko sysko 2010-septembro-02 18:32 2010-septembro-02 18:32 link Konstanta ligilo

It seems that neither me nor Trang are paid to do this, which mean we're doing this on our freetime, if we are busy in our private life, or solving bugs, then we have less time to implement things. Nothing more to add to this.

{{vm.hiddenReplies[2563] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
boracasli boracasli 2010-septembro-02 18:37 2010-septembro-02 18:37 link Konstanta ligilo

What should I not do in Launchpad?
I'm using Google Translate in translating Ubuntu.
But I hate Launchpad.
Mark Shuttleworth can know afrikaans because he was born in Welkom, an afrikaans speaking region.
I can translate Ubuntu to any language with Google Translate?

{{vm.hiddenReplies[2565] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-02 20:48 2010-septembro-02 20:48 link Konstanta ligilo

> I can translate Ubuntu to any language with Google Translate?

No, don't translate anything with Google Translate! Don't trust a machine to translate human languages any more than you'd trust a person to translate a file from one binary format to another.

Google Translate can be a very useful crutch, but it's just that and nothing more.

Pharamp Pharamp 2010-septembro-02 20:50 2010-septembro-02 20:50 link Konstanta ligilo

No, you must not.
Google Translator isn't a human, and it's extremely bad to use it everywhere, here on Tatoeba, or in Launchpad.
But as you're a Turkish native speaker, you can use your own skills to translate Ubuntu in Turkish.

Pharamp Pharamp 2010-septembro-02 20:53 2010-septembro-02 20:53 link Konstanta ligilo

As I said you some days ago, you should wait, as everyone is doing now.
Languages will be available as soon as possible - and this doesn't indicate a day, a month or a year.
Just be patient please.

Swift Swift 2010-septembro-01 18:51 2010-septembro-01 18:51 link Konstanta ligilo

** Problem playing sound files **

When I click on the icon to play sound files on sentences that have them, my browser complains about missing plugins. The onclick event on the anchor tag is set to "return false", disabling the link, but I couldn't see how it triggered the plugin.

However it's done, is this a feature to bypass the browser querying the user for a program to handle the file? Any chance I can convince our dear devs that it's a bug?

Sound file example: http://tatoeba.org/eng/sentences/show/333392

{{vm.hiddenReplies[2541] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
CK CK 2010-septembro-02 02:18, modifita 2019-oktobro-26 04:03 2010-septembro-02 02:18, modifita 2019-oktobro-26 04:03 link Konstanta ligilo

[not needed anymore- removed by CK]

{{vm.hiddenReplies[2549] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
Swift Swift 2010-septembro-02 02:43 2010-septembro-02 02:43 link Konstanta ligilo

Actually, I had a quick chat with sysko on IRC. Tatoeba creates an object which my browser doesn't know how to handle. The audio file link works fine for me, but the link is disabled with the onclick event.

Sysko furthermore said that they were going to move to use the audio tag. I threw together an an example implementation that can be found here:
http://martin.swift.is/tatoeba/audio.html

sysko sysko 2010-septembro-02 09:46 2010-septembro-02 09:46 link Konstanta ligilo

The forbidden error is due to a problem which occurs yesterday soon after I leave home for a diner (murphy law I hate you)

aandrusiak aandrusiak 2010-septembro-01 22:27 2010-septembro-01 22:27 link Konstanta ligilo

Why aren't Tatoeba phrases indexed by Google? Can't find them by googling.

{{vm.hiddenReplies[2544] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
CK CK 2010-septembro-02 02:07, modifita 2019-oktobro-26 04:04 2010-septembro-02 02:07, modifita 2019-oktobro-26 04:04 link Konstanta ligilo

[not needed anymore- removed by CK]

{{vm.hiddenReplies[2547] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
aandrusiak aandrusiak 2010-septembro-02 08:54 2010-septembro-02 08:54 link Konstanta ligilo

Cool!!
thx

blay_paul blay_paul 2010-septembro-01 22:36 2010-septembro-01 22:36 link Konstanta ligilo

Actually they are, but obviously they aren't all indexed instantly.

saeb saeb 2010-septembro-01 22:48 2010-septembro-01 22:48 link Konstanta ligilo

I think they are, but a lot of the sentences here are very generic so searching for them using google will return millions of results...and tatoeba mostly likely won't be on the first few hundred pages

but you can set up the right parameters for google to find pages on tatoeba, for example
http://www.google.com/#hl=en&sa...3f5060d8dfcee5

aandrusiak aandrusiak 2010-septembro-01 11:02 2010-septembro-01 11:02 link Konstanta ligilo

There are so many doubled phrases! Wpuld there be a way to fix it?

{{vm.hiddenReplies[2537] ? 'expand_more' : 'expand_less'}} kaŝi la respondojn montri la respondojn
blay_paul blay_paul 2010-septembro-01 11:32 2010-septembro-01 11:32 link Konstanta ligilo

There are a lot of near-duplicates. Especially in the English / Japanese sentences. However duplicates are only removed when they are exact duplicates. This is a policy decision based on the usefulness for language analysis tools, among other reasons.

In the long term it may be possible to tag near duplicates and optionally filter the data to not show them.

sysko sysko 2010-septembro-01 11:33 2010-septembro-01 11:33 link Konstanta ligilo

In a specific language? In normal time we have a script we run from time to time which merge duplicate, as for ressource reason, we can't have a real time check. But I don't think it's more than a hundred of duplicate, over the 500 000 sentences.

adjusting adjusting 2010-septembro-01 05:08 2010-septembro-01 05:08 link Konstanta ligilo

Someone might want to look into the mess that doanhuong made.