menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Swift Swift September 2, 2010 September 2, 2010 at 11:21:11 AM UTC link Permalink

** Related translations **

Recently I adopted a sentence and added a comment on an alternative way to word it.[1] CK then added a comment asking why I hadn't just added another translation, but seeing how I'd be interested in the community's thoughts on the subject, I decided to reply here.

I've been doing this regularly for the Icelandic sentences (and this is in fact the main reason I set up the cmt-tags[2]). From using Tatoeba, I had found it to be confusing to have several translations to choose from without any annotation as to in which situations these could be used. Rather than just add all the translations I could find, I decided to add only the most fitting one and then describe the alternatives in the comments.

I don't always describe the alternatives terribly well, but try to mention whether words are formal or slang, or whether there is a particular situation in which one would use it.[3] Once we have more qualified links, I'm hoping these could be turned into descriptive relationships between sentences so that multiple related sentence (that are essentially the same) can be collapsed in the search results.

Then again, maybe that should just be a feature of the search.

Either way, I've long been wondering whether I should just add all the variants and make comments with links (in which case it would be super cool if #<number> would add a link[4]).

[1] http://tatoeba.org/eng/sentences/show/15806
[2] http://martin.swift.is/tatoeba/tags.html#meta
[3] http://tatoeba.org/eng/tags/sho...t_on_situation
[4] http://tatoeba.org/eng/wall/show_message/720

{{vm.hiddenReplies[2553] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius September 2, 2010 September 2, 2010 at 5:47:12 PM UTC link Permalink

I believe this is not a good solution, because:
a) it is not suitable for use outside of Tatoeba, because comments are not exportable,
b) it's not suitable for beginners since they may be not proficent enough to understand comments
c) it's not suitable for automatic processing, because comments are free-form.

While the first problem can be soloved by adding an 'annotation' field (it was suggested by FeunDRenais earlier, and I believe it would be useful for learners... and for playing games ^^), others still remain.

Of course, this approach also has its downsides, as illustrated by http://tatoeba.org/eng/sentences/show/475445 (It[masculine/feminine/neuter] is (his|hers), isn't it?): having too many sentences may be hard to read, but still I believe it's a better solution.

{{vm.hiddenReplies[2560] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift September 2, 2010 September 2, 2010 at 8:26:46 PM UTC link Permalink

The sentence you linked to illustrates precisely what I was hoping to avoid. Tagging sentences with genders, numbers and cases is useful, but difficult with complex sentences.

I hadn't really thought of a) and c), thanks. I've mainly been thinking how best to present the information here on Tatoeba -- partly in anticipation of annotation fields and qualified links that might solve the exportability and processing issues.

Reason b) doesn't really worry me as I've left the elementary stuff simple (largely on the form "gender/number: „word“"). My intent was actually to make it easier for readers to use the sentences as It might in fact be harder for beginners were the different genders not distinguished.

At the same time, I guess genders is stuff that can be looked up pretty easily and links to the other gender versions with the "cmt link to related" tag[1] would be pretty easy to convert to qualified links. Hmmm...

[1] http://tatoeba.org/eng/tags/sho...ink_to_related

{{vm.hiddenReplies[2572] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius September 3, 2010 September 3, 2010 at 7:05:34 AM UTC link Permalink

I believe that the problem is that tags can be viewed only after clicking on the sentence page.

I think we need some way to present tags. I.e., it can be done by icons: if a sentence has a female face with an open mouth, it has ‘female’ or ‘said by female’ tags. If a sentence has a male face with an ear icon, it is ‘said to male’, etc.

This will help learners to find the sentence they need somewhat easier. Hovewer, this would still be worse than annotations... :o Actually, I’m not sure what is the best way to handle this.

{{vm.hiddenReplies[2580] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift September 3, 2010 September 3, 2010 at 11:28:01 AM UTC link Permalink

It is certainly /a/ problem. Another is that the search feature would benefit from knowing the relationship between sentences and thus simplify the search results.

There are two fundamental problems with finding the right sentence: Clutter in the search results and finding the most fitting sentence for a given context.

I see two solutions to mitigate the clutter problem: Group related or near identical sentences together, and a "basket" to collect sentences that one might be interested in (while browsing several pages worth of results).

For the situational problem, I think extra information in the search results view is useful but it'd quickly become overcrowded. An annotation link that would pop out a little frame might be useful.

It'd be interesting to see various solutions to this problem once the API is established.

Demetrius Demetrius September 2, 2010 September 2, 2010 at 5:57:04 PM UTC link Permalink

BTW, I believe 'you have to' and 'you must' is not an alternative wording, since the sense does change... At least according to what I know.

{{vm.hiddenReplies[2561] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais September 2, 2010 September 2, 2010 at 8:30:21 PM UTC link Permalink

Personally, the two are close enough for me. "You should" is where things change...

Swift Swift September 3, 2010 September 3, 2010 at 11:33:17 AM UTC link Permalink

I've been wondering about this, and I agree with FeuDRenais: I can't see a difference in meaning. Both imply an obligation.
There is certainly a difference in nuance ('must' /can/ be more formal than 'have to'). Thoughts on this would certainly be useful in the comments to that sentence.

sacredceltic sacredceltic September 2, 2010 September 2, 2010 at 7:35:14 PM UTC link Permalink

There were already several recommendations from Trang to create all the variants as full sentences and I support this.
Who are we to decree what is the main sentence and what is the variant, according to whose definition, and which variant should be used in what context ?
We might have information on usage that we may want to deliver through tags (slang /...) but other people may have different or additional views on the usage of the same sentences and add their own tags accordingly. We don't know what we don't know, do we?
In the future, functionalities enabling us to filter certain tags will enable users to consult the corpus along their own requirements: Slang only, no slang, no sexist slang...

{{vm.hiddenReplies[2571] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius September 3, 2010 September 3, 2010 at 9:15:35 AM UTC link Permalink

I totally agree.

Swift Swift September 3, 2010 September 3, 2010 at 12:13:53 PM UTC link Permalink

sacredceltic, if you think that I've been choosing "main sentences" as "the accurate one" then you've seriously misunderstood what I meant by "most fitting".

No-one mentioned anything about "decrees" but you'll find a good number of people here giving their opinions to the best of their abilities. Many seem quite well aware of the limits of those abilities.

We are, of course, already "decreeing" what are archaic[1] and unnatural[2] sentences, and which are rude[3] or polite[4]. I should hope that none of us assumes these to be any more objective than comments on usage.

[1] http://tatoeba.org/eng/tags/sho...th_tag/archaic
[2] http://tatoeba.org/eng/tags/sho..._tag/unnatural
[3] http://tatoeba.org/eng/tags/sho..._with_tag/rude
[4] http://tatoeba.org/eng/tags/sho...ith_tag/polite

{{vm.hiddenReplies[2587] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 September 3, 2010 September 3, 2010 at 12:36:06 PM UTC link Permalink

I am not sure whether it makes sense to produce too many translation variants. Instead of having ten translation variants for a single sentence, contributers should rather translate ten different sentences. By browsing through different sentences, language learners will then see that there are several ways to translate a sentence, e.g. they would encounter sentences which use "must" and sentences which use "have to". Creating too many translation variants for one sentence is a) boring b) overwhelming, having a long list of translations.
The reason why I do not like it is that when I search for a certain word, say e.g. "umbrella", instead of getting ten nice different sentences in which umbrella is used, I get thousand translation variants, where "must" is replaced by "has to", which has no bearing on the usage of the word umbrella.

An overuse of translation variants will make tatoeba an unwieldy monster, and more problematic, it will be rather useless to browse through.

Another problem is that all those translation variants will again be translated into other languages, creating an exponential number of sentence variants. This all is rather useless in my humble opinion.

{{vm.hiddenReplies[2588] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic September 3, 2010 September 3, 2010 at 1:01:35 PM UTC link Permalink

I disagree with you, xtofu80. Tatoeba must reflect the richness of languages and I don't see volume as an issue other than merely technical and transitory, if you know Moore's law.
Besides, I don't see a difference between a variant and a different translation. At some point, someone or something will link them if their meaning are close and one single translation among thousands of languages will happen to be identical...Given the number of languages, the probability that 2 close sentences are linked is very high, transforming 2 former distinct sentences and their translations into variants of each other...

{{vm.hiddenReplies[2592] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 September 4, 2010 September 4, 2010 at 12:16:51 AM UTC link Permalink

Well, Moore's law is about computer capacity, but not about human capacity. If you are looking for a nice example sentence, but instead of getting 10 nice and distinct ones, you get 500 variants of the same sentence, you don't want to look on page 117 to find the next uniquely distinct sentence.

{{vm.hiddenReplies[2616] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic September 4, 2010 September 4, 2010 at 9:12:31 AM UTC link Permalink

I think you're exagerating the issue. There won't be 500 variants of the same sentence because they don't exist. In most cases you have 2 or 3 and in extreme cases you'll get a dozen. Not the end of the world...

{{vm.hiddenReplies[2622] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift September 4, 2010 September 4, 2010 at 10:23:05 AM UTC link Permalink

Exaggeration or not, the point still stands.

I'm not sure how others use Tatoeba, but I often find near-duplicates seriously diluting the usefulness of Tatoeba. I frequently search between Japanese and English, the two languages with the greatest number of sentences, meaning that the effect is the most pronounced there. It's understandable that others, searching between languages with an order of magnitude fewer sentences would not notice this problem of scale.

{{vm.hiddenReplies[2624] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul September 4, 2010 September 4, 2010 at 10:31:12 AM UTC link Permalink

When (if) we get a filter system, you should be able to filter out Japanese sentences that are excluded from WWWJDIC. There are 255 records explicitly removed from those being used for WWWJDIC, many because they are near-duplicates.

sacredceltic sacredceltic September 4, 2010 September 4, 2010 at 10:33:04 AM UTC link Permalink

For all your reasons, I will not refrain myself from producing variants. I'm sorry I still can't see a valid reason to do that.
Your problem is a problem of querying and presentation methods. It should not affect content in any way. What you propose is no less than censorship and is not acceptable.

{{vm.hiddenReplies[2626] ? 'expand_more' : 'expand_less'}} hide replies show replies
FeuDRenais FeuDRenais September 4, 2010 September 4, 2010 at 9:31:09 PM UTC link Permalink

(Strangely enough,) I agree with sc here. In my personal opinion (and based on a strong hunch), I don't think any website can expect to gain widespread success without being able to efficiently regulate bad user behavior - even when such behavior is widespread.

In other words, setting user guidelines and saying things like "you should not produce too many variants", "you should not translate into languages other than the ones you're native-level in" etc., will work when the community is still small, like it is now (though still with limited success, as is being witnessed now). As it grows, it'll be naturally harder for the moderators to keep good track of everything and the quality of the personal interaction will be diluted, inevitably.

As sacredceltic says, it is mainly a matter of querying and presentation methods. There needs to be a robust system in place to handle all the inefficiency of the users, because the users will always be inefficient. Unfortunately, this just means more work for the programmers and less for us normal users...

Swift Swift September 5, 2010 September 5, 2010 at 5:12:29 AM UTC link Permalink

Sorry, who's asking you to refrain from anything? Actually, let's leave it. This isn't going anywhere fast...

{{vm.hiddenReplies[2670] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic September 5, 2010 September 5, 2010 at 9:26:36 AM UTC link Permalink

You mean not in your direction? Indeed, that is the very essence of debates, you know...

Demetrius Demetrius September 3, 2010 September 3, 2010 at 1:09:16 PM UTC link Permalink

> but the system will be organised by
> adding 2 variants of wording as equivalent.
There will be a way to tell that some tag is an alias of another one.

> Instead of having ten translation
> variants for a single sentence,
> contributers should rather translate
> ten different sentences
This does prevent users from using Tatoeba as a phrasebook. ^^

Language learners ususally can change gender or number in a verb ending. But sometimes people are willing to say just one phrase, and to be understood correctly.

Of course, we shouldn’t add dozens of translations for every single sentence. (In fact, we won’t be able to ^^) But concerning the sentences that are the easiest and are often said or spoken, such translation may turn out to be helpful.

Swift Swift September 3, 2010 September 3, 2010 at 2:50:44 PM UTC link Permalink

This was my concern, precisely. I do think that there is worth in all possible translations, but as you pointed out, the current system tends to be overwhelmed.

I think that a more detailed relational database will make things easier, but until we get there we'll do well not to pack the corpus too tightly -- just tightly enough so that we can see the stress points and spot the problems that arise as the corpus grows ;-).

sacredceltic sacredceltic September 3, 2010 September 3, 2010 at 12:52:09 PM UTC link Permalink

I don't understand yout logic at all, since your former point supported comments on usage rather than tags, and the examples you point to do show the exact opposite.
By the way how exactly do you decree that a sentence is "archaïc"? When it has not been used in the last minute? Because "what a clever student/man/teacher/woman...you are!" is a phrase I hear all the time and no later than yesterday in the mouth of a British woman in her 30s...your logic is beyond me, really!

{{vm.hiddenReplies[2589] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift September 3, 2010 September 3, 2010 at 1:13:43 PM UTC link Permalink

Sorry, what are we talking about now? I pointed out that you seemed to have misunderstood my original post. If you misunderstand the premise, the logic will indeed not make much sense.

I don't know how to explain this to you so how about we just leave it at that?

> By the way how exactly do you decree that a sentence is "archaïc"?

Personally? I don't.

Those who do /probably/ do so the way others decree a sentence to be "rude" (see link above). You might want to ask them.

{{vm.hiddenReplies[2595] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic September 3, 2010 September 3, 2010 at 1:41:40 PM UTC link Permalink

Well I do use tags and for a purpose! The examples tagged 'rude' by me and others are so tagged because they are, indeed, considered rude by a vast majority of the population. Applicable tags, as translations, should not be taken for granted and should be the object of debates and moderation. So if a tag is inappropriate, such as this "archaïc" on "what a clever student you are!", it should be removed.