Wall (465 threads)

12345 >>>
  1. Happy Easter!
    Feliz Páscoa!
    ¡Feliz Pascua!
    Frohe Ostern!
    Joyeuses Pâques!
    Buona Pasqua!
    Kαλό Πάσχα!
    Vrolijk Pasen!
    Счастливой Пасхи!
    Mutlu Paskalyalar.
    Честит Великден!
    Sretan Uskrs!
    • #
    • CK
    • CK
    • 5 hour(s) ago - edited 5 hour(s) ago
    ** Duplicate Stats (April 19, 2014) **

    Total Number of Duplicates
    32,808 exact duplicates

    Top 10 Languages with Duplicates
    5,325 = epo
    4,608 = tur
    4,397 = rus
    4,099 = ita
    3,214 = eng
    1,843 = fra
    1,811 = ber
    1,746 = deu
    1,317 = spa
    1,193 = por

    Top 10 Members Owning (the higher-numbered) Duplicates
    4,411 = Guybrush88
    3,693 = duran
    2,964 = nimfeo
    2,243 = Amastan
    1,767 = marafon
    906 = GrizaLeono
    640 = odexed
    599 = alexmarcelo
    566 = Pfirsichbaeumchen
    493 = learnaspossible
    • Thanks for the insightful stats.
    • Do you have such statistics for every user?
    • I gained notoriety :D
  2. Ok, so here I am with another question.

    Suppose I am translating the sentence "How are you?" (let is be sentence E1) to Bengali. There would be three possible translations:
    B1 - আপনি কেমন আছেন? (Polite)
    B2 - তুমি কেমন আছো? (Familiar)
    B3 - তুই কেমন আছিস? (Very Familiar)

    Now, suppose E1 has also been translated to Spanish, where the possible translations would be (if I am not mistaken):
    S1 - ¿Cómo está usted?
    S2 - ¿Cómo estás (tú)?

    So, B1, B2 and B3 as well as S1 and S2 are directly linked to E1.

    Now, suppose someone wants to see the translation of S1 in Bengali. Here is where the problem arises, because all the above mentioned Bengali sentences, i.e B1, B2 and B3 would be shown as the translation, whereas only B1 would be the proper translation of S1.

    So my question is, how could this problem be solved? Since I know a bit of Spanish, I could link S1 to B1, but what about other languages which have different pronouns for different level of politeness?

    P.S.- The example above has been taken from the "How to be a good contributor" post on the Tatoeba blog.

    --- Tanay
    • If the sentences exist and have already been linked correctly, B1, B2, and B3 will all be shown as linked to S1, but only B1 will be shown as directly linked to S1 (with a green arrow), while the other two will be shown as indirectly linked to S1 (with a gray arrow). We tell users that they should only trust direct links.

      • But I need to link S1 to B1, only then will there be a direct linking, isn't it? But what about, say, Italian which has, I think two forms of "you"? I don't know Italian, so I won't be able to link B1 to any of the Italian sentences. So an Italian sentence in the formal form, linked directly to the E1, would show B1, B2 and B3 as the translations of that Italian sentence, albeit an indirect one.
        • Actually Italian has three forms of "you": singular informal, singular polite, and plural. Also, there is a special plural polite form which is used rarely now, as I know.

          Yes, it's a kind of problem. For example, Russian and Italian both have masculine and feminine forms of adjectives, but English doesn't have them. So, at first, when I sow Italian feminine form as an indirect translation of Russian masculine form (related to English), I considered it as a "mistake". It takes some time to understand, how it works. :)
          • Thanks. That's what I was talking about, you need to know both Italian and Russian to even know that there is a "mistake".
    • Something that needs to be made clearer is that by policy, only Directly linked translations are to be treated as 'official' translations. Indirectly linked translations should be trusted about as much as machine translations, possibly even less. Therefore, if B1, B2 and S3 show up, none of these are to be considered a translation of S1 or S2 until a user fluent in both languages constructs links between them.

      As for your question about Italian, it must also be kept in mind that it is sound policy to only translate sentences or link them if you are absolutely sure of their meaning. In your hypothetical case, it may be best to leave the sentences be until a user fluent in Bengali and Italian can link sentences as appropriate.
      • ধন্যবাদ।
  3. Áldott Húsvéti Ünnepeket kívánok mindenkinek!
  4. Is it difficult to implement auto-search of duplicates before adding a new sentence?
    Stackoverflow has a feature like this: before your question is posted, existing questions are searched by their content for resemblance and if any are found they are displayed for the user.

    On Tatoeba, a list of duplicates could be shown for the user with the option to link an existing sentence instead of adding a new one which (s)he has just written.

    Sorry if this question was asked here before, I don't know how to search the Wall.
    • There is no real search for Wall, and quite a lot of Wall messages disappeared in the past due to a server crash.

      The auto-search was suggested several times before, but a normal auto-search (that will not just count duplicates, but also near-duplicates) is likely to require better hardware.

      Also, Tatoeba has not enough programmers. If you are a programmer and want to help implementing this feature, please contact AlanF_US (also, see http://tatoeba.org/eng/wall/sho...#message_18575 and http://tatoeba.org/eng/wall/sho...#message_18631 ).
    • duplicates will eventually get merged, and typing a sentence for search or for insert takes the same time anyway...
  5. It's just me or the function "auto-detect" isn't working?...
      • #
      • CK
      • CK
      • 3 day(s) ago
      It hasn't been working correctly for a while.
      This happens from time to time.

      When "auto detect" isn't working, if you manually choose your language on the next contribution, then a cookie gets set and you won't have to manually set it again.
    • #
    • tanay
    • tanay
    • 6 day(s) ago - edited 6 day(s) ago
    Hello Tatoebans, how is everyone doing?
    I have a question regarding translating. I don't know whether a similar question has been answered before, so here I go.

    I am currently translating sentences to Bengali. I am a bit confused about sentences containing different grammatical persons because of the fact that Bengali verbs are inflected for person and honour.

    So, while translating a sentence like this:
    I think he will be glad to see you. (http://tatoeba.org/eng/sentences/show/71169)
    the equivalent Bengali translations would be:
    আমার মনে হয় সে তোমাকে দেখে খুশি হবে।
    আমার মনে হয় তিনি তোমাকে দেখে খুশি হবেন।
    আমার মনে হয় ও তোমাকে দেখে খুশি হবে।
    আমার মনে হয় উনি তোমাকে দেখে খুশি হবেন।

    This gets even complicated for some other sentences.

    So my question is do I add all the possible translations (which could be up to 6 sentences in Bengali)?

    Thank you in advance.

    --- Tanay
    • Yes, you can add as many versions of translation, as you want.
    • Absolutely! You should add as many translations as you can. I've seen some English sentences with a dozen (or two) translations in Italian or French.
    • yes, you can add every translation you want if you see a sentence that can be said in different ways in another language. i'm doing this with italian. every time i see a sentence that can be said in different ways in italian, i add all the possible versions i come up with
    • Thanks a lot for your suggestions. :)
    • Ok, so here is a more complicated translation. This English sentence (http://tatoeba.org/eng/sentences/show/15832) could be translated in 8 different ways in Bengali, they being different from one another by one or two words!! I have given all the translations, so please have a look and tell whether that is a good thing to do. I am still confused about what to do. :(
      • That is the right thing to do.
        Tatoeba's Graph representation is useful in modelling inter- or intra-language subtle relations.

        For example, you could also link all Bengali sentences you may use in the same context (formality, gender, number, etc.) to each other. And even with more languages.

        The rule of thumb is "would this be a right thing to say in this precise context and meaning?". This way, even word-to-word matches aren't so significant.
      • yes, that's the best thing to do, in my opinion. i find it quite useful to add the possible ways to say a single sentence, if it might change according to the context
      • Thanks again for your replies.
      • I absolutely know that sheepish feeling when you add several translations and wonder if maybe you're doing something wrong. But don't worry, you can link a million translations if the translations themselves are correct.
  6. Has Tatoeba has been affected by the Heartbleed bug and should we change our passwords?

    This sentence i added: http://tatoeba.org/ita/sentences/show/3173766
    got manually deleted by another CM just because it was an exact duplicate of another existing sentence. This should be stopped, since there's already a script that automatically does that, even if it's not used very often. Please stop deleting manually exact duplicates, since it's already an automated job that doesn't require the imposition from other users to not add certain sentences just because the script isn't frequently used
    • Why do you care, if another user delete it or automatical script delete it? What's the difference for you?
      • The difference between manual and automatic duplicate merging is that an automatic procedure, if written correctly, can preserve the comments and history of all instances of the sentence. It can also preserve audio, if it exists, for one of the sentences. (If and when we allow multiple audio files for a single sentence in the future, it will preserve all the audio.)

        The problem is that we don't have a good automatic duplicate merging mechanism right now. The script that was used in the past was run infrequently (perhaps only several times a year). In part, this was because it was a very slow procedure with many restrictions on when it could be run. It also lost audio links. We're in the process of writing new code. I consider this our second highest near-term priority, after restoring access to the wiki and our help/FAQ pages.

        In the meantime, there are two "camps": those who believe that contributors should look for existing sentences before adding possible duplicates, and those who believe that this is too burdensome and thus should be a task left to the software, whenever it is ready. I'm not going to comment on the arguments made on both sides, other than to say they both have some validity in my eyes.

        However, there is much that can be done to maintain peace in the time before the duplicate merging code is written. It's a good idea for everyone to look for possible duplicates (especially when contributing sentences consisting of common words). On the other hand, I don't see much use in corpus maintainers deleting duplicate sentences. When we have such a high number of duplicate sentences (~25,000 for English is the figure I've seen), any manual deletion will have no real impact. Nor does it seem to prevent people from contributing duplicate sentences. It does, however, succeed in annoying people and leading to arguments, both of which are counterproductive.

        Thus I urge everyone to be patient until the duplicate merging code is complete and in use, and to refrain from either contributing duplicate sentences or from deleting the ones that exist.
      • >Why do you care, if another user delete it or automatical script delete it? What's the difference for you?

        The difference is that if native speakers only search for exact duplicates just to delete them, they won't have the time to find mistakes in sentences and post corrections in comments, which, in my opinion, is a more urgent thing to do, rather than removing manually duplicates
        • I agree with you, but every user have a right to deside, what he/she likes better to do. If someone don't like to post corrections in comments, we can't force him, can we?

          I also agree, that quality management system on Tatoeba.org can be improved. I think, every sentence, regardless, if it was created by a native speaker or non native speaker, should be checked, corrected if it's nesessary and tagged OK. Also, I think, it would be better, if we can see, who tagged it. So, if we trust this user, we can trust all the sentences, he/she tagged, regardless, who created those sentences.

          Also, I suggest to create a list of users, who are willing to check sentences, in order those users, who are not sure about the quality of their contribution, can ask them.
      • #
      • CK
      • CK
      • 4 day(s) ago - edited 4 day(s) ago
      If it's an exact duplicate and there are no sentences yet linked to it, I think it's a good idea to delete it.

      If left there, then it potentially wastes other members' time re-translating something that has already been translated.

      My policy, unless TRANG tells me not to do it, is to delete such sentences.
      If a duplicate has been translated, I don't delete duplicates.
      • but then, we could even miss new translations by doing that. plus, users might actually learn something with those sentences, so it's far from being a waste of time if they translate them, even if they produce duplicates
          • #
          • CK
          • CK
          • 4 day(s) ago - edited 4 day(s) ago
          So, should I re-contribute all my contributions hoping for new translations?

          I don't think so.
          • well, i don't think it's such a problem if users sometimes add some duplicates by chance, so i don't see the need to rush to delete them because the deduplication script isn't used that often. they'll eventually be merged, so if any user decides to add new translations to them, they'll be added to just one sentence after the merging
              • #
              • CK
              • CK
              • 4 day(s) ago
              I don't "rush" to delete such sentences, but when I see them, I do it as a service to the Tatoeba community.

              I don't plan to continue this conversation.
              I think most people can understand my position on this.

    • Congratulations to all the new advanced contributors!
        • #
        • CK
        • CK
        • 5 day(s) ago - edited 5 day(s) ago
        For those of you who may not know, ...

        This change in status isn't really a "promotion" as a reward for good service.

        This status change means ...

        1. The member can now add tags.
        2. The member can add links between existing sentences.

        I think any member who has contributed a lot responsibly and feels either of these are things you would like to do to help the corpus, should request this status.
        (Perhaps, "a lot" would be over 1,000 sentences.)

    • Great news.
      They are all important contributors.
    • Hey, Alex, good to see you again!

      Just to be clear, these are requests for status changes, not status changes themselves. Contributors should write private messages to Alex if they have any comments regarding these candidates.
12345 >>>