clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5310 threads)

JeanM
9 days ago
Are there any guidelines regarding the translation of given names? I found some sentences where e.g. Luke in English was translated to Luc in French. It seemed a bit odd to me.
hide replies
Ricardo14
9 days ago
I don't think so. Mary is translated into Maria in German.
hide replies
JeanM
9 days ago
The only negative side I can see is that, when speaking, I wouldn't normally translate given names depending on the language I'm speaking. So, for instance, I wouldn't refer to you as Richard in English. And to use your example, the one German Maria I know still goes by "Maria" in English-speaking countries.

The positive sides are that it's a good way of introducing variety in the sentences, and of teaching names in other languages. I can also see names being translated in the context of example sentences given in language textbooks, and I have also read some translations of novels which also translated names.
hide replies
Ricardo14
9 days ago
> I wouldn't normally translate given names depending on the language I'm speaking.

Yes, agreed. I just mentioned what people who work on the German corpus do.
hide replies
JeanM
9 days ago
Ah I see, then it was probably one of their sentences which I saw. Is the idea that the corpus maintainer for a given language sets the guidelines for that language?
hide replies
Ricardo14
9 days ago
Actually not. There are debates to define which guidilenes we're going to follow like

- adding space for question and exclamation marks in French
- which "symbols" we're going to use to create dialogue

>> https://en.wiki.tatoeba.org/articles/show/main

How to Write Dialogues

>> https://en.wiki.tatoeba.org/art...riting-dialogs

Thanuir
8 days ago
I sometimes use the foreign names as is, and sometimes translate them - Tom as Tomi or Tommi, usually.

One can think of in terms of translating the original sentence, in which case the name usually would not be translated (though it might be modified a bit to make it pronouncable in the new language; when speaking of Bjørn in Finnish, we often say Bjørni, which is much easier to say).

Or one can think in terms of what would be the equivalent sentence in the other language. In this case, using a native name is better, if the original name is native to the sentence that one is translating.

Native names are also much easier to pronounce, should someone want to add audio.

I believe the thing to do here is to use either the names as is, or to replace them with a native name, and in any case not worry about it. The contributions will be useful to Tatoeba in both cases.
ecorralest101
10 days ago
Hello,

Sentence #7694625 please change flag to French.
hide replies
AlanF_US
9 days ago
Someone did.
ecorralest101
10 days ago
Hello,

I was working on Transifex and I noticed that some languages in English don't show option to be translated into Spanish like: Guadeloupean creole, Low German.





Thanks.
ecorralest101
2019-03-17 21:56
Hello dears,

I want to know if there are any Bambara speakers here on Tatoeba that might translate into this language.
hide replies
TRANG
26 days ago
You can check from "Community" > "Languages of members" who has added Bambara in their profile:

https://tatoeba.org/eng/users/for_language/bam

Apparently there is only one person, and they have not contributed anything. But you could try to contact them via private message, they might respond to your call.
hide replies
ecorralest101
10 days ago
Thanks for ur message.
sacredceltic
11 days ago
*** matérialisation des traductions directes /indirectes ***

La plupart des nouveaux utilisateurs continuent à confondre les traductions directes et indirectes, malgré les différentes couleurs de flèches (bleue ou grise)
Ça pose de graves problèmes de compréhension du site et ça génère des commentaires intempestifs, parfois agressifs...
Je propose une amélioration pour mieux matérialiser l’indirection :
Ne peut-on pas, en plus des couleurs de flèches, INDENTER l’affichage des traductions indirectes ?
hide replies
AlanF_US
11 days ago
For those who can read English but not French, sacredceltic is saying that most new users are still confusing direct and indirect translations, despite the different colors of arrows [my note: and text] that are used for each. He suggests, in addition, indenting the indirect translations.

I think that his suggestion would help as long as there were a visual element to separate the block of indirect translations from the block of direct translations. Otherwise, it would look as though there were some special relationship between the indirect translations and the last direct translation in the list (that is, the one at the bottom).

It's worth noting that if you go to the settings, the final one, under "Experimental options", is "Display sentences with the new design. Note that you will not have all the features from the old design." If you select this, you will see the captions "Translations" and "Translations of translations" in separate subsections. This is what users who are not logged in see, but it is not the default setting for new contributors.

This leads me to bring up several things that should be improved.
(1) If we think it's useful for new contributors to see a visual distinction between direct and indirect translations, we should do that by default for them.
(2) The differences between the "new" and "old" design are not documented anywhere, so they can only be determined by experimentation. This is at best a waste of time for users, and undoubtedly scares them away from trying the "new" design.
(3) "New" and "old" are both vague and inaccurate, given that this feature has been around for several years.
(4) Features should not be sitting indefinitely in the "Experimental" section. The first three items in that section seem pretty straightforward and should be well-tested by now, so I suggest moving them into the main section. As for the "new" vs. "old" design, we should figure out what we really want to do: Fix the missing functionality in the new design? Incorporate the new elements into the design that everyone sees? Offer a choice between these options permanently? Some combination of these?
hide replies
AlanF_US
11 days ago
A specific question for @TRANG and/or @gillux: Can you list which features are not supported by the "new" design, or is this something that needs to be figured out by experimentation? Please let me know so that I can add documentation for this.
hide replies
TRANG
11 days ago
The new design only shows the sentence and translations. Every button on the old design is not yet implemented in the new design. More specifically:

- adding a translation
- editing the sentence
- marking as favorite
- adding to a list
- linking to another sentence
- deleting the sentence
- adding a rating
- listening to audio
gillux
14 days ago - 14 days ago
I’ve been playing around with our default search ranking algorithm. I insist on the "default" part because that’s what the vast majority of visitors use. I also focus on searches that do not use double quotes or any special trick. Just plain words. Again because that’s what the vast majority of visitors use.

Our current way of ranking results is pretty basic: it searches for sentences that include all the words (eventually stemmed) and sort them by total number of words in the sentence.

A problem with this approach is that the order of the words is ignored. The top result of searching for "you go there" is "There you go!" because it’s a shorter sentence than "You may go there."

Ignoring word order is especially catastrophic on languages without word boundaries, like Chinese, because the searched characters are randomly reordered into something totally unrelated. For example, the results for "可不可" in Chinese are cluttered by irrelevant "不可something". Same for kana words in Japanese.

In order to address this problem, I tentatively tweaked the default ranking algorithm on https://dev.tatoeba.org/ into something that prioritize, in the following order:

1. sentences that contains an exact match (like if searching for ="you go there")
2. sentences having the "longest common subsequence" (LCS, [1])
3. sentences having the least number of words

[1] https://docs.manticoresearch.co...anking-factors

However, I don’t know if this new ranking suits everyone out there. What do you think?

You can compare the search results on https://tatoeba.org/ (old ranking) and https://dev.tatoeba.org/ (new ranking). You can run a search on tatoeba.org, and then add "dev." in the URL bar and press alt+return to open a new tab.
hide replies
AlanF_US
13 days ago - 13 days ago
I do prefer a ranking that favors exact matches over stemmed matches. Longest common subsequence also sounds good. But sentences having the least number of words are often not the ones I want to see most. I prefer slightly longer ones that give me more context. For that reason, I always choose random ordering. It doesn't always put the sentences that I want at the very top, but at least I have a good chance of finding them without having to go through pages and pages of very short sentences. Also, providing a mix of sentences that is random with regard to sentence length lets people see more diversity. I think that's a good thing.
hide replies
CK
CK
13 days ago - 13 days ago
Maybe you wouldn't want this for the default search, but I wonder if it would be possible to add a "minimum word" option to the advanced search. This might prove useful. For example, members could still opt to sort by length, but start by showing sentences that are over a certain length up to 1,000 results.

More involved, perhaps, but an additional idea might be to have a "maximum length" option, too. This would allow members to have search results displayed randomly between 5 and 12 words in length, for example.
hide replies
AlanF_US
13 days ago
Yes, I can imagine that a "minimum length" and a "maximum length" option would be useful. However, the nice thing about favoring sentences that meet a certain criterion rather than limiting them to that criterion is that if there are not enough sentences that meet it, you will automatically see the other ones without having to remove the criterion and do another search. I imagine that if I set a minimum and/or maximum length, I would often eliminate some of the fallback sentences I'd like to see, and then I'd have to do a follow-up search.

Indeed, I wouldn't want the default search to be optimized for my particular needs, which would be something like "favor sentences from five to ten words in length". However, I worry that optimizing it for choosing the shortest sentences would pessimize it for people like me, whereas leaving out a criterion of length would allow people to see a variety of sentences, short and long.
Thanuir
12 days ago
If I could choose and there were no computational costs:

1. An exact match of the sentence.
2. Sentences with exact match of the query as part of it.
3. Sentences with all the exact words, but possibly in different order or with other words between them.
4. Sentences with all the words, but with stemming and the order might be different etc.
5. Sentences with all but one of the words (with stemming and could be in any order).
6. Sentences with all but two of the words (with stemming and could be in any order).
7. And so on.
8. Sentences with even a single searched word (with stemming).

Random order within the categories. (Some of the categories could be sorted into even finer subcategories, but probably not worth it.)

For example search: haluan kalastaa tänään
1. Haluan kalastaa tänään.
2. Minä haluan kalastaa tänään, niin kuin eilenkin.
3. Tänään minä haluan kalastaa. Haluan kuitenkin kalastaa tänään.
4. Haluankin kalastaa tänään. Haluatteko te tänään tai huomenna kalastamaan?
5. Haluatteko elokuviin tänään?
6-8. Karhut kalastavat lohia.

The idea would be to first have the precise phrase and then to have increasingly distantly related phrases, which hopefully would still give some understanding of the involved words.
gillux
14 days ago
We upgraded our search engine to the latest version of Manticore. Manticore is a fork of Sphinx. You shouldn’t notice anything new because the search functionality remains the same. It just improves performance a little bit and paves the way for future improvements.

That said, while we were at it, we added stemming support for four additional languages:

• Danish
• Hungarian
• Romanian
• Norwegian (Bokmål)

Have a look at this page if you wonder what stemming is about: https://en.wiki.tatoeba.org/art...h#more-details
hide replies
sabretou
14 days ago
It might be placebo, but I feel the search is faster now! Thanks!
hide replies
TRANG
12 days ago
Additionally gillux implemented some changes to make the selection of the random sentence faster, which I just deployed.

All of this should lead to the website being faster in general. We're not yet reaching lightning speed but we're on a good track, I think :)
seveleu_dubrovnik
12 days ago - 12 days ago
Thanks for accepting me as a UI translator!
hide replies
TRANG
12 days ago
Thanks for joining us :)
CK
CK
13 days ago - 13 days ago
** We now have 40,620 Spanish-English Audio Pairs **

Using this link, you can get a random selection of 1,000 of them.

https://tatoeba.org/sentences/s...andom&from=spa



Here are the remaining 9 of the "Top 10."

17,026 German-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=deu

12,509 Kabyle-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=kab

9,520 Portuguese-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=por

6,573 French-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=fra

4,449 Berber-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=ber

3,810 Finnish-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=fin

3,460 Esperanto-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=epo

2,062 Russian-English Audio Pairs
https://tatoeba.org/sentences/s...andom&from=rus

hide replies
Seael
13 days ago
Great! I contributed some Spanish audios long ago. I'd like to have some free time to go get a microphone and contribute some more! Hopefully some time soon.
sacredceltic
14 days ago
Bonjour.

J'avais déjà signalé ici il y a quelques temps que je trouvais que l'identification automatique de langues semblait moins bien fonctionner qu'auparavant.

Là, je viens d'introduire la phrase « La stupidité n'est pas une excuse » et elle a été identifiée comme de l'italien.
Ca semble extravagant.
Dans mon souvenir, le système d'identification mis en place par sysko, reposait sur une analyse statistique de sections de 3 ou 4 lettres des phrases de chaque langue, permettant d'établir un score de probabilité.
J'avais participé à l'évaluation de ce système, et il était très efficace, en tout cas pour les langues pour lesquelles Tatoeba disposait d'une quantité suffisante pour servir de base statistique.
Or si on décompose ma phrase en sections de 3 ou 4 lettres, on tombe sur des sections qui sont quasiment toutes hautement improbables en italien :
"La stupidità non è una scusa"
J'ai donc bien peur que le système ait été changé pour un système bien moins performant qu'auparavant...
hide replies
gillux
14 days ago
Je suis conscient que c’est frustrant, mais sache nous n’avons pas oublié ce problème, il est noté sur Github [1]. Mais merci de nous le rappeler!

C’est toujours le même algorithme de sysko qui détecte les langues, donc cela doit venir de la base de données sur la laquelle il s’appuie. J’avais tenté de la mettre à jour, mais ça n’avait pas résolu le problème. Je vais investiguer ça prochainement et je te tiendrai au courant si j’ai besoin de ton aide pour tester.

[1] https://github.com/Tatoeba/tatoeba2/issues/1731
hide replies
sacredceltic
14 days ago
Je serais ravi de participer à tous les tests en la matière.
Je trouve ce sujet passionnant et je me demande si d'autres services, sur Internet, ont ce besoin. Je pense que c'est une besoin émergeant. Et il faudrait envisager de recourir à de l'IA, du "deep learning", de la même manière qu'on apprend à des systèmes à reconnaître des visages.
Je pense qu'il serait intéressant de demander à Google de l'assistance en la matière. Je suis convaincu qu'ils prêteraient leur aide, car ils doivent considérer le projet Tatoeba comme intéressant et prometteur, et donc avec bienveillance.
Mais, en attendant, je trouvais la piste prise par sysko très intéressante. D'ailleurs, les probabilités, c'est la base de ce sur quoi s'appuie le "deep learning" pour dire qu'une photo est celle d'un visage ou pas.
La connaissance est rarement algorithmique. Elle est plutôt cumulative.