clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5266 threads)

CK
CK
10 hours ago - 8 hours ago
We now have over 550,000 sentences with audio files.

https://tatoeba.org/eng/audio/index

Here are some comparisons with the top four languages with audio.


397,583 English
= more than the number of French sentences (7th ranked language)

64,456 Spanish
= more than the number of Chinese (Mandarin) sentences (20th ranked language)

21,187 Kabyle
= more than the number of Persian sentences (33rd ranked language)

18,962 German
= more than the number of Romanian sentences (34th ranked language)
shekitten
3 days ago
I'm not sure if this has been mentioned before, but would it be possible for sentences to provide multiple audio files in different regional accents?

For example, one audio file in Peninsular Spanish, another in Argentine Spanish, perhaps another in Mexican Spanish...

Or one in U.S. English and another in British English, and another in Australian English.

In cases where the sentence is natural in multiple dialects (which happens quite often), I think this would be beneficial to learners, even if it would obviously depend on the willingness of people with different regional accents to contribute audio.
hide replies
Thanuir
2 days ago
If someone is active on Forvo, then it might also be fruitful to try developing cooperation with them. The website mostly contains pronunciations of words, but also some sentences. There are often several pronunciations and the region of the speaker is displayed.
CK
CK
5 days ago - 5 days ago
Create a Dashboard of Customized Links for Tatoeba.org

http://goo.gl/RzP8hV

It's been updated to include a few new items.

If you are a regular user of this, you may need to force a reload to get the new version of the external .js file.

For people who haven't yet tried this, ...

1. This will load in faster than tatoeba.org, since it will cache on your own computer and doesn't require a connection to the database.

- If you just need to get to the search or need to get some specific links, you can save time.

2. This is set up to give you several options for ways to find sentences to translate into your own language.

3. Once you have chosen the language you want to translate from and your native language, you can bookmark the resulting page, and access your bookmark. This means you don't have to set it up every time.

4. Members who like to translate from several languages, can easily set up and bookmark several versions of this. The external .js file that gets cached on your computer is the same one.

hide replies
Ricardo14
5 days ago
Thanks for that. However it displays "Not secure" - http://prntscr.com/mz1exk . Is that a big problem?
hide replies
CK
CK
4 days ago
That just means it's http: and not https:
Since you're not submitting any data to that website, it doesn't matter.
There are a lot of websites using http: and not https.

CK
CK
2 days ago
Preset Searches for Study on Tatoeba.org

http://bit.ly/searchesforstudy

I made this as a modified advanced search with a few presets options.
Perhaps some members may find this useful.

1. show only sentences by members who claim to be native speakers.
2. random sort

You can easily change a few other options.
CK
CK
2 days ago
** New Spanish Voice **

terrafuego
https://tatoeba.org/eng/sentenc.../show/8796/und

If you, too, would like to add audio files in your native language, please read http://bit.ly/shtooka
Aiji
3 days ago
[BUG]

- Add two sentences, A and B.
- Add A to a list L.
- While A is being adding (the waiting symbol is spinning), click on the "Add to a list" button => It looks like B was added to the list L (because L disappears from the dropdown), while the waiting symbol on A spins forever.
- Actually, A is correctly added to L (checked by opening L), and B cannot be added anymore unless going to the sentence page of B.

Expected behavior:
- Wherever I click during the addition of A to L, the dropdown menu of A, and its waiting symbol, should be the ones affected when the operation is over, not the one with the current focus.
sharptoothed
4 days ago
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
hide replies
Guybrush88
4 days ago
thanks :)
Ricardo14
3 days ago
Thank you so much!
TRANG
6 days ago
**Traduction des Conditions Générales d'Utilisation (CGU)**

Nous avons mis à jour nos CGU et elles ne sont pour l'instant qu'en français. Si vous comprenez suffisament bien le français et pouvez nous aider à les traduire vers d'autres langues, je vous invite à nous rejoindre sur Transifex:

https://www.transifex.com/tatoe...-terms-of-use/

C'est la plateforme que nous utilisons pour gérer les traductions de l'interface du site web. Si vous ne comprenez pas très bien comment l'utiliser, n'hésitez pas à poser vos question ici sur le Mur ou en m'envoyant un message privé.

Vous pouvez sinon vous référer à notre wiki[1] ou à la documentation de Transifex[2].

Merci!

---

[1] https://en.wiki.tatoeba.org/art...ce-translation
[2] https://docs.transifex.com/
hide replies
JeanM
6 days ago
J'ai essayé de rejoindre l'équipe pour l'anglais, mais je n'arrive pas à sélectionner "English" .
hide replies
TRANG
5 days ago
Je pense qu'il fallait d'abord que quelqu'un accepte ton inscription au groupe. Est-ce que tu peux réessayer maintenant?
hide replies
JeanM
4 days ago
Ça marche, merci !
ecorralest101
4 days ago
Hello dears,

I want to know if there are any Bambara speakers here on Tatoeba that might translate into this language.
CK
CK
5 days ago
The 178 Languages with Registered Native Speakers

http://tatoeba.byethost3.com/19...-speakers.html
danepo
22 days ago - 21 days ago
Mozilla hat seine Sprachdatensammlung Common Voice öffentlich freigegeben
https://www.heise.de/newsticker...i-4323042.html

https://voice.mozilla.org/en/datasets
hide replies
shekitten
22 days ago
Al ĉiuj: Ĉu nomoj krom 'Tomo' ne estus tre utilaj por tiu projekto? Ŝajnas, ke oni maljuste ĉikanas uzantojn, kiuj faras tiajn frazojn.

To everyone: Wouldn't names other than "Tom" be very useful for that project? It seems like users who make those sorts of sentences are unjustly hounded about it.
hide replies
Ricardo14
20 days ago
I agree. Tom has been used too much. I know it has been adopted to avoid repetition and it is not against Tatoeba rules but I truly beliveve there are other ways to achieve such a thing.
hide replies
shekitten
20 days ago
I'm not sure Tom has been used too much, but it would be nice — and useful to projects that use our data — if we had other names, especially non-Western ones. OsoHombre has used a lot of Arab names in his sentences, and ended up being repeatedly told to use the name "Tom" over and over after already giving his reasons for not doing so. I think using other names couldn't possibly hurt and is actually very helpful to some use cases, like Mozilla Common Voice.
hide replies
AlanF_US
20 days ago
I believe there was only one person in favor of restricting sentences to this small number of names, and Trang explicitly said that this was not a Tatoeba policy. However, that person is a very prolific contributor, and is not about to start adding sentences with other names. Basically, this comes down to the following: If you want to see sentences with proper nouns other than "Tom", "Mary", "Boston", and "October", add them yourself -- but don't be surprised if you continue to see many more sentences that only use those nouns.
hide replies
PaulP
19 days ago
When I started learning Romanian I used the Anki app based on the Tatoeba corpus. I remember that I got a row of sentences similar to "Tom is sleeping", "Mary is sleeping", "Fadil is sleeping" etc. It is very annoying to translate them all. In my opinion it is quite OK to use other names than Tom, but before adding a sentence please make sure that you don't create near duplicates.
hide replies
shekitten
19 days ago
Is there a standard program that generates these decks? If so, it would likely be possible to change this program in a way that doesn't add near-duplicates to the deck.
Aiji
19 days ago
As shekitten suggested, and as frustrating as it can be, the problem lies in the tool, not in the source.

Having near-duplicates sentences could be very useful to a tool trying to extract information from sentences, as a training set, for example.

I think it is harder for the maker of the second tool to create near-duplicates out of nothing than for the one of the first to avoid near-duplicates that already exist.

As always, most of use-case issues lie in poorly designed tools and not in the source (Surely, the source could be improved, but you got the point I think).
Impersonator
19 days ago
> If you want to see sentences with proper
> nouns other than "Tom", "Mary", "Boston",
> and "October", add them yourself

This doesn't really work due to the native speaker policy. We're discouraged from adding sentences in languages that are likely to be translated, so the impact we can have is very limited. English native speakers have unfair advantage of forcing their names (and not just names) on others.

Tatoeba badly needs decolonisation.
hide replies
Thanuir
18 days ago
I agree about decolonization.

Some constructive things to do:

1. Add sentences with varied names of people and places.
2. Add other sentences characteristic of less present cultures - food, clothes, politics, idioms, nature, etc.
3. When you see such sentences in other languages, translate them.
4. When you see someone discouraging others from adding sentences with varied names, for example, reply with a list of reasons why adding varied names is good and desirable.

Tatoeba is a volunteer effort, so the best way of creating change is to simply do it. It is a pity that certain cultures dominate the website, but I do not see a constructive way of preventing that. The Tom, Mary and Boston -sentences do add value to the website, so they should not be discouraged, either.
hide replies
shekitten
18 days ago - 18 days ago
I think there are constructive ways of preventing that. It's like something George Orwell said on a different topic (political language): "A man may take to drink because he feels himself to be a failure, and then fail all the more completely because he drinks."

The main way colonial attitudes perpetuate themselves is passive. I don't need to make any personal effort to continue cultural genocide against the Seneca; it's enough that I do nothing about the fact that the English language and English cultural attitudes are the supreme currency on their land. I don't need to steal the land of Indigenous Americans personally; it's enough that my great-grandparents came here and settled on their land after it was already stolen, and that I do nothing to remedy the material harm that has resulted from this theft.

Something we could do, apart from what you're suggesting, is to make it a violation of policy to discourage people away from adding non-Western names, places, etc.. Such discouragements are assertions of long-existing hierarchies, no matter how politely they are phrased and regardless of whether the user is conscious of this. I am sure there are other things we could do, but if we begin from the point that "nothing constructive can be done," it will be a self-fulfilling prophecy.
hide replies
Thanuir
18 days ago - 18 days ago
My comment was about Tatoeba in particular, in case that was not clear.

I do not know enough about how policy on Tatoeba is settled to comment on that. I would not oppose your suggestion, if it was up to me.

Below is a comment, slightly edited, I left on a sentence when the contributor was discouraged from adding a sentence with non-standard name. Maybe you find it useful when you see this kind of counterproductive behaviour:

...

1. I would not discourage people from contributing, even if they contribute sentences that fit a pattern. They will probably also contribute other things.

2. Writings names in different scripts is non-trivial and language-dependent. The famous mathematician Tikhonov, Tihonov, Tychonoff, etc., and the philosopher Plato or Platon.

3. Declension of names is non-trivial in some languages. For example, in Finnish, Tomi-Tomin-Tomilla, Johannes-Johanneksen-Johanneksella. As such, having different names adds actual linguistic content. This tends to be especially true of foreign names.

4. It is most natural to write sentences that use names of the culture in question.

5. If there would be default names, which culture would they be from? I would prefer Väinö and Aino, personally, as they are good and traditional names with ties to Finnish mythology. I am sure everyone else would have different favourites

6. Different names suggest different genders in different languages. Kari is male in Finland and female in Norway, for example, by default.

7. It takes a lot of effort to police patterns. One can use the same to add new sentences and to translate sentences instead. Furthermore, this would be something one would have to teach most contributors one-by-one. Adding such requirements for contributing is not a good idea.

So: Several such sentences are not needed, but they also do not cause harm. It would take work and be highly impolite to police them. Unifying names would, in general, lead to loss of linguistically relevant content.
hide replies
shekitten
18 days ago
Thanks, those are useful points. And especially #3 is a thing that English speakers can easily miss, and it really shows how useful and even necessary it is to give sentences with multiple names. If the primary common language used on Tatoeba were Russian instead of English, our default proper nouns would probably reflect the language's different declensions. If it were Turkish, our default proper nouns would probably reflect the different types of vowel harmony and consonant changes.
soliloquist
19 days ago
> OsoHombre has used a lot of Arab names in his sentences

Actually, the way OsoHombre builds up his corpus isn't much different from CK's. He has his own standard names, too.

Sami <-> Tom
Layla <-> Mary
Fadil <-> John
Salima <-> Alice
Cairo <-> Boston

I think users adding original sentences in large numbers tend to adopt this wildcard policy one way or another. It has its advantages. The question is, if this policy is useful for users individually, will extending it to all original sentences in a language bring more good than harm, or vice versa?

Btw, there's a phone-number search site generating thousands of spam pages by using the patterns of Tom sentences here with different names for SEO purposes. It's interesting to see how many derivations can be done just from a single pattern.

https://www.google.com/search?q...w=1277&bih=538
JeanM
7 days ago
I have seen the "do not insert annotations into sentences" policy here: https://en.wiki.tatoeba.org/art...into-sentences

While I am aware of that, I wonder if adding annotations *on top* of sentences (i.e. separately) would be a partial solution here. Below every sentence there could be an extra field, perhaps only displayed to advanced contributors, that allows marking proper nouns, and maybe even other things such as dates (in the simplest form, picture something like the "highlighter" feature of PDF annotation software). This would not have any of the drawbacks listed on the page linked above, as the annotation would be a completely optional separate field that's hidden by default.

The advantages would be that downstream users of the data (e.g. Memrise-style study deck apps, or translation software) could then attempt to replace proper nouns to add some variety to the data. This is obviously not as trivial as I make it sound, as one would have to contend with phenomena such as inflection, but it's certainly a starting point – and inflection could be dealt with downstream, or partially handled by more sophisticated annotation schemata (which could be used to mark gender, declension, etc.).
hide replies
Thanuir
6 days ago
Currently there exists sentence-specific tags, but nothing word-specific, as far as I know.
Ricardo14
20 days ago
I agree. Tom has been used too much. I know it has been adopted to avoid repetition and it is not against Tatoeba rules but I truly beliveve there are other ways to achieve such a thing.