clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5310 threads)

Thanuir
26 days ago
What to do with red sentences?

There are some sentences which are, as far as I know, completely valid, but they are in red and can not be translated. (E.g. https://tatoeba.org/fin/sentences/show/3313641 , https://tatoeba.org/fin/sentences/show/415526 )

The sentences can still be linked. Adopting them does not change the status. I have not tested if editing them changes the red status.

What is up with these sentences and what, if anything, should one do with them? (I am guessing they are from frozen contributors, but I do not know if that is the sole criterion.)
hide replies
TRANG
26 days ago
Currently, the only solution is to contact an admin so they "re-approve" those sentences.

It would definitely be useful (and it has been considered) to remove the red warning when the sentence gets adopted by a trusted user, but this has yet to be implemented.
hide replies
Thanuir
24 days ago
A personal message informed me that some of the red sentences might be copyright violations.

1. If they are copyright violations, they should be removed, not marked red.

I am not sure they can be; if there exists a French teacher's organization, they might have good ideas about where the boundaries of copyright violations lie in these types of cases. I would be very surprised if a collection of standard and common phrases would be a copyright violation, but I know only a little bit and that about the Finnish law, not the French one.

2. If they are of utterly awful quality, they could be removed en masse.

3. If they are salvageable or reasonable, I see no benefit to the red status. Maybe I am missing it. Automatic tagging with @needs native check , maybe, instead?
hide replies
TRANG
23 days ago
If they were clear copyright violations, we would of course remove them. For instance if you're gonna copy-paste every sentence from the Harry Potter books, that would be an issue and we wouldn't keep the sentences.

In our case, the red sentences are in an unclear status. Many of them aren't really out of the ordinary and sound like something that anyone could come up with. Some of them are actually only a problem because they are not CC BY compatible (for instanced copied from CC BY-SA content) and would actually be fine once we would implement this: https://github.com/Tatoeba/tatoeba2/issues/1659

As for other sentences that are not subject to copyright issues, most of the time when we mark them as red, they were not reviewed one by one. They were just the result of us figuring out that one user has a lot of problematic sentences, and no one has time to check each sentence one by one. So we mark them all red to reduce the risk that they get translated and because fixing a sentence that has been translated is often a mess. It's better to first fix them then remove the red mark.
hide replies
CK
CK
23 days ago - 23 days ago
Even if both sentences of each bilingual pair are just common, ordinary, everyday sentences, it would be considered a copyright violation to copy-and-paste a series of those pairs from a copyrighted source.
hide replies
Thanuir
23 days ago
That would not be obvious in the Finnish legal system. The original source would have to be unique enough to be considered a "teos", i.e. demonstrate unique artistic vision and be something another person would not have come up with.

This would be unlikely to be true of a short list a common greetings and their translations. A more unique translation would certainly qualify.

But the French copyright law is probably different.
TRANG
22 days ago
It is true that lists can be copyrighted, but it's not as straightforward when you take into account fair use. You'd have to consider how much of the list has been copied, and of course whether the list itself had some originality or not.

For instance let's say I have copied 20 basic sentences from a very large dataset of thousands and thousands of sentences. It would be impossible for the copyright owner of the dataset to claim a copyright infringement.

If anything, we should be able to keep very basic sentences within Tatoeba, even if they have been copied, as long as they are not presented within Tatoeba in a list that looks nearly identical to the source.

For instance if I'm a new user and I've copied 20 basic sentences from a blog post, and those are my only sentences in Tatoeba, then there could be copyright infringement due to the sentences being listed on my sentences page. But if I unadopt all these sentences, and there become blended into the corpus, then there is almost no more risk of copyright infringement because I have effectively removed the list.

On the other hand let's assume someone wants to create a list of basic sentences in Tatoeba, and they came upon an interesting blog where the author compiled a list of good sentences to learn for beginners. They decide to re-create this list in Tatoeba. Coincidentally, all the sentences on that blog post already exist in Tatoeba. Well, the list that the user created in Tatoeba can be considered copyright infringement despite the fact that the sentences existed already. But we would just need to delete the list to solve the copyright issue, we wouldn't have to delete the sentences.

My point is that when dealing with copyright on lists, we have to look at the places where Tatoeba lists sentences ("My sentences" page, lists, tags, favorites). This is where the list copyright infringement can make sense and it is the list that we would need to delete, not the items in the list. For the items, we have to look at each of them individually and evaluate if on their own they have any originality or not. And if not, it is fine to keep them.
Thanuir
23 days ago
I can, with some confidence, say that "Vi ses!" is okay in Swedish.
hide replies
TRANG
22 days ago
I'll remove the red mark on the sentences you mentioned.

I've also created an issue on this topic: https://github.com/Tatoeba/tatoeba2/issues/1847

Feel free to suggest other solutions if you have any better ideas.
Lyte3000
23 days ago
Is it just for me or does https://tatoeba.org/eng/tools/search_hanzi_kanji not work at all? Searching 一日 and 一二三四五六七八九十口 just doesn’t work
CK
CK
24 days ago
** Trivia **

28% (just over 11,750) of our 42,000 registered members have contributed 1 or more sentences.

I wanted to know, so I checked, and then thought perhaps other members might like to know, too.
hide replies
deniko
24 days ago
That's a surprisingly low figure, at least if you had asked me before posting your stat what would be the percentage of those who contributed at least a sentence I would have responded 70-80%.

Of course I understand there are a lot of people who use Tatoeba without contributing sentences, that's perfectly normal, but why register a user if you're not planning to contribute?

For the sake of favorites? To leave a comment? Bots?
hide replies
TRANG
23 days ago
This type of distribution is actually a common thing and is an example of the Pareto principle in action (or the 80-20 rule).

Quoting from Wikipedia[1]: "Pareto noticed that approximately 80% of Italy's land was owned by 20% of the population."

This phenomenon happens in various fields and in the case of Tatoeba this would translate into "80% of the sentences are contributed by 20% of the users".

I never invested time into figuring out what stops the users from contributing (that would be a very interesting research to do though), but my guess is that aside of spammers and bots, it's one of these reasons:
- website is not intuitive enough and people can't figure out how it works
- despite some initial good motivation, people ended up not being inspired enough to create sentences or not confident enough to translate
- or perhaps they tried to contribute but coincidentally, the sentences they wanted to add were already added, or they couldn't find anything easy enough to translate because all the easier sentences were already translated into their language
- procrastination: they register today then tell themselves they'll contribute tomorrow, but in the end never find the time to
- just curious: they just want to see what's available to them when registered
- by reflex: so many websites ask for registration nowadays, I wouldn't be surprised if some people just registered without really thinking
- to "reserve" their username: someone you stumble a website that you find kind of nice and you want to make sure your username is not taken by someone else (I happened to have done that myself on some websites when I was younger)

And there's probably hundreds of other reasons that my imagination cannot comprehend.

---

[1] https://en.wikipedia.org/wiki/Pareto_principle
soliloquist
24 days ago
I've seen multiple times new members saying hi on the wall and never showing up again. It's really an interesting phenomenon. Perhaps providing some guidance after registration (i.e. a multilingual and interactive welcome page with useful links about the languages they set on their profiles or redirecting them to a tutorial video page) may increase contributions.
CK
CK
24 days ago
** Native Speakers Not Listed as Native Speakers on Tatoeba.org **

http://tatoeba.byethost3.com/st...-speakers.html


This is a list of native speakers whose sentences will not show using the "Owned by a self-identified native" option in the advanced search.
sharptoothed
26 days ago
* Tatoeba Top 30 Languages Interactive Graphs*

Tatoeba Top 30 Languages Interactive Graphs have been updated:
https://tatoeba.j-langtools.com/igraph/
https://tatoeba.j-langtools.com/igraph/share.html
hide replies
Ricardo14
25 days ago
Thank you!!
Guybrush88
25 days ago
thanks
Aiji
2019-03-19 14:08
[BUG]

- Add two sentences, A and B.
- Add A to a list L.
- While A is being adding (the waiting symbol is spinning), click on the "Add to a list" button => It looks like B was added to the list L (because L disappears from the dropdown), while the waiting symbol on A spins forever.
- Actually, A is correctly added to L (checked by opening L), and B cannot be added anymore unless going to the sentence page of B.

Expected behavior:
- Wherever I click during the addition of A to L, the dropdown menu of A, and its waiting symbol, should be the ones affected when the operation is over, not the one with the current focus.
hide replies
CK
CK
29 days ago - 29 days ago
We now have over 550,000 sentences with audio files.

https://tatoeba.org/eng/audio/index

Here are some comparisons with the top four languages with audio.


397,583 English
= more than the number of French sentences (7th ranked language)

64,456 Spanish
= more than the number of Chinese (Mandarin) sentences (20th ranked language)

21,187 Kabyle
= more than the number of Persian sentences (33rd ranked language)

18,962 German
= more than the number of Romanian sentences (34th ranked language)
shekitten
2019-03-19 13:05
I'm not sure if this has been mentioned before, but would it be possible for sentences to provide multiple audio files in different regional accents?

For example, one audio file in Peninsular Spanish, another in Argentine Spanish, perhaps another in Mexican Spanish...

Or one in U.S. English and another in British English, and another in Australian English.

In cases where the sentence is natural in multiple dialects (which happens quite often), I think this would be beneficial to learners, even if it would obviously depend on the willingness of people with different regional accents to contribute audio.
hide replies
Thanuir
2019-03-20 08:03
If someone is active on Forvo, then it might also be fruitful to try developing cooperation with them. The website mostly contains pronunciations of words, but also some sentences. There are often several pronunciations and the region of the speaker is displayed.
CK
CK
2019-03-17 03:38 - 2019-03-17 06:37
Create a Dashboard of Customized Links for Tatoeba.org

http://goo.gl/RzP8hV

It's been updated to include a few new items.

If you are a regular user of this, you may need to force a reload to get the new version of the external .js file.

For people who haven't yet tried this, ...

1. This will load in faster than tatoeba.org, since it will cache on your own computer and doesn't require a connection to the database.

- If you just need to get to the search or need to get some specific links, you can save time.

2. This is set up to give you several options for ways to find sentences to translate into your own language.

3. Once you have chosen the language you want to translate from and your native language, you can bookmark the resulting page, and access your bookmark. This means you don't have to set it up every time.

4. Members who like to translate from several languages, can easily set up and bookmark several versions of this. The external .js file that gets cached on your computer is the same one.

hide replies
Ricardo14
2019-03-17 13:13
Thanks for that. However it displays "Not secure" - http://prntscr.com/mz1exk . Is that a big problem?
hide replies
CK
CK
2019-03-18 00:15
That just means it's http: and not https:
Since you're not submitting any data to that website, it doesn't matter.
There are a lot of websites using http: and not https.

CK
CK
2019-03-20 03:12
Preset Searches for Study on Tatoeba.org

http://bit.ly/searchesforstudy

I made this as a modified advanced search with a few presets options.
Perhaps some members may find this useful.

1. show only sentences by members who claim to be native speakers.
2. random sort

You can easily change a few other options.
CK
CK
2019-03-20 00:35
** New Spanish Voice **

terrafuego
https://tatoeba.org/eng/sentenc.../show/8796/und

If you, too, would like to add audio files in your native language, please read http://bit.ly/shtooka