clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

CK's messages on the Wall (total 878)

keyboard_arrow_left 1234567...44
CK
CK
yesterday
The tags are older and the rating are newer. That's why I think it was TRANG's intention to eventually use the ratings to help people know which sentences could be trusted. At about the same time, she put in the option of seeing which sentences were "owned" by native speakers.
CK
CK
yesterday - yesterday
I think that perhaps TRANG was thinking about using the rating system to help with quality control, replacing the OK tag, but I'm not sure.


* The OK tag

** Disadvantages
--one needs to be an advanced contributor to add the tag.
--only one person can add the tag.

** Advantages
--In advanced search, you can limit searches to only those sentences tagged OK.
-- You can browse all sentences in a given language tagged OK. For example, the almost 6,000 French sentences tagged OK. https://tatoeba.org/eng/tags/sh...with_tag/7/fra


* The OK Rating (currently called "collection(s)")

** Disadvantages
--since the ratings are called collections, non-native speakers add OK ratings, too, which may not be as trustworthy.
--There is no way to easily list all sentences in a given language that are rated OK.
--No way to limit searches to rated sentences.
--Not many members are using this yet. However, I have given OK ratings to over 700,000 English sentences. https://tatoeba.org/eng/collections/of/CK/ok


** Advantages
--With the exported data, it's easy to find the sentences with the most OK ratings. If several native speakers have rated a sentence OK, it's likely good.
--You can also see which sentences have a "not OK". This way if there are 2 OK ratings and 5 Not OK ratings, you may suspect that the sentences isn't really OK.
CK
CK
2 days ago
Perhaps the best solution, if you are going to use a bot to generate or harvest sentences, would be to have the bot create a text file that you proofread and then ask TRANG to import the sentences for you. This would solve both the quality problem and the wrong language flag problem.
CK
CK
5 days ago
** New German Voice **

Driini has contributed 150 audio files.
https://tatoeba.org/eng/sentenc.../show/8680/und

If you, too, would like to help by contributing audio files in your native language, please read http://bit.ly/shtooka.
CK
CK
8 days ago
** Stats - 2019-01-05 - Username & Number of Audio by CK **

http://tatoeba.ueuo.com/stats-190105-audio.html

See how many English sentences by various members have audio files by CK.
CK
CK
8 days ago
** Stats - 2019-01-05 - English Sentences on List 907 Translated by Native Speakers **

http://tatoeba.ueuo.com/stats-2019-01-05.html

This shows the percentage of English sentences on List 907 translated into each language that has identified native speakers working here.
CK
CK
9 days ago - 4 days ago
** Tab-delimited Bilingual Sentence Pairs **

http://www.manythings.org/anki/

Updated: 2019-01-06

These files include all the proofread English sentences on List 907 that have links to sentences owned by native speakers of the other languages.

Updated: 2019-01-12 (again)
CK
CK
25 days ago
Tatoeba.org Native Speakers with Native Language Sentences

http://bit.ly/2CvfgQ9

5193 = Native Speaker Usernames with Native Speaker Sentences
126 = The Number of Languages with Identified Native Speaker Contributions
5740203 = The Number of Sentences These Members Own in Their Native Languages

December 22, 2019
CK
CK
28 days ago - 27 days ago
** Sentences with audio that haven't yet been translated into any language **

Spanish (currently 5,398)
https://tatoeba.org/eng/sentenc...nd&sort=random

English (currently 6,846)
https://tatoeba.org/eng/sentenc...nd&sort=random



Perhaps you would have fun translating some of these.



Here are the same searches, showing the NEWEST sentences first.

Spanish
https://tatoeba.org/eng/sentenc...d&sort=created

English
https://tatoeba.org/eng/sentenc...d&sort=created


Here are the same searches, showing the SHORTEST sentences first.

Spanish
https://tatoeba.org/eng/sentenc...und&sort=words

English
https://tatoeba.org/eng/sentenc...und&sort=words
CK
CK
29 days ago
Remove the quotes around the wildcard.

https://tatoeba.org/eng/sentenc...rom=eng&to=tur
CK
CK
2018-12-15 08:19
If you think that all native speaker contributions can be trusted, then here is a table that will give a minimum score for each language.

http://tatoeba.ueuo.com/stats18...agenative.html
CK
CK
2018-12-14 10:17
We now have over 518,000 audio files, up a little over 18,000 since November 12, 2018.

https://tatoeba.org/eng/audio/index

You can find each member's audio list with this search.
The lists with the most-recent changes are at the top.
https://tatoeba.org/eng/sentenc...direction:desc

See the last wall post about audio additions.
https://tatoeba.org/eng/wall/show_message/30710
CK
CK
2018-12-10 23:29 - 2018-12-10 23:29
> From my point of view, a good sentence should be indistinguishable whether it is original or a translation.

I agree with this.

If the purpose of the Tatoeba Corpus, which I believe it is, is to provide people with sentences worth studying to learn a language, then this is something we should all strive for.

My suggested guidelines are as follows.

* Only translate into your native language.

* Only translate things you are 100% sure of.
** If you aren't sure, just skip the sentence.

* Only create natural-sounding sentences.
** Remember that people studying your language will study your sentences.
** Even if you know what it means, but can't make a natural-sounding sentence for the translation, just skip it.

* If you think the sentence is strange, don't translate it.
CK
CK
2018-12-10 01:01
Also, 5. Sounds awkward, but still acceptable. ....

If by "still acceptable", you mean to include obviously incorrect language use, but still able to communicate the speaker's intention, I would consider these not "good" for the Tatoeba Corpus.

In real life, these are totally acceptable when communicating with friends and maybe in many other situations, but I don't think many of us would think they are appropriate for anyone who wants to use them to study a language.
CK
CK
2018-12-08 01:24 - 2018-12-08 06:38
1. If you assume that all native-speaker sentences are correct and natural-sounding, then based on 2018-11-24 data, it would be at least 78%.

78% (5611756/7139827)
Based on this http://tatoeba.byethost3.com/st...018-11-24.html

However, not all native-speaker sentences are good, since some have typos, or are non-natural-sounding, word-for-word translations.

The number would be higher, though, since some non-native sentences are good, and we have a number of sentences in Esperanto, Latin and other constructed and dead languages.

If you want to get an idea of the number of sentences in each language by native speakers, see http://bit.ly/nativespeakers
(Last updated October 13, 2018)


2. For English, I would say that the quality is at least 61%.

61% (702027/1149697)
Based on https://tatoeba.org/eng/sentenc...s/show/907/und

However, the actual score would be higher.
I don't add near-duplicate sentences to this list.
For example, I added these.
* Tom was desperate for attention.
* That's not going to fly with Tom.
But, I didn't add these.
* Mary was desperate for attention.
* That's not going to fly with Mary.

I don't add old-fashioned and archaic sentences to List 907.
I try not to add sentences that are potentially offensive and not appropriate for all ages and cultures.
And, I don't read very, very long, multi-sentence contributions.


3. 13% (917620/7183699) of our sentences had good ratings in the 2018-12-08 exported data.

Note that some members rate sentences in non-native languages, so perhaps this data can't be trusted too much. Part of this problem, perhaps, is that the rating system is called "collections."
See the number of ratings and which members have rated sentences "good.".
http://tatoeba.ueuo.com/2018-12...od-ratings.txt
CK
CK
2018-12-06 00:47
Daily Contribution Stats All On One Page

http://tatoeba.ueuo.com/timeline/

See how many sentences were contributed each day since the beginning of the project.
CK
CK
2018-12-03 01:47
Here are the same links, showing the sentence text and clickable links.

http://study.aitech.ac.jp/replayed/

I split the HTML files into 400 lines each.

To quickly check a lot of these, on a Mac, hold down shift+command as you click the links to open in a new tab, and then command-w to close that tab and return to the page with the links. There are similar keyboard shortcuts for other operating systems, which you probably already know for the system you are using.
CK
CK
2018-11-30 08:32
** Create a Dashboard of Customized Links for Tatoeba.org **

Version 1
http://goo.gl/RzP8hV
This version only shows (indirect) translations in the language of the translator.

Version 2
http://goo.gl/D71HfQ
This version shows all translations.

Version 1 is likely faster, but for those of you that know several languages, perhaps Version 2 might be more fun.

We have had a number of new members who have joined since the last time I posted these links.
CK
CK
2018-11-29 00:48
Here are the same stats, but with a column showing sentences numbers.
I wanted to be able to easily see which sentences had been in the database the longest.
The higher-numbered sentences are the newer sentences.

http://tatoeba.byethost3.com/st...018-11-24.html

You can sort on any column in the table.


CK
CK
2018-11-24 07:33
** Stats - 2018-11-24 - Native Speaker Sentence Counts **

http://tatoeba.byethost3.com/st...018-11-24.html
keyboard_arrow_left 1234567...44