clear
{{language.name}} Kieltä ei löytynyt.
swap_horiz
{{language.name}} Kieltä ei löytynyt.
search

Seinä (5201 viestiketjua)

keyboard_arrow_left 1234567...521
TRANG
eilen
**Code migration of Tatoeba**

In the past couple of months we've been working on upgrading the framework on which Tatoeba is built (CakePHP) to a newer version. We're starting to have something tangible so I deployed the new Tatoeba version on the dev website: https://dev.tatoeba.org/

I would like to invite everyone to test and report things that do not work, as this will help me prioritize what to fix. Please focus on testing features that would impact you the most. You can report issues here on the Wall or in the Tatoeba chatroom at https://chat.tatoeba.org.

Do not create issues on GitHub. At this stage, many things still don't work. I'm keeping track of what's left to do in a GitHub project:
https://github.com/Tatoeba/tatoeba2/projects/2
If something doesn't work and you see it in the "To do" column, there is no need to report it.

I am aiming to deploy the new Tatoeba on the main website somewhere around mid-January. This leaves us about 4 weeks to test and fix as many things as possible. For some features will remain broken, but that's okay, as long as the new Tatoeba is overall functional, the rest can be fixed afterwards.

Thank you for your help!
piilota vastaukset
AlanF_US
eilen
At the moment, any search I do for a word fails. The form of the error display differs depending on whether I'm logged in and whether it's an ordinary or advanced search. This is the error I grt when I do an advanced search for a word:

"Call to a member function getSearchableLists() on boolean
Error"
Rockaround
26 minuuttia sitten
Do we need to create a new account? Is there one for testing purposes? I tried mine in case the user database was the same, but it does not seem to be working.
sharptoothed
eilen
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
piilota vastaukset
Ricardo14
eilen
Thank you! :)
Guybrush88
eilen
thanks
wolfgangth
eilen
Import a csv-file with translations into Tatoeba?


There are a lot of sentences, which only differ in a first name (e.g. Tom, Jim, Sami). So I exchanged these names to check if there is a translation in the tatoeba database with another first name. (I do this in a local database which only contains the English and German sentences.)

Example:
The sentence "Tom hasn't come home yet." has no German translation, but there is a sentence "Jim hasn't come home yet." with the translation
"Jakob ist noch nicht nach Hause gekommen."
I read this, replaced e.g. "Jakob" by "Tom" and then you have the translation of
"Tom hasn't come home yet." = "Tom ist noch nicht nach Hause gekommen."

With this method I created about 880 new (Eng-Ger) translations. These are stored in a cvs-file (as UTF-8). Finally I have manually checked all these sentences in then cvs-file so there are ready to import. (It is possible that some of the sentences have already been translated in the meantime.)

How or who can import this csv-file to Tatoeba? As a normal contributer I can't do this but I read there is a way. It would be a lot of work to do this manually.
My user id is 73763.
piilota vastaukset
onurussia
3 päivää sitten
İs tehere a problem? I can not searching :(
piilota vastaukset
TRANG
3 päivää sitten
Yes there is some issue with the search engine. I disabled it for now. It's unclear what caused the crash and I'm not sure when we can restore the feature. Looking into it.
piilota vastaukset
TRANG
2 päivää sitten
The search is back! Sorry for the inconvenience.

The website crashed because too many sentences have been updated within a too short period of time. This caused an overload of work for our search engine and as a result, the random sentence selection (which relies on the search engine) was taking way much more time than it should. Since the random sentence is a feature of the homepage, which is the most visited page, everything went downhill.

We should be good now.
piilota vastaukset
Ricardo14
2 päivää sitten
Good news! Thanks a lot, Trang!
Hybrid
eilen
Thank you.
Ricardo14
3 päivää sitten
Is anyone else experiencing problems on Tatoeba? I mean, I'm not able to post sentences, it takes forever to load the home page and also the "Randon sentence" is not in there. Is this an ongoing upgrade?
piilota vastaukset
Hybrid
3 päivää sitten - 3 päivää sitten
+1 I'm getting automatically logged out of Tatoeba very quickly. I've been experiencing this problem for the past few months, but today Tatoeba is almost unusable.
belkacem77
3 päivää sitten
There is an update announced on Github
TRANG
10 päivää sitten
Anyone would be interesting in running an analysis on the quality of Tatoeba's corpus? The question to answer would be: what's the percentage of our sentences that can be safely considered as good/correct?

You could go for an empirical approach, for instance by looking through a random sample of 1000 sentences in your native language, reviewing them one by one and count how many of them have a mistake (and of course post a comment when you find mistakes). Let's say you find 50 sentences that have mistakes, it would suggest that potentially 95% of the sentences in that language are good and 5% are bad.

Or you could go elaborate your own criteria on how to detect that a sentence is good (if it's tagged "OK", if it's rated as "OK", if it has been commented and corrected, if it has audio, etc). Make some script to count the sentences that match your criteria, run it against the data from the Downloads page, and get some numbers of out it.

These are just some ideas. You can come up with any other approach that makes sense to you.

The goal is really not to get precise measurements but to get a sense of how much of our corpus is actually good: is it 80%, is it 90%, is it 99%? It would be quite interesting to compare between languages as well. Then we could perhaps try to extract some patterns on what helps and what doesn't help in making a good quality corpus.

Either way, I think it's an interesting question a lot of us would be interested to know the answer of, so I'm throwing out the idea and I hope it'll catch the interest of some of you :)
piilota vastaukset
Ricardo14
10 päivää sitten
I'm interested.
My approach would be: Tag as "OK" sentences which languages I speak and ask some other members to do the same.
Then we could compare how many of them were tagged (are good sentences) and how many weren't (*maybe* are not good sentences).
CK
CK
10 päivää sitten - 10 päivää sitten
1. If you assume that all native-speaker sentences are correct and natural-sounding, then based on 2018-11-24 data, it would be at least 78%.

78% (5611756/7139827)
Based on this http://tatoeba.byethost3.com/st...018-11-24.html

However, not all native-speaker sentences are good, since some have typos, or are non-natural-sounding, word-for-word translations.

The number would be higher, though, since some non-native sentences are good, and we have a number of sentences in Esperanto, Latin and other constructed and dead languages.

If you want to get an idea of the number of sentences in each language by native speakers, see http://bit.ly/nativespeakers
(Last updated October 13, 2018)


2. For English, I would say that the quality is at least 61%.

61% (702027/1149697)
Based on https://tatoeba.org/eng/sentenc...s/show/907/und

However, the actual score would be higher.
I don't add near-duplicate sentences to this list.
For example, I added these.
* Tom was desperate for attention.
* That's not going to fly with Tom.
But, I didn't add these.
* Mary was desperate for attention.
* That's not going to fly with Mary.

I don't add old-fashioned and archaic sentences to List 907.
I try not to add sentences that are potentially offensive and not appropriate for all ages and cultures.
And, I don't read very, very long, multi-sentence contributions.


3. 13% (917620/7183699) of our sentences had good ratings in the 2018-12-08 exported data.

Note that some members rate sentences in non-native languages, so perhaps this data can't be trusted too much. Part of this problem, perhaps, is that the rating system is called "collections."
See the number of ratings and which members have rated sentences "good.".
http://tatoeba.ueuo.com/2018-12...od-ratings.txt
belkacem77
9 päivää sitten
For Kabyle, I suggest @Amazigh_Bedar and he can suggest more names to help.
Bedar is a linguist (Kabyle and other berber languages)
deniko
9 päivää sitten - 9 päivää sitten
I'd say that whoever checks the sentences for their languages shouldn't be an active contributor, or at least not one of the top contributors for their language.

For example, for Ukrainian I would be mostly checking my own sentences. Which is not that bad, it's proofreading, but it's always makes more sense to have a fresh look, and it's more efficient.

Also, I think there could be a lot of categories of "good" and "bad" sentences.

For example, depending on the context, I can classify as good either some subset of the list below, or all of them:

1. Sounds like something a native speaker would say, no spelling mistakes, no punctuation mistakes.

2. Sounds like something a native speaker would say, no spelling mistakes, some problems.with punctuation (missing comma, missing full stop at the end, unnecessary comma, etc.)

3. Sounds like something a native speaker would say, contains a minor typo. (Tom has an elephannt.)

4 Sounds like something a native speaker would say, contains a spelling mistake typical to native speakers ("I know more then you.", "I would of done it.", "Your my friend.")

5. Sounds awkward, but still acceptable.

6. Sounds like something a native speaker of a different dialect of my language would say.
piilota vastaukset
Aiji
9 päivää sitten
I'm part of those who do not believe in "good" sentences. That is a very bad word, impractical to use.

That being said, following Tatoeba's rules, why would you classify as "good" any of 2, 3, or 4?
In 2, if the punctuation is really erroneous, that is not simply a matter of appreciation but a real mistake in the sentence, it needs to be corrected.
Similarly, 3 and 4 need to be corrected.

I think I see your point, but from the experiment point of view related to what TRANG was talking about, they would weigh as "bad" sentences in the balance to extrapolate to the whole corpus (of course, that is not possible, but I think you see what I mean).

As for proofreading, I agree with you. Proofreading is best done with a fresh set of mind.
piilota vastaukset
CK
CK
8 päivää sitten
Also, 5. Sounds awkward, but still acceptable. ....

If by "still acceptable", you mean to include obviously incorrect language use, but still able to communicate the speaker's intention, I would consider these not "good" for the Tatoeba Corpus.

In real life, these are totally acceptable when communicating with friends and maybe in many other situations, but I don't think many of us would think they are appropriate for anyone who wants to use them to study a language.
piilota vastaukset
deniko
8 päivää sitten
> If by "still acceptable", you mean to include obviously incorrect language use

I meant "awkward, but something a native speaker might write, and not correct themselves after re-reading it".

It's really difficult to come up with English sentences like this to me because I'm nowhere near native level, but in Ukrainian I've been coming across a lot of examples of such phrases. Basically, "bad style" or something like that.
deniko
8 päivää sitten - 8 päivää sitten
> That being said, following Tatoeba's rules, why would you classify as "good" any of 2, 3, or 4?

Yes, they need to be corrected, but they are still good according to my low standards. My standard is: if it can pass for a sentence written by a sober native speaker who had re-read it twice - it's good enough.

So I just wanted to know what the standards for "good" are. Something that needs correction? Then any single fault makes it "bad". Which is fine by me, "good" and "bad" are too broad of terms.

Basically, I would like to distinguish between something like "How much time?" (intended meaning - "What time is it?") and a genuine typo or even an error that a lot of native speaker make.

But if the idea is to put both "How much time?" and "I'm taller then him." into the same basket because they both need correction, I'd be fine with that as well.
soliloquist
9 päivää sitten - 9 päivää sitten
I have finished checking 1000 random Turkish sentences. The results are as follows.

- Sentences in good condition (no errors & natural-sounding) : 764 (76.4%)

- Sentences owned by non-native speakers: 3 (0.3%)

- Sentences with spelling or punctuation errors (accent letters are not taken into account) : 43 (4.3%)

- Sentences with other grammatical errors: 18 (1.8%)

- Sentences with translation errors (mistranslated words, tense errors, pronoun errors etc. ) : 37 (3.7%)

- Partly unnatural sentences (excessive pronoun usage, improvable word choices etc. - sentences that don't sound smooth) : 121 (12.1%)

- Completely unnatural sentences (literal translations, very strange word choices/orders etc. - sentences that a native speaker would never say) : 61 (6.1%)

They are more than 1000 (100%) in total because some of them fall into more than one category.
piilota vastaukset
deniko
8 päivää sitten - 8 päivää sitten
You've done a great job, and I liked how you created a lot of categories instead of just saying "good" or "not good".

> Sentences with translation errors (mistranslated words, tense errors, pronoun errors etc. ) : 37 (3.7%)

If we started checking this within the task defined by Trang, it would be impossible to complete.

Imagine a native speaker of English trying to determine which sentences are "good", and also check whether all 100 translations into 50 languages of each sentence is correct.

I think most of Turkish sentences are linked to English ones, but what if you come across something linked to say Ukrainian or Marathi, would you check the translation as well?

From my point of view, and according to my low standards, I'd say 94% of Turkish sentences are good sentences. This is the only category of "bad" sentences, IMHO:

> Completely unnatural sentences (literal translations, very strange word choices/orders etc. - sentences that a native speaker would never say) : 61 (6.1%)
piilota vastaukset
soliloquist
8 päivää sitten
Thank you for your remarks, deniko.

> I think most of Turkish sentences are linked to English ones, but what if you come across something linked to say Ukrainian or Marathi, would you check the translation as well?

Yes, most of them are directly-linked to English and most of the rest are indirectly-linked to English, so I didn't have much difficulty checking them. Sometimes I looked to Google Translate and Glosbe for sentences other than English to solidify my decisions. If I encountered an isolated unusual pair like Marathi-Turkish I would probably skip it.


> Imagine a native speaker of English trying to determine which sentences are "good", and also check whether all 100 translations into 50 languages of each sentence is correct.

I think English is a special case. Many of the English sentences are original. I don't think evaluating the quality of a corpus with translations linked to it later would be very fair, but I accept that's a logical dilemma. A cooperative and multilingual work is needed here. As Goethe said, let everyone sweep in front of his own door and the whole world will be clean.


> From my point of view, and according to my low standards, I'd say 94% of Turkish sentences are good sentences. This is the only category of "bad" sentences, IMHO:
> Completely unnatural sentences (literal translations, very strange word choices/orders etc. - sentences that a native speaker would never say) : 61 (6.1%)

I beg to differ. Some of those 'partly unnatural' sentences might be acceptable for language books to show/stress some aspects of the language, but from a purely native standpoint, they're not much different from the 'completely unnatural' sentences. In novel translation, for instance, such sentences would irritate readers. From my point of view, a good sentence should be indistinguishable whether it is original or a translation.
piilota vastaukset
CK
CK
7 päivää sitten - 7 päivää sitten
> From my point of view, a good sentence should be indistinguishable whether it is original or a translation.

I agree with this.

If the purpose of the Tatoeba Corpus, which I believe it is, is to provide people with sentences worth studying to learn a language, then this is something we should all strive for.

My suggested guidelines are as follows.

* Only translate into your native language.

* Only translate things you are 100% sure of.
** If you aren't sure, just skip the sentence.

* Only create natural-sounding sentences.
** Remember that people studying your language will study your sentences.
** Even if you know what it means, but can't make a natural-sounding sentence for the translation, just skip it.

* If you think the sentence is strange, don't translate it.
deniko
7 päivää sitten
> From my point of view, a good sentence should be indistinguishable whether it is original or a translation.

Of course it is. I don't think we have ever argued about that.

However, even without translation native speakers quite often produce awkwardly sounding sentences that they can either correct themselves, or leave as is. Not everybody is able to speak and write clearly and smoothly. So I thought you meant your "Partly unnatural sentences" were in that category.

If they clearly sounded like something a native speaker would never say, then it's just unnatural sentence, in my opinion.
piilota vastaukset
soliloquist
7 päivää sitten
> However, even without translation native speakers quite often produce awkwardly sounding sentences that they can either correct themselves, or leave as is.
> So I thought you meant your "Partly unnatural sentences" were in that category.

The difference may seem unclear, but I'd rather call them 'non-standard'. Other than that, we're on the same page, I think.
Aiji
6 päivää sitten
That's an interesting discussion, but we can go down forever and ever ^^
We can split "awkward" sentences into correct awkward, academic awkward, regional awkward, etc. With Tatoeba current system, they probably should be tagged, but not everybody thinks about it, or can, and in the absolute tags are here to give meta-information and help the person who would not understand why such a sentence is correct.

A simple example: The French « C'est quelle heure ? » is (clearly) grammatically incorrect. However, this is the natural way to ask the time in some regions... Hence, even "natural-sounding" sentences are up to debate. Etc. :)
AlanF_US
6 päivää sitten - 6 päivää sitten
> The goal is really not to get precise measurements but to get a sense of how much of our corpus is actually good: is it 80%, is it 90%, is it 99%? It would be quite interesting to compare between languages as well. Then we could perhaps try to extract some patterns on what helps and what doesn't help in making a good quality corpus.

I would say the following:
(1) the goal of extracting patterns on what helps and what doesn't is more important than trying to come up with a single number
(2) coming up with a number doesn't get us much closer to figuring out what helps and what doesn't
(3) if you have X people doing the counting, you'll have X systems for doing it

Having said that, I would like to toss into the mix my opinions about what makes a sentence valuable to me.

First of all, I want to point people to the wiki page "How To Write Good Sentences" ( https://en.wiki.tatoeba.org/art...ood-sentences# ), which captures my feeling that a good sentence meets the following criteria:
- clear
- self-contained
- likely
- standard dialect
- natural
- unlikely to offend

For my purposes, a really useful sentence meets these criteria and goes even further:
- contains words like "surprised" that are in the second or third tier of frequency, meaning that they're important but neither among the very first words learned (like "water") nor among the rare and quirky words that even native speakers are unlikely to use (like "whortleberry")
- demonstrates the meaning of the rarest/most advanced words in the sentence
- has some interesting characteristic that makes it easy to remember
- is not irritating

My model of an ideal sentence is (1) one that would be found in a good dictionary whose entries contain sample sentences OR (2) one that would be found in a good textbook, like the kind from which I learned languages in the 1980s and 1990s, before the online era but after the age during which the focus was on drilling or lecturing the student. (I've seen textbooks from the 1930s and 1940s with series of sentences like "The first month is January. The second month is February." or "Jane is a diligent student. She always does her homework." I think language instruction improved greatly after that time.) I see "textbook sentences" criticized here from time to time as being unnatural, but I believe that the critics are thinking of textbooks like the kind published in Asia by non-native speakers to teach English. In my experience, many language textbooks are written by people who are good at coming up with fresh, novel, natural sentences that demonstrate language as it is actually used.

Sentences of the form "I am [adjective]. You are [same adjective]. He is [same adjective]." are not particularly helpful to me. They're not interesting, and they don't demonstrate what the adjective means. Tatoeba is different from other crowd-sourced multilingual dictionaries in that it can provide sentences that give people a sense of the flavor of the words they contain. Sentences that match a simple template do not fulfill the site's potential. However, I realize that it's hard work to come up with sentences of the particularly useful kind I'm describing, and I myself spend far more time translating sentences that other people have written than coming up with new ones.
soliloquist
6 päivää sitten
The following are numbers of comments per 1000 sentences for the top 10 languages.

- English: 67
- Russian: 82
- Turkish: 24
- Italian: 14
- Esperanto: 180
- German: 177
- French: 143
- Portuguese: 60
- Spanish: 137
- Hungarian: 88

I know it's far from being a precise and reliable source to make a judgement as comments reporting wrong-flag errors or giving annotations and some other factors like low number of active speakers decrease its efficiency, but still it might give a rough idea about how actively and effectively sentences in these languages are checked and maintained.
CK
CK
3 päivää sitten
If you think that all native speaker contributions can be trusted, then here is a table that will give a minimum score for each language.

http://tatoeba.ueuo.com/stats18...agenative.html
CK
CK
4 päivää sitten
We now have over 518,000 audio files, up a little over 18,000 since November 12, 2018.

https://tatoeba.org/eng/audio/index

You can find each member's audio list with this search.
The lists with the most-recent changes are at the top.
https://tatoeba.org/eng/sentenc...direction:desc

See the last wall post about audio additions.
https://tatoeba.org/eng/wall/show_message/30710
CK
CK
8 päivää sitten
** Note to Spanish-English Advanced Contributors **

I suspect that many of these Spanish sentences with audio can be linked to indirectly-linked English sentences. If you have time, please help by linking those that match in meaning.

https://tatoeba.org/eng/sentenc.../show/6685/und
Ramin88
10 päivää sitten
Hello friends!
I am Ramin looking for a native English speaker.
piilota vastaukset
soliloquist
9 päivää sitten
Hi, Ramin.

Tatoeba doesn't work like HiNative or Lang-8. However, you can contribute by translating sentences into Persian or by creating original Persian sentences.

Here are the English sentences that are not translated into Persian.

https://tatoeba.org/eng/sentenc...o=&sort=random

I see you set Turkey as your country on your profile. If you're living in Turkey you can also try translating Turkish sentences into Persian. As neighbor countries we have so few linked Turkish-Persian sentences here.

https://tatoeba.org/eng/sentenc...io=&sort=words
Amastan
9 päivää sitten
Tamazight/Berber language resources

Dictionnaire français-tamazight de génie électrique
Mohand Mahrazi (University of Bejaia, Algeria)

Google books:
http://bit.ly/2Qa9dbe

French-Tamazight Dictionary of Electrical Engineering

This book will be very helpful in the translation of many scientific sentences into Tamazight. This one of the best reference dictionaries for technical terms.
keyboard_arrow_left 1234567...521