Wall (4132 threads)

<<< 1234567 >>
thorn0
8 minutes ago
The Ukrainian flag used on Tatoeba is wrong. The upper stripe has to be darker. You can find the correct colors on Wikipedia or on the site of the Ukrainian parliament.

https://en.wikipedia.org/wiki/Ukraine
http://rada.gov.ua

Where should I report such issues?
CK
CK
7 days ago
** Improving the Quality of the Tatoeba Corpus **

In a Wall post by sacredceltic (https://tatoeba.org/eng/wall/index#message_25321), he said this:
"Most of these sentences are unadopted because they're simply un-adoptable."

This is true for English and Japanese, too, but more so for English, since I've gone through the Tanaka Corpus English sentences and have adopted all the ones I felt were worthwhile. The same hasn't yet been done by anyone for the Japanese, so there are likely good sentences that could be adopted. One problem with the Japanese sentences from the Tanaka Corpus is that many of them are OK in the sense that they are written in the way that we often see translations of English written in textbooks in Japan. However, they aren't really what we would naturally say or write to convey the same idea. This makes them difficult to adopt.

Additionally, I frequently look through the newly-unadopted English sentences and adopt the good ones, so the percentage of good, unadopted English sentences is very low, I think.

As far as poor quality goes, it's not only the unadopted sentences that are the problem.

At least for English and Japanese, we have many non-native speaker members adding sentences that are as bad or worse than the unadopted sentences. Additionally, we sometimes have native speakers who add stilted word-for-word or unnatural-sounding translations.

Here is a list of members who have contributed in multiple languages for those who are interested.

http://justpaste.it/qyqx

We could likely drastically improve the quality of incoming sentences if current members would limit themselves to adding sentences in their own native language or dominant language, and also encourage new members joining the project to do the same.

Do you have any other suggestions on how we can improve the quality of the Tatoeba Corpus?
hide replies
pullnosemans
7 days ago - edited 7 days ago
an important wall post, I think. let's see if we can actually get something going.

I think the main problem is that tatoeba right now has no clear identity as to where it wants to go (this is only my impression, no implication on trang's vision or anything like that). I have the impression that the project has no real means to actually ensure any quality, its openness is as much a curse right now as it is a blessing.

maybe through a clearer division of roles among contributors things could become more stable. my ideas for features that would reduce the openness of the site, but might nonetheless be beneficial in the long run are:

- increasing the number of corpus maintainers and encouraging them to take more radical action if they think sentences need to be changed or deleted
- forming bodies of trusted contributors who are known to be able to create good sentences and translations and emphasizing their importance to the project
- introducing a feature where contributors can be labeled responsible for creating translations from one specific language into one or maybe several specific language(s)
- creating a forum open only to key members (advanced/trusted contributors, corpus maintainers, etc.) with features for discussion and creating an overview over stuff that needs to be done (evaluating sentences, etc.)

the effects of these features could be increased by limiting the contributing rights of unknown members (e.g. introducing a feature where their sentences have to be evaluated by a corpus maintainer or trusted contributor before being displayed for everyone to see) and advertising on other sites for people to contribute to tatoeba (e.g., for japanese speakers to take on the tanaka corpus and change all the sentences into good, natural japanese).

the second point leads to the question of money. is tatoeba right now creating any money that could be used for ads, or even small monetary rewards for contributors?

all these ideas are based on the notion to increase the motivation for competent people to invest time into making tatoeba into something more stable. I don't know the details about how e.g. wikipedia manages the cleaning up of their articles, but I just generally feel that this site could be much more than it already is, but is right now in a sort of identity crisis as it has become big enough for its ambitions to grow higher.
hide replies
pullnosemans
7 days ago - edited 7 days ago
one more thing about tatoeba that right now paralyzes the possibility for change is the fact that many bad sentences have links to good ones in so many different languages that it is impossible for a single person to change them while verifying that the new, better sentence still is a fitting translation for all the sentences it is linked to. therefore, no bad sentence like this can be changed without either having all linked sentences verified and if necessary changed by various contributors (which would be bullshit) or losing links to any language that the respective contributor does not know well.
sacredceltic
7 days ago - edited 7 days ago
In my view, the main issue is to avoid deterring new quality contributors by exposing the bad quality of some of the sentences that are shown on the service's main page, because this is the first contact with the service, and it should be immaculate.

So the random sentence that is shown on Tatoeba's main screen should only be an owned sentence, and possibly owned by a self-proclaimed native (although I'm afraid this would entice more contributors to lie about their native language...), or an OK-tagged sentence (tagged OK by a self-proclaimed native).

I would not change the retrieval rules that apply to searches, so even unowned sentences would be retrieved, so they can be adopted and improved.

Once quality contributors have joined, they progressively understand the issues and work toward a better quality rather than be deterred by the corpus' current state.
But they must first be made to join.
hide replies
Hybrid
7 days ago
"the random sentence that is shown on Tatoeba's main screen should only be an owned sentence"

I agree.
hide replies
CK
CK
6 days ago - edited 6 days ago
Personally I would like to see the random sentence taken off the main page.

Being random, you never know what's going to be there. If the first sentence a new visitor sees is something he or she finds highly offensive or is obviously incorrect, he or she will likely never come back to this website. We have had various contributors over the years who seem to delight in contributing sentences that are potentially offensive to a lot of people. Also, we have a number of members who treat this project as a playground and contribute sentences in languages they are not very good at.

Instead, I think members should be encouraged to use the advanced search function to find random sentences in a foreign language they know that are not yet translated into their own native language.

If you don't know how to do that, you can use this tool to help you create a page that you can bookmark for daily use.

Create a Customized Links Page for Tatoeba.org
http://bit.ly/tatoebastart
Hybrid
7 days ago - edited 7 days ago
"Do you have any other suggestions on how we can improve the quality of the Tatoeba Corpus?"

There's also the rating system that can be activated in the settings. Maybe this could be used to improve the quality of sentences in the future.

Edit: Also I think that we shouldn't be allowed to rate our own sentences.

It might also be a good idea to only allow native speakers, or very proficient non-native speakers, to rate sentences.
hide replies
sacredceltic
6 days ago
Well, I'm deeply opposed to this rating. Over the time, you will realise that the so-called "wisdom of the crowd" is nothing else than dumbness of the crowd.
If you want to convince yourself, visit the http://www.urbandictionary.com/ which is absolutely full of crappy definitions coined by people thinking they're smart and funny.

A language is not the result of democracy. A language is LEARNT.
Otherwise, schools and teachers would serve no purpose.
hide replies
CK
CK
6 days ago - edited 6 days ago
...
hide replies
sacredceltic
6 days ago
hey, my opinions are not subject to voting, either...
sacredceltic
6 days ago
hey, I was joking. It's not because I don't use emoticons to evidence it that I don't crack jokes...
wells
6 days ago
I generally only use the rating system to warn users that there's something odd or unnatural about the sentence, but I don't see a way to improve the sentence concrete enough to leave a comment. Not that it acts much as a warning if it's hidden by default.

I'm not sure why you think its use will somehow develop into an undesirable consensus. Can you elaborate on that?
hide replies
CK
CK
6 days ago
I find that adding sentences to a "my collection" of things I think sound wrong is strange. so I only "collect" sentences I think are good.
sacredceltic
6 days ago
>I'm not sure why you think its use will somehow develop into an undesirable consensus. Can you elaborate on that?

The proof of the undesirable consensus is in the pudding.

If you watch urban dictionary, for instance, you'll see that whenever a "smart" contributor coins a silly definition for a word, it's immediately voted up by many others, who think they're funny, and that is the way crappy definitions end up having a high rank.

But there are also countless exemples on Internet (and also in the media, alas) of wrong syntaxes or spellings that are progressively becoming dominant since uneducated users or non-natives come to outnumber educated natives, and their belief that what they write is correct is reinforced by what they see on Internet.

Famous French soccer players, for instance, who are notoriously illiterate, draw thousands of fans on their twitter accounts, ready to use and republish any ineptitude they write.

I know what you will retort : that's how languages evolve (or so people believe...), but it isn't true, otherwise schools and teachers would not even know what spelling or syntax to instruct.

Internet has changed it all, because now, illiterate people publish the most.

I know English may evolve a lot under this pressure, especially since English has few rules. But that is not the same in other languages to which far more rules apply, and where mistakes are more obvious in their regard.

In France and the UK, countries I know well (but it must be the same elsewhere, as far as I know), the way the language is used, relative to education, actually serves to screen people socially. Mistakes in language use are spotted immediately by upper-class people (those that are to grant jobs...)
So you may argue that all language uses are equal as long as they're popular, but that is just not real.
A parallel example in French is the following : A majority of French people mispronounce « Les haricots ». At first, you may say : "So what ? Then their wrong pronunciation is the correct one". But once you've said that, you haven't helped much the "mispronouncers" getting a job, because this mispronunciation actually works as a social/educational marker for French educated people. It tells them immediately what is the educational level of the person saying it...

A voting system would actually strengthen people in their mistakes, to their own detriment, creating havoc in language rules that would end up being impossible to teach.

PS: what would you say about voting for mathematical results ? Let's make a test.

2+2 = ?

a) 4
b) 2
c) 0
d) 5

I vote for c)
hide replies
al_ex_an_der
6 days ago - edited 6 days ago
2 + 2 = 0 ? I'm not quite sure.
Two plus two is many, I guess. And many may be not exactly zero. ;-)
But beside of that, you are right; the crowd isn't allways right.
Unfortunately, by far not always. :(
hide replies
sacredceltic
6 days ago
"many" is not an option. You'll have to set your own concurrent poll, to make things worse...
sacredceltic
6 days ago
for me, anything + anything = 0, because I'm a nihilist. I strongly believe all things emerge from nothingness and return to it. My theory is actually backed by most astro-physicists, nowadays.
c) option should definitely win.
wells
6 days ago
Well, that was a wall of text. I frankly don't see much relation to what I asked about ("how come a OK / Not OK system will lead to bad sentences") and what you replied. But I'll get a bunch of irrelevant things replied first.

> But there are also countless exemples on Internet (and also in the media, alas) of
> wrong syntaxes or spellings that are progressively becoming dominant

I'm not quite sure what you are talking about. I don't know French much but I mostly see 'quel' / 'quelle' changed to 'kel', kind of like the English 'your' becomes 'ur'. Nobody in their right mind would consider those to be correct. Well, nobody past primary school age.

There was someone earlier today commenting on French sentences with colloquial contractions, asking them to be changed to use "proper" language, by the way.

> I know what you will retort : that's how languages evolve (or so people believe...)

No, I was not going to retort that. I've seen that argument get used and think it's stupid.

> In France and the UK, countries I know well (but it must be the same elsewhere, as
> far as I know), the way the language is used, relative to education, actually serves
> to screen people socially. Mistakes in language use are spotted immediately by
> upper-class people (those that are to grant jobs...)

Same here. People who never learnt the most basic compound word rules are numerous. Not that I would advocate using the rules to learn compound words -- you seem to either subconsciously know them or you don't. You can probably practise with the rules to get your subconscious up to speed, or so I think. The part about employment I find odd though -- there are plenty of employers (white, blue, and pink collars) who don't themselves know the rules, so how would they judge the applicant based on that?

> A majority of French people mispronounce « Les haricots »

"Aspirated h", or so Wiktionary tells me. Pronounced [le a.ʁi.ko], and mispronunciation would then be [lez a.ʁi.ko] I guess?

Now, finally the meat of the question.

> If you watch urban dictionary, for instance, you'll see that whenever a "smart"
> contributor coins a silly definition for a word, it's immediately voted up by many others,
> who think they're funny, and that is the way crappy definitions end up having a high rank.

No, I see thousands upon thousands of words with a single definition without a single person ever having voted on it. Usually someone coining a pair of words with the intent to insult their friend.

I do not think it applies to Tatoeba at all. Either a sentence is good or it is not good. There is no popularity contest between sentences. Or are you going to argue that « J'ai eu un mal de tête. » is somehow competing with « J'ai eu mal au crâne. » ? That one of them is wittier than the other?

There are about 300 000 French sentences on Tatoeba currently. You'd need an audience of at least three million French speakers all rating sentences to cover a significant part of the corpus, not a single one commenting that they think something about a sentence is wrong.

I'm much more concerned about many sentences turning out somewhat unnatural due the virtue of being translated with the mind that each word of the original must be represented in the target sentence. "Word by word", if you will. It happens all the time here in my experience, especially on more complex sentences. I think learners should be steered clear of these sentences whenever possible. (Yet somehow the professional translators manage to produce fine sentences one after another. I applaud them -- translating is a tough job.)
hide replies
sacredceltic
6 days ago
>I do not think it applies to Tatoeba at all. Either a sentence is good or it is not good.

Good, so I misunderstood your initial comment and we agree on that. But don't go believing that we're the majority here. Far from it. I have had countless debates with contributors here, including many students in linguistics, who advocate freedom to write as they wish and democracy as a way to assess the validity of their sentences, the "Facebook" like ruling of languages as against Academies that they all consider to be bunches of useless old nutters (age being an-OK criteria to disparage people, if I understand them well...)
pullnosemans
5 days ago - edited 5 days ago
"If you watch urban dictionary, for instance, you'll see that whenever a "smart" contributor coins a silly definition for a word, it's immediately voted up by many others, who think they're funny, and that is the way crappy definitions end up having a high rank."

that's unfortunate. why exactly are you so convinced that tatoeba will show the same result, especially seeing as tatoeba is not a site with a humorous component like urban dictionary is?


"But there are also countless exemples on Internet (and also in the media, alas) of wrong syntaxes or spellings that are progressively becoming dominant since uneducated users or non-natives come to outnumber educated natives, and their belief that what they write is correct is reinforced by what they see on Internet."

so you are saying language use should generally be dictated by a small elite upper class?


"I know what you will retort : that's how languages evolve (or so people believe...), but it isn't true, otherwise schools and teachers would not even know what spelling or syntax to instruct."

I don't think I really understand this conclusion. what relevance do teachers play for a phenomenon such as language, which is acquired naturally by most humans?


"I know English may evolve a lot under this pressure, especially since English has few rules. But that is not the same in other languages to which far more rules apply, and where mistakes are more obvious in their regard."

have you ever counted the "rules" in english? do you have a statistic with an average of rules per language on the globe?
edit - added later: and in the framework of which grammatical theory are you making this quite bold claim, which even many reputed professors of comparative linguistics would never claim to have a deep enough understanding of the way language works in our brains to be able to make?


"A parallel example in French is the following : A majority of French people mispronounce « Les haricots ». At first, you may say : "So what ? Then their wrong pronunciation is the correct one"."

no, no, nobody says that, you are the one speaking about there being "the" correct one.


"But once you've said that, you haven't helped much the "mispronouncers" getting a job, because this mispronunciation actually works as a social/educational marker for French educated people. It tells them immediately what is the educational level of the person saying it..."

I don't know if this applies to all "educated people", or only to those who take delight in regarding their idiolectal variety of their native language as the only one with a right to exist.


"A voting system would actually strengthen people in their mistakes, to their own detriment, creating havoc in language rules that would end up being impossible to teach."

I would say languages are per se impossible to teach. they can only be learned, through careful observation. all teachers ever could do for me at least was give me access to material (including their own native output) for me to work with.
hide replies
sacredceltic
5 days ago
>so you are saying language use should generally be dictated by a small elite upper class?

I'm not saying it should. I'm saying it actually is. Upper classes will always define ways of differentiating themselves from the rest of the people. This is their very way to exist. And language is the prime differentiator. If you don't realise that, you don't know your own society.

>what relevance do teachers play for a phenomenon such as language, which is acquired naturally by most humans?

A pity. Teachers are supposed to teach language to your children. Maybe you escaped school. I agree that their relevance is more and more debatable at Internet age. However, they're still key to language education for most, and language education is the very base of education in general. Developed countries spend an awful lot of money on teachers. I hope they actually serve some kind of purpose...
I, for one (along with millions of other French children), learnt spelling and grammar at school.

>have you ever counted the "rules" in english? do you have a statistic with an average of rules per language on the globe?

No, and I don't need it. English has no rules for vowel pronunciation, for example, when Dutch, German, French or Spanish have. That's less rules.

>you are the one speaking about there being "the" correct one.

No, society does. The elite defines how things should be pronounced (usually, with good logical grounds...), not me. I'm just immersed in the society and I merely observe the phenomenon. But, yes I do think phonetic rules are handy, because they enable people to pronounce words they've never seen or heard before (which is not possible in English, but is in Spanish, Dutch, French...) and I agree to their imposition by the elite, because once you know the rules, disrespect of them sounds ugly.

A language is a protocol. It's subject to informal acceptance. It works exactly like politeness : Some people enter a restaurant without smiling and saying Good day, others don't. You can't prevent the attendance from judging which way is more appropriate and civil and to act accordingly.

>I would say languages are per se impossible to teach.

Then all these useless teachers should be made redundant and the building of schools should be stopped at once. In France, this is the state's number one spending. Every taxpayer would subsequently pay 30% less tax.
hide replies
pullnosemans
4 days ago - edited 4 days ago
ah, I see. so the central misunderstanding here is that you were actually mostly speaking of written language, when I thought you were talking about language in general. under this premise I can understand much better why you are saying what you are saying, because written language is largely a human construct and thus indeed primarily taught, not acquired naturally.
in this case, I can also agree that the system of english orthography is among the most erratic and inconsequent alphabet-based systems in the world, if not the most erratic (I assume this is what you meant when talking about english having fewer rules than many other languages). however, I still don't see how this would lead to english writing being more prone to language change.

and so you suggest that since we want tatoeba to serve an educational purpose and linguistic prescription by influential people is simply real, the site should have an orientation toward high-prestige conservative language varieties. I guess that's a relatable opinion, even if I ideologically oppose language prescription on an a priori basis.

glad we worked this out. I ask you to be more explicit about your focus on written language when making statements about the character or profile of languages, or talking about language education. I think it would make your point of view easier to understand and help create dialogue, especially for those whose perspective differs from yours.
hide replies
sacredceltic
4 days ago
>however, I still don't see how this would lead to english writing being more prone to language change.

Prescriptions frames and slows changes. When there are no prescriptions, changes occur more frequently and erratically.

>and so you suggest that since we want tatoeba to serve an educational purpose and linguistic prescription by influential people is simply real, the site should have an orientation toward high-prestige conservative language varieties. I guess that's a relatable opinion, even if I ideologically oppose language prescription on an a priori basis.

No I don't. I think you misread me.
I'm just saying languages are not democratic. Sentences are either correct or not, by the standards of a given period of time and space, and these standards happen to be more applied (and enforced) by upper-class people who tend to have a longer education, enabling them to better master these standards.

This works the same way in large developed societies and small primitive tribes.

If, for instance, you were an ethno-linguist and you would want to learn and conserve a recently discovered language from a tribe in Central Papua, would you rather learn it from the young from this tribe, asking them to vote for words meanings and correct syntax, although these young started speaking pidgin in the last 20 years, through contact with other tribes, because they find it cool to go global, or would you rather try to learn it from old tribe members who know the myths, and tell their stories to children ?

So if voting is no good for ethno-linguists, as a tool to comprehend a language, why should it be good for us ?

And by the way, my purpose on Tatoeba is not educational, but rather conservatorial (which is not contradictory with an educational use). I coin sentences from all registers of society and I store, on Tatoeba, sentences I heard or read and that are sometimes nowhere else on Internet.

I realised - funnily through a Google search that retrieved Tatoeba sentences - that my native language was mis-represented on Internet and this came as a shock to me and that is why I've been so involved.

This misrepresentation is caused, not only by the distortion brought by the fact that more uneducated or non-native people write on Internet than educated native ones, but also because only certain things are written (and sometimes only certain things are written specifically on Internet) while others are not, for instance, sentences from the intimate or childhood register, or local turns of phrases that people deem good enough to say but not to write, and that interests me much.

Internet is the global scene on which everybody wants to act, talented or not, and it sums up to a representation that is neither very natural nor very rich or diverse, let alone beautiful.
Impersonator
12 hours ago
> If you want to convince yourself, visit the
> http://www.urbandictionary.com/ which
> is absolutely full of crappy definitions coined
> by people thinking they're smart and funny.

I find urbandictionary.com extremely helpful when dealing with English slang expressions.

If urbandictionary.com is supposed to be an example of how 'wisdom of the crowd' doesn't work for linguistic data, it's not a convincing one.
hide replies
sacredceltic
12 hours ago - edited 11 hours ago
Sure, here is a typical example http://www.urbandictionary.com/...hp?term=snatch

You'll notice that the correct definition is only the 3d, with only 1419 votes up, when the 1st got 4066.
That means that 4066 idiots took the pain to do this.
It's a clear proof that so-called "crowd-wisdom" is just crowd-idiocy. And it's massive.
I can't see why Tatoeba would be different, since the same people use both services...
hide replies
cueyayotl
11 hours ago - edited 11 hours ago
>> Sure, here is a typical example http://www.urbandictionary.com/...hp?term=snatch

It refers you to another word ('cunt') and gives you an example sentence (which should give one the idea of what the word means already). When you click on the word 'cunt' it sends you to its page, and after seeing that the definition 'woman' doesn't make sense in the original example sentence, we have the definition: "A synonym for a woman's genitalia, vagina, pussy, etc." Now we have learned the definition of the word, as we wanted.

'Wisdom of the crowd' works fine here. Language is not an exact science, nor does it necessarily follow logic; a sentence being correct in a language cannot be compared to "1+1=2" or "1+1≠2".
hide replies
sacredceltic
10 hours ago
I don't know what you're talking about with your "logic".
The topic here is "crowd wisdom" to be applied to Tatoeba sentences.
I just provided a good exemple of crowd idiocy, in an Internet linguistic service, parallel to what Tatoeba does.
hide replies
cueyayotl
5 hours ago
-- I don't know what you're talking about with your "logic".

>> PS: what would you say about voting for mathematical results ? Let's make a test.

>> 2+2 = ?

>> a) 4
>> b) 2
>> c) 0
>> d) 5

>> I vote for c)

-- I just provided a good exemple of crowd idiocy

>> It [Urbandictionary] refers you to another word ('cunt') and gives you an example sentence (which should give one the idea of what the word means already). When you click on the word 'cunt' it sends you to its page, and after seeing that the definition 'woman' doesn't make sense in the original example sentence, we have the definition: "A synonym for a woman's genitalia, vagina, pussy, etc." Now we have learned the definition of the word, as we wanted.

Maybe there are examples of crowd idiocy on the site, but there just aren't any in this particular entry. Could you please provide another example? Thank you in advance.
hide replies
sacredceltic
4 hours ago
Crowd iodiocy is when 4066 people vote up the definition that is not the more complete and accurate. Only 1419 voted up the complete and accurate definition, which ranks only 3d.

That proves brilliantly that the public, on linguistic Internet sites, is not wise, and prefers to crack bad jokes and ridicule the service that is supposed to be rendered.

Hence my conclusion : crowd wisdom is crowd idiocy. QED

Only to you doesn't it look obvious, bizarrely... Logic, you were saying ??
hide replies
wells
3 hours ago - edited 3 hours ago
May I point out that the first definition had a four-year head start?

And that the place is called *Urban* dictionary, which aims to define slang and colloquialisms? Those senses that are not found in ordinary dictionaries.

And that none of the verb senses are slang, and thus not really at home on the site? "to snatch" has been in English for about a thousand years, if not more.

I think you are reading too much into this.
hide replies
sacredceltic
2 hours ago - edited 2 hours ago
>"to snatch" has been in English for about a thousand years, if not more.

English is not that old...

>I think you are reading too much into this.

Precisely not.

First, "Urban" doesn't equate to colloquialisms in my dictionary.
It means "from the city", as its latin urba root points to.
I don't think "snatch" means more "vagina" as a first thought, in cities than in the countryside. Or maybe you think rednecks are ignorant ?

The fact remains, nevertheless, that 4000+ people took the pain, on purpose, to vote up a definition in a dictionary "from the city" that is provocative and very narrow-minded.

There is no reason to think that people would proceed differently on Tatoeba. I know millions of teenagers who would laugh their brains out voting up the most ridiculous sentences, and that would result in a ridiculous exhibition of counter-example sentences, precisely the contrary to what Tatoeba is trying to promote.

Not everybody is satisfied that the definition of "snatch" comes up with "cunt" as the only definition in an online popular dictionary on Internet.
But Okay, you might find it funny (if you're under 25...)
But do you realise that search engines actually take it for granted ?

Well, Tatoeba is the same. When a sentence is wrong, it shows up in Google searches anyway, and people subsequently take it as is.
And Tatoeba is very well indexed by search-engines, since Tatoeba has a serious and consistent purpose that makes it much favoured by these engines (search engines are optimised to downrate inconsistent websites, that are usually redirecting to unwelcome advertising...)

Yesterday, people used to learn languages from books that where edited (ie, corrected). Today, they learn it from Internet, where anything is and will be written, without any correction (just read Wikipedia or Twitter if in doubt...)
I'm not saying we can prevent Internet from being full of crap, this is already hopeless. But at least, we can prevent that the most favoured results from smart search engines are not only crap. Otherwise, I can't tell you at what rate our languages are going to degrade.
TRANG
4 days ago
​The problem of quality is a long time problem and many things have already been suggested and discussed in the past. I'll try to cover the various points that have been mentioned int his thread.


1. Unadopted sentences

Unadopted sentences are a bit difficult to deal with. They are usually not bad to the point that it's clear they should be deleted, otherwise they would simply be deleted, but they are often bad enough that nobody feels like taking care of them.

Removing them from the corpus is an option, but doesn't sound like the best option.
I am not against the idea though. I had discussed, a while ago, the case of the unadopted Japanese sentences and asked what if we simply delete them? The answer I got was basically that they are not harmful to the point that they should be deleted. Therefore in the case of Japanese, we will keep them.
But if, for another language, the community considers that the unadopted sentences are basically useless should be all deleted, I would not necessarily reject the idea.

That being said, I agree that we should make them less accessible to the lambda users. We already do it by displaying unadopted sentences after owned sentences in the search results.
But we should also, as it was suggested, not display them in the random sentences.
We should as well probably prevent users to translate unadopted sentences since there are many owned sentences that are waiting to be translated.


2. Sentences contributed by non-native members

The idea of limiting contributors to contribute only in their native language cannot be applied to all languages in my opinion. It cannot be applied to all contributors either.

There are languages for which we have almost no chance to find native contributors, but for which we have passionate non-native contributors who want to do what they can to document these languages. I don't see a good reason to discourage that. Sure, the sentences may suffer in terms of quality, but does quality matters so much in this case?

There are users who are capable of creating correct sentences in foreign languages, even though they are not native speakers of the language. There are users who have connections with native speakers, and have the possibility to ask these native speakers to correct their sentences before they submit them into Tatoeba. As long as a contributor is careful enough, I wouldn't stop them from contributing in a foreign language.

Also being a native speaker does not ensure good quality. Some native speakers can contribute pretty bad sentences. And even good contributors can make mistakes when adding new sentences or translating sentences.

So I don't think we could make it a general rule, to limit members to contribute only in their native language. This is something that should rather be decided case by case, for each user.

One thing we might need is to set up some better, more scalable, mechanism to temporarily ban contributors from adding sentences in a certain language because the average quality of their contributions in that language is below standard.

At the moment the only thing we have is a functionality that is available only to admins: admins can set the "level" of a user to "-1". As a result, the user is not able to add or translate any sentences anymore. They can can only edit their current sentences.
The fact that only admins can access this functionality, and that we don't have enough admins, and that there's no way to customize the restriction to specific languages, makes this functionality suboptimal.


3. Corpus maintainers and trusted users

Pullnosemans suggested that a possible way to improve quality is to increase the number of corpus maintainers and form bodies of trusted contributors. I agree that the number of corpus maintainers and trusted contributors has (or at least should have) a positive influence on the quality of the corpus. I'm all up for having more corpus maintainers and advanced contributors. But how do we achieve this? And how do we make sure that we are promoting the right people?

At the moment we lack people who want to or can invest time and effort into building a stronger community of contributors. We probably also lack people who are motivated to take new responsibilities, or who have the right mindset, knowledge and skills to take these responsibilities.
We can't just accept anybody for the role of corpus maintainer, and we can't force contributors to be corpus maintainers either. There are people who want to become corpus maintainers but are not ready for it. There are people who would be great corpus maintainers but don't want to take the responsibilities.

Maybe there are certain things that we're doing wrong and we could fix, to create a more engaged community. But I can't see this happening if we don't have very dedicated and active community admins.

On a side node, I want to clarify that corpus maintainers actually do have the possibility to take radical actions if they think a sentence needs to be changed or deleted. Our guidelines for corpus maintainers is that they first post a comment, then wait two weeks in order to give the sentence's owner, as well as other members, the time to react to the comment. If there is no reaction after these two weeks, then the corpus maintainer can freely change the sentence or delete it.
This is only a guideline though. Corpus maintainers can skip the two week delay, and delete or edit a sentence right away, if they consider that an action needs to be done urgently.


4. Money

When it comes to quality, I'm not convinced money can do a lot for the project. I don't reject the idea of rewarding competent contributors, or of having people being paid to maintain the corpus as full time or part time job. But there are several problems.

Obviously there is the problem of funding: where do we find the money? Can we even gather enough funding to reach the level of quality that we're dreaming of? How much would we have to invest? How much would you like to be paid, to correct sentences in the corpus?

Now even with all the money in the world, how would we evaluate that someone is actually competent? How do we evaluate that someone is doing a good job at improving the corpus' quality? How do we know that we're not just wasting money?

There will also be problems of prioritization: what will justify that we invest more money in a certain language than on another?

We do have some funds, from donations, that we could use for something else than paying the server. But nothing substantial. We can of course try to run a donation campaign, if we have a clear plan of what to do with the money that we would raise. But right now I don't see any clear plan.


5. Sentence ratings/collections

The sentence ratings/collections (I'll call it ratings for the rest of this message) is, for me, an important feature to improve quality and it has been implemented with the problem of quality in mind. This is not about implementing a voting system. This is about providing an infrastructure for contributors to evaluate the quality of the sentences in the corpus, and for users/learners to have a better indicator to decide if they can to rely on a sentence or not.

​One feature that was suggested years ago was to have some sort of secondary corpus. The secondary corpus would basically be a space where users would be more "free" to add poor quality sentences. This would be the place where sentences from new users would be stored, and as well sentences from users who are contributing in their non-native languages. And if after verification, a sentence from the secondary corpus is considered to be actually good, it can be moved to the main corpus.

For me, the current Tatoeba corpus is that "secondary corpus". We don't have the "main corpus" yet but we should find a way to build it. For that, we would need to define criteria that would help us decide what is worthy to go into the main corpus.
For instance we could start with this criteria: all the sentences from corpus maintainers, that are written in the native language of the corpus maintainer, are worthy to be in the main corpus. We could extend the criteria to advanced contributors.
But then what about the rest of the contributors? And what about sentences from advanced contributors or corpus maintainers that have mistakes?

This is where the ratings system comes into play. It should provide a more accessible and standard a way for contributors mark sentences as they explore the corpus, to express their opinion about the quality/correctness/reliability (whatever you want to call it) of the sentences that they see.​ These opinions would serve an extra criteria to decide whether or not a sentence can go into the "main corpus".

The rating system is currently still in the experimental phase. And unfortunately, it probably will remain in an experimental phase for another year or so because I personally won't have a lot of time to dedicate to it and there are other topics that I consider higher priority. But who knows, maybe this year we will have new developers joining the team who will be motivated and inspired to work on this problem.


So that was a long rant... There is a lot more to to say on the topic but I won't have time for it. I hope nonetheless that it gives everyone a better vision of where Tatoeba stands when it comes to quality.
hide replies
sacredceltic
4 days ago
Je suis toujours totalement opposé à  votre système pseudo-démocratique d'appréciation de la qualité des phrases.
Le fait d'être un administrateur du corpus ou un gestionnaire du corpus n'est pas un gage de qualité, ni non plus que plusieurs contributeurs, par ailleurs peut-être ignorants, approuvent une phrase.

Une phrase est soit correcte ou incorrecte. Ça ne dépend pas de l'appréciation de qui que ce soit. C'est juste un fait.
tommy_san
3 days ago
> I had discussed, a while ago, the case of the unadopted Japanese sentences and asked what if we simply delete them? The answer I got was basically that they are not harmful to the point that they should be deleted. Therefore in the case of Japanese, we will keep them.

There are actually some (though not many) sentences that I find plainly wrong or clearly unnatural, and thus harmful. I rate them "not OK" and "unsure" (even though I'm not unsure about anything) respectively to warn other users. However, most people don't see my ratings, so these sentences keep getting translated, especially often by new members.

If the community thinks it's better for me to delete these sentences, I can do so. In that case, you'd need to excuse me for accidentally deleting sentences that are correct in some variety of Japanese I'm not familiar with, or even ones that are correct in standard Japanese that include a word or phrase I don't know. You'd also need to excuse me for deleting sentences that could be turned into good example sentences with some changes. I don't have the time or ability to improve them and make sure the new sentences match all the translations, and there are many sentences that, in my opinion, wouldn't make good standalone example sentences anyway.

By the way, I think it's really important for us to tell new members what to translate and what not to translate. Since we have both good and bad sentences, they should translate only when they're sure it's a good sentence. If they cannot judge the quality of sentences themselves (which is the case for many non-native speakers), it's better to choose sentences owned or tagged/marked OK by a self-identified native speaker. Whenever I notice, I tell this to members who translate bad sentences, but it's something every contributor should keep in mind.

I also wonder if we could develop a set of good sentences that contributors of every language could consider translating. The set would surely include sentences like "Hello" and "Thank you", but it doesn't have to be phrasebook-like. It could include any sentence that you find good and real and makes good sense out of context (such as "You made the mistake on purpose, didn't you?" and "Does this dress make me look fat?"). It's not that all contributors should translate from this set, but if they don't have a particular preference, it might be better for them (especially contributors of sentences with few sentences) to translate sentences from such a set than to simply translate recent or random sentences, which are often not very good.
hide replies
pullnosemans
3 days ago - edited 3 days ago
"There are actually some (though not many) sentences that I find plainly wrong or clearly unnatural, and thus harmful. I rate them "not OK" and "unsure" (even though I'm not unsure about anything) respectively to warn other users. However, most people don't see my ratings, so these sentences keep getting translated, especially often by new members.
If the community thinks it's better for me to delete these sentences, I can do so. In that case, you'd need to excuse me for accidentally deleting sentences that are correct in some variety of Japanese I'm not familiar with, or even ones that are correct in standard Japanese that include a word or phrase I don't know. You'd also need to excuse me for deleting sentences that could be turned into good example sentences with some changes. I don't have the time or ability to improve them and make sure the new sentences match all the translations, and there are many sentences that, in my opinion, wouldn't make good standalone example sentences anyway."
(tommy_san)


as a non-native speaker both of japanese and english, I can say that I would absolutely be in favour of your doing so. I have the impression that tatoeba lacks japanese speaking members who are reliable and willing to take action, and that this is the main reason why the japanese corpus is still so corrupted, so I think it would absolutely be worth paying the price of having some potentially salvageable sentences be lost if we can get a start on the cleaning up of the japanese corpus for it.

and if you ask me, any english tanaka sentence not yet adopted can be deleted right with the japanese ones. "The outcries of the angels go unheard by ordinary human ears." may be a sentence you could potentially come across in poetry or lord of the ring style fiction, but without context on a site such as this, it just sounds silly to me.

I generally think we need to become bolder in cleaning up this page, even if that means that we lose some material that could maybe eventually at some point in time be useful. this goes for any language with a great number of bad sentences right now.

I'd rather have a change now, and then be able to build up from that with revised concepts.
hide replies
wells
3 days ago
> "The outcries of the angels go unheard by ordinary human ears."

Not a great example though. There was a Tanaka contributor or three who input short phrases he/she had written in his/her notebook, along with their translations. None of them were full sentences, or were even conjugated to form a sentence. 天使の叫び → "Angel's cry out" (#125025, #278968) was one of those pairs. Someone here tried to salvage the Japanese to make a full sentence, which eventually lead to the English translation that you quoted.

I'd prefer that the Japanese, if odd, be fixed or new translations supplied, but I understand that our most prolific contributor is rather antagonistic towards the idea. I don't blame the contributor -- it's a lot of sentences and a lot of work. We'd need a hundred active contributors working on the sentences to make a dent.

Anyway, anyone proposing the deletion of old unadopted Japanese sentences would first have to ask JimBreen well before enacting a plan of that sort. His dictionary is dependent on the indexing stored on Tatoeba and if you delete a sentence, the indexing goes as well, if I'm not mistaken.

Also, any deletion of a well-known language such as English will likely disperse the other translations based on it. For example, if you delete #63882, the Italian, Hebrew and Macedonian sentences will have no links and will no longer be grouped together.
hide replies
pullnosemans
3 days ago
everything you say is true, and the problem of losing links might be the biggest issue to take care of if we want significant change, but as I said, I think we should suck it up and deal with these issues as best as we can rather than being like, "yeah, there's too many problems, we can't really do anything right now, let's talk about it again next year".
CK
CK
2 days ago - edited 2 days ago
> I also wonder if we could develop a set of good sentences that contributors of every language could consider translating.

This could be done right now with lists. A collaborative list would be possible, but that opens up the possibility of accidental or malicious additions and deletions.

Instead, I would suggest that each member interested in this idea should create a list and add sentences to it as they find such sentences.

Here is an example of a list that I already have that is very similar to what I mean.
https://tatoeba.org/eng/sentences_lists/show/4003

Hopefully, we will eventually have the possibility to search lists. (See https://github.com/Tatoeba/tatoeba2/issues/767). We could then search for sentences on these lists that are not yet translated into our own native languages. If we are also allowed to search more than one list at a time, we could choose which members' lists we felt we could trust, and then do our searches on more than one list at a time.
pullnosemans
3 days ago - edited 3 days ago
"I had discussed, a while ago, the case of the unadopted Japanese sentences and asked what if we simply delete them? The answer I got was basically that they are not harmful to the point that they should be deleted. Therefore in the case of Japanese, we will keep them.
But if, for another language, the community considers that the unadopted sentences are basically useless should be all deleted, I would not necessarily reject the idea."
(TRANG)

as for this, see my reply to tommy_san's post above. I generally think it's better to lose some not-completely-downright-useless material than to stay inactive and get no improvement to the current problematic situation in the corpora of some languages (english, french, japanese, korean come to my mind).


"So I don't think we could make it a general rule, to limit members to contribute only in their native language. This is something that should rather be decided case by case, for each user."
(TRANG)

I agree. I also think this could well be combined with the creation of a more clearly defined body of trusted/responsible users.


"I'm all up for having more corpus maintainers and advanced contributors. But how do we achieve this? And how do we make sure that we are promoting the right people?
At the moment we lack people who want to or can invest time and effort into building a stronger community of contributors. We probably also lack people who are motivated to take new responsibilities, or who have the right mindset, knowledge and skills to take these responsibilities.
[...] There are people who want to become corpus maintainers but are not ready for it. There are people who would be great corpus maintainers but don't want to take the responsibilities.
Maybe there are certain things that we're doing wrong and we could fix, to create a more engaged community. But I can't see this happening if we don't have very dedicated and active community admins."
(TRANG)

we could have a one-week long poll that is advertised on the front page where users can state that they are willing to become part of an engaged, clearly defined community of people who work to systematically improve this site, by clear guidelines and clearly divided responsibilities. anyone who wants to be a part of it and doesn't seem untrustworthy picks or is assigned a certain job (maybe more important jobs for people already known for their good work on the project), and their activity is watched by the other members of the team. if anyone performs badly or there are problems, it is discussed with them in a group, and if no consensus is found, they can lose their responsibility. this doesn't mean, however, that they cannot regain it or get another one. I think the best way to verify that you're "promoting" the right people is to have a certain fluidity in the system, and have everyone be aware that they are collaborating in a team.
I, for one, would be very happy to be a corpus maintainer for german, since improving the sentences already existing interests me more than creating new ones. I once asked for this, but was told that german already had enough maintainers, which was fine with me (though I thought, "can you have too many maintainers?").


"4. Money"
(TRANG)

for now, I would say that recruiting new contributors via ads (possibly in combination with the poll for creating a dedicated core of contributors as I described above) would be more beneficial than paying people for contributing. this site does run on an open source concept, after all. everything else can be pondered over when there actually are immediate perspectives for getting a significant amount of funding.
odexed
yesterday - edited yesterday
Some sentences tagged 'OK' but unadopted: http://tatoeba.org/eng/sentence...amp;sort=words

If you are a native speaker, you can adopt them. It's also easy to filter them by language, for example,
English - http://tatoeba.org/eng/sentence...amp;sort=words

Spanish - http://tatoeba.org/eng/sentence...amp;sort=words

French - http://tatoeba.org/eng/sentence...amp;sort=words

Japanese - http://tatoeba.org/eng/sentence...amp;sort=words
hide replies
Ricardo14
yesterday
Thanks, odexed!
Pfirsichbaeumchen
3 days ago - edited 3 days ago
►New Corpus Maintainer and Advanced Contributor Candidates
►Neue Korpuspfleger- und fortgeschrittene Mitarbeiterkandidaten
►Novaj kandidatoj, kiuj volas iĝi frazara bontenanto aŭ progresinta kontribuanto

Raggione:
https://tatoeba.org/user/profile/raggione

Wezel:
https://tatoeba.org/user/profile/Wezel

[ENG] John (raggione) asked to be a corpus maintainer for German some time ago. He has been a very active and helpful member for a long time and would help make necessary corrections to German sentences of contributors who no longer respond.

Wezel would like to become an advanced contributor. His native language is Russian, and he has also been around for nearly a year already. As an advanced contributor, he would help with linking and tagging Russian sentences.

As always, please feel free to share your feedback using the link below.


[DEU] John (raggione) hat vor einiger Zeit schon gefragt, ob er zum Korpuspfleger für das Deutsche ernannt werden könnte. Er ist seit langem schon ein sehr aktiver und hilfsbereiter Mitarbeiter und würde dabei helfen, deutsche Sätze, deren Autoren nicht mehr reagieren, mit notwendigen Korrekturen zu versehen.

Wezel möchte fortgeschrittener Mitarbeiter werden. Seine Muttersprache ist das Russische, und er ist auch schon fast ein Jahr lang mit dabei. Als fortgeschrittener Mitarbeiter würde er bei der Verknüpfung und Etikettierung russischer Sätze mitwirken.

Zögert auch diesmal nicht, uns mit Hilfe der untenstehenden Verknüpfung eure Rückmeldungen zukommen zu lassen.


[EPO] Jam antaŭ kelka tempo John (raggione) demandis, ĉu li povas iĝi frazara bontenanto por la germana lingvo. Jam delonge li estas tre aktiva kaj helpema kontribuanto. Li helpus atribui necesajn korektojn al germanaj frazoj de aŭtoroj ne plu reagantaj.

Wezel volas iĝi progrensinta kontribuanto. Lia gepatra lingvo estas la rusa, kaj ankaŭ li ĉeestas jam preskaŭ dum unu jaro. Kiel progresinta kontribuanto, li helpus ligi rusajn frazojn kaj aldoni etikedojn al ili.

Ankaŭ ĉi-foje Ne hezitu sendi al ni mesaĝon pri via opinio per la suba ligilo.

[1] http://tatoeba.org/private_mess...sichbaeumchen.
wallebot
3 days ago
Stackexchange.com

Tatoeba don't have a forum and mybe somebody like to know a good one.

Stackexchange.com is a high quality forum for many subjects, languajes included.
It don't have many thant tatoeba, but it can helps.
By example
http://spanish.stackexchange.com/
http://portuguese.stackexchange.com/
http://french.stackexchange.com/
An others
I hope it helps.
sacredceltic
4 days ago
*** césure incorrecte sur les phrases de Tatoeba ***

Je remarque, depuis quelques jours, que les mots sont coupés de manière bizarre dans l'éditeur et à l'affichage des phrases de Tatoeba, qui, jusqu'à présent, respectait l'intégrité des mots, qui appraissaient sur une même ligne (quand ils n'étaient pas trop long)
Que se passe-t-il ?

Exemple ci-dessus, dans mon éditeur de commentaires, le mot « apparaissaient » est soudain coupé après ss...
hide replies
gillux
4 days ago
Effectivement, c’est très curieux et ton problème m’a donné du fil à retordre. Mais j’ai trouvé. Dans ton message, tu n’écris qu’avec des espaces insécables (caractère unicode 202f), aussi ce que tu tapes est considéré comme un seul long mot. Seul le trait d’union de ton « ci-dessus » autorise une césure chez moi.
hide replies
sacredceltic
4 days ago
Vraiment bizarre, parce que je tape avec le même clavier qu'avant, je n'ai rien changé. Comment se fait-il que ce soit désormais considéré comme insécable par Tatoeba ?
sacredceltic
3 days ago
Bon, j'ai peut-être une piste, mais je ne comprends pas pourquoi le problème est soudain survenu.
Sur les conseils de Pharamp, j'ai utilisé l'application Mac OSX Ukelele pour personnaliser mon clavier, ce qui me permet d'écrire en plusieurs langues, dont l'espéranto, avec un seul dessin de clavier. Je trouvais ça génial.
J'utilise actuellement la version 10.11.1 d'OSX, El Capitan (qui est vachement mieux que les précédentes, mon Mac connaît une nouvelle jeunesse...) et lorsque j'ai migré mon Mac vers cette version, je n'ai pas noté de changement de comportement dans Tatoeba.
Il semble donc que des mises à jour ultérieures « invisibles » aient changé le comportement du clavier personnalisé.
Lorsque je passe en clavier français standard, le problème semble réglé. Mais pas avec mon clavier personnalisé, qui fonctionnait pourtant à merveille auparavant.
Je pense ne pas être le seul contributeur à employer Ukelele, donc le problème risque de se reproduire pour d'autres.
Les arcanes des mises à jours Apple semblent de plus en plus insondables.
En attendant, je ne peux plus écrire en espéranto. Sniff.
sacredceltic
3 days ago
Bon, finalement, j'ai redémarré ma machine et mon clavier perso semble fonctionner à nouveau. Il doit y avoir quelque chose (un site...) qui a modifié mon clavier en mémoire. C'est bizarre.
Le problème, c'est que j'ai créé plein de phrases avec des espaces insécables à la place des normales.
Une idée de ce que je peux faire pour les changer en masse ?

À ce propos, il y a toujours le problème, dans les phrases françaises, de phrases en double avec version espace insécable ou normale avant les points doubles ou avant et après les guillemets.
Il y a aussi des doubles (et pas qu'en français), avec les apostrophes informatiques (') et les apostrophes typographiques (’), voir celle-ci (′)

Il faudrait normaliser ça. Idéalement, à la source, à l'insertion de la saisie.
sacredceltic
4 days ago
*** Liste des mots du français absents de Tatoeba ***

sysko avait fait l'exercice, par le passé, d'extraire les mots du français, qu'il avait dû prendre dans un dictionnaire libre, qui n'étaient pas représentés dans les phrases de Tatoeba. J'aimerais bien réactualiser cette liste.

Quelqu'un aurait-il déjà réalisé cet exercice ou pourrait-il suggérer une méthode pour le faire ?

Merci
hide replies
gillux
3 days ago
J’ai essayé et je suis arrivé à ça: http://downloads.tatoeba.org/not_in_tatoeba/

C’est du bricolage, mais c’est utilisable. Je me suis basé sur le dictionnaire français de hunspell pour générer la liste « all ». Comme elle contient toutes les formes féminin/masculin, masculin/pluriel, conjugaisons etc., elle est très longue. Aussi j’ai tenté de l’ordonner et la filtrer à l’aide de listes de fréquence de mots que j’ai pu trouver sur le net. J’ai ainsi créé trois listes qui sont des sous-ensembles de « all »:

• 10k_lexique, combinée avec la liste de lexique.org et limitée aux dix mille premiers mots. C’est de loin la meilleure.
• 10k_opensubtitles, combinée avec une liste de fréquence basée sur les sous-titres d’opensubtitles.org. Ça donne une compilation de mots tout droit sortis des séries télé américaines. C’est plutôt moyen mais j’ai beaucoup rigolé en la lisant alors je la laisse.
• wortschatz, combinée avec la liste de fréquence de Wortschatz. Cette liste semble avoir des défauts et elle est courte donc le résultat est limité, mais exploitable.

Chaque liste est au format texte brut et HTML. Le contenu est le même. Dans la version HTML, un lien permet de chercher le mot sur Tatoeba. Ça permet de voir si des mots de la même famille sont déjà présents dans le corpus.
hide replies
sacredceltic
3 days ago
Merci, c'est sympa, surtout d'avoir les liens vers les recherches Tatoeba.
On voit qu'il y a un boulot énorme et ça prouve que les phrases de Tatoeba manquent cruellement de variété, en termes de vocabulaire.
wallebot
4 days ago
Mistakes about automated translation.
Al añadir una frase la bandera no corresponde con el español.
No me molesta, pero tal vez sean utiles para mejorar el sistema de reconocimiento de idioma.

When I add a sentence automated flag was mistaked, don't recognice Spanish. I correct it, by hand.
Not interlingua: http://tatoeba.org/spa/sentences/show/4877679
Not Hugaro: http://tatoeba.org/spa/sentences/show/4877646
Not Kotaba: http://tatoeba.org/spa/sentences/show/4877652

I don't know if system learn when automated flag is changed by user.
It do it?

Thanks, for atention.
hide replies
wallebot
4 days ago
Not german: http://tatoeba.org/spa/sentences/show/4877685
I think there are a problem with recognition. I suposse its temporal.
sacredceltic
4 days ago
The system, alas, seems not to learn anymore...
sharptoothed
4 days ago
** Sentences & Translations Stats **

These stats have been updated.

http://tatoeba.j-langtools.com/transtop/

Various graphs have been updated, too:

http://tatoeba.j-langtools.com/graphs.html
hide replies
CK
CK
4 days ago - edited 4 days ago
As always, thanks for doing this.

Just for fun, I created a bar graph for the daily contributions for the last 4 months.

http://prntscr.com/9xm3kg

We had 5 days last month with over 3,000 contributions. It's been about 3 months since we've had days with over 3,000 contributions per day.
hide replies
CK
CK
4 days ago - edited 4 days ago
I wonder if other members feel that all these Wall posts with just a URL seem like spam.
I'd suggest that you delete them.
Or at least consolidate them all into one Wall post and explain what they are all about and why they are being posted.
hide replies
pullnosemans
4 days ago
thanks, agreed.
al_ex_an_der
4 days ago
What they are all about and why they are being posted?
That's a really good question. Solvu la enigmon, Viktor.
<<< 1234567 >>