Wall (6,756 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
CK
13 hours ago
sharptoothed
3 days ago
CK
5 days ago
janTuki
7 days ago
deyta
7 days ago
janTuki
7 days ago
deyta
7 days ago
Nuel
9 days ago
Cangarejo
9 days ago
Nuel
9 days ago

To continue the discussion from these threads:
https://tatoeba.org/en/wall/sho...#message_39333
https://tatoeba.org/en/wall/sho...#message_39361
https://tatoeba.org/en/wall/sho...#message_39380
https://tatoeba.org/en/wall/sho...#message_39395
Humans need variety and context in order to understand and learn language. A single word may have several meanings. It may be used with some words but not others. It may have a level of formality that makes it only suitable for certain situations. If these nuances are to be induced from a collection of sentences, the collection must be sufficiently diverse. In addition, the human mind needs constant stimulation. Duplicate or near-duplicate sentences create boredom, and a bored mind refuses to learn.
The set of sentences "They learned some Greek while they were on vacation", "Rima and Skura learned valuable life lessons the hard way", and "Our mother, we learned later, had been saving money in a separate account" demonstrate different kinds of things that can be learned (in addition to teaching additional vocabulary). By contrast, the set of sentences "<X> learned Berber", "<Y> learned Berber", and "<Z> learned Berber" tells you nothing about Berber other than that it can be learned, teaches you nothing about learning other than that it is a process that can be applied to Berber, and gives you no vivid picture that might help your mind retain the words or even simply brighten your day. If X, Y, and Z have different gender, number, and person, and the sentences are translated into a language where this affects the form of the equivalent of the word "learned", this set of sentences can show you how to conjugate the verb "learn" in the past tense, but it won't tell you anything else. It won't even tell you as much about verb conjugation as a verb chart will. A set of sentences in random order can't give you visual clues, such as placement on the page, that bring out the inherent patterns. And with no starting point, there's no suitable place to explain important points such as where the patterns hold and where they change or break down.
Sixteen years ago, there was no shortage of dictionaries, or verb charts, or descriptions of other aspects of grammar, either printed or in electronic form. But it was hard to find to find searchable collections of sample sentences that would demonstrate the usage of words and phrases. Trang founded what became Tatoeba in order to address her own frustration with that shortage. She opened her new project up to accept contributions from virtually anyone, with no barrier to entry other than the ability to work on an electronic device over the Internet using a free and anonymous account, very few restrictions other than a set of gradually evolving guidelines that were minimally enforced, and an interface that enabled correction of errors after they had been introduced instead of a moderated pipeline that would require sentences to be approved before going live. The website also put itself in the world of social media by providing the ability to leave comments on sentences and on a general "wall", and provided statistics on the number of contributions by language and by individual contributor.
This structure was both a blessing and a curse. It contained the potential for a wide variety of people around the world to collaborate in harmony to contribute an almost infinitely diverse set of utterances. It also contained the potential for individuals to antagonize each other while mass-producing colorless sentences of minimal value that were less valuable at Tatoeba than they would have been elsewhere.
Tatoeba is somewhere between these extremes of paradise and hell. There are many worthwhile corners of variety in the corpus for people who know how to look for it and are willing to expend the effort to extricate the jewels from the trash. The problem is that large-scale contribution of near-duplicate sentences makes this harder and harder to do. Since those sentences have similar content and are added to Tatoeba at the same time, they'll all show up in the same place in your results, and will crowd out everything else. I'm not saying that adding ten "<> learned some Berber" sentences is a problem, or that someone needs to panic about adding a sentence that might be similar to one that someone else has already added. But adding a hundred "<> learned some Berber" and a hundred "<> will learn some Berber" sentences, or going on a rampage of adding a "Ziri" copy of every "Tom" sentence you can find, is another matter.
Furthermore, the elimination of variety tends to propagate itself via translation, or via copycat or retaliatory behavior. In an unintentionally ironic post below (https://tatoeba.org/en/wall/sho...essage_39426), CK filled a screen with a near-duplicate series of near-duplicate sentence pairs (35 of them!) in an apparent attempt to bolster his assertion that eliminating variety in the use of proper nouns is a way to avoid lack of variety in other aspects of language. I think the preposterousness of this idea is self-evident: it's like claiming that preventing someone from learning history is going to make them a better math student. But it's especially worth pointing out that many of those duplicate sentences were added in reaction to the ubiquity of the name "Tom" throughout the corpus. So CK, or more broadly speaking, the general elimination of one aspect of variety, had a large hand in creating the problem in the first place.
It would be nice to introduce some mechanized way of mitigating the problems introduced by mechanized creation of sentences (including using one's mind like a machine, whether or not a computer is involved). But as I have already described, I don't think this can be done easily, quickly, or in a way that would approach any meaningful consensus. Instead, I want to explain to the people who are tempted to add near-duplicate sentences to Tatoeba on an industrial scale that they are probably better off doing the same thing elsewhere, and that there are easy ways to do it that they might not have thought of.
Let's say you want to collect sentences and audio containing Kabyle placenames for a GPS, or that you want to cover all the conjugated forms of Berber verbs that you can think of. It's not as though you have thousands of contributors in unison acting on a top-security project that requires millisecond response time. Is it not simply possible to write the sentences in a spreadsheet on your computer and put the audio files you record into a folder? Or in Google Docs or Google Drive? Or set up your own simple website? (A simple search for "easy free ways to set up a database on the web" brings up 924 million results.) They'll be easier to retrieve from such a place anyway. Why involve Tatoeba at all?
Choosing the best site to help you prepare a special project takes a little bit of imagination and a little bit of effort. So does contributing sentences with enough variety to be worth something to others at Tatoeba. Don't let that stop you.

Has an end user ever complained about near duplicates?
Has anyone ever said, "I'm not going to use Tatoeba's corpus because of the number of near duplicates"?
It seems like a huge drain on time and resources to care about them, when there are more serious problems. Near duplicates are not a problem that affects anyone; hate speech is. The worst thing that can result from near duplicates is frustration or boredom; the worst thing that can result from hate speech is genocide.

Más oldalak, melyek felhasználják az itteni mondatokat, mind rendszerezték valamely formában azokat.
Itt nincs rendszerezve, random mondatok közt vagy egy 20-as mondatlista lefordítása közben hamar megunja az ember magát ha csupa Lvl 1 mondatot talál, vagy ugyanolyanokat, csak más szereplĆvel.
Vagy legyen rendszerezhetĆbb itt pár dolog, esetleg precízebb keresési kritériumok, vagy legyen megállapodás arról, hogy kell-e nekünk ennyi adat és nem pedig információ.
Maybe not affects anyone, but more poeple than I first thought.
No, what you find hate speech, also not affects anyone, but probably some, even many, still, it is not that wall message what is about hate speech.

> Itt nincs rendszerezve, random mondatok közt vagy egy 20-as mondatlista lefordítása közben hamar megunja az ember magát ha csupa Lvl 1 mondatot talál, vagy ugyanolyanokat, csak más szereplĆvel.
Have you tried using Tatominer?

Yes, I did, tried some times, but the problem was the system not recognized correctly the words, the Hungarian language uses a ton of word endings.

There are lemmatizers for Hungarian, but maybe they’re too slow.

Yes, quite sure there have been "end users" put off by the usefulness of the content considering the size. If you have nothing to add, that's fine but please give up on attempting to hijack the topic, at least.

> Yes, quite sure there have been "end users" put off by the usefulness of the content considering the size.
Can you name a single one?
> give up on attempting to hijack the topic
The relative unimportance of this issue compared to more serious ones, ones that actually keep people away from this site, is completely relevant. We spend more time talking about avoiding near duplicates than anything else. Why aren't we fixing more serious problems?

"The relative unimportance of this issue compared to more serious ones, ones that actually keep people away from this site, is completely relevant."
Every issue has its own wall post.
If it has no post, just start one, but do not try to derail a different issue conversation.
Thank you.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

> Can you name a single one?
MaxDailene.
Now can we move on to actually addressing the problem?

We've been addressing near duplicates for over a decade. I don't think they're going anywhere.
Clearly, a lot of people - not just me, who actually uses the names Tom and Mary most of the time - see much bigger and more serious problems than near-duplicates. Like the fact that the names Tom and Mary do not reflect their culture. And seeing this, they ignore the demand to eliminate near duplicates, which they never voted on. I support them doing this. It's unjust that people are pressured to use the names Tom and Mary, which is an extremely political and polarizing demand.
And when @AlanF_US ignores people who ask legitimate questions, what else should anyone do besides ignore his demands?

It's a bit bizarre that you yourself say that a community solution of this problem is willingly sabotaged by some people, including you, shortly after announcing that the problem won't be solved anyway. Well, maybe then do your share of solving the problem and we'll all be happily able to forget it for good?
Also, I don't see it anywhere written that the elimination of near duplicates must mean that all sentences must be with the names "Tom" and "Mary"; I rather read it the opposite way - don't add yet another "Tom" or "Mary" variant just to finish this weird collection.
And finally, unlike the topic of near duplicates - which is a meaningful topic regarding the evaluation of a linguistic corpus - bringing "extremely political and polarizing" narratives is just not a meaningful, constructive topic for Tatoeba. This is quite an important difference between the "demands" - one is approachable for everyone simply by using universal reasoning, the other requires a political lens, hence starting off by arbitrary, not-so-well-agreed-upon division of a community that really just could work on common principles.

It feels sad but thank you for taking the time to write about these issues. Call me a "technical solutionist", but I believe there are technical ways to at least mitigate the problem, in addition to convincing members to change their behavior. But as you said, these are costly so they need to be carefully thought before implementing.
Just to make sure we are talking about the same thing: Basically, you cannot get useful results when searching because all the results look similar, so you need to scroll over and over in order to find useful sentences.
I saw interesting ideas here:
https://github.com/Tatoeba/tatoeba2/issues/2816
I'd like to suggest the following approaches, in order of estimated cost of development (from cheapest to most expensive):
- add a "number of words" search filter to easily exclude short sentences
- expose per-language (and per-user?) statistics about sentences "diversity", similar to what can be seen in the Github issue
- introduce a new smarter ranking algorithm that favor sentences based on their uniqueness, length or other criterion, using GDEX as an inspiration https://www.sketchengine.eu/guide/gdex/ ). This however brings new political problems of how to decide on the criteria.
- cluster search results so that similar sentences are grouped, only display one sentence of each group, but allow clicking on a group to see all the hidden sentences

@gillux Thank you for the update on these technical solutions. I hope we will find time to implement some of them in the future.
I would like to add to your list of features the possibility to "mute" a contributor in the settings: https://github.com/Tatoeba/tatoeba2/issues/2008

@gillux Now that we've had a chance to hold a comprehensive discussion of behavior, I'm happy to have us discuss technical solutions. Any of the ones that you and @lbdx mentioned would be helpful.
Until we are able to implement these technical approaches, people can use random search when looking up words or when finding candidate sentences to translate. One additional technical solution would be to make random search the default, rather than "relevance" (which favors shorter sentences and tends to show near-duplicates in clusters). I can imagine some resistance to this idea, since some users probably prefer seeing the shorter sentences even if they lack context, but it's something we can discuss.

I believe this is a search issue rather than a contribution issue.
Here's an example.
The word "content" has the following word senses (from Wiktionary):
A. satisfied
B. that which is contained
C. subject matter
D. the amount of material contained
E. mathematics: space contained by a polytope
As a thought experiment, suppose that the corpus contains 100 example sentences
of each word sense. If the search results displayed all the A sentences before all the B sentences, and all the B sentences before all the C sentences, etc., then there's going to be a perceived lack of diversity, regardless of how diverse the corpus really is.

To be able to filter the results according to the senses of the searched word would indeed be a very nice feature. However, I sincerely doubt that we will be able to implement it for more than 400 languages anytime soon.

> If the search results displayed all the A sentences before all the B sentences, and all the B sentences before all the C sentences, etc., then there's going to be a perceived lack of diversity, regardless of how diverse the corpus really is.
Diversity in terms of word sense is a valid thing to consider, but it's the kind of criterion that someone would apply only if they were doing an analysis that required some thought. I was talking about the lack of diversity that jumps out at someone instantly because their search results are all identical except for variation in a single respect (such as <placename> in sentences of the form "I drove to <placename>").
Also, sentences with sense A, B, C, etc. will only end up clustered together in the search results if they share a characteristic with the criterion beng used for the sort (for instance, the user is searching by sentence creation date, and sentences with sense A all happened to be added before sentences with sense B). And sentences with sense A will only crowd sentences with sense B off the page if there are lots of sentences with sense A. For these reasons, I think it's unlikely that there are many pockets of "hidden sense diversity" that can be found in the tail end of search results but are not visible closer to the beginning.
It's true that there may be more diversity in the corpus than is visible on a single page of search results. However, we can't expect all users to either scroll for an indefinitely long period of time or use sorting criteria to increase the variety at the top of the search results.

> I want to explain to the people who are tempted to add near-duplicate sentences to Tatoeba on an industrial scale that they are probably better off doing the same thing elsewhere
@AlanF_US Thank you for taking the time to talk to the few Tatoebans involved. It looks like you've managed to convince them to reduce the volume of their contributions (at least temporarily). This is really good news đ
Some of the larger contributors may be wondering if the sentences they have added recently are diverse enough. You can check the table at the bottom of https://colab.research.google.c...P8?usp=sharing that gives a lexical diversity score to the current month's contributions. The measure used is called MTLD. It is recognized as reliable because it does not vary with the length of the texts analyzed. The higher the MTLD score, the more diverse the sentences added by a contributor. If your score is below 25, your sentences probably contain a high proportion of near-duplicates.

> It looks like you've managed to convince them to reduce the volume of their contributions (at least temporarily). This is really good news
I'm happy to hear that.
The idea of displaying the MTLD for a contributor is interesting. Would it be feasible to give someone the ability to do one of the following?
(1) see the figures for the top X contributors, where X is some arbitrary number other than the one you're already using
(2) type in a username to see the figure for that contributor alone

For now, the online tatowatch notebook is read-only but those who would like to edit and run it can copy it to their Google drive or download it to their own machine.
The idea of this project is to help the moderation team to keep an eye on new contributions. If any of you have feature suggestions or want to participate in this project, please feel free to contact me in private message.

ŚŚŚšŚ ŚŚŚŚšŚ ŚŚąŚŚšŚŚȘ, ŚŚ ŚŚ Ś ŚŚŚšŚŚ§ 4 ŚŚ©Ś€ŚŚŚ ŚŚȘŚšŚŚŚ ŚŚŚ Ś©Ś ŚŚ کڧ ŚŚŚ©ŚȘŚŚ© ŚŚąŚŚšŚŚȘ, ŚŚ ŚĄŚŚŚŚȘ Ś©Ś ŚŚŚĄŚš ŚŚŚ Ś ŚŚąŚŚŚ§Ś ŚŚŚŚ©Ś ŚŚȘ ŚŚ€Ś ŚŚȘ ŚŚŚšŚŚ©ŚŚȘ ŚŚŚąŚšŚŚȘ ŚŚ Ś ŚŚ ŚŚŚŚą ŚŚŚ ŚŚȘŚšŚŚ ŚŚȘ 4 ŚŚŚŚŚąŚŚȘ ŚŚ ŚŚȘŚšŚŚȘ.
ŚŚ©ŚŚ ŚŚ ŚŚŚŚąŚ ŚŚŚš ŚŚŚŚŚ ŚŚȘŚȘ Ś©Ś ŚŚȘŚŁ ŚŚ ŚĄŚŚŚš ŚŚȘ Ś ŚŚ©Ś ŚŚȘŚšŚŚŚ ŚŚąŚȘ ŚąŚȘŚ.
ŚȘŚŚŚ!

ŚŚŚŚ ŚŚąŚŚŚš, ŚŚ ŚŚ ŚŚŚ©Ś€ŚŚŚ?

ŚŚšŚŚ©ŚŚ:
Japanese Indices
Ś§ŚŚ©ŚŚš ŚŚŚ§ŚŚš ŚŚ§ŚŚ:
https://github.com/Tatoeba/tato...loads.ctp#L428
ŚŚ©Ś Ś:
Contains the equivalent of the "B lines" in the Tanaka Corpus file distributed by Jim Breen. See <1>this page</1> for the format. Each entry is associated with a pair of Japanese/English sentences. {sentenceId} refers to the id of the Japanese sentence. {meaningId} refers to the id of the English sentence.
Ś§ŚŚ©ŚŚš ŚŚŚ§ŚŚš ŚŚ§ŚŚ:
https://github.com/Tatoeba/tato...loads.ctp#L438
ŚŚ© ŚąŚŚ Ś©ŚȘŚ ŚŚŚšŚŚŚŚȘ ŚąŚ ŚŚĄŚŚšŚŚ Ś§ŚŠŚȘ ŚŚŚŚŚŚŚ ŚŚŚ ŚŚ Ś ŚŚŚŚ©ŚŚ ŚŚĄŚȘŚŚš ŚŚȘŚ, ŚŚąŚŚš ŚŚŚ ŚŚ© Ś©ŚŚŚȘ Ś©Ś ŚŚŚŚŚ Ś©Ś€ŚŚȘ ŚŚŚŚ ŚŚŚŚŚŚšŚŚŚȘ Ś©ŚŚ Ś ŚŚ ŚŚŠŚŚŚ ŚŚŚŠŚŚ Ś©ŚŚ ŚĄŚŚŚŚŚŚ ŚŚŚŚ€Ś ŚŚȘŚšŚŚŚ Ś©ŚŚŚ ŚŚąŚŚšŚŚȘ (ŚŚ© ŚÖŸ57 Ś©Ś€ŚŚȘ ŚŚŚŚ, ŚŚŚŚ ŚŚŚ€ŚŚą ŚŚŚšŚ ŚŚŚ€Ś§ŚĄ Ś©Ś ŚŚ€ŚšŚŚŚ§Ś).
ŚȘŚŚŚ :)
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

đ Japanese N-grams
æ„æŹèȘ N-gram Searches of the Tatoeba Corpus - Top 4,000
http://tatoeba.ueuo.com/jpn_ngrams-1.html
The following is another set of longer N-grams that are "sentence endings."
æ„æŹèȘ ă» N-grams ending with a sentence boundary
http://tatoeba.ueuo.com/jpn_7-g...e_endings.html
These are pages I put together in 2016 based on research done by Susumu Yata.
The "advanced search" links will return whatever is currently in the tatoeba.org database.

đ We now have over 200 Swedish sentences with audio.
The list, showing only English translations
https://tatoeba.org/en/sentence...how/171175/eng
The list, showing all translations
https://tatoeba.org/en/sentence...how/171175/und
ull joined our project on February 8, 2023.

> 11,111,107 sentences
16:20 UTC
February 12, 2023
Soon, we'll hit 11,111,111.
An English-speaking Japanese dog might say, the following. đ
ăŻăłă ăŻăłă
ăŻăłă ăŻăłă ăŻăłă
ăŻăłă ăŻăłă ăŻăł ă

That's funny. đđđ

đ€

Is there a list of sentences by difficulty? Let's say I wanted to order 500 easiest sentences in Korean, how would do I this?

The Tatoeba collection in any language is not fundamentally sorted. You can search through it with a number of criteria (length, creation date), but none of them pertain to difficulty. Assuming a Tatoeba member could come up with a workable measure of difficulty, they could choose to add tags to particular sentences indicating their perceived difficulty, or they could add a list ("500 easiest sentences in Korean") and assign sentences to it. But they couldn't impose a sorting order, such as easiest to hardest, on a set of sentences.
However, Tatoeba makes its sentences freely available, and I know of at least one site, clozemaster.com, that takes these sentences and groups them into categories by frequency of the least common word within a sentence ("100 Most Common", "500 Most Common", and so on). The rareness of the words in a sentence is not the only measure of its difficulty, but it's probably the easiest measure to calculate in that regard.

There's two ways to grade difficulty, either x grade reading level or language level exam level. For example in Korean, if you passed TOPIK test level 3, should be expected to know this sentence or not?
On another note, there's too many lists. I can't possibly see if someone did create a list I'm interested in, there's thousands of them

Clozemaster sorted the sentences in one way using word frequency lists.
You can do the same by creating a code or ask someone to do the work for you.
In this corpora you can't really sort things the way what is best for your criteria.

⌠For English sentences sorted by vocabulary levels, you can try these lists.
CK's OGTE-Level Lists
http://goo.gl/BnPz6h
Created in 2017.

Reading between the lines:
CK probably used the Online Graded Text Editor (OGTE), a tool that assigns English text a level based on the vocabulary it contains, in order to classify English sentences and assign them to lists that he produced on Tatoeba. Naturally, this doesn't suit your needs directly, since you want Korean text, not English. However, you could presumably look for sentences within these lists that contain Korean translations, assuming that the classification of vocabulary difficulty for the Korean sentences will match those for the English sentences. Alternatively, if you can find such a classification tool for Korean, know how to program, and have a lot of time and motivation, you can do the same kind of thing for Korean that CK did for English. That's a big "if".
As for looking through the lists on Tatoeba, it's true that there are a lot, but you can search through the titles of the lists. I did a search for "TOPIK" and for "Korean", but didn't find anything useful.
If you want to use Tatoeba-sourced sentences, it looks like Clozemaster is probably your best alternative.

> CK probably used the Online Graded Text Editor (OGTE), a tool that assigns English text a level based on the vocabulary
That's right, as the 2nd sentence on that page says, ...
These lists were created based only on vocabulary, using er-central.com/ogte, so some of the sentences will use grammar and idioms that are above the level of the lists.

Since I already had a list of Korean sentences sorted by word frequency, I uploaded the first 500: https://tatoeba.org/en/sentence...&direction=asc
Each sentence contains at least one word that doesn't appear in any of the 499 other sentences, so they're definitely not the 500 easiest ones (which likely contain a lot of repetition) but maybe you'll find the list useful nonetheless.

Ankissa on Tatoebasta (ilman mainintaaa) haettuja ja järjestettetyjä lauseita sisältävä pakka: https://ankiweb.net/shared/info/241481292
Sama henkilö on tehnyt useita vastaavia eikä mainitse Tatoebaa, mutta valtava Tom- ja Mary-lauseiden määrä paljastaa lähteen. https://frequencylists.blogspot.com/

I also believe she is using machine translation. I queried a few short Finnish ones with native Finns, and the English translations were bad.

Ehkä. Olen kuitenkin löytänyt muutaman kirjoitusvirheen tanskankielisissä lauseissa tuollaisen pakan kautta.
Epäilen sen johtuvan enemmän siitä, että Tatoeba, varsinkaan vanhempien lauseiden ja pienempien kielten kohdalla, ei aina ole laadukas. Mutta ehkä siellä on konekäännöksiäkin, en siitä tiedä.

> I also believe she is using machine translation. I queried a few short Finnish ones with native Finns, and the English translations were bad.
Are these bad sentences also present in the tatoeba database? If that's the case, then is anyone doing anything to remove or fix them?

https://tatoeba.org/es/tags/sho...th_tag/561/fin - suomenkieliset lauseet, joita pitäisi muokata. Osa on ollut listalla pitkään. Ehkä pitäisi itse pyytää muokkausoikeuksia niihin, koska kukaan ei näytä niitä muuten korjaavan.
En tiedä, ovatko mainitut lauseet tunnisteella merkittyjä, tokikaan.

đ A list with over 2,000 Japanese sentences that don't have kanji
https://tatoeba.org/en/sentences_lists/show/170911
All sentences on this list are owned by native Japanese speakers.

Maybe Kanjis are out of fashion nowadays with younger natives…

With respect, I beg to differ, @sacredceltic. It is true that after World War II, the Japanese Ministry of Education has curtailed the use of kanji to some degree, but a Japanese high school graduate will have learnt the 2,136 kanji and will generally use these appropriately.
https://en.wikipedia.org/wiki/List_of_jĆyĆ_kanji
I do not speak Japanese, but I picked one sentence at random from @CK’s list (https://tatoeba.org/en/sentence...s/show/170911) and entered it into Google Translate:
#11011895 — posted by @small_snow
ăă«ăăăȘăăźïŒ
Google translated it as « Êtes-vous stupide ? » Reversing the direction of translation — that is, going from French to generate Japanese, reproduced the original sentence, written « ăă«ăăăȘăăźïŒ» or, in romaji, « Bakajanaino? »
While the majority of sentences in Japanese will likely include one or more kanji, I think that @CK may just have intended to produce an interesting list of sentences that correctly use no kanji *because no kanji are needed* to express the words constituting those sentences, currently numbering 2,351.
Rather than somehow labelling these 2,351 sentences as being informal or less literary Japanese, I think the intent may just have been to help beginning students of Japanese, who will have mastered the hiragana syllabary but who have not yet learnt many of the jĆyĆ kanji — a process which takes many years in a typical Japanese person's education.
Kind regards,
Erik (Objectivesea)

> ... I think that @CK may just have intended to produce an interesting list of sentences that correctly use no kanji *because no kanji are needed* to express the words constituting those sentences ...
Yes, that's right.
You can see details here with lists that also introduce one new kanji per list.
https://tatoeba.org/en/user/profile/CJ

Is it only with my Tatoeba being so often out of order in the last two weeks?

No, I've had the same problem.

From what I've heard it's to prevent a server overload during periods of high activity. Until recently that almost never happened to me though.

I've hardly been able to load a page here for the past three hours.

No, it's happening with me too. I'm glad to know it's not just me. I was wondering if it was just me.

An overeager crawler was making a bunch of expensive requests that kept overloading the server. I've now changed our configuration to block that crawler; hopefully that will improve the situation.
If you keep getting the "Tatoeba is currently unavailable." message, let us know.

Thanks, Yorwba!

Thanks!

ăăăăšăă

Thank you so much!

Danke schön! đ
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.