menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,677 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

Cangarejo

4 days ago

subdirectory_arrow_right

hecko

4 days ago

subdirectory_arrow_right

hecko

4 days ago

subdirectory_arrow_right

sharptoothed

5 days ago

subdirectory_arrow_right

CK

5 days ago

subdirectory_arrow_right

small_snow

5 days ago

subdirectory_arrow_right

Cangarejo

5 days ago

feedback

sharptoothed

5 days ago

subdirectory_arrow_right

hecko

6 days ago

subdirectory_arrow_right

Cangarejo

6 days ago

lbdx lbdx 26 days ago November 6, 2022 at 9:56:27 AM UTC link Permalink

** Zirification of the Tatoeba Corpus **

Since the beginning of this year, almost 20,000 original English sentences containing the first name "Ziri" have entered the Tatoeba Corpus. So far, less than eight percent of these sentences have at least one translation from another user. This is more than three times less than their "Tom" counterparts.

Given the extent of the "zirification" phenomenon, could the few contributors adding and translating these sentences tell us more about their goals? How do other members of the Tatoeba community feel about this trend?

{{vm.hiddenReplies[39152] ? 'expand_more' : 'expand_less'}} hide replies show replies
maaster maaster 26 days ago, edited 26 days ago November 6, 2022 at 10:17:58 AM UTC, edited November 6, 2022 at 10:36:40 AM UTC link Permalink

As for me, I don't like Sami, I don't like Tom and I don't like Ziri either – of course the sentences with them either.
Short sentences could be added with personal pronouns; longer sentences could be added with different names as well. There would be less possibility to have same sentences with different names. It wouldn't be that boring that way.
However, Tom has already a cult. (But it doesn't necessarily mean that in 60-80% of corpus should occur the name Tom.)

Sometimes, I translate sentences with Tom or Ziri if I find them interesting but I won't like them.

And I'm not for double standard. There are also too much sentences with Tom.
New contributors can think that's a joke or some irony.

We shouldn't add more millions of sentences like "T is bigger than M", "T is younger than his elder brother", "T is an astronaut". These sentences don't really give anything to knowledge of language learners. I can just repeat myself.
And everything goes on in a usual way.

I can see that many users intentionally avoid sentences with those characters.

{{vm.hiddenReplies[39153] ? 'expand_more' : 'expand_less'}} hide replies show replies
sundown sundown 26 days ago November 6, 2022 at 10:56:10 AM UTC link Permalink

> Short sentences could be added with personal pronouns; longer sentences could be added with different names as well.

Often when I add a sentence with a personal pronoun – usually a translation – the same sentence, more or less, will invariably appear a bit later with the pronoun substituted with "Tom".

{{vm.hiddenReplies[39154] ? 'expand_more' : 'expand_less'}} hide replies show replies
morbrorper morbrorper 26 days ago November 7, 2022 at 7:41:53 AM UTC link Permalink

My problem with sentences with plain "he" is that they will too often be translated into Spanish and Italian using their respective redundant "él" and "lui" pronouns that would probably not be used in an original sentence. Using names works against that translation effect.

{{vm.hiddenReplies[39163] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 25 days ago November 7, 2022 at 12:31:26 PM UTC link Permalink

that's the translator's fault in my opinion, and it should be brought to their attention and corrected when possible

Shishir Shishir 26 days ago, edited 26 days ago November 6, 2022 at 11:29:45 AM UTC, edited November 6, 2022 at 11:30:19 AM UTC link Permalink

I don't like having always the same name over and over again, and it doesn't matter if the name is Ziri, Ali, Layla, Mary, Sami or Tom, so I tend to avoid them unless they're sentences in a language I don't have enough knowledge to be able to choose or the sentence is particularly interesting.

But given that they will be creating near duplicates anyway, I would welcome it if at least the last letter of the new name was a different one from the previous widespread names (in some languages first names are also declined and the declension depends on the last letter). Also, the name Ziri is kind of confusing because, not being such a famous name, it's not so obvious it is a male name.

Aiji Aiji 26 days ago November 6, 2022 at 12:33:28 PM UTC link Permalink

I stopped translating sentences with names a long time ago, and consequently translate much less English sentences that I used to, since I quickly find the search results boring, or repetitive (I often find myself scrolling over one page of results, although I could translate the sentences I see).
Might it be Tom or Ziri or Whatever, I think that the added value of many (most?) of theses sentences are near zero. Many of them were not added first, they're often added massively, and in a conscious manner. However, "freedom of whatever" will always be opposed to any sensed argument, for example, helping the community before helping yourself (never one's fault). So I let them free, ignore them, and hope they'll rest silently in a dark corner of Tatoeba.

I'd like to encourage translators to set their search parameters to ignore the names overly used in the corpus (Tom, Mary, Ziri, etc. or their version in the language they translate). They will hopefully realize that many sentences already exist without names, and maybe find their time translating more enjoyable than before.

I'd like to encourage people that only translate the original sentences that added themselves, to have a look at the contributions of others.

{{vm.hiddenReplies[39156] ? 'expand_more' : 'expand_less'}} hide replies show replies
sundown sundown 25 days ago November 8, 2022 at 7:31:19 AM UTC link Permalink

> I'd like to encourage translators to set their search parameters to ignore the names overly used in the corpus (Tom, Mary,

I translate mainly German sentences here. If I followed your advice, the number of sentences I'd have to translate would be significantly curtailed, because the most prolific authors of German sentences here are enthusiastic users of those names.

{{vm.hiddenReplies[39171] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 23 days ago November 9, 2022 at 7:50:10 PM UTC link Permalink

It varies a lot between users. Here's a list of the top 20 users by number of German sentences without English translations together with the percentage of Tom-and-Mary-style sentences:

Esperantostern: 48176 (3%)
al_ex_an_der: 34612 (4%)
Pfirsichbaeumchen: 27569 (32%)
felix63: 9343 (23%)
list: 9253 (16%)
maaster: 8871 (11%)
manese: 5533 (12%)
Yorwba: 5268 (3%)
MUIRIEL: 5191 (2%)
Zaghawa: 4987 (7%)
raggione: 4623 (13%)
Sudajaengi: 4063 (5%)
brauchinet: 3760 (6%)
freddy1: 3308 (10%)
xtofu80: 3187 (2%)
Tamy: 3150 (5%)
Adelpa: 2446 (10%)
Manfredo: 2318 (3%)
mraz: 1510 (14%)
PeterR: 1443 (2%)

Maybe focusing on sentences by specific users is more satisfying than searching with excluded keywords.

{{vm.hiddenReplies[39182] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 23 days ago, edited 23 days ago November 10, 2022 at 8:33:52 AM UTC, edited November 10, 2022 at 8:34:38 AM UTC link Permalink

Wo bitte schön geht’s zum Pranger? ðŸ˜„

{{vm.hiddenReplies[39185] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 14 days ago November 18, 2022 at 10:03:43 PM UTC link Permalink

And the winner is… ;-)

sundown sundown 22 days ago November 10, 2022 at 8:39:19 PM UTC link Permalink

@Yorwba

It's interesting how perception differs from reality; I would've bet that some of those percentages were higher than they are. So, thanks for that correction.

Ergulis Ergulis 26 days ago November 6, 2022 at 3:41:48 PM UTC link Permalink

I believe that sentences with "Ziri" are created out of personal reasons of the author and I respect it. However, I don't translate them and never will, as I have no personal relation to them. On the other hand, Tom is a "legend" for me here. Even a colleague of mine is Tom, so I often draw inspiration from real life. As for the imaginary sentences, they can also be the amusing ones.
There will surely be times in the future when sentences about Tom become rare after one is out of new ideas.
Some people might find the stories about Tom interesting for learning and see them as a thematic unit.

{{vm.hiddenReplies[39158] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 25 days ago, edited 25 days ago November 7, 2022 at 8:43:19 PM UTC, edited November 7, 2022 at 9:37:05 PM UTC link Permalink

> However, I don't translate them and never will, as I have no personal relation to them. On the other hand, Tom is a "legend" for me here. Even a colleague of mine is Tom, so I often draw inspiration from real life.

Everyone can't have names that reflect the culture they personally live in.* Imagine, though, if they only translated sentences with the ones that did. We'd end up with a bunch of isolated communities barely connected, and it would defeat the purpose of the site. On the other hand, making everyone use the name "Tom" seems unfair and culturally hegemonic.

Quite a lot of the Ziri sentences are not near-duplicates, also. Some are, some aren't.

* EDIT: Well, I suppose everyone can have names reflecting the culture they personally live in; it's simply a matter of adding the sentences.

{{vm.hiddenReplies[39168] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 24 days ago November 8, 2022 at 11:13:30 AM UTC link Permalink

The thing is, not everyone cares about reflecting their own culture on a linguistic corpus, and I think this is something that should also be respected. There aren't many projects nowadays that don't force a narrative upon you and don't start off with some progressivism-centered code of conduct shoved onto you.

I, for one, couldn't care for the life of me if there are Attila, Gyula, Emese sentences on Tatoeba or not. I'm gonna translate sentences that I comprehend and for me, the actually relevant issue is the painful lack of Romanian (or even Polish) content of any form. I want THIS issue to be represented, if anything, not obsess over the "name sentences".

{{vm.hiddenReplies[39172] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 24 days ago, edited 24 days ago November 8, 2022 at 3:31:49 PM UTC, edited November 8, 2022 at 3:33:34 PM UTC link Permalink

I didn't try to force you to include your culture's names in sentences. No one did. This sort of bizarre far-right rant complaining about having "progressivism forced on me" in response to a simple statement of fact ("It is possible to have everyone's cultures represented") defines the sort of reactionary movement that it is toxic to accommodate, because you are not acting rationally based on anything other than your feeling of having progressivism forced on you, reinforced by ruthless and greedy politicians.

{{vm.hiddenReplies[39175] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 24 days ago November 8, 2022 at 4:10:40 PM UTC link Permalink

I think if anything is bizarre here, it's that you already make this out to be a rant, toxic reactionary movement and whatnot. Are you sure _you_ aren't acting upon your own prejudices and self-defense mechanism when you hear the word "progressive" with negative connotations?

I stated a fact, namely that Tatoeba is inclusive in a sense that doesn't force a (typically progressivist) narrative onto you. This means that you are free (everyone is) to respect if someone uses their own personal inspiration for translating Tom sentences in particular while feeling no similar attachment to Ziri sentences. There is no reason to talk about "cultural hegemony" and making anyone use Tom sentences.

Also, again, in order to talk about progressivism and reactionary movements, we seem to miss the fact that *there is just not enough content for many languages to begin with* for this to matter at all. The fact that Tatoeba allows us to work on a common goal regardless of our world views should empower us to work on the common goals, not to pick on each other's motives.

{{vm.hiddenReplies[39176] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 24 days ago, edited 24 days ago November 8, 2022 at 4:23:29 PM UTC, edited November 8, 2022 at 4:24:27 PM UTC link Permalink

Cultural hegemony has nothing to do with "progressivism," as evidenced by the number of right-wing regimes throughout the world who use opposition to it as an excuse for treating minorities and women poorly. Iran, for example, or Russia, or Israel.

Cultural hegemony is, however, a descriptive term for what currently exists on Tatoeba: mostly sentences with Christian names, English-centered (reflecting the state of the world partly), with English and Christian-named sentences most likely to be translated. Part of this reflects the state of the world at the moment, but the state of the world does not reflect the ideal state of the corpus.

{{vm.hiddenReplies[39177] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 24 days ago November 8, 2022 at 4:33:04 PM UTC link Permalink

Judging an open linguistic corpus with broad goals based on perceived "cultural hegemony" is, however, a progressivist *narrative*. One that you are *allowed* to represent here but not allowed to impose. The "ideal" state, if there even is one (this is not a descriptive term anymore, mind you), is to have loads of high-quality sentences (both literary and colloquial) in all languages. Any well-written or well-translated sentence adds to Tatoeba, moreover that's the most important contribution anybody can do here. With "Tom" or with "Ziri" is really a nuance compared to that.

{{vm.hiddenReplies[39178] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 24 days ago, edited 24 days ago November 8, 2022 at 4:45:24 PM UTC, edited November 8, 2022 at 4:50:08 PM UTC link Permalink

But again, the fake victim narrative: nobody is imposing anything on anyone. This is how I know you're a reactionary despite your denials. You believe that "progressives" are forcing things on you that they never forced on you, because this narrative dominates your whole existence.

{{vm.hiddenReplies[39179] ? 'expand_more' : 'expand_less'}} hide replies show replies
Polgar1 Polgar1 24 days ago November 8, 2022 at 6:29:16 PM UTC link Permalink

This is neither "fake" nor "victim": I'm just giving you hints that you should respect people's preferences for reasons they add sentences for "Tom" and not for "Ziri", instead of giving lectures about "cultural hegemony" that is *not Tatoeba's concern*. If you can do that, it's all fine. I give you the benefit of the doubt despite switching to this diversive rhetorics of reactionaries.

{{vm.hiddenReplies[39180] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 18 days ago, edited 18 days ago November 14, 2022 at 5:58:59 PM UTC, edited November 14, 2022 at 5:59:58 PM UTC link Permalink

Tatoeba is for language learners, and people learning languages need to learn about at least one other culture, which includes different names among other things. If you have a problem with that, you are the one giving lectures that aren't Tatoeba's concern.

Also, don't assume the names "Tom" and "Mary" to be politically or religiously neutral, because they aren't. They express a political and religious view, and I am simply providing a contrary view.

{{vm.hiddenReplies[39206] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 15 days ago November 17, 2022 at 10:42:08 PM UTC link Permalink

"Tatoeba is for language learners, and people learning languages need to learn about at least one other culture, which includes different names among other things."
Names are like wildcard characters.
Their usage is OK in moderation I think.
Couldn't care less what is the name if the sentence gives me the information I'm looking for.
Everybody who wants to write a name will write a name, and anybody who will ask someone to add some Polish, or any other names will may get those names.

You don't have to tell what people need. People will find what they need.
And I don't need hundreds of thousands of short sentences with the same name, so I don't translate them. Others who want, they will, and others who don't want but they do, just doing some harm themselves...

Still, using a different name like Ziri and adding thousands of simple sentences (may) possibly creates duplicates.

{{vm.hiddenReplies[39212] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 15 days ago November 17, 2022 at 11:46:37 PM UTC link Permalink

So as you mentioned, the issue is that only simple sentences are being added, which are easy to translate but also easy to duplicate.
But that basically each contributor is free to do as they wish, within the rules of Tatoeba.

{{vm.hiddenReplies[39213] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 15 days ago November 18, 2022 at 9:03:32 AM UTC link Permalink

"Still, using a different name like Ziri and adding thousands of simple sentences (may) possibly creates duplicates."
Minds are alike.
When each newbie writes a sentence with different names, we will have many duplicates.
Hello, Tom., Hello, Mary. Hello, Greg...

The main thing I want to say is missed again.
"Couldn't care less what is the name if the sentence gives me the information I'm looking for."

https://tatoeba.org/en/wall/sho...#message_39161
"Many of the members don't know that all what needs the corpus is information, not data."
The problem is, adding more names and writing simple sentences with them will create duplicates and by that it will create data, not information.

sacredceltic sacredceltic 7 days ago, edited 7 days ago November 25, 2022 at 1:20:53 PM UTC, edited November 25, 2022 at 8:58:36 PM UTC link Permalink

My twopence:

1) not all names are equal. To me “Ziri” is not immediately identifiable as a person’s first name. I never met anybody named so and I had no idea it was a firstname before. To me, it sounded more like a brand name for some food, or fast fashion such as “Oxo” or “Zara”.
That’s where the reading of an unknown foreign firstname becomes very difficult, because, then, you don’t know what you’re reading about and it just makes interpretation completely random.

What’s the meaning of : “Oxo just opened a new outlet” ?!? Depends on the nature of “Oxo”…Person ? River ? Brand ?
There are more than 6000 languages in the world, each having hundreds of firstnames. One can’t expect anybody to identify them all as such and probably hundreds of them are already commercial brands’names…

So it’s highly preferable to use known firstnames relevant to the language we’re translating into, in order to make them more identifiable. I’m sorry for those who have the rarest firstnames and who will subsequently feel discriminated ( because everybody wants to feel somehow discriminated, nowadays…), but Tatoeba is a sentences collection, not a names collection.

2) Yes, not all language learners are willing to learn the associated cultures, but language and culture are one. The culture perspires through the language, even if you try hard to ignore it…

And yes, some languages declense names, and that doesn’t work well with foreign names, in general…

Polgar1 Polgar1 9 days ago November 23, 2022 at 11:15:26 PM UTC link Permalink

Even the statement that somebody learning a language is somebody learning a culture, is debatable. One can learn a language for business reasons; out of linguistic interest, as an intermediary language; and the list could probably go on, it's just a matter of perspective.

And anyway, if somebody translates "Tom" because they can relate to that name, or because it's sort of a running Tatoeba meme, or because it's short to type and they are used to it, *that's just fine*. It's not politically charged in the way your sentiment is.

Once again - you are free to provide a view in the form of adding content. I don't think political expressions should trump usefulness for learning - but what I'm absolutely sure of is that Tatoeba is not a place where influencing others decisions on a predominantly political base is the way we do business with each other. And I'd like that to stay this way. I think this still leaves a lot of space for expression for you, and if you are dissatisfied with this perspective, I'm sure you can find communities where everything is subject to politics.

PS: since the email notifications don't get updated, please mind what you write before submitting; it might make a highly different impression from the eventual message.

Cangarejo Cangarejo 26 days ago November 6, 2022 at 5:34:20 PM UTC link Permalink

Here's a random name generator.

https://www.behindthename.com/random/

{{vm.hiddenReplies[39159] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 18 days ago November 14, 2022 at 4:00:11 PM UTC link Permalink

this list of international names may also be useful, though in practice it seems most of the names are european https://mixedname.com/top_names

one may counteract that by picking a specific non-european language from the main site https://mixedname.com/

{{vm.hiddenReplies[39203] ? 'expand_more' : 'expand_less'}} hide replies show replies
shekitten shekitten 18 days ago November 14, 2022 at 5:56:09 PM UTC link Permalink

I think ideally people add names from their own culture, if they want to, and others don't just ignore those sentences but translate them where possible.

I think there's nothing wrong with Tom and Mary sentences per se, but obviously those names present a narrow view of the world and don't do much to teach the cultures behind many languages - hence OsoHombre and Amastan adding their own, and me sometimes doing the same, and others. I think maaster has also added some Hungarian names before.

{{vm.hiddenReplies[39204] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 18 days ago November 14, 2022 at 5:58:59 PM UTC link Permalink

,,hm
so far i've been avoiding names in my sentences but i might just start doing that too

{{vm.hiddenReplies[39205] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen 17 days ago November 15, 2022 at 12:43:10 PM UTC link Permalink

You could use Polish names in your sentences. They're underrepresented. :)

Thanuir Thanuir 18 days ago November 15, 2022 at 9:36:35 AM UTC link Permalink

Olen myös käyttänyt suomalaisia nimiä.

DJ_Saidez DJ_Saidez 26 days ago November 6, 2022 at 8:02:10 PM UTC link Permalink

I think it has to do with the view of seeing Tatoeba sentences as data and nothing other than that, and looking at it from a computational linguist's perspective, and wanting to get data as efficiency as possible.

In that case, something that could be useful to prevent the near-duplicates that CK wants to avoid (the reasoning behind his wildcards), if we can find a convenient way to execute it, is to have some software that lets you select proper nouns (names, city names), so that for generic grammar structures for beginners, you can link translations that might be attached to near-duplicates, and you can focus more on learning the grammar structure.

Unless the information pertains to that specific person or place only, for example
"Lionel Messi won the 2021 Ballon d'Or, but I think Lewandowski deserved to win that year".
It wouldn't make sense to have a generic Tom Jackson winning that specific award for that specific year, but rather is a good way to know how to express that knowledge in a given language.

Besides, generic textbook sentences are not completely what we want anyways. Breaking away from this habit and trying to contribute more unique, complex, interesting sentences will greatly reduce the need for avoiding near-duplicates.

{{vm.hiddenReplies[39160] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 26 days ago, edited 26 days ago November 6, 2022 at 9:19:05 PM UTC, edited November 6, 2022 at 9:19:23 PM UTC link Permalink

THIS >> "I think it has to do with the view of seeing Tatoeba sentences as data and nothing other than that, and looking at it from a computational linguist's perspective, and wanting to get data as efficiency as possible."

I already said in an other message somewhere in the wall to a person that he cannot differentiate between data and informations. (meanwhile he claimed himself a programmer)
And I see this happening all of the time.
Many of the members don't know that all what needs the corpus is information, not data.

xorgy xorgy 26 days ago, edited 26 days ago November 7, 2022 at 2:00:12 AM UTC, edited November 7, 2022 at 2:03:00 AM UTC link Permalink

Proper nouns are not interchangeable, and they are bound to more information than “there's a person or place called this”. Tom is an unambiguously masculine name for example (so in many languages can not be substituted freely for a feminine one), and in other sentences you might see the proper noun ‘God’, which is often proverbial, part of a set phrase, or sometimes has special grammar.

From the computational linguistics perspective, particular proper nouns have more meaning than simply referring to some place or some one. Tatoeba is rarely, if ever, used on its own; and I imagine the lack of richness in proper nouns makes it even less adequate to this end.

P.S. I only opened the Wall to look for information about the Tom phenomenon; lo and behold it is an active topic.

sundown sundown 25 days ago, edited 25 days ago November 8, 2022 at 7:22:00 AM UTC, edited November 8, 2022 at 7:22:59 AM UTC link Permalink

> the near-duplicates that CK wants to avoid

He seems to me to have been one of the most active producers of so-called near-duplicates here.

https://tatoeba.org/en/wall/sho...#message_39154

Thanuir Thanuir 26 days ago, edited 26 days ago November 7, 2022 at 7:56:55 AM UTC, edited November 7, 2022 at 8:46:29 AM UTC link Permalink

Nimien käyttö aina välillä on parempi kuin jättää nimet täysin käyttämättä, tai niiden käyttö jatkuvasti.

Monipuolinen nimien käyttö on parempi kuin yksipuolinen nimien käyttö.

Niinpä on parempi jos on monia eri nimiä käytössä kuin jos niitä on vain kaksi (Tom ja Ziri tässä esimerkissä), mutta kahden nimen käyttö on parempi kuin vain yhden nimen (Tom tässä esimerkissä).

On myös hyvä jos nimiä on eri kulttuureista ja myös että monimuotoisuutta on sukupuolten suhteen.

AlanF_US AlanF_US 25 days ago, edited 25 days ago November 7, 2022 at 2:43:24 PM UTC, edited November 7, 2022 at 7:18:25 PM UTC link Permalink

The Tatobeba Project is effectively two things: a collection of sentences, and a social experiment regarding what happens when people are invited to contribute sentences with almost no restrictions. The result of the experiment is that the collection ends up looking like the sentences contributed by the people who are the most determined to add the greatest number of sentences. This "game" can be "won" by relying on the addition of simple, repetitive sentence content (including a lack of variety regarding personal names) and automation. It actually should be no surprise that whoever has the traits of personality/philosophy that would lead them to do this in the first place would resist any pleas to change in the absence of any incentive to do so (for instance, to avoid having their accounts suspended or their sentences removed, something that would be extremely inconsistent with the way Trang has run the project so far).

What can the rest of the community do about the relative lack of variety of the corpus as a whole? Not much given its scale. Some statistics regarding the occurrences of sentences with certain proper names in our English corpus:

Masha: 10
Ivan: 27
James: 107
Thomas: 207
Jim: 297
John: 9,744
Ziri: 33,049
Mary: 158,505
Tom: 443,233

Total number of English sentences: 1,656,118 as of the last time sharptoothed counted them

I include "James" and "Jim" (its most common nickname) because "James" is the most common birth name in the US (source: [1]), more than twice as common as "Thomas". I include "Masha" and "Ivan" because they are names whose form changes according to case in Slavic languages, something that learners of these languages need to learn (as Shishir pointed out).

"Tom" occurs in more than one out of four English sentences. That's mind-blowing. "Ziri" occurs in about 2% of English sentences. I can't imagine any concerted attempt to add a greater variety of proper names having much of an effect on those ratios, especially if the Ziri-spammers and Tom-spammers keep adding sentences at the same rate as they have been.

However, I could imagine a deliberate attempt to add sentences with certain kinds of names (for instance, Slavic ones) having an impact on the ABSOLUTE number of sentences with these kinds of names. But to do that, you'd need to appeal to specific people to contribute such sentences and other specific people to translate them, and to make them useful, you'd need to keep track of them somehow, such as in a list.

> How do other members of the Tatoeba community feel about this trend [of "over-Ziri-fication" and "over-Tom-ification"]?

I've resigned myself to the fact that the corpus as a whole will have various traits that I'm not crazy about (including over-Ziri-fication and over-Tom-ification), but will continue to try to improve what I can.

[1] https://www.ssa.gov/oact/babyna...s/century.html

{{vm.hiddenReplies[39167] ? 'expand_more' : 'expand_less'}} hide replies show replies
xorgy xorgy 25 days ago November 7, 2022 at 9:47:37 PM UTC link Permalink

According to SSA, Oliver and Olivia have been the most popular boy and girl names for social security card applicant births since 2020 in the state of Wyoming.

sacredceltic sacredceltic 7 days ago November 25, 2022 at 9:38:19 PM UTC link Permalink

The problem is one of big-numbers seeking. What if “but-name duplicates” sentences were discounted to their authors ? That would be some deterrent to their authors, probably…
Easy to code…

Polgar1 Polgar1 24 days ago November 8, 2022 at 11:20:33 AM UTC link Permalink

I think this is another instance of bikeshedding - one could probably make a top 10 list of Tatoeba issues / space of improvement and it wouldn't include the name sentences. They are easy to add and easy to ignore, I don't think there is any value in investigating it too much.

Moreover, for now, my personal proposal would be to simply stop bringing it up and thereby making it seem more relevant than it actually is. Give more time to things like:
- the format of the site
- the "Tatominer" project
- the fate of long quotes / convoluted sentences
- how to increase presence for language content outside of like the top 10 languages on Tatoeba
- copyright/licensing, what can be added
and I'm sure everybody can think of similar topics that precede the whole Tom vs Ziri (vs pronouns) polemy.

klzlueylx klzlueylx 22 days ago November 10, 2022 at 1:15:48 PM UTC link Permalink

I thought the guide encourage us using culture-based names rather than simply using whats in the original sentence?
(any famous Ziri anyway?)

{{vm.hiddenReplies[39186] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 22 days ago November 10, 2022 at 10:37:09 PM UTC link Permalink

i'm not aware of any rules that would allow using different names in translations than in the originals, nor of anyone doing so

Cangarejo Cangarejo 21 days ago, edited 21 days ago November 12, 2022 at 12:09:55 AM UTC, edited November 12, 2022 at 12:11:33 AM UTC link Permalink

Out of curiosity, I tried substituting the names Tom and Mary with random common male and female names in CK's untranslated sentences and this is what I got. It's a .txt file.

https://we.tl/t-BUw9LKm3Pk

My script may still have bugs.

{{vm.hiddenReplies[39189] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 21 days ago, edited 21 days ago November 12, 2022 at 12:57:43 AM UTC, edited November 12, 2022 at 12:58:43 AM UTC link Permalink

> ... substituting the names Tom and Mary with random common male and female names ...

You can also try some of these demos that I made in 2017 to see names being randomly substituted.

http://aitstudy.com/sub/

Each time you load the pages, you'll get another random selection.

{{vm.hiddenReplies[39190] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 20 days ago November 12, 2022 at 7:54:59 PM UTC link Permalink

FYI: The demos are broken on non-Apple platforms because you don't define jtext if mac = false.

Since you already have the ability to automatically replace wildcard names, maybe you could do that *before* adding sentences to Tatoeba so that everyone else doesn't have to deal with this.

The number of commented-out lines in http://aitstudy.com/sub/vars.js as well as notes like

// don't use languages that require "AN English book" "AN English teacher"
// This will mess up with "French restaurant" etc.

tell me that you should be well aware that getting this right even for a single language-pair is no trivial task. And Tatoeba has quite a few more languages than just English and Japanese...

Personally, I just filter out all sentences where a wildcard match is detected instead of trying to fix them somehow.

{{vm.hiddenReplies[39195] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 20 days ago, edited 20 days ago November 12, 2022 at 8:17:05 PM UTC, edited November 12, 2022 at 8:17:45 PM UTC link Permalink

> Since you already have the ability to automatically replace wildcard names, maybe you could do that *before* adding sentences to Tatoeba so that everyone else doesn't have to deal with this.

that would defeat the goal of wildcards, which is that if everyone were to use them then near-duplicates would be decreased
which is a laudable goal given that more near-duplicates lead to a more disconnected network, but yeah the end result is eh
personally i think a better way would be to have the sentence adding page search for similar sentences before submission, but that could require an overhaul of the search system and would probably increase load massively

> And Tatoeba has quite a few more languages than just English and Japanese...

i've heard the example of russian not declining `Mary` at all, meaning it can't be automatically replaced by names that do decline without using some sort of natural language processing library

{{vm.hiddenReplies[39196] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 19 days ago November 13, 2022 at 12:07:46 PM UTC link Permalink

> that would defeat the goal of wildcards, which is that if everyone were to use them then near-duplicates would be decreased

I don't think that ever worked.

The probability of randomly creating a near-duplicate is very low to begin with: The most common length of an English sentence on Tatoeba is six words. (There are more than 260,000 six-word sentences.) Even with a very limited vocabulary, six words leave enough flexibility to write potentially billions of different sentences. If you randomly pick one of those billions of sentences, the odds of landing near one of the 260,000 existing six-word sentences are already quite low, and the odds of differing only in wildcards are lower still.

But when you use the same wildcards every time, that reduces variation and makes near-duplicates more likely: The most common three words to start a six-word sentence are "Tom and Mary"; there are 1855 such sentences. Once you've chosen to start a six-word sentence with "Tom and Mary", there are only three words left with which to distinguish this "Tom and Mary" sentence from all the "Tom and Mary" sentences that already exist.

Still, I think that sentences like
#8508429 Tom and Mary adopted a child.
#8509338 Tom and Mary adopted a kid.
were likely intentionally created to (ironically) show variation. I think it would've been better if the "Tom and Mary" part were different as well.

> personally i think a better way would be to have the sentence adding page search for similar sentences before submission

I think the point where someone has already entered a sentence into the submission form is a bit too late to try and encourage diversity. It might discourage the creation of many similar sentences a bit, but I'd prefer to encourage the creation of many different sentences instead, if possible.

I think Tatominer https://tatominer.netlify.app is a good step in that direction, since when adding a sentence with a word that only appears in the corpus a few times or not at all, you can be pretty sure that it won't be similar to existing sentences. (Unless you always just put the word into the same generic template, of course.)

maaster maaster 21 days ago November 12, 2022 at 3:41:12 AM UTC link Permalink

The subject in a sentence can be 963 billion different things.
I just can't perceive why in the world some people almost alway take a person, a person's name as the subject of their sentences.

{{vm.hiddenReplies[39191] ? 'expand_more' : 'expand_less'}} hide replies show replies
ZegPhig ZegPhig 21 days ago November 12, 2022 at 9:01:52 AM UTC link Permalink

It's just easier than to create sentences not about people, because in real life we also use more often sentences with or about other persons. We live in society and very often speak about other persons

{{vm.hiddenReplies[39192] ? 'expand_more' : 'expand_less'}} hide replies show replies
maaster maaster 20 days ago November 12, 2022 at 12:54:42 PM UTC link Permalink

We speak more about other themes together, e.g. work, sports, politics, cars, cooking, diseases, sex..., I think.

{{vm.hiddenReplies[39193] ? 'expand_more' : 'expand_less'}} hide replies show replies
ZegPhig ZegPhig 20 days ago, edited 20 days ago November 12, 2022 at 3:13:11 PM UTC, edited November 12, 2022 at 3:17:15 PM UTC link Permalink

Yes, we speak about this themes, but we do it with other people and often about other people. What do you think? How you cook this? Why are you like that bad movie? Tom Hanks is stupid politician. etc. Even in books texts describe actions of characters very often.
Anyway, I agree with you. I also think Tatoeba need more sentences about different themes.

morbrorper morbrorper 20 days ago November 13, 2022 at 8:55:50 AM UTC link Permalink

I for one would appreciate more variety without using either names or pronouns. How about using "the man", "the woman", "the child", "the doctor", "the butcher", "the seamstress", etc? That would make for more interesting sentences, IMO.

{{vm.hiddenReplies[39198] ? 'expand_more' : 'expand_less'}} hide replies show replies
maaster maaster 19 days ago November 13, 2022 at 10:40:57 AM UTC link Permalink

There are sentences with "the woman" ("the man"); I find them a bit weird.

{{vm.hiddenReplies[39199] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir 19 days ago November 13, 2022 at 10:46:25 AM UTC link Permalink

+1
it's not that common to use that on its own, I would add some descriptive attribute like "the man wearing green glasses" or "the woman sitting next to the door".

maaster maaster 7 days ago, edited 7 days ago November 25, 2022 at 7:23:59 PM UTC, edited November 25, 2022 at 8:06:55 PM UTC link Permalink

No matter how long we discuss the matter, the result will be the same: many users aren't happy about too much name-sentences (i.e. Tom/Ziri-sentences). This is why I avoided translating from English in the last years.
(I sometimes wonder it is just a trolling.)

Cangarejo Cangarejo 6 days ago, edited 5 days ago November 26, 2022 at 10:09:08 PM UTC, edited November 27, 2022 at 11:26:51 AM UTC link Permalink

Some time ago, I wrote a script that tries to select English sentences more randomly than Tatoeba does. The result was this text file.

https://easyupload.io/1m5jr3

I’ve shared this before, but I’m sharing it again.

{{vm.hiddenReplies[39240] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 6 days ago November 27, 2022 at 1:34:08 AM UTC link Permalink

ooh this seems *very* useful, even despite the many extremely-rare words

i see it's a subset rather than a shuffle; would you be open to one of us uploading it as a tatoeba list? i can do it with the script i use for my cc0 list, or i can send said script to you if you have python

{{vm.hiddenReplies[39241] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cangarejo Cangarejo 5 days ago, edited 5 days ago November 27, 2022 at 11:26:16 AM UTC, edited November 27, 2022 at 1:28:02 PM UTC link Permalink

You can do whatever you want with the list. I imagine it would take a lot of time to upload it, so I’m not interested in doing that myself. If you want, you can also regenerate the list by using the following script.
https://github.com/CangarejoAsu...n/balance-s.py

Just use the following command.

python balance-s.py 1 eng_sentences.csv balanced.txt

This script only works for languages that don’t string words together and that use few suffixes though, so probably only English and Vietnamese.

CK CK 5 days ago November 28, 2022 at 1:37:31 AM UTC link Permalink

Hecko, try this.

http://study.aitech.ac.jp/cangarejo_random/

* Limited to English that did not yet have Toki Pona.

* And then, limited to English that had audio.

* And then, limited to English with Polish translations.

This is an example of what you can create offline for yourself, so creating a list on tatoeba.org wouldn't be necessary.

{{vm.hiddenReplies[39245] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 4 days ago November 28, 2022 at 1:13:35 PM UTC link Permalink

hmm yeah that's fair enough
the idea behind making a list was that other people could use it too, but i suppose i should first *ask* what people want to translate and then adjust accordingly
and i somehow didn't think of generating a html page for convenience, thanks for that idea

{{vm.hiddenReplies[39247] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 4 days ago November 28, 2022 at 11:22:10 PM UTC link Permalink

update on that: turns out that just filtering by length (10-20 words) and removing names makes a set of sentences that i find very appealing, perhaps even more so than cangarejo's approach (though the latter is probably better for diversity of vocabulary)
though i'm probably gonna make a tatoeba list out of (a sample of) that because it's honestly more convenient

{{vm.hiddenReplies[39248] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cangarejo Cangarejo 4 days ago, edited 3 days ago November 28, 2022 at 11:52:17 PM UTC, edited November 29, 2022 at 9:26:59 PM UTC link Permalink

Here’s a list that’s somewhat lighter on vocab.

https://easyupload.io/k1sail

Can I please have a copy of that script for uploading lists?

sharptoothed sharptoothed 5 days ago November 27, 2022 at 10:42:09 AM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[39242] ? 'expand_more' : 'expand_less'}} hide replies show replies
small_snow small_snow 5 days ago November 27, 2022 at 10:58:08 PM UTC link Permalink

Спасибо большое!

[#4718216] Спасибо большое. (*sharptoothed)

{{vm.hiddenReplies[39244] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed 5 days ago November 28, 2022 at 8:27:02 AM UTC link Permalink

どういたしまして ^^

CK CK 7 days ago, edited 6 days ago November 26, 2022 at 8:56:47 AM UTC, edited November 26, 2022 at 10:12:18 AM UTC link Permalink

🍎 For Those Studying Japanese

http://study.aitech.ac.jp/recent-jpn/

This should use the text-to-speech (TTS) Japanese voice you have setup for your computer, if you have done so. When you click the "next" button, you will hear the sentence being spoken as the next sentence is being loaded.

This does not work on an iPhone, at least not on mine.
It worked in the 9 browsers I tested it with on a Macintosh.
I haven't yet checked it on Windows or Linux.

These were the 10,000 most-recently added Japanese sentences in the 2022-11-26 exported data that were owned by 3 of our prolific native-speaker contributors,bunbuku, small_snow, and KK_kaku_.

This starts with the most-recent sentence in the 2022-11-26 exported data and goes in reverse numerical order.

hecko hecko 8 days ago, edited 8 days ago November 24, 2022 at 9:29:35 PM UTC, edited November 24, 2022 at 9:40:20 PM UTC link Permalink

currently trying to reconcile my principle of "the more sentences can be linked, the better" with "when translating an ungendered sentence to a language that genders it, go with the feminine"

never thought i'd have an ideological dilemma about sentences

{{vm.hiddenReplies[39229] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic 7 days ago November 25, 2022 at 11:12:26 AM UTC link Permalink

> when translating an ungendered sentence to a language that genders it, go with the feminine"

Where is this “principle” from ?!?

{{vm.hiddenReplies[39230] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 7 days ago November 25, 2022 at 1:17:13 PM UTC link Permalink

from my brain, i should've clarified that this is what *i* have decided to do and not anything prescribed at me

{{vm.hiddenReplies[39231] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic 7 days ago November 25, 2022 at 3:08:20 PM UTC link Permalink

Strange principle. And what do you do when languages have more than two genders ?

{{vm.hiddenReplies[39233] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 7 days ago November 25, 2022 at 3:24:00 PM UTC link Permalink

i'm full of strange principles :)

as i'm only barely a polyglot i don't speak any languages where that would be a problem (polish has a neuter gender but using it for people is very rare)
,,though i guess when translating toki pona sentences into english i do have more than 2 choices, but i rarely translate *from* toki pona, and when i do i usually include multiple variations anyway

Thanuir Thanuir 7 days ago November 26, 2022 at 8:23:50 AM UTC link Permalink

Periaate on ihan hyvä; ainakin viimeksi, kun tarkistin, oli miespuolisia nimiä ja persoonapronominejä enemmän kuin naispuolisia vastaavia, puhumattakaan sukupuolineutraaleista.

tatoebashrek tatoebashrek 14 days ago November 18, 2022 at 10:19:25 PM UTC link Permalink

FAQ should be updated to say there is in fact now an API.

On the page https://en.wiki.tatoeba.org/articles/show/faq is the section "Does Tatoeba provide an API?" with the answer "No, it does not (yet)."

In fact, it does, as described on https://en.wiki.tatoeba.org/articles/show/api

The FAQ should be updated to say "Yes, it does!" and link to this other page.

{{vm.hiddenReplies[39223] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 14 days ago November 19, 2022 at 8:24:15 AM UTC link Permalink

I suspect that it still doesn't actually work.

https://github.com/Tatoeba/tato...ent-1133129204

{{vm.hiddenReplies[39224] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 13 days ago November 19, 2022 at 12:44:43 PM UTC link Permalink

the api itself works, it's just that it can't be accessed by javascript from other websites
so it's still useful for e.g. python scripts or mobile apps, just not web apps (that don't have their own server)

TRANG TRANG 12 days ago November 20, 2022 at 5:08:54 PM UTC link Permalink

I updated the FAQ :)

Polgar1 Polgar1 9 days ago November 23, 2022 at 11:20:15 PM UTC link Permalink

Oh that's amazing news!

I remember that I wanted to use the API (even appeared on video haha) but lately I've been (hyper)focused on the Raku programming language... now, I might be able to get the two together and make a Raku module for the Tatoeba API. ;)

hecko hecko 14 days ago November 18, 2022 at 8:26:53 PM UTC link Permalink

is it just me or are collaborative lists kinda broken? e.g. here https://tatoeba.org/en/sentences_lists/show/169834 everything is a green link

{{vm.hiddenReplies[39218] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 14 days ago November 18, 2022 at 9:07:31 PM UTC link Permalink

Someone changed the color in css.

Guybrush88 Guybrush88 14 days ago November 18, 2022 at 9:57:14 PM UTC link Permalink

I experience the same bug. I opened a ticket on the bug tracker: https://github.com/Tatoeba/tatoeba2/issues/3016

sh5555 sh5555 14 days ago November 18, 2022 at 10:55:23 AM UTC link Permalink

タトエバは面白い!

{{vm.hiddenReplies[39215] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 14 days ago November 18, 2022 at 2:11:27 PM UTC link Permalink

そうですね!気に入ってくれて嬉しいです!^^

タトエバへようこそ!

small_snow small_snow 14 days ago November 18, 2022 at 8:13:58 PM UTC link Permalink

はじめまして。ようこそTatoebaへ!
私も面白くてついついやっちゃうんです。
一緒にハマりましょう!😊

{{vm.hiddenReplies[39217] ? 'expand_more' : 'expand_less'}} hide replies show replies
small_snow small_snow 14 days ago, edited 14 days ago November 18, 2022 at 8:54:45 PM UTC, edited November 18, 2022 at 10:39:21 PM UTC link Permalink

Dear all of Tatoeba!
Here is a Japanese newcomer, sh5555. sh5555 (she or he) says, "Tatoeba is interesting!" I'm happy to hear that, so I wanted to share it with all of you.
Thank you. :)

hecko hecko October 24, 2022, edited October 24, 2022 October 24, 2022 at 4:06:01 PM UTC, edited October 24, 2022 at 4:08:27 PM UTC link Permalink

I found a public-domain English textbook with ~700 example sentences: https://www.gutenberg.org/ebooks/48702

Some of them are old-timey, and some others are surely near-duplicates of existing sentences, but they might still be useful, especially since they were designed specifically for language education.

There's also the Mozilla Common Voice list, but CK is already handling adding sentences from it: https://github.com/common-voice...-collector.txt

{{vm.hiddenReplies[39112] ? 'expand_more' : 'expand_less'}} hide replies show replies
ZegPhig ZegPhig October 24, 2022, edited October 24, 2022 October 24, 2022 at 4:23:06 PM UTC, edited October 24, 2022 at 4:24:21 PM UTC link Permalink

All books on Project Gutenberg are public domain. They don't publish other. Problem is we don't have bot for importing sentences from other sites and PG-books have too many old-timey sentences. I know some Russian and English websites, what use CC-BY, and write on modern language, therefore I think, that it will be more usefull, if we use them instead of sourses with public domain. Public domain is important also, but new words of modern languages are more important (in my opinion)

{{vm.hiddenReplies[39113] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko October 24, 2022, edited October 24, 2022 October 24, 2022 at 4:41:43 PM UTC, edited October 24, 2022 at 4:45:39 PM UTC link Permalink

I've suggested a CC-BY website once before, but the license version didn't match the one we use, and as CK argues:

> Though some may not agree, I think we shouldn't be using data that requires attribution, since our data is used by others who will unlikely pass on the attribution, since there is no easy way to do that. One of the values of the Taoteba Corpus is that it may be used freely by others to develop apps or for other uses by just attributing tatoeba.org.
https://github.com/Tatoeba/tato...ent-1195388425

That being said, you've reminded me of this list of modern uncopyrighted blogs, which I have handpicked a few sentences from before: https://project-awesome.org/joh...opyright#blogs

Additionally, I'm mildly considering using AI to try and filter out archaic sentences from Project Gutenberg books. This wouldn't solve the lack of new vocabulary, but I think timeless sentences would be useful too.

{{vm.hiddenReplies[39114] ? 'expand_more' : 'expand_less'}} hide replies show replies
ZegPhig ZegPhig October 24, 2022 October 24, 2022 at 6:48:33 PM UTC link Permalink

I'm agree with you. Timeless sentences will be usefull. Thank you for the CC0-list!

About licenses, which requires attribution. It's strange. For example, I'm author of text and I use CC-BY for it. Somebody create new history based on my original work, he should write, that his work based on my text and he decided to use CC-BY also. If someone third want to create animation based on second work, should he to indicate both sourses (the first and second work) or only second work? I thought always that CC-BY allow chain of sourses and I can indicate only sourse, which was used for my work, because other people can visit it and find link to the first text there.

I thought and think still indicating of sourses can work so:
3. Work based on second work (sourse: 2) —> 2. Work based on original work (source: 1) —> 1. Original work

CK think that indicating of sourses work so:
3. Work based second work (sources: 1, 2) —> 2. Work based on original work (sourse: 1) —> 1. Original work

Did I understand exactly?

Yorwba Yorwba October 24, 2022 October 24, 2022 at 6:30:59 PM UTC link Permalink

Speaking of public domain books, Shuzo Fujimoto's heirs recently released five of his origami books (in Japanese) into the public domain: https://origami.kosmulski.org/b...-public-domain

Maybe someone learning Japanese would like to mine them for example sentences and learn something about origami in the process.

hecko hecko 15 days ago November 17, 2022 at 7:50:12 PM UTC link Permalink

Another set, this one with over a thousand sentences: https://cofl.github.io/conlang/...-analysis.html

It was published in 1922, as cited here: http://archives.conlang.info/ka...haulqhuen.html

If anyone decides to mass-add either this set or the textbook sentences, I ask that you apply the CC0 license to them. (I already have sentences from the textbook roughly extracted, but I'm not sure when or if I'll get around to adding them.)

16 days ago November 16, 2022 at 12:49:58 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

sharptoothed sharptoothed 19 days ago November 13, 2022 at 4:21:38 PM UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/