menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,337 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

AlanF_US

16 hours ago

subdirectory_arrow_right

Selena777

17 hours ago

subdirectory_arrow_right

gillux

yesterday

subdirectory_arrow_right

lbdx

yesterday

subdirectory_arrow_right

jimkillock

yesterday

subdirectory_arrow_right

Selena777

yesterday

subdirectory_arrow_right

sabretou

2 days ago

subdirectory_arrow_right

lbdx

2 days ago

subdirectory_arrow_right

sabretou

2 days ago

subdirectory_arrow_right

Objectivesea

2 days ago

sharptoothed sharptoothed July 25, 2021 July 25, 2021 at 7:19:35 PM UTC link Permalink

* Tatoeba As A Graph *

Tatoeba internals represented as undirected graphs.

https://tatoeba.j-langtools.com/tgraph/

{{vm.hiddenReplies[37374] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez July 25, 2021 July 25, 2021 at 11:09:43 PM UTC link Permalink

This is cool

AlanF_US AlanF_US July 22, 2021 July 22, 2021 at 4:09:23 PM UTC link Permalink

I said a few days ago that I wanted to respond to the thread about near-duplicates ( https://tatoeba.org/en/wall/sho...#message_37321 ), but I haven't had a chance to do it until now. Trang brought up many of the things that I would want to, but there are a few more things that I want to say.

(1) Diversity of sentences benefits everyone, beginners as well as advanced learners. Even simple sentences that are designed to demonstrate variation across a single dimension (such as substitution of a pronoun) are more valuable when they contain variety in multiple respects. We are humans, not machines, and we learn best when we're not bored.

(2) Even if an overabundance of near-duplicate sentences had a positive or neutral effect on beginning learners (which, as I say, I don't believe) but disadvantaged advanced learners, then we would need to take it seriously. This is true not only for what one might call ethical reasons (not wanting to frustrate advanced learners), but for practical ones as well (we want the contributions that advanced learners make; we want to serve the needs of a broad community so that people don't feel they need to leave Tatoeba once they achieve a certain level).

(3) Even if an overabundance of near-duplicate sentences had a positive or neutral effect on speakers of language X (which again, I don't believe in general) but disadvantaged English speakers, then we would need to take it seriously. Otherwise, we'd be simply substituting "X-centric" for "English-centric".

(4) There are many kinds of sets of near-duplicate sentences, and they exist on a continuum where the extent to which they reduce the quality of the corpus increases with:
(a) the number of sentences in a particular set
(b) the simplicity of the grammatical transformations required to produce one from another (so "drop-in" substitutions that leave the rest of the sentence untouched, like replacing "everyone" with "everybody", are less valuable than those that involve changing gender or tense)
(c) the lack of word-choice transformations required to produce one from another (where, for instance, changing a noun would also require changing the verb that is associated with it, as in "damage a building" -> "injure a person")
I believe Trang's point is that we should consider how to avoid adding the near-duplicates that are towards the wrong end of the scale of usefulness.

(5) By allowing us to write sentences virtually without restriction, Tatoeba already provides us a huge degree of self-expression. Ultimately, Trang is talking about how we can write sentences so that they achieve the most good. I feel like it's reasonable to ask ourselves that question, rather than reflexively jumping to a defense of what we've always done.

{{vm.hiddenReplies[37363] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx July 24, 2021, edited July 24, 2021 July 24, 2021 at 8:17:24 PM UTC, edited July 24, 2021 at 8:38:09 PM UTC link Permalink

I am also one of those who think that similar sentences generally bring more noise than value to Tatoeba's corpus.

In order to reduce the proportion of similar sentences, it might be useful to ask contributors for confirmation when they add an unlinked sentence whose originality is below a certain threshold.

{{vm.hiddenReplies[37369] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 25, 2021 July 25, 2021 at 1:28:16 AM UTC link Permalink

> it might be useful to ask contributors for confirmation when they add an unlinked sentence whose originality is below a certain threshold.

That would be nice, but I'm sure it's beyond what we can do any time soon, given our limited number of developers and long backlog of requests. I think Trang's comment applies here as well:

"The reality about Tatoeba today is that it doesn't provide a full-fledged set of features for people to sort out what they possibly don't need. For all I know, it could take another ten years till we get there and during this time we cannot operate as if the necessary features were going to be rolled out tomorrow.

That being said, if someone wants to work on a technical solution to help users filter out near-duplicates, I have to remind that Tatoeba is an open source project and we're always more than happy to receive pull requests :)"

In the absence of such a feature, it comes down to contributors making an effort to write sentences that are less likely to be near-duplicates.

{{vm.hiddenReplies[37370] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx July 25, 2021, edited July 25, 2021 July 25, 2021 at 8:54:24 AM UTC, edited July 25, 2021 at 10:38:44 AM UTC link Permalink

> In the absence of such a feature, it comes down to contributors making an effort to write sentences that are less likely to be near-duplicates.

On the "Add sentences" page, we can read "Avoid using the same words, names, topics, or patterns over and over again." Yet some very large contributors have completely ignored this recommendation for years and continue to flood the corpus with low quality sentences.

I think it is now urgent to recognize that guidelines alone will not be sufficient and that it is necessary to introduce friction if we want to curb this phenomenon.

It is true that the feature I proposed in my previous post is difficult to implement in the short term. A simpler and more radical solution would be to cap the number of unlinked sentences a contributor can add in a given time period. Similarly, it might be useful to cap the total number of unlinked sentences that a user is allowed to own.

{{vm.hiddenReplies[37371] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba July 25, 2021 July 25, 2021 at 11:51:13 AM UTC link Permalink

I don't think a cap on the number of unlinked sentences is going to work, because many near-duplicates get added by linking them to existing sentences.

E.g. #252252 was adopted by CC https://tatoeba.org/en/user/profile/CC , which helpfully states in the description that it's an account created specifically for "Sentences that I have either adopted or written that have a version with contractions." Then the version with contraction was added as a new sentence #10189287 and all sentences linked to #252252 were also linked to #10189287 , thus creating a near-duplicate with many links.

A cap on unlinked sentences wouldn't prevent this, and even a cap on all sentences would be easy to circumvent with multi-accounting.

I think this kind of behavior is caused by strict adherence to the rule that correct sentences shouldn't be changed, https://en.wiki.tatoeba.org/art...are-correct%2E so CK just adds his preferred variant to a whole lot of sentences. If we change the rule to allow changes to sentences you own or adopt, provided they don't change the meaning, we might get fewer of these kinds of near-duplicates.

{{vm.hiddenReplies[37372] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx July 25, 2021, edited July 25, 2021 July 25, 2021 at 12:45:04 PM UTC, edited July 25, 2021 at 12:46:00 PM UTC link Permalink

> many near-duplicates get added by linking them to existing sentences

We could also cap the number of links per language that a contributor can add to a sentence. This would avoid situations like #8558069 .

> even a cap on all sentences would be easy to circumvent with multi-accounting

The goal is to set clear and sufficiently deterrent limits. Those who still choose to circumvent the rules will have to work harder.

ddnktr ddnktr July 24, 2021 July 24, 2021 at 3:17:03 PM UTC link Permalink

Does the list of "Vocabulary that needs sentences" (https://tatoeba.org/en/vocabula...sentences/eng) update as new sentences are added for those phrases? I like this page a lot and decided to use it to add English sentences a while ago, and I noticed that even when a given vocabulary request already has 9 sentences, the number doesn't change (and the entry doesn't disappear) after I add new sentences.

I noticed while looking around that other requests seem "frozen" based on the time they were added. "Spot on" is on page 24 of the English vocabulary requests (https://tatoeba.org/en/vocabula...s/eng?page=24) with "1 sentence," but when you click on the link to show the existing sentences, there are actually 24 sentences (https://tatoeba.org/en/sentence...&unapproved=). This is similar for a lot of other entries ("spouse" on page 25 doesn't have 1 sentence as is written, but 63). Maybe it would be more useful if the list periodically updated to reflect the number of existing sentences in the corpus. I don't know if someone has brought this up before.

{{vm.hiddenReplies[37366] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx July 24, 2021 July 24, 2021 at 5:14:39 PM UTC link Permalink

I think this bug has already been reported: https://github.com/Tatoeba/tatoeba2/issues/2239

{{vm.hiddenReplies[37367] ? 'expand_more' : 'expand_less'}} hide replies show replies
ddnktr ddnktr July 24, 2021 July 24, 2021 at 6:00:22 PM UTC link Permalink

I wasn't aware. Thanks!

sharptoothed sharptoothed July 24, 2021 July 24, 2021 at 10:40:40 AM UTC link Permalink

** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

lbdx lbdx July 24, 2021 July 24, 2021 at 8:31:18 AM UTC link Permalink

** Tatominer **

Thanks to Yorwba, Walentinio, marafon, Objectivesea, AlanF_US, Polgar1, cojiluc, shekitten, ddnktr, small_snow, maaster, Rafik, glavsaltulo, iiujik, Jeigmz, Shishir, aldar, megamanenm and giuliopaci for their 112 contributions that helped move the project forward this week.

Check out the most searched words that lack sentences or translations in your language at https://tatominer.netlify.app.

Elin Elin July 18, 2021 July 18, 2021 at 2:17:09 PM UTC link Permalink

Hello, I am new here and I have a question: how does one report incorrect sentences? I can see no flag option.

I left a comment containing the correct translation a week ago, but no-one has responded or corrected the sentence.

Thank you for the help: I am looking at the wiki as well but it is not obvious where to find this information.

{{vm.hiddenReplies[37347] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 18, 2021 July 18, 2021 at 3:57:49 PM UTC link Permalink

Welcome to Tatoeba, Elin!

You did the right thing by adding a comment. If you stay around for a while, you will gain the ability to add tags (such as "@check" and "@change") that are periodically checked.

Unfortunately, if you look at this page:

https://tatoeba.org/en/stats/native_speakers

you'll see that Welsh has no admins, corpus maintainers, or advanced contributors, and in fact it has only three contributors in total. This makes you all the more valuable to us :) but it explains why no one has responded to you yet.

Admins and corpus maintainers can modify or delete sentences from languages that they don't know as long as they are given clear, reliable information. So now that we know that you're adding comments, we can periodically look at them.

{{vm.hiddenReplies[37349] ? 'expand_more' : 'expand_less'}} hide replies show replies
Elin Elin July 18, 2021 July 18, 2021 at 6:10:09 PM UTC link Permalink

Thank you Alan - a really helpful reply.

I have been learning Welsh (as an adult) for 6 years and am happy to help with sentences at the beginner's (Mynediad/Sylfaen) end of the spectrum. I learn De Cymraeg (South Walian) but at the Uwch 2 level I have just completed, more and more North Walian is being introduced - which will necessitate more pseudo-duplicate sentences!!

{{vm.hiddenReplies[37351] ? 'expand_more' : 'expand_less'}} hide replies show replies
maaster maaster July 20, 2021 July 20, 2021 at 5:55:57 AM UTC link Permalink

You can only add tags at least as an advanced contributor.
If you write a comment into the comment field, you can use the @-sign; e.g.: @AlanF_US and in this case Alan gets a massage.

{{vm.hiddenReplies[37359] ? 'expand_more' : 'expand_less'}} hide replies show replies
Elin Elin July 20, 2021 July 20, 2021 at 6:21:59 PM UTC link Permalink

Thank you :o)

mraz mraz July 20, 2021 July 20, 2021 at 6:10:12 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

{{vm.hiddenReplies[37360] ? 'expand_more' : 'expand_less'}} hide replies show replies
mraz mraz July 20, 2021, edited July 20, 2021 July 20, 2021 at 1:30:15 PM UTC, edited July 20, 2021 at 2:22:40 PM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

DostKaplan DostKaplan July 18, 2021 July 18, 2021 at 8:18:17 PM UTC link Permalink

What's the difference between sentences that have a blue arrow versus those that have a gray one? Example:

https://tatoeba.org/en/sentence...t+I+%22&to=tur

{{vm.hiddenReplies[37352] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet July 18, 2021 July 18, 2021 at 8:26:42 PM UTC link Permalink

blue: direct translation
grey: indirect (translation of a translation)
You can exclude indirect translations by setting "Link" from "any" to "direct".

lbdx lbdx July 17, 2021 July 17, 2021 at 7:37:17 AM UTC link Permalink

** Tatominer **

Thanks to Yorwba, AlanF_US, ddnktr, Pfirsichbaeumchen, marafon, shekitten, Wezel, small_snow, danepo, CK, Walentinio, wolfgangth, Ergulis, soweli_Elepanto, Rovo, Olegg, cojiluc and Ooneykcall for their 118 contributions that helped move the project forward this week.

Check out the most searched words that lack sentences or translations in your language at https://tatominer.netlify.app.

{{vm.hiddenReplies[37335] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG July 17, 2021 July 17, 2021 at 2:41:48 PM UTC link Permalink

I'm thinking about adding a link to Tatominer on the "Add sentences" page. What do you think?

{{vm.hiddenReplies[37337] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 17, 2021 July 17, 2021 at 4:11:08 PM UTC link Permalink

That sounds like a great idea to me.

Pfirsichbaeumchen Pfirsichbaeumchen July 18, 2021, edited July 18, 2021 July 18, 2021 at 1:35:37 AM UTC, edited July 18, 2021 at 1:39:19 AM UTC link Permalink

A great idea. It might help people come up with new sentences as are sought. Would it be possible to generate a random set of, say, ten searched expressions on the "add sentences" page?

lbdx lbdx July 18, 2021 July 18, 2021 at 11:49:45 AM UTC link Permalink

As Tatominer's "Add translations" pages are the most popular, it might also be useful to add a link from Tatoeba's "Translate sentences" page.

Kazuki278 Kazuki278 July 6, 2021, edited July 6, 2021 July 6, 2021 at 12:49:52 PM UTC, edited July 6, 2021 at 12:51:27 PM UTC link Permalink

Hi guys, I wanna ask something.
So... Malay has two writing script, Latin(Rumi) and Arabic(Jawi), and I would like to contribute in both script.
But did I did it right though? Should I just post both script in same language section like this?
https://tatoeba.org/en/sentences/show/10152828
Example:
Saya suka makan nasi (I like to eat rice)
ساي سوک ماكن ناسي

{{vm.hiddenReplies[37293] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba July 6, 2021 July 6, 2021 at 9:08:58 PM UTC link Permalink

Welcome to the project! The only Malay words I know are "nasi lemak" (because of the movie by Namewee), but I think you did it right.

On Tatoeba, every sentence can have an unlimited number of translations in the same language, to show different word choices, different meanings of an ambiguous sentence and of course also different ways to write the same sentence.

In some places, the Jawi script might end up looking weird because the layout is currently configured to treat Malay as written from left-to-right, but I'll make the necessary configuration changes to fix that with the next update.

If there's some kind of standard method to turn Jawi into Rumi and vice versa, we could also add a feature to the website to make this conversion automatic. You can see what it would look like on this Uzbek sentence: #8824337 (It might however take a long time to implement. There was a request to add automatic transliteration for Yiddish, but nobody has found the time to work on it yet: https://github.com/Tatoeba/tatoeba2/issues/2651 )

{{vm.hiddenReplies[37296] ? 'expand_more' : 'expand_less'}} hide replies show replies
HAGNi HAGNi July 6, 2021 July 6, 2021 at 11:44:23 PM UTC link Permalink

Not from Malaysia, but from Indonesia. AFAIK I think having an automatic transliterator for Jawi-Rumi would be kinda hard, especially since there are many exceptions to the orthography (preserved Arabic spelling, different spelling rules within the two writing systems, etc.), unlike Uzbek which has almost one-to-one correspondence between the Latin and Cyrillic alphabets. This makes me wonder why don't languages like Serbian also have automatic conversion?

As for multiple translations written in multiple scripts in the same language, that's what I have been doing too for Javanese for quite some time, like this: https://tatoeba.org/en/sentences/show/10131755

Kazuki278 Kazuki278 July 7, 2021 July 7, 2021 at 6:40:35 AM UTC link Permalink

Ok thx, good to know I'm in the right track :D

For the transliteration tho, like HAGNi said, its a bit challenging too. I dont think the automatic transliteration would be a great idea for Jawi.
And for RTL, maybe if we can somehow mark it as RTL would be great.

There are some dev who made transliterator for Latin to Jawi but afaik, Its used system like dictionary or something, if the word doesn't have in database for it, the transliteration will return error.

Jawigram (Telegram Bot): https://t.me/jawigram_bot

Rumi2Jawi: http://rumi-to-jawi.appspot.com
-Github: https://github.com/mohdzamrimurah/rumi-to-jawi

{{vm.hiddenReplies[37300] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba July 7, 2021 July 7, 2021 at 8:27:01 PM UTC link Permalink

Thank you for the information. The GitHub repo you linked would be a great starting point for adding this functionality to Tatoeba, but, yeah, it would be a lot of work. Let's forget about it for now. :)

soliloquist soliloquist July 7, 2021 July 7, 2021 at 10:12:52 PM UTC link Permalink

> And for RTL, maybe if we can somehow mark it as RTL would be great.

The writing direction of the language needs to be changed from LTR to auto. I had made a similar request for Ottoman Turkish. Sentences in both scripts display correctly now. #9674971

{{vm.hiddenReplies[37309] ? 'expand_more' : 'expand_less'}} hide replies show replies
Kazuki278 Kazuki278 July 8, 2021 July 8, 2021 at 6:06:24 AM UTC link Permalink

Ouh, this is interesting...

TRANG TRANG July 17, 2021 July 17, 2021 at 3:50:51 PM UTC link Permalink

RTL is now supported for Malay and Malay (Vernacular). You may want to review your sentences, as some of them have the punctuation in the wrong place :)

https://tatoeba.org/en/sentence...&direction=asc

{{vm.hiddenReplies[37340] ? 'expand_more' : 'expand_less'}} hide replies show replies
Kazuki278 Kazuki278 July 17, 2021 July 17, 2021 at 4:34:16 PM UTC link Permalink

Woohoo!! Thank you!! 🤩