menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,616 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages feedback

CK

yesterday

subdirectory_arrow_right

lbdx

yesterday

feedback

ecorralest101

2 days ago

subdirectory_arrow_right

CK

3 days ago

feedback

CK

3 days ago

subdirectory_arrow_right

sharptoothed

7 days ago

subdirectory_arrow_right

Cabo

7 days ago

subdirectory_arrow_right

fjay69

7 days ago

feedback

sharptoothed

7 days ago

subdirectory_arrow_right

DJ_Saidez

7 days ago

tommy_san tommy_san February 25, 2013 February 25, 2013 at 7:10:58 AM UTC link Permalink

I've been browsing through the unowned Japanese sentences these days.

When a sentence is clearly wrong, I correct it. (ex. http://tatoeba.org/eng/sentences/show/87733)
When it doesn't match the translation, I add new translations and/or fix the links. (ex. http://tatoeba.org/eng/sentences/show/200178)

The problem is, there are so many sentences that are not incorrect but I don't like.

Here's an illustrative example: http://tatoeba.org/eng/sentences/show/149347
There's actually no mistake in the original sentence, but we hardly ever say that way. "人間が持っているのと同じ感情" is a literal translation of "the same feelings that people have". I changed it to "人間と同じ感情", which is how we usually say.

Here’s another example: http://tatoeba.org/eng/sentences/show/171699
Again, I think "泳ぎたくはないですか" is grammatically correct (at least I can’t explain what’s wrong with it), but it sounds somewhat weird. "泳ぎたくないのですか" sounds much more natural.

I wonder if this kind of change is permissible. Personally I’d like to do away with all those clumsy Japanese sentences inherited from the Tanaka Corpus. I always feel very sad when I google and find them swaggering about on many websites. I can imagine many learners reading and even memorizing these sentences without knowing they’re actually not at all a good Japanese.

And yet I know I’m breaking one of the Tatoeba rules that prohibits us to change anything that isn't wrong, for, as I said earlier, these sentences are not really wrong; they’re just inelegant.

How do you think we should deal with them?

{{vm.hiddenReplies[15711] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san February 26, 2013 February 26, 2013 at 4:06:01 PM UTC link Permalink

Some of you may have thought I was exaggerating when I wrote only 20% of the orphan Japanese sentences are worth adopting. Now I try to show you how they seem to me.

I looked at the first ten orphan sentences that the site randomly showed me.

Among the ten, one was a very good sentence, so I adopted it.
http://tatoeba.org/eng/sentences/show/203334

Three of them were not very bad, but different from what I'd write.
http://tatoeba.org/eng/sentences/show/2261063
http://tatoeba.org/eng/sentences/show/267961
http://tatoeba.org/eng/sentences/show/47391

Four of them looked bad. At least, I would bitterly oppose if someone was going to use these sentences in a Japanese textbook, for example. The last one is especially a good example of a bad Japanese.
http://tatoeba.org/eng/sentences/show/2075864
http://tatoeba.org/eng/sentences/show/295733
http://tatoeba.org/eng/sentences/show/1554289
http://tatoeba.org/eng/sentences/show/25040

(I wrote comments in English for these seven sentences above. You can read them on the pages of original Japanese sentences.)

Two of them I simply ignored. They don't sound natural and each of them is only linked to an orphan English sentence. I decided they're not worth working on.
http://tatoeba.org/eng/sentences/show/113480
http://tatoeba.org/eng/sentences/show/99776

This is only a few examples, but I hope I could give you some idea about what kind of sentences you have been shamelessly displaying for such a long time. (I really want to hear what other Japanese think about these sentences.)

You might think, "these sentences are written by Japanese students; if their English can be wrong, their Japanese can't be wrong". This assumption is, however, totally mistaken. Since English and Japanese are very different languages, it's very difficult for most Japanese to make natural-sounding Japanese sentences equivalent to English ones. So when they're told to translate English into Japanese, they produce sentences far removed from everyday speech. I believe these awkward Japanese sentences are heard in almost every English class in Japan. These sentences are understandable enough for us and are sometimes very helpful to explain how English works. But they're never the kind of Japanese that any serious learner should read. They often give the wrong impression that Japanese is closer to European languages than it really is. (I suppose thise German sentence http://tatoeba.org/eng/sentences/show/331880 is similar to the case I'm talking about.)

Of course I myself once was (and sometimes still am) one of them. I have no intention of insisting I'm flawless. Some people must think some of my sentences sound unnatural. Some must be able to think up better translations. But I'm trying really hard to avoid that kind of clumsiness, and I believe that I---and bunbuku-san and some other Japanese here---can think up much better sentences than most of the orphan ones.

{{vm.hiddenReplies[15743] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 27, 2013 February 27, 2013 at 10:36:17 PM UTC link Permalink

>These sentences are understandable enough for us and are sometimes very helpful to explain how English works. But they're never the kind of Japanese that any serious learner should read. They often give the wrong impression that Japanese is closer to European languages than it really is.

I fully sympathise, since this is exactly what is happening to many French sentences here and French is much much closer to English than Japanese is. So if even Japanese ends up being so distorted, I let you imagine what they do with French...

{{vm.hiddenReplies[15766] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san February 28, 2013 February 28, 2013 at 7:27:13 AM UTC link Permalink

I sometimes find sentences of English and French (and German) that looks so similar that I wonder if they're really natural expressions in each language. It often happens though that I feel assured to find them belonging to some contributors like you who I believe I can trust. Literal translations do sometimes actually work among these languages. But that makes even difficult for us to tell the difference between good translations and bad ones.

If there's a system like I suggested in another thread,
http://tatoeba.org/eng/wall/sho...#message_15740
I'd also love to ask native speakers, "Is this really natural?" Many of them must be fine, so then you can just say "OK".

{{vm.hiddenReplies[15771] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 28, 2013 February 28, 2013 at 8:52:53 AM UTC link Permalink

actually, some sentences are "OK" tagged by natives, so if it is authored by a native and OK tagged by another one, it is some kind of reassurance (you can see who tagged a sentence by hovering your mouse over the tag)
However, very few native contributors bother to "OK"-tag sentences authored by others, alas...

marcelostockle marcelostockle February 26, 2013 February 26, 2013 at 4:20:10 PM UTC link Permalink

@tommy_san:
as for you comments on sentences, regardless of whether these are or aren't unnatural sentence, I'm tagging them with "comment on sentence" to indicate these should be read with these little considerations

Ianagisacos Ianagisacos February 27, 2013 February 27, 2013 at 3:22:06 PM UTC link Permalink

>sharptoothed wrote:
I'm sure that everyone here who, somehow or other, concerns about Japanese part of Tatoeba would do everything within his/her powers to help.

I'm a native Japanese speaker and concern about situations of Japanese sentenses in Tatoeba, like Tommy-san.

I strongly support his estimation that only less than half orphan Japanese sentenses are worth adopting.
I'll also try evaluating that 'ten random orphans' he presented at message_15743 again.

In my feeling:

One of them has no preoblem.
http://tatoeba.org/eng/sentences/show/203334

Five of them is unnatural in most cases, but in rare cases these might be natural.
http://tatoeba.org/eng/sentences/show/2261063
http://tatoeba.org/eng/sentences/show/267961
http://tatoeba.org/eng/sentences/show/47391
http://tatoeba.org/eng/sentences/show/295733
http://tatoeba.org/eng/sentences/show/99776
(Tommy-san rated 295733 and 99776 as bad)

Four of them is bad.
http://tatoeba.org/eng/sentences/show/2075864
http://tatoeba.org/eng/sentences/show/1554289
http://tatoeba.org/eng/sentences/show/25040
http://tatoeba.org/eng/sentences/show/113480

If we found one unnatural orphan out of ten random sentenses, we would feel like to grasp at every unnatural orphan and try to correct it. But in practice there are lot more unnatural sentenses than we could deal with!
At present, when I correct unnatural sentenses or add new ones, I get often anxious.
It is because there are almost no credible systems or guidelines to deal with that vast numbers of orphans and each contributer works in our own way.
I think some other contributers feel unease, too.
If so, I suppose that the absense of public institutions (i.e. clear guideline, public work space, efficient tools) is discouraging Japanese speekers from pioneering the fronteer of orphan sentenses.

I wish it soon gets easier for shy Japanese speakers to contribute to Tatoeba.

{{vm.hiddenReplies[15754] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed February 27, 2013 February 27, 2013 at 7:43:30 PM UTC link Permalink

> It is because there are almost no credible systems or
> guidelines to deal with that vast numbers of orphans and
> each contributer works in our own way.

Actually, the answers to many questions a contributor may ever have are already among documents in Help section (http://tatoeba.org/help). Unfortunately, those documents were never translated into Japanese and this is real problem. Additionally, as Tommy-san noticed, Japanese localization of the user interface leaves much to be desired, to put it mildly, and this is the problem, too. So, putting all together, we can outline the following vital issues concerning Japanese part of Tatoeba:
- catastrophic level of sentences of unsatisfactory quality among unadopted sentences;
- no documentation in Japanese;
- poor Japanese localization of the user interface.
Apparently, the issues above give rise to one more issue: small number of active Japanese members. And this issue affects the Tatoeba Project in a whole, I have to say, and makes things look like an exclusive circle: we need more Japanese members to improve Tatoeba but we have to improve Tatoeba to attract more Japanese members. Fortunately, and I have not the slightest doubt about it, there are all chances to break this circle and I hope you and Tommy-san, and bunbuku-san and other Japanese members can do much about this. With the help of all Tatoeba community, of course.

sacredceltic sacredceltic February 27, 2013 February 27, 2013 at 7:51:12 PM UTC link Permalink

Isn't precisely shyness part of the problem ?
You Japanese natives must take control of your own Corpus, otherwise, self-proclaimed experts control it and you see the result...

Rome wasn't built in a day, and neither was Japan. Someone must take the bull by the horns, NOW !

And be ready to confront people ! We're not here to bow and scrape, but to WORK.

archer_root archer_root February 26, 2013 February 26, 2013 at 11:43:13 PM UTC link Permalink

19 hours ago
Is there a system for weighting orphaned sentences? If after an active adoption cycle, orphaned sentences could be demoted into invisibility, then contributors of all levels could gradually adopt and improve some of these orphaned sentences, while other sentences drift closer over time to total deletion. That's how language works in our own world, no?

--
sharptoothed wrote:
any idea is doomed to fail unless it has supporters. I'm sure that everyone here who, somehow or other, concerns about Japanese part of Tatoeba would do everything within his/her powers to help. And I hope, more native Japanese speakers will join us or all efforts will be wasted.
--

I agree with this sentiment. I'd be glad to help, with what resources I possess.

{{vm.hiddenReplies[15748] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 27, 2013 February 27, 2013 at 11:59:28 AM UTC link Permalink

>And I hope, more native Japanese speakers will join us or all efforts will be wasted.

They didn't join in several years, why would they now that these deterring sentences are still prevalent ?

{{vm.hiddenReplies[15749] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san February 27, 2013 February 27, 2013 at 12:47:50 PM UTC link Permalink

Another thing that's driving them away is the horrible translation of the user interface, as you have mentioned elsewhere. I believe we must cope with it, though it's not an easy job.

{{vm.hiddenReplies[15750] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 27, 2013 February 27, 2013 at 1:05:48 PM UTC link Permalink

You may change it yourself in LaunchPad. Open an account there if you haven't got one yet and go to the Tatoeba project.
Once you have finished with your updates, send a message to sysko so he can republish the new Japanese UI version.
Whatever can be done to make Tatoeba more attractive to Japanese contributors is most urgent...
Thank you in advance.

{{vm.hiddenReplies[15751] ? 'expand_more' : 'expand_less'}} hide replies show replies
tommy_san tommy_san February 27, 2013 February 27, 2013 at 1:52:32 PM UTC link Permalink

I've started working on it.
It seems to me that some (or many) of the phrases there are not used in the current pages any more. Isn't it the case?

Maybe you too should update the French version, especially the names of newly added languages.

{{vm.hiddenReplies[15752] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 27, 2013 February 27, 2013 at 2:27:52 PM UTC link Permalink

>Maybe you too should update the French version, especially the names of newly added languages.

I wanted to, but I seem to not be able to edit these new language names.
In order for UI elements to be translated, they must first be published as translatable by sysko...then the translation must be published by him so that might require a bit of patience.

tommy_san tommy_san February 26, 2013 February 26, 2013 at 4:09:21 PM UTC link Permalink

By the way, some of you must be annoyed because I put links to the pages of English sentences. You can't tell there which Japanese sentence is unowned and which one is mine, until you visit the individual pages. That's exactly what annoys me the most. Owned and unowned sentences look completely equal. This makes Tatoeba too inconvenient---almost useless---for the learners of Japanese, I suppose.

I agree that the users must be able to choose whether they want to see the orphan sentences or not, and that these must be set invisible by default. That could also surely prevent us from missing potential competent Japanese contributors.

Personally I'm opposed to deleting them completely. They can be sometimes useful for us. If Tatoeba is going to have more than one databases soon, then you can just put them all into the database of the lowest rank. They will do you no harm as long as you don't see them.

Concerning the "unnatural" tag, I have to say I'm very reluctant with it. It's a hard job and takes a lot of time to judge if a sentence is acceptable or not. I'm sure some people must think that there's nothing wrong with some of the four sentences that I called "bad" above. If I have time for such a distinction, I'd love to spend that time to make good sentences.

{{vm.hiddenReplies[15744] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed February 26, 2013 February 26, 2013 at 6:30:15 PM UTC link Permalink

Tommy, I second to the most of your considerations, they are quite reasonable and clear. I think, your examples were convincing enough to leave no doubt that the need of changes is ripe. I just want to add a small remark: any idea is doomed to fail unless it has supporters. I'm sure that everyone here who, somehow or other, concerns about Japanese part of Tatoeba would do everything within his/her powers to help. And I hope, more native Japanese speakers will join us or all efforts will be wasted.

sharptoothed sharptoothed February 25, 2013 February 25, 2013 at 10:33:40 AM UTC link Permalink

Personally, I think that sentences on Tatoeba should be as natural as possible. First and foremost, this concerns newly added sentences, not translations. Saying that I understand that in many cases a sentence taken out of context may sound unnatural, weird, ambiguous, etc., so I think we should encourage the contributors to avoid adding such sentences or, at least, to provide usage examples whenever it's necessary.
As for the translations, I think that a good translation should meet at least the following criteria: it should be grammatical, natural, has the same meaning and/or the same effect. Word-for-word translations often have nothing to do with those criteria and, thus, should be avoided unless we want to illustrate some peculiar property of the original sentence. In this case a "good" translation should be added along with the literal one and the latter should be marked with an appropriate tag(s) (i.e., "literal translation", "unnatural", etc.).
The biggest problem is what to consider "natural". Sometimes even native speakers have polar points of view on the same sentence. This means that it's highly desirable to raise a discussion whenever we bump into a doubtful sentence. At least, corrections shouldn't be made without notice and explanation.
Thus, I think that:
- ungrammatical sentences should be corrected unconditionally;
- unnatural sentences should be corrected unless a context or usage example that makes it sound natural could be found. "Unnaturalness" should be explained by a proficient/native speaker.
- if the sentence sounds unnatural due to literal translation made on some purpose, the reason should be explained and the natural translation added along with the literal one.

{{vm.hiddenReplies[15713] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 26, 2013 February 26, 2013 at 7:56:12 AM UTC link Permalink

>Sometimes even native speakers have polar points of view on the same sentence.

Yes and it's normal, but in these cases, at least one of the proponents of the sentence validity adopt it to give it some credibility (when he's not a self-proclaimed native, as we know a few provocateurs...)
But here, we're facing hundreds of thousands of sentences that not even a fake native would adopt over a period of 7 years !!!
It's just a waste of time and of talents who are completely deterred away from the service.

{{vm.hiddenReplies[15735] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed February 26, 2013 February 26, 2013 at 10:11:19 AM UTC link Permalink

> But here, we're facing hundreds of thousands of
> sentences that not even a fake native would adopt over a
> period of 7 years !!!

Yes, this is real problem. Every now and then I bump into sentences (I'm talking about sentences in my native language) that do have their owners but still don't worth the bytes on the hard-drive they stored on. Many of the members who wrote them are inactive for a long time and some others just tend to furiously defend their precious children no matter how ugly they are. Of course, the scale of this problem is far less than that of the one we have with Japanese and English parts of Tatoeba, but it grows as the time flows, I'm afraid. The potential number of members who produce "bad" sentences is always bigger than the number of members ready to spend their time correcting others' mistakes.
I don't really think that deletion of orphan sentences will improve the situation in long-term outlook. I'd rather said, we need structural and conceptual changes in Tatoeba engine and ideology. Splitting Tatoeba in two sections, trusted and "dirty", seems to be effective enough, though I don't know if this idea will be implemented any soon. Another, much simpler solution, is to implement a mechanism to filter (hide) unadopted sentences and sentences tagged with certain tags so people who seek for quality materials will have an instrument to sort the wheat from the chaff.

{{vm.hiddenReplies[15736] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 26, 2013 February 26, 2013 at 10:52:30 AM UTC link Permalink

>Another, much simpler solution, is to implement a mechanism to filter (hide) unadopted sentences and sentences tagged with certain tags so people who seek for quality materials will have an instrument to sort the wheat from the chaff.

I beg to differ, because the problem affects new contributors, who are subsequently kept away from the service! By definition, newcomers don't know how to activate filtering options.
When one surfs a new website for the 1st time, one tends to judge the site quality in a matter of minutes.
Statistically, Japanese natives have a 90% chance of being deterred in 2 ways :
1) everything they find about Tatoeba on search engines is in English
2) even if they understand what it is about in English, their chance of falling on laughable sentences in their language is so high...

This has been too long like this and Tatoeba has missed too many opportunities to acquire new talented native Japanese. In the 3 years I've been here, I've seen only 2 who are still contributing now.
For a service with a Japanese name, it's a real shame...
Let's change this NOW and erase these deterring unadopted sentences. They can always be restored the day we know where to store them...

{{vm.hiddenReplies[15737] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed February 26, 2013 February 26, 2013 at 12:29:18 PM UTC link Permalink

> By definition, newcomers don't know how to activate filtering options.

Quite true, so unadopted sentences could be hidden by default, I guess. Anyone interested in correcting them could then make them visible by changing personal settings in his profile. The problem is that there's not such thing as "personal settings", though... Well, maybe just hiding unadopted sentences from regular members and non-members (read: search engines) will be efficient enough for the time being.

tommy_san tommy_san February 25, 2013 February 25, 2013 at 4:31:59 PM UTC link Permalink

Let me narrow down the point: how to deal with "orphan" unnatural sentences.

There are 124,005 orphan sentences in Japanese now. In my rough estimation, about 20% of them are worth adopting, while the others are somewhat unnatural or unattractive: many of them are written in the style of bad translators. This means that about the half of Japanese sentences in Tatoeba are unsatisfactory from the point of view of a native speaker. Can you imagine how funny this site looks to us? I showed this site to some of my friends and all of them laughed a lot to see weird Japanese sentences here and there. I'm pretty sure that the Japanese sentences have the worst quality of all the languages in Tatoeba. This is why I was trying to modify the bad sentences rather than to leave it alone and add a new one.

Since there are so many of them, Dima's suggestion of commenting on each of them simply sounds unrealistic. We could do that, but then it'll take hundreds of years to settle all of them. And now I'm beginning to doubt whether we need to take care of those orphan sentences in the first place. Having corrected a few sentences, I am much convinced that it's far easier to make a good sentence by myself than to try to make a bad one better. What do we have all those for then?

I'm not insisting on deleting orphan sentences, though. We Japanese contributors can easily increase the number of reliable sentences by adopting good sentences that have remained unowned. What I really wish is to have them invisible to most of the users unless they want to see them. I'm pretty sure that almost all the users don't want to see many sentences that they don't know if they can trust or not. If this is done, then we can add new sentences without being concerned about the already existing orphan ones, and Tatoeba would look much more sensible, and it would be a much better place especially for those who are interested in Japanese and English.

{{vm.hiddenReplies[15718] ? 'expand_more' : 'expand_less'}} hide replies show replies
marcelostockle marcelostockle February 25, 2013 February 25, 2013 at 6:00:18 PM UTC link Permalink

Wow.
Certainly, this isn't a discussion we've had before (I think).
Now you're it's about time we have it.

Meanwhile, I just want say that I don't think it's worth having 200000 sentences or a million sentences if they don't faithfully reflect the language in question, or if only about 20% does.

Anyway, we should also consult sysko: maybe the new tatoeba can prove itself helpful in dealing with this. (tatoeba.org/jpn/wall/show_message/15345#message_15345)

sharptoothed sharptoothed February 25, 2013 February 25, 2013 at 6:25:56 PM UTC link Permalink

> There are 124,005 orphan sentences in Japanese now. In my
> rough estimation, about 20% of them are worth adopting,
> while the others are somewhat unnatural or unattractive

Oh, my! Looks pretty impressive. When I was writing my considerations I couldn't even imagine how deep the abyss is. In such situation, your proposition of hiding orphan sentences looks quite reasonable, but, in effect, it raises the question of splitting Tatoeba database in two separate sections: trusted and working. This idea was proposed by alexmarcelo some time ago (http://tatoeba.org/rus/wall/sho...message_15321) and it definitely worth implementing and I hope it will be done eventually. But until that we have to try to make the best of what we have now.
So, what can be done? I think that:
- unnatural sentences could be tagged "unnatural". It's better than to just having them left unadopted since it indicates that a native speaker had a look at them.
- if there are typical mistakes and it's possible to classify them, there will be sufficient to describe them once and then just edit unnatural sentences leaving a short reference to the precedent in the comments (don't know if this idea is any vital, though)
- minor mistakes and typos could be corrected without notice;
- good sentences should be adopted.
And, last but not least, if you feel that another, more natural, elegant, etc. version of the sentence could be added - don't think twice, do add it. :-)

sacredceltic sacredceltic February 25, 2013 February 25, 2013 at 7:35:36 PM UTC link Permalink

I said it and repeated it from the beginning. It's the same in English.
Unadopted sentences are not owned because they're suspect. They should be deleted. Otherwise, they discredit the service.

pierre_m pierre_m February 26, 2013 February 26, 2013 at 7:54:38 AM UTC link Permalink

As a new user on here, this topic has me concerned. The only reason I found this website was because I was trying to find good resources for learning Japanese. I was planning to use the sentences on here to learn how to properly use Japanese words.

I'll be honest, my first reaction on reading this was to keep my distance. I know I will have a hard enough time trying to learn Japanese when I am learning from a good source.

I looked up sentences that have the tag Unnatural, just to get an idea of what sentences have that tag. Currently there are only 55 sentences that have that tag, which makes me think it's being under utilized.

The first sentence under that tag is this one in English. http://tatoeba.org/eng/sentences/show/463455 Looking at this sentence first makes me wonder if it was added as part of the Tanaka Corpus as well, especially since its only direct translation is into Japanese and both of those sentences are orphans. This sentence is also tagged OK.

My thoughts on this are that there should be a little bit more intelligence to the code behind tags. In my mind, OK and Unnatural are mutually exclusive, and when both tags are attempted to be added to a word, the system should prevent it. The offending tag should be deleted first.

Also, tags that imply the sentence is less than perfect (unnatural, archaic, non-standard, ...) should be a bit more obvious. Give sentences tagged with it a yellow or red background. Make those tags red in color instead of yellow. Make those sentences weighted so that they always appear at the end of any list. Sentences that are orphaned should probably get similar treatment. Perhaps a yellow background for orphaned, and a red one for the assigned tags.

I know that the above ideas require a change to the code, and cannot be managed by policy only.

My gut feeling though is if the sentences feel as bad as the one I linked to, and they aren't either deleted or obviously marked, people like me won't get what we are looking for. And we probably won't know it until after the damage is done. I'm just happy I saw this topic.

archer_root archer_root February 26, 2013 February 26, 2013 at 4:15:27 AM UTC link Permalink

sharptoothed wrote:
Sometimes even native speakers have polar points of view on the same sentence.
--

Pfirsichbaeumchen wrote:
…sentences that one native thinks are just fine while the other doesn't, should not be changed even if the other gets the chance to do it. Instead, he should add variant sentences that he is comfortable with, and everyone is satisfied. That is obviously not the case here.
--

After using Tatoeba Project corpus for three years, I've joined as a contributor. What initially drew my interest to Tatoeba? The reliably vast number of blatant errors. And also the comments I would read about how awful the example sentences were, especially the romaji. While I didn't agree with the tone of these comments, I did acknowledge that these errors are substantial to new learners who are using apps or websites that relate to Tatoeba Project. I kept the notion of joining the project in mind and waited for a goal / project ***for myself as a learner*** to surface. It has surfaced and now I'm here initially using Japanese proverbs to see what sort of errors the MeCab analyzer makes. I've read your comments that the furigana engine will be improved. I'm not the sort of person to suspend my inquiry while I await results from the unknown.

I essentially agree with what tommy_san has written. Here's my paraphrasing:
i) the errors are staggeringly numerous
ii) the errors are embarrassingly laughable
iii) orphaned sentences should be demoted (made invisible / less accessible / more concave)

I'm new here and I'm slowly reading through articles and conversations. Thank you to everyone who has drawn my attention to pertinent topics.

Is there a system for weighting orphaned sentences? If after an active adoption cycle, orphaned sentences could be demoted into invisibility, then contributors of all levels could gradually adopt and improve some of these orphaned sentences, while other sentences drift closer over time to total deletion. That's how language works in our own world, no?

{{vm.hiddenReplies[15729] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 26, 2013 February 26, 2013 at 7:11:21 AM UTC link Permalink

These laughable translations have been here since the beginning, ie 7 years. And the reason is that a couple of ayatollahs have sacralised the Tanaka corpus as if if it was the Golden calf, although it was the mere work of students...
It's high time adopted sentences are deleted. Good sentences will be created again anyway.

{{vm.hiddenReplies[15730] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pfirsichbaeumchen Pfirsichbaeumchen February 26, 2013 February 26, 2013 at 7:19:54 AM UTC link Permalink

Do you mean 'unadopted sentences'?

{{vm.hiddenReplies[15731] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 26, 2013 February 26, 2013 at 7:25:36 AM UTC link Permalink

Yes...

Pfirsichbaeumchen Pfirsichbaeumchen February 25, 2013 February 25, 2013 at 9:12:41 PM UTC link Permalink

I think there is no point in keeping in the corpus until the end of time sentences that are technically, i.e. in some grammatical sense, correct, but yet awkward and unnatural and no good example sentences at all. By 'unnatural' I do not mean 'non-colloquial', 'literary' or 'old-fashioned' (I do have an appreciation for such sentences), but 'not breathing the spirit of the language in any way'.

Judging what is natural and what is not, what is acceptable and what is not, is not always easy, but you seem very competent, and I would trust you with it. Sentences laughed at by native speakers certainly need correction. If you see to that, you have my support.

I think what Trang was more concerned about when she set up those rules, was that sentences that one native thinks are just fine while the other doesn't, should not be changed even if the other gets the chance to do it. Instead, he should add variant sentences that he is comfortable with, and everyone is satisfied. That is obviously not the case here.

Undoubtedly, grammar and spelling mistakes are the easiest to correct, but correct grammar alone does not make a correct sentence. If several natives agree that a sentence is unnatural, and none disagrees, they should have the right to change it. If you are not certain or need confirmation, you could ask at least one other Japanese contributor, and then, if he or she consents, 'do it', as it were.

To really be on the safe side, you could suggest the change in a comment, tag the sentence with '@change' and then after two weeks, if no one has disagreed by then, change it in good conscience.

Please ask Sysko to make you an advanced contributor. ☺

The idea Alex Marcelo had with that 'playground', if I got him right, was not to put there all unowned sentences that no native feels comfortable to adopt, but to give non-natives a place where they can contribute their own sentences that would then be transferred to the 'trusted area' of Tatoeba after they have been checked, but that may very well never be implemented, although it's a good idea.

{{vm.hiddenReplies[15725] ? 'expand_more' : 'expand_less'}} hide replies show replies
sharptoothed sharptoothed February 25, 2013 February 25, 2013 at 9:24:50 PM UTC link Permalink

> Please ask Sysko to make you an advanced contributor. ☺

This is absolutely necessary, I have to say. :-) Additionally, we need an active native Japanese corpus maintainer badly, too. :-)

marcelostockle marcelostockle February 25, 2013 February 25, 2013 at 7:54:06 AM UTC link Permalink

I'm glad to see you are doing that.
In my opinion, as a native, you have all the right, and you should be encouraged to do it, specially since they have no owner (apparently).
But whenever I've corrected a few on my own, I've been confronted with who-knows-what self imposed rules, status quo, immaculate sentences conception or whatever, finally leading to "you don't correct sentences!".

But let's just wait to see what others have to say about it.

In any language you can find sentences which were written by non-native users or in a haste and maybe they aren't wrong, but they are not examples of a proper use of language nonetheless. In my opinion, these have to undergo correction eventually, by all means.

liori liori February 25, 2013 February 25, 2013 at 4:21:45 PM UTC link Permalink

Whatever will be decided, you can always at least leave a comment saying that you feel sentence is unnatural. This, while currently not very visible, will already add precious information to the corpus.

Pfirsichbaeumchen Pfirsichbaeumchen February 25, 2013 February 25, 2013 at 9:21:51 PM UTC link Permalink

By the way, I think that all unowned sentences should be considered non-native sentences and thus be corrected by a native and moved to the 'trusted area' of Tatoeba by putting their names on them upon correction.

CK CK 16 days ago, edited 16 days ago July 22, 2022 at 4:25:20 AM UTC, edited July 22, 2022 at 5:25:05 AM UTC link Permalink

🍎 Dashboard of Useful Links for Translators

I moved this to an https URL for those who were uncomfortable using this from the previous http URL.

Here are a few examples on how to use it.

German to English
https://domore.web.fc2.com/dash/?f=deu&t=eng

French to Kabyle
https://domore.web.fc2.com/dash/?f=fra&t=kab

Just add the 3-letter codes.
f = the "from" language; t = the "to" language


Optionally, you can also add &u=YOUR_USERNAME for links related to your username.

For example, English to Japanese, set for "KK_kaku_" would look like this.

https://domore.web.fc2.com/dash...jpn&u=KK_kaku_

../dash/?f=eng&t=jpn&u=KK_kaku_

terrywallwork terrywallwork 17 days ago July 21, 2022 at 7:35:52 AM UTC link Permalink

Weblio started including links to this Japanese/English Subtitle Corpus of sentences by Stanford. Some sort of open data set. Looks interesting:

Stanford Japanese\English Subtitle Corpus

https://nlp.stanford.edu/projects/jesc/

Looks like it has been around for a while so if every one already knows of it apologies.

{{vm.hiddenReplies[38928] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 17 days ago, edited 17 days ago July 21, 2022 at 8:18:19 AM UTC, edited July 21, 2022 at 8:20:32 AM UTC link Permalink

Here are 10 consecutive "pairs" from the "test" file, for those of you who want to quickly get a taste of this without needing to download everything. I just jumped to a random line in the file and grabbed ten items, and then formatted the data so it's a little easier to compare.

i'll never forgive you for that!
ぜったいに ゆるさないんだから!

so, that's it then?
他に話したいことは?

burn a little?
痛みますか? はい

especially for the adults working at mcsweeney's
こんなこととは知らずに契約して

i've got wildling blood in my veins.
俺の血筋には野人の血が流れている

we don't have much time!
時間がない!

but i also am in the fueling business.
ただ私は燃料供給ビジネスもやっている

i could. stay over tonight, if to make you feel any better.
一晩一緒にいてもいいよ それがよければ

i think you're enjoying this.
私はこれを楽しんでいると思います。

and cut!
はい カット

hecko hecko 17 days ago July 21, 2022 at 8:34:31 AM UTC link Permalink

it's weird that they're putting a cc(-by-sa) license on it when it's made up entirely of copyrighted movie subtitles they don't own

granted copyright isn't a problem for most people, but here's something that might be:

> A pair of human evaluators (both native Japanese and pro-ficient English speakers) randomly sampled 1000 phrase pairs. On average, 75% of these pairs were perfectly aligned, 13% partially aligned, and 12% misaligned.

computers might not mind that too much but i don't think many people would like having every 8th card in their anki deck be garbage

CK CK 17 days ago, edited 17 days ago July 21, 2022 at 8:51:34 AM UTC, edited July 21, 2022 at 9:12:23 AM UTC link Permalink

I did a case-insensitive comparison of the English in their "raw" file with last week's exported data from the Tatoeba Corpus.

There were 42,014 case-insensitive matches out of the 2,379,684 unique English lines in the "raw" file,

Note that their "raw" file has a lot of non-sentences such as the following

! country bumpkin!
: no, no, no, no!
,ferris?
,a new species, lived.
, but it also
that no one will ever see?
zut alors! i'm okay!
zoo transfer!
only fitness.

There were also over 20,000 items that had no spaces -- 1 word, etc, like the following

curious,smart,annoying.
fourfourbravothreezero.
howaboutaftermycartoon?
okay,multitasking,boss.
surroundedbyaswarmof...
threeafoursixnineromeo.
wearebothstudentsofwar.
objection.argumentative.
threenewworldrecordsset.

CK CK 21 days ago July 17, 2022 at 4:46:14 AM UTC link Permalink

🍎 Stats - Members That Use the Rating System

http://tatoeba.ueuo.com/stats-r...022-07-16.html

There are 299 usernames on this list.

{{vm.hiddenReplies[38896] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 21 days ago, edited 21 days ago July 17, 2022 at 3:01:48 PM UTC, edited July 17, 2022 at 3:05:07 PM UTC link Permalink

The most people I think use the rating because they find incorrect sentences and mark them. And later when the sentence is corrected they may check upon those and mark them correct. They don't mark each and every of them, because they're written by a non-native.
Big respect for those who marked more than a thousand sentences, that takes long time.

{{vm.hiddenReplies[38906] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 21 days ago July 17, 2022 at 3:45:45 PM UTC link Permalink

Tags are better than ratings for marking incorrect sentences because:

(1) they're searchable via the Tatoeba user interface
(2) corpus maintainers search them regularly
(3) the tags can be deleted when or after the sentences are fixed
(4) they allow you to specify a little more about what's wrong with the sentence, such as incorrect punctuation (though this is often unnecessary since it's obvious)

Ratings serve a different purpose. They're more about telling you about people's opinions about the correctness of a sentence that you're already looking at.They can be used to say that the sentences are OK even if there was some doubt about them (for instance, if they were written by a non-native who is not sure about their correctness). Or they can be used to say that the sentences are questionable, if not necessarily incorrect.

{{vm.hiddenReplies[38908] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 21 days ago, edited 21 days ago July 17, 2022 at 5:07:37 PM UTC, edited July 17, 2022 at 5:09:54 PM UTC link Permalink

@AlanF_US thanks for starting the tags vs. reviews debate. I'll try to make the case for reviews.


Reviews are better than tags for marking incorrect sentences because:

(1) this feature is only dedicated to sentence rating
(2) this feature is available (as opt-in) to all contributors
(3) it's a very quick and easy way to give your opinion on a sentence
(4) you can track the reviews you've given in "My reviews"
(5) reviews are automatically marked as "outdated" when the sentence is changed
(6) we can sum up the ratings given by different users on a sentence
(7) sentences with a majority of "not OK" reviews are now searchable in the "Rated as 'not OK' 🔴" list at https://tatoeba.org/en/sentences_lists/show/170380


Tags serve a different purpose. They're more about classifying sentences by category and can be used to indicate the next step (e.g. @delete).

If we enable the "reviews" feature by default, and if we update the list of flagged sentences every 15 minute (instead of once a week), I think we would have a fairly robust error tracking system.

{{vm.hiddenReplies[38909] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 21 days ago, edited 21 days ago July 17, 2022 at 5:17:45 PM UTC, edited July 17, 2022 at 5:27:16 PM UTC link Permalink

"Available to all contributors" and "quick and easy" are a double-edged sword. I'm seeing a lot of sentences that are fine, so it's not clear why they were marked as "not OK" in the first place, and the rating doesn't contain any information about that. One possibility is that people are hitting the button accidentally, and then perhaps don't know how to undo it. Another is that the ratings are given by people who don't know the language well.

I also frequently encounter sentences I don't like very much, for one reason or another, even if they're grammatically OK. It feels wrong to me to mark them as OK just to take them off the list.

Also, we're depending on you to produce and update this list. But if you don't mind, I don't have any serious objections.

{{vm.hiddenReplies[38910] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 21 days ago July 17, 2022 at 7:12:42 PM UTC link Permalink

> we're depending on you to produce and update this list.

For the moment, I generate the list on my laptop from the weekly exports and I upload it with a script. It's only a prototype.

If corpus maintainers give positive feedback in the next few weeks, I guess we can consider implementing the list generation on the server. It would be under a more official account and could be updated more often. However, I will stop the experiment at some point if there is too little interest.

{{vm.hiddenReplies[38913] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 19 days ago, edited 19 days ago July 19, 2022 at 1:50:48 PM UTC, edited July 19, 2022 at 1:58:29 PM UTC link Permalink

> we're depending on you to produce and update this list.

A better solution would be make it so that we could limit searches based on ratings, the same way we can limit searches to a certain tag or to a certain list. Perhaps it is too difficult to have this data indexed to use with the search engine.

The way things could be automated.

It perhaps would be possible to perform advanced searches like that following.

* must have at least 3 OK ratings, the number could be decided by the person making the query.

* must have at least one OK rating and no Not OK ratings.

* must have more OK ratings than Not OK ratngs.

* must have a Not OK rating. (This would be good for corpus maintainers.)

* must have at least 1 (at least 2, at least 3, etc.) OK ratings by native speakers.

* etc.

{{vm.hiddenReplies[38920] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 19 days ago July 19, 2022 at 3:04:36 PM UTC link Permalink

@CK as you made a similar suggestion 2 years ago, here is a link to a comment by AlanF_US that I still find personally insightful :)
https://github.com/Tatoeba/tato...ment-610377215

{{vm.hiddenReplies[38921] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 19 days ago July 19, 2022 at 3:30:30 PM UTC link Permalink

He reads his own sentences and don't find the mistakes. Sometimes I reread my sentences and I don't find the mistakes, others do.
I don't care about OK marks that much. If I see a comment or two, then I know why they are there.

{{vm.hiddenReplies[38922] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 19 days ago July 19, 2022 at 3:47:18 PM UTC link Permalink

@Cabo we agree, everyone makes mistakes and that's OK. It is mainly the first paragraph that I find interesting. This sentence is a good summary I think:

> But I actually think it's a really bad idea, a further step toward the idea of "Distrust every sentence unless CK marks it as OK."

{{vm.hiddenReplies[38923] ? 'expand_more' : 'expand_less'}} hide replies show replies
Cabo Cabo 19 days ago July 19, 2022 at 4:08:49 PM UTC link Permalink

I don't think we distrust those, or if we do that, it's simply just sad.
But I believe those who get the data from us they can discard those sentences without any reviews. The sentences marked by CK are all have audio files, and I can understand why someone can be interested in just those sentences.

{{vm.hiddenReplies[38924] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 19 days ago, edited 19 days ago July 19, 2022 at 10:37:31 PM UTC, edited July 19, 2022 at 10:59:31 PM UTC link Permalink

Sentences with audio are also, for the most part, sentences that CK chose. If they're in English, he generally either recorded the audio himself or found it via another source. (There are some English sentences recorded by other Tatoeba members, but a relatively tiny number.) And if they're not in English, he generally selected them for the person recording the audio, usually because they were translations of English sentences from a list that he also compiled.

I also understand why someone would be interested in just the sentences with audio. But they also belong, for the most part, to the single-gatekeeper model, so they are not very diverse. In any case, the Tatoeba user interface already allows people to search for sentences with audio. I wouldn't want to take that away. But while it's true that offline developers can filter out all the unrated sentences from downloaded files, I don't want to make the ability to search only for rated sentences part of the Tatoeba user interface because people who used it wouldn't understand what they were missing.

{{vm.hiddenReplies[38925] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 19 days ago July 20, 2022 at 12:15:17 AM UTC link Permalink

I was just suggesting an alternative way to the way lbdx suggested.

=== begin quote

(7) sentences with a majority of "not OK" reviews are now searchable in the "Rated as 'not OK' 🔴" list at https://tatoeba.org/en/sentences_lists/show/170380

[snip] ... if we update the list of flagged sentences every 15 minute (instead of once a week)...

=== end quote

Instead of creating a list and updating a list, and then selecting that list in the advanced search to find such sentences, to have the search engine refer to such ratings, and give people various query options would be better, in my opinion. The search engines indexes are automatically updated quite often.

People using the search engine could ignore this option. There are likely other options that members ignore.

{{vm.hiddenReplies[38926] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir 18 days ago July 20, 2022 at 1:14:23 PM UTC link Permalink

I liked the idea, getting the possibility of not showing the ratings made by non native speakers or by determined users would be quite useful for me because most of the wrongly rated Spanish sentences are by inactive non native speakers and it's always the same people ^^' The rest (that of asking for a determined number of "OK"s or "not OK"s), given that not many people use the rating system and most sentences only have one rating at most, is not so relevant. Another option that would also be useful for me would be to put the name of the user and the rating in the search page

By the way and just for the record, I'm interested in this feature for correcting purposes, I wouldn't search for only rated sentences if I wanted to translate stuff.

TRANG TRANG 22 days ago July 16, 2022 at 10:52:37 AM UTC link Permalink

Dear Tatoebans,

A new chapter of my life is starting and I will have to take some time off from Tatoeba.

I am going to become a mom :)

Because of that, I have been quite unable to do much on Tatoeba in the past few weeks, in great part due to the constant fatigue that comes with the first months of pregnancy. And I expect the few months before and after giving birth to be even more demanding of my time and energy...

While I don't think I will be going completely radio silent, I certainly won't be very present and active on Tatoeba. So before we get to that point, I think now is a good occasion to try and find people to take over my responsibilities in Tatoeba.

This includes:
1) Adding new languages to Tatoeba
2) Managing the translators team in Transifex
3) Merging pull requests in GitHub
4) Deploying new changes on production
5) Investigating and fixing issues of Tatoeba being slow or not accessible
6) Guiding/assisting/mentoring people who are interested in contributing to the development of Tatoeba
7) Establishing new rules and guidelines when conflicts arise within the community

I hope I can do a live stream where I can explain a little bit more all these points. Perhaps in a month, when my fatigue attenuates a little bit (I've heard it gets better in the second trimester of pregnancy).

But in the meantime, if you already know you would like to help with any of that, please let me know! You can either just reply to this thread, send me a private message, or contact me by email (trang@tatoeba.org).

Thank you!

{{vm.hiddenReplies[38883] ? 'expand_more' : 'expand_less'}} hide replies show replies
small_snow small_snow 22 days ago July 16, 2022 at 12:56:36 PM UTC link Permalink

Congratulations! 🍀

lbdx lbdx 22 days ago July 16, 2022 at 1:26:56 PM UTC link Permalink

I hope you enjoy this next phase of your life. Congratulations on the big news!

sabretou sabretou 22 days ago July 16, 2022 at 5:06:53 PM UTC link Permalink

Congratulations!

Shishir Shishir 22 days ago July 16, 2022 at 7:48:22 PM UTC link Permalink

Congratulations :)

LanguageExpert LanguageExpert 22 days ago, edited 22 days ago July 16, 2022 at 7:55:33 PM UTC, edited July 16, 2022 at 7:55:43 PM UTC link Permalink

Congratulations! :)

bunbuku bunbuku 22 days ago July 16, 2022 at 11:08:53 PM UTC link Permalink

Congratulations! I hope you have a happy and healthy pregnancy. :)

Ergulis Ergulis 21 days ago July 17, 2022 at 6:36:26 AM UTC link Permalink

Congratulations.

DJ_Saidez DJ_Saidez 21 days ago July 17, 2022 at 6:38:49 AM UTC link Permalink

Félicitations ! ^^

mraz mraz 21 days ago, edited 21 days ago July 17, 2022 at 8:26:36 AM UTC, edited July 17, 2022 at 8:28:24 AM UTC link Permalink

Jó egészséget kívánok!

#3610179

TRANG TRANG 21 days ago July 17, 2022 at 9:46:31 AM UTC link Permalink

Thank you everyone ❤️

AlanF_US AlanF_US 21 days ago July 17, 2022 at 2:38:35 PM UTC link Permalink

That's wonderful news, Trang!

There are a few items from the list that I can consider taking on, perhaps with some help. Maybe you could start an e-mail thread with the people who are in a position to do some of these. I know most of them, but you have the most complete knowledge.

Cabo Cabo 21 days ago, edited 21 days ago July 17, 2022 at 3:03:02 PM UTC, edited July 17, 2022 at 3:03:14 PM UTC link Permalink

Egészséget, gondoktól és stressztől mentességet kívánok neked és a kicsinek!

ZegPhig ZegPhig 20 days ago July 18, 2022 at 11:45:14 PM UTC link Permalink

Congratulations!

GlossaMatik GlossaMatik 19 days ago July 19, 2022 at 8:54:06 AM UTC link Permalink

Congratulations!
And I also wanna say thanks for creating tatoeba:)

DJ_Saidez DJ_Saidez 21 days ago, edited 21 days ago July 17, 2022 at 6:11:33 AM UTC, edited July 17, 2022 at 6:24:13 AM UTC link Permalink

Want to hear your guys' opinion,

The U.S. is the country with the 2nd largest number of Spanish speakers in the world, behind Mexico.
https://spanishlanguagedomains....s-500-million/

Spanish might not be an official language, but the U.S. federal government has NO official language, although English is certainly a lingua franca.

Several articles speak of a U.S. Spanish dialect, which varies depending on the region, but is generally meant to be neutral to accommodate the different Latinos that immigrate here and intermingle with each other.
Ex.: https://terratranslations.com/w...anish-variant/

The news broadcast of Telemundo (Major Spanish-speaking TV channel in U.S.) is generally considered to be given in U.S. Spanish. Univision (similar) also often broadcasts material that's either U.S. Spanish or a mix of both languages (Spanglish).

Aside from colloquialisms or other regional terms, there's not much that separates it from other dialects. It's still very obviously Spanish.

Finally, Spanish is also the foreign language most studied in the U.S., and many of the students aren't likely to leave the country, so for the language to be most useful and relevant to them, they should learn the kind that's spoken around them. I don't mean speaking it lazy like a gringo, but just imitating the neutral accent already found here.



Do you think it'd be useful to have this dialect represented here, in terms of both sentences and audio? Especially now that we have the capacity to accommodate several audio files per sentence, and can show different ways of saying the same thing?

Since I've now lived here more than half of my life, my accent leans more towards U.S. Spanish than Mexican Spanish now, and have met many others like me where I live. Several of us are still quite proficient, we just don't have the same exact voice as those you might hear from other countries.

I want to add audio in Spanish that represents me, us and this dialect. So can we make this dialect, for Tatoeba's intents and purposes, valid here too?

{{vm.hiddenReplies[38897] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 21 days ago July 17, 2022 at 6:57:24 AM UTC link Permalink

As far as I know, sentences written and spoken in dialect are completely fine.
For written dialectical sentences, adding a relevant tag or a comment is polite so that others know what is going on with the sentence.

{{vm.hiddenReplies[38900] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 21 days ago July 17, 2022 at 7:07:36 AM UTC link Permalink

And I'm hoping that soon we can add descriptions/tags to audio files as well, to further educate about the type of audio it is.
https://github.com/Tatoeba/tatoeba2/issues/2958

{{vm.hiddenReplies[38901] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 21 days ago July 17, 2022 at 2:29:32 PM UTC link Permalink

Can you please give us some sample sentence pairs written in U.S. Spanish and Mexican Spanish? It would also be nice if you could share some audio pairs with one of our native Spanish speakers.

{{vm.hiddenReplies[38904] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 19 days ago July 19, 2022 at 6:19:58 AM UTC link Permalink

Give me a little bit of time to gather it :) I'm busy with end-of-semester stuff

20 days ago July 18, 2022 at 9:04:36 AM UTC link Permalink
warning

The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.

lbdx lbdx 24 days ago, edited 24 days ago July 14, 2022 at 10:34:38 AM UTC, edited July 14, 2022 at 7:12:32 PM UTC link Permalink

Rated as 'not OK' 🔴

I just launched a new public list that brings together all the sentences that have more 'not OK' reviews than 'OK' reviews: https://tatoeba.org/en/sentences_lists/show/170380

If some of you find this list helpful to correct mistakes in the corpus, I'll try to update it every Saturday.

Note that reviews that are outdated or made by the owner of the sentence are not taken into account. The most recently reviewed sentences appear at the top of the list.

{{vm.hiddenReplies[38876] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir 24 days ago, edited 24 days ago July 14, 2022 at 8:05:32 PM UTC, edited July 14, 2022 at 8:12:49 PM UTC link Permalink

I just checked a few samples of what kind of sentences landed there, and there are three kinds: the sentences that are wrong, the sentences that are correct but that don't match (completely) their translation and the sentences that are correct but have been misjudged by some non native speaker. So my question is:
Would it be possible to manually remove sentences from the list because, despite having been rated as "not OK", they are actually OK?

By the way, how did this sentence land in the list if the outdated ratings are not taken into account? https://tatoeba.org/es/sentences/show/853351

{{vm.hiddenReplies[38879] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 24 days ago July 14, 2022 at 8:18:31 PM UTC link Permalink

Thanks for the feedback!

To remove a sentence from the list without modifying it (and prevent it from being put back in the next update), someone who is not the owner must add an "OK" (+1) review to compensate for each "not OK" (-1) review.


> how did this sentence land in the list if the outdated ratings are not taken into account?

Maybe it's due to Horus, I'll have a look.

lbdx lbdx 23 days ago July 15, 2022 at 2:52:41 PM UTC link Permalink

It seems that sentences inherit "not OK" reviews from the duplicates that Horus merges with them. To compensate, I now count an extra "OK" review for each merged sentence. 60 sentences have been removed from the list as a result of this change.

{{vm.hiddenReplies[38881] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir 21 days ago July 17, 2022 at 6:22:55 PM UTC link Permalink

I just found this sentence

https://tatoeba.org/es/sentences/show/1281578

It has an OK and a "not OK" rating, I thought then it wouldn't appear in any list, but it's in the "not sure" list... Is it supposed to work like that? The sentence itself is fine, it was rated as not ok because of a wrong translation that was already unlinked.

{{vm.hiddenReplies[38911] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 21 days ago July 17, 2022 at 6:56:23 PM UTC link Permalink

Probably because the OK and not OK cancel each other out to a "maybe" based on the algorithm

lbdx lbdx 21 days ago July 17, 2022 at 7:14:07 PM UTC link Permalink

That's exactly it. But maybe it would be less confusing to take into account only the latest review?

{{vm.hiddenReplies[38914] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir 21 days ago, edited 21 days ago July 17, 2022 at 7:41:04 PM UTC, edited July 17, 2022 at 7:41:24 PM UTC link Permalink

Oh well that's not how I understood what you told me here:

>To remove a sentence from the list without modifying it (and prevent it from being put back in the next update), someone who is not the owner must add an "OK" (+1) review to compensate for each "not OK" (-1) review.

I thought it was supposed not to appear in any list. Otherwise the sentence is still to be taken with caution or it appears as OK with as many OK as "not OK" ratings, which would be misleading.

lbdx lbdx 22 days ago July 16, 2022 at 5:19:49 AM UTC link Permalink

> Would it be possible to manually remove sentences from the list

I switched the list to collaborative mode. Please don't forget to review the sentences as "OK" before removing them so that they don't come back automatically at the next update.

Selena777 Selena777 22 days ago July 16, 2022 at 11:24:17 AM UTC link Permalink

How can I select a language before reviewing the list? There are many languages there.

{{vm.hiddenReplies[38885] ? 'expand_more' : 'expand_less'}} hide replies show replies
hecko hecko 22 days ago, edited 22 days ago July 16, 2022 at 11:28:27 AM UTC, edited July 16, 2022 at 11:28:46 AM UTC link Permalink

open `Advanced search` at the top of the website, select the list (you can find it quickly by opening the dropdown and starting to type `Rated [...]`), set your language and whatever else, then search

{{vm.hiddenReplies[38886] ? 'expand_more' : 'expand_less'}} hide replies show replies
Selena777 Selena777 22 days ago July 16, 2022 at 5:21:59 PM UTC link Permalink

Thanks!

cojiluc cojiluc July 3, 2022 July 3, 2022 at 6:36:13 PM UTC link Permalink

Some months ago, I asked about the possibility to do an advanced search using partial name of tags:
https://tatoeba.org/en/wall/sho...#message_37786

My main motivation was the ability to perform an advanced search within quotes, as almost all of them start with "by".

As Ricardo had mentioned in that post, Guybrush has opened a ticket on Github
https://github.com/Tatoeba/tatoeba2/issues/2866

But this feature needs perhaps a lot of time to be implemented. On the other hand, just for quotes, there is a simple workaround:

to add the tag "quote" to all tags which starts with "by ".
Because the tags which starts with "by " are quotes anyway.

In this way, by doing an advanced search using the tag "quote", all quotations can be searched.

Can this workaround be realized? (perhaps by an admin with necessary privileges).

{{vm.hiddenReplies[38846] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 30 days ago July 8, 2022 at 5:06:12 PM UTC link Permalink

> add the tag "quote" to all tags which starts with "by ".
> Because the tags which starts with "by " are quotes anyway.

Done. (Well, except for "by " tags with the name of a Tatoeba contributor, because I didn't feel that tagging those sentences with "quote" would be appropriate.)

There are now 47209 sentences tagged "quote": https://tatoeba.org/en/tags/sho...s_with_tag/211

{{vm.hiddenReplies[38857] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 30 days ago July 8, 2022 at 9:01:31 PM UTC link Permalink

You'll notice that a lot of these are not what most people would traditionally consider quotes.

For example, look of the sentences owned by Hybrid that are now tagged "quote."

https://tatoeba.org/en/sentence...no&user=Hybrid

{{vm.hiddenReplies[38858] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 29 days ago July 9, 2022 at 9:42:21 AM UTC link Permalink

Some of those are quotes from articles by VOA, the NOAA or other organizations, which I could easily filter out based on the tag, but others are by a collection of various authors where that would be more difficult.

If the "quote" tag should be reserved for pithy quotes, I could re-tag the other sentences as "quoted from external source" or something.

{{vm.hiddenReplies[38860] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 28 days ago, edited 28 days ago July 10, 2022 at 10:53:39 PM UTC, edited July 10, 2022 at 10:58:16 PM UTC link Permalink

I think not adding the "quote" tag to all of these would have been a better idea. I wonder if there is any way to undo this, assuming TRANG and other members think it would be a good idea to do so.

We used to be able to browse sentences tagged "quote" and be able to see things we'd more likely think of as quotes.

You can download all the English sentences that are currently tagged "quote" to see how many there are that may be inappropriately tagged.

currently-tagged-as-quote-english.zip
https://gofile.io/d/B9e2GT

I sorted the sentences from longest to shortest.

You could browse all of these using the tatoeba.org interface, but it would be a lot slower.

cojiluc cojiluc 27 days ago, edited 27 days ago July 11, 2022 at 5:18:11 PM UTC, edited July 11, 2022 at 5:22:59 PM UTC link Permalink

CK, there are considerably few false-positives (the sentences which start with "by" that are not quotes). I fully support what Yorwoba has done and thank their efforts. Believe me, the ability to search within quote-like sentences has many advantages. Several times it has been asked by many users what are the most interesting sentences to translate, I believe the majority of quote-like sentences are very interesting to translate. Of course, one can remove the tag "quote" if a sentence is not really a one. Given the fact that the ability to do an advanced search with partial name of tag has not yet been implemented, I think what Yoworba has done is very helpful. Let keep it.

{{vm.hiddenReplies[38874] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 24 days ago July 14, 2022 at 7:54:37 PM UTC link Permalink

Nonetheless, I've untagged 4531 sentences where the author mentioned in the "by" tag wasn't an identifiable person, but an organization. Hopefully that reduces the share of false positives somewhat.

{{vm.hiddenReplies[38877] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 24 days ago July 14, 2022 at 8:03:24 PM UTC link Permalink

I suspect that if you also removed a number of the very long items that it would reduce the false positives.

Just start browsing, by showing the longest items first and I think you'll see what I mean.

https://tatoeba.org/en/sentence...roved=no&user=

I don't think most people think of these types of things when they want to study quotes.

CK CK 27 days ago, edited 14 days ago July 11, 2022 at 1:47:20 PM UTC, edited July 24, 2022 at 11:51:51 AM UTC link Permalink

[Deleted, since lbdx didn't like the post and seemed to think it wasn't an appropriate post for the Wall.]

I moved this to the the bottom of my profile for now, for those who may be interested

Go to https://tatoeba.org/en/user/profile/CK

{{vm.hiddenReplies[38871] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 27 days ago July 11, 2022 at 2:00:52 PM UTC link Permalink

@CK

Once again, you highlight the sentences you selected for your audio recordings. Please leave some space for other English contributors once in a while 😀

{{vm.hiddenReplies[38872] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 27 days ago July 11, 2022 at 4:56:45 PM UTC link Permalink

I think he selected sentences with audio recordings, not just his own audio recordings? The fact that most of them belong to CK is kind of irrelevant, other English contributors can contribute by adding more audio, which we all would appreciate.

Personally I have always preferred working with sentences with audio recordings because they're more reliable. If they were voiced by a native speaker, the probability of a mistake is very low (although sometimes the recording doesn't match the sentence, but it happens very rarely, and I listen to a ton of sentences every week).

{{vm.hiddenReplies[38873] ? 'expand_more' : 'expand_less'}} hide replies show replies
lbdx lbdx 27 days ago July 11, 2022 at 5:52:06 PM UTC link Permalink

I recently shared a list that allows users to browse more dependable sentences without using the biased "audio" and "native speaker" filters. In the first link he just shared, CK started from this list and only kept the sentences with audios. This had the effect of eliminating most of the sentences that were not his own. I don't think this is fair play.

I thank CK for all the work he does with the audios, and he is free to add the audios he wants. But you should be aware that he covers his sentences much more than those of the other important native English contributors (85% of his own sentences vs. maximum 30% of the others). This amplifies the imbalance of the English corpus towards simple sentences with little variety.