Menu
I've seen people who change the names that appear in correct sentences in order to "standarize" them and use "Tom" or "Mary". I want to publicly complain about that. There are many languages in which it doesn't matter whether we're talking about Tom or Peter, the rest of the sentence remains the same, but there are others in which, depending on that, something might change, like in Turkish, where you say "Tom'UN bir kitabı var" (Tom has a book) but "Peter'İN bir kitabı var" ( Peter has a book), so that of having different names with different endings can be really useful.
You are probably talking about what CK and I did today. It happened during the process of trying to refine the sentences I added in my earliest days in Tatoeba. I never mean to go so far as to standardize all the proper nouns. I'm sure I won't do it again. I'm sorry for causing some disorder in the corpus.
And thank you for the tip on Turkish. I should have borne in mind that what seems to be a mere "near duplicate" in one language might not be so in another. It may be a good idea to gather this kind of information somewhere so that we could prevent such over-standardization to happen again.
Actually it all started because of this:
http://tatoeba.org/spa/sentence...25319#comments
(Had it just been because of you, I'd have posted a comment on your sentence instead of writing a wall post :) But I'm glad I rose awareness of why this idea of changing names shouldn't be done ^^)
There's no mass standardization effort going on. I did this for one unowned sentence, whose original was:
"According to sysko, Trang is just a predatory guru who promises gain."
which is only mildly funny, at best, for people who know that sysko and Trang are administrators of the site who have worked together closely and respect each other very much. I figured that people who didn't know the in-joke would take the statement at face value, and I thought it was a bad idea to set a precedent for people to mention Tatoeba members by name, especially in a negative light. I interpreted the little smiley in the comment "why did you change the names? :S" as a sign that the reason I changed the names was clear. Perhaps it wasn't, so perhaps the smiley meant something completely different. In any case, I unlinked the translations and left comments on them to indicate the English sentence to which they were originally linked so that they could be changed and relinked.
In any case, this was an unowned sentence, and the site encourages us to adopt and modify unowned sentences that we feel need to be changed. But I've come to the conclusion that, in reality, modifying an unowned sentence comes with so much invisible baggage that doing so seems generally not to be worthwhile, even if the sentence is problematic.
The rules, as far as I know, say not to modify sentences if they're correct, and changing just the names that appear on it looked quite strange to me, that's why I wrote that ":S", it meant just confusion.
> I thought it was a bad idea to set a precedent for people to mention Tatoeba members by name
http://tatoeba.org/spa/sentence...rom=und&to=und
That didn't set any precedent since it hadn't been translated, Scott changed both sentences to turn them into sentences, but no-one else translated them; but the "precedent" had been set earlier. And I think we should ask Trang whether she finds it offensive or not, if she does, I'd still wish the name used wasn't Mary, but I won't ask any further question about this subject.
By the way, I'm curious. what is the "invisible baggage" you're talking about?
Ah, I guess I misread the emoticon. Good thing "Emoticon-ish" is not one of the Tatoeba languages I translate! :)
By "invisible baggage", I mean that if I adopt a sentence, I am bound to displease someone, no matter what I do. If I leave it as is, someone is going to ask me to change it, since my adopting it is a "vote of confidence" for what was probably a problematic sentence written by a non-native speaker. If I change it, someone is going to ask me even more insistently why I touched it in the first place, since in their opinion it wasn't wrong. Or they'll ask me to change it in a different way.
If the sentence has the misfortune of having translations associated with it, the problems multiply. If I change the English, but I feel that the other translations are valid and don't need to be touched, I'll get a complaint. If I change the English and take the time-consuming, cautious approach of unlinking the translations and adding comments to them pointing to the sentence to which they were originally linked and indicating that someone may want to modify and relink them, I'll offend someone else.
Often the feedback takes the form of "Well, couldn't you ask someone else for an opinion before you unlink/modify/do nothing/implement what I want you to do?" That seems reasonable on the face of it, but in reality, it means waiting for an unknown period of time only to get an answer that may be inconclusive or conflict with someone else's. Multiply that wait by multiple sentences and the whole process grinds to a halt. Imagine how long it would take me to get a response from Trang and sysko about whether they felt insulted by a sentence, considering that it took half a week just to get a response to the far more urgent message: "Hey, the website is so slow that it's unusable -- please fix it!"
Several people have told me that "the rules" say that I shouldn't modify sentences, even ones that I've adopted, unless they're "wrong" (where "wrong" is left undefined). I'm curious which of them have looked up what's written in the site documentation. Here is what the Help page ( http://tatoeba.org/eng/help ) actually says, in its totality:
"When you add a sentence, this sentence 'belongs' to you - only you can edit it. However, most of the sentences in Tatoeba come from a Japanese-English corpus called Tanaka Corpus. These sentences do not have any owner because they have been collected outside of Tatoeba.
If you see a mistake in an 'orphan' sentence, you will not be able correct it because you are not the owner. This is why there is an 'adopt' option (). Once you adopt a sentence, you become its owner and therefore can edit it.
Adopting a sentence is also part the 'quality process'. You can find more information about it here: http://blog.tatoeba.org/2009/01...n-system.html"
Once again: "You become its owner and therefore can edit it." Period. Naturally, this is not an invitation to change a sentence arbitrarily or maliciously, but it does suggest that the admins trust the judgment of whichever contributors adopt the sentences, in the same way they trust them in contributing their own sentences.
I can see how a cautious approach to changing existing sentences could be suitable for a corpus where the sentences and translations were known to be of high quality when they went in, but it's a bad match for ours.
All the same, although I feel justified in adopting and correcting unowned sentences according to my judgment as a native speaker, I'm not going to do it anymore because the costs are too steep. I'm going to focus again on translation. (I'm reminded of the lyrics from the song "Garden Party" by the singer Rick Nelson: "But it's all right now/I've learned my lesson well/You see you can't please ev'ryone so/You got to please yourself.")
>Several people have told me that "the rules" say that I shouldn't modify sentences, even ones that I've adopted, unless they're "wrong" (where "wrong" is left undefined). I'm curious which of them have looked up what's written in the site documentation.
Well, curiously, although you are a "corpus maintainer", whatever that means nowadays, you ignore the fundamental rules. So here they are http://en.wiki.tatoeba.org/arti...ow/quick-start
Rule number 2 unequivocally states that you shouldn't change sentences that are correct.
The fact that you're the owner of the sentence is irrelevant. Anyway, as you seem to ignore, you cannot change a sentence unless you are its owner. And when you do, as a CM enables you to, you must absolutely respect the author and warn them.
So you're being very disingenuous when you write : "If I change the English and take the time-consuming, cautious approach of unlinking the translations and adding comments to them" because you change sentences all the time without warning their authors, their owners or their commenters .
Here is an unfortunate example where 3 different people, including you, have altered an initially correct sentence in 3 different ways, correct and incorrect, while never taking the pain to warn the others or anybody else : http://tatoeba.org/fre/sentence...36357#comments
The reason you behave like this is that you believe that being a CM and an English native gives you every right on the English language on Tatoeba.
But you're just native of your little bubble somewhere in the vast English language realm that you are just ignorant of.
This behaviour of yours is one more proof, if it was necessary, that you don't understand Tatoeba, don't respect the Corpus and its contributors and that you subsequently shouldn't be a CM.
If we were to let you do what you wish, you would probably reduce the diversity of English on Tatoeba to the 5% that is spoken around your village and which is the only one you know...
You're actually noxious to the Corpus.
I personally think that the problems are,
a) corectness is not boolean, there is a continuum ranging from ‘absolutely correct’ to ‘absolutely incorrect’, and most sentences lie in-between, but in Tatoeba sentence should be either ‘corrent’ (i.e. present in Tatoeba) or ‘incorrect’ (and edited);
b) the correctness varies greatly in different subsets of a language, and different people have different ideas about what subset of a language should be present in Tatoeba.
Exactly. And when you add non-native speakers to the mix, you end up with even more opinions.
I see your point, but this is the way I'd act in that situation. If the sentence is linked to a Japanese translation, I'd ask a native for their meaning (believe it or not, it rarely passes more than a few hours -one day at most- before you get an answer about what it means or whether it matches the English sentence ... or you can send a private message to a native speaker who's active). If the sentence doesn't sound natural, I would not adopt it nor tag it as OK, because, as people told you, it's a vote of confidence in something you wouldn't actually say; and I would make sure the sentence doesn't match BEFORE unlinking. That's how I act when adopting sentences written in Spanish and I've never faced any trouble or complaint, and if I know / think they're said in some other Spanish speaking country but not mine, either I leave them alone or I ask someone from that country to have a look at it.
Why do I insist on this point? Because I think it's fundamental for Tatoeba to get a trustworthy English corpus, too many people want to learn it and finding a corpus in which 1 out of 10 sentences might not be trustworthy makes the corpus quite useless.
First, I'd like to thank you for bringing this topic to our attention. We do have declensions for names in Latin as well, and sometimes it's important to add several possibilities for each case; however, I agree that some kind of standardization would help us avoid near duplicates. I believe this could be solved by the use of metadata, which is a very interesting approach.
You're not wrong when you say AlanF_US could have been more careful, though. Just because we adopted a sentence, it doesn't mean we can do whatever we want with it, especially when they are linked to other sentences.
There seems to be a contradiction in our documentation about what should be done with abandoned sentences, so I think TRANG is the right person to add a period to this thread. We don't want to cause our users any extra work, so it would be good to know exactly what to do with these entries.
> There seems to be a contradiction in our documentation about what should be done with abandoned sentences, so I think TRANG is the right person to add a period to this thread.
It would be also nice if she'd give us a comment about what we discussed here.
http://tatoeba.org/jpn/sentences/show/2592643
I welcome any clarification of policies. But if you are attributing a lack of caution to me, a desire to "do whatever I want" with sentences, then (ironically enough) I think you ought to reread my note much more carefully. In fact, it was an abundance of caution that led me to change the names in that sentence (and write a comment to clarify that I had done so) in an attempt to promote a civil atmosphere. The irony is, of course, that what erupted was, in part, the exact opposite. Actually, that's just the first of many ironies in this discussion, but I'll save that for a separate comment.
Okay, count the ironies in this situation:
I decide to stop adopting and correcting sentences. It's at just that point that I find one of those sentences the subject of a thread on the Wall. I explain that I modified the sentence in an attempt to promote a respectful atmosphere, but that I've decided that adopting sentences is not something I'm going to do anymore. I then find myself the subject of a personal attack (flagrantly violating the "don't be disrespectful" rule) that accuses me of becoming a CM precisely in order to modify other people's sentences. He concludes his diatribe by calling *me* noxious. The admin then raps me on the knuckles for a lack of caution and for "doing whatever I want" with sentences. I'm then sucked into having to spend time defending myself, exactly the reason I abandoned the sentence-adoption process in the first place.
Part of the source of the problem is that, while there are plenty of people to police the rules (written, unwritten, and imagined) about editing unowned content, no one, least of all the admins, shows the slightest interest in enforcing the rules about respect, in making this a fun place to work. Touch an unowned sentence to remove what you consider is somewhat libelous, and you unleash a firestorm. But flagrantly flaunt the "no public discussion of candidates" or Trang's "don't be disrespectful" rules or its corollaries (don't impute bad motives to someone else without cause, cast blame, criticize in public when you can do it in private, make unsubstantiated accusations) at every turn, or take someone's private communications and broadcast them on the Wall, or flout the opinion of multiple native speakers on a sentence and declare that native speakers don't really know their own language... and you are begged on bended knee not to leave the site, after which you announce, in a fit of pique, that in fact you will leave, only to reappear in order to cast more vitriol, without comment.
In fact, the repercussions of failing to enforce the courtesy rules are far more severe. I can't count the number of people who have left the site because of their frustration over the bickering that this person unleashes and that the admins don't see fit to even address. That is a tremendous, tremendous loss for the site. It's even a loss for the person himself, who potentially could profit far more from positive attention for his contributions and valid points than he does from the negative cycle in which he provokes people into attacking him, then attempts to bully them into submission, and ends by complaining about being the target of abuse.
If Trang or sysko does visit this thread, I would appreciate not only a clarification of the policy regarding unowned sentences but also some comment on what the admins feel they can do to make this community a more pleasant place.
>he provokes people
YOU provoke people by ignoring the rules although you are a CM. There is no bickering or poisoning the atmosphere by denouncing the ignorance of rules so stop you ad hominem attacks.
Actually, not enough people denounce this ignorance which is not only noxious but absolutely LETHAL to the corpus.
Rule number 2 - which you ignored and overruled so far - is rule number 2 and has been there from the start - at least since I've been here.
Rules are there to be RESPECTED.
I and other contributors chose to contribute (massively in my case) under the guidelines and the SECURITY of these rules.
Should these rules be disregarded now, that would constitute a breach of trust toward all the past contributors and they would be justified in demanding the restitution of their hard work.
You can't change the rules during the game ! And the game has been on for years now.
You arrive here, ignore the rules and want to change them all.
No way or give me back my sentences, because I surrendered them under different conditions !
I fully support this standardization of names. Tatoeba is now full of similar sentences. And we need somehow if not to decrease their number but to try to make a system out of them.
We all know that the number of sentences in any human language is infinite. Are we gonna try to achieve this?
Otherwise I'll start adding:
A loves A
A loves B
B loves A
B loves B
A loves AA...
;)
Actually I'm really grateful to Tatoeba for having so many similar sentences, it's helped me a lot to learn Turkish, and as I previously said, if we just have Tom and Mary, in the case of Turkish, I wouldn't know the ending to indicate possession when talking about proper nouns (because they depend on the ending of that noun), or to indicate destination, or when it's a direct object. So if we have Tom loves Mary, I already know it'd be "Tom Mary'yi seviyor", but if we have "Tom loves Lucia" we'd have "Tom Lucia'yı seviyor", and "Tom loves Rocio" "Tom Rocio'yu seviyor", "Tom loves Jess", "Tom Jess'i seviyor" see the difference? And I think Basque works the same way, and these languages have the same right to get a good database of examples as English, French or German, where you just say "A loves B" without a real change in the structure.
So, as A and B are not real names, I don't see the point in adding them, but if you said Allison and Brad and Ana or Anne or whatever other name, I'd most gladly accept it.
>English, French or German, where you just say "A loves B"
[fra] J'aime l'espagnol / j'aime le turc / j'aime la grammaire...
As I had suggested earlier, we should develop a way to de-duplicate near-duplicates when they don't have an added value in the language. Or rather, a way not to create them.
At the same time, we could ensure that names are exemplary names in each languages (which Tom, Mary or anything else aren't, except for English speakers who don't get the issue you just explained with turkic, although it is prevalent in many languages : French, Russian, German...just to name a few.
So there should be a way, in each language, to create example sentences with each significant form, and no more. Then, a smart "injecting-program" would randomly fill the sentences with the names for each form, in order to maintain names diversity.
For example, in French, we have the following cases :
female names starting with a vowel
male names starting with a vowel
female names starting with a h
male names starting with a h
female names starting with another consonant
male names starting with a another consonant
so in the sentence :
« Je suis amoureuse du beau Paul » it doesn't make a difference if the name is Pierre, Jacques or Rémy.
But it does it the name is Hubert, Alain, Marie or Henriette
=> « Je suis amoureuse du bel Hubert / du bel Alain / de la belle Marie »
So in theory, Tatoeba should have no more than 6 near-duplicates of this sentence in French, while the smart injecting program would randomly change each name to display from preset lists.
So if I was to create the new sentence « Je suis amoureuse du beau Pierre » when « Je suis amoureuse du beau Paul » already exists, it shouldn't create it, but rather merely display it and add « Pierre » to the list of French names in case1, if it isn't already there (which would be useless, because we can prefill these lists with all current French firstnames.) and we could do that for each language.
That would be good for the Corpus because we would avoid clutter while emphasizing diversity of forms.
That would also benefit the students because when they search, they would retrieve only one example of each case next to each other, instead of having to dig in hundreds of silly names that don't make a difference.
That would be more culturally equitable, because every firstname of every language/culture would be equally represented.
I don't think this is that difficult to build.
It only requires a procedure that detects if a sentence has near duplicates, based on list of firstnames in each language, then rejects the insertion if it is already there as a near duplicate.
On the other hand, at display of a sentence that contains a firstname (identified from relevant language list), a random name matching the case would be displayed.
And that's all !
If Tatoeba was written in Ruby, I would already have written the procedure.
create table firstnames firstname:string case:string language:string
create table sentences sentence:string, language:string
before_save near_duplicate_check(sentence, language)
def near_duplicate_check(sentence, language)
words = sentence.split' '
words.uniq.each do
if Firstname.find_by_firstname_and_language("'word', language).any?
f = Firstname.find_by_firstname_and_language("'word', language)
firstnames_in_same_case = Firstname.find_by_case_and_language(f.case, language)
firstnames_in_same_case.each do |firstname|
near_duplicate = words.collect! { |word| (word == f) ? firstname : word}.join
if Sentence.find_by_sentence(near_duplicate).any?
...generate error to block sentence save and break procedure since one is enough
end
end
end
end
end
end
def pick_random_firstname_to_be_displayed(sentence, language)
words = sentence.split' '
words.uniq.each do
if Firstname.find_by_firstname_and_language("'word', language).any?
f = Firstname.find_by_firstname_and_language("'word', language)
displayed_firstname = Firstname.where(:case => f.case, :language => language).order("RANDOM()").first.firstname
displayed_sentence = words.collect! { |word| (word == f) ? displayed_firstname : word}.join
end
end
end
Since new firstnames don't pop-up everyday for a given language, it should be possible to optimize the index on language+firstname, so the initial sentence analysis would not be too slowed by the procedure.