Menuo
As I have time to deal about this these days, lets do some tag,list cleaning
So here please discuss about
which tag you want to keep ,
which tags you wants to be deleted
which tags you want to see moved to list
precise them clearly so that after we can discuss about it.
Thanks
If one read the above mentioned questions one could suppose everything will be quite simple, but, I beg your pardon, I have some doubts about this. The possible views on this matter are very different, and I ask myself whether it is really necessary to begin with a controversial content-centered discussion, and whether it could possibly above all create conflicts rather than to bring a generally acceptable solution.
I’m convinced it’s (at least in the beginning ) possible to gain very much with a quite pragmatic approach. If we consider, that all authors in the future will be indicated in another way, the list of the tags will at least be cut in half or may be even reduced to a third. In the number of the remaining tags we will see, that again a huge majority of them appear only one or two times. If we suppose that frequency may be an indicator of importance and usefulness , and fix a number of maybe 10, and we would consent that only tags appearing 10 or more times should remain, we would get a list with probably less than 10 percent of the former entries. (Of course I didn’t count them but tried to give a reasonable rough estimate based on random sampling and optical impression). To reach consensus about such a way to process will certainly not be difficult, because a discussion about subjective criterias and interests would not be necessary at all. Thus we can already make a huge step forward, and the starting position for all further considerations will be significantly more favorable.
Another point: I guess I'm not the only one who happened to create tags without intention by mistyping, or who in the past created tags which he or she considers now no longer necessary. So I would very like to support the cleaning to get rid of this stuff.
Do you really need all your Esperanto verb tags? Does the tag "-as -as -is -is" serve any useful purpose?
These tags are obvious from the content of the sentence. Tags should be used to group things that aren't obvious, like authors of quotes. If we could get rid of these, it would significantly clean up the list of tags.
Sorry if I'm repeating myself a bit here, but I think that those tags are potentially useful. I would just move them into a "Language specific tag" category where they would be less visible.
I think that we shouldn't view it as an objective to reduce the number of tags at all cost, since that removes a lot of information. Many websites using tags have immense lists of tags.
Yes, maybe we just need to categorize them...
Dear colleagues, I would like to explain to you why I insist on the importance of the possibility to give basic indications of sentence structures, verb tenses and word numbers, and to discuss how this can be managed and restricted in such a way that nobody should see this as a disadvantage.
At the moment I lack time to go into details, but I count on you, and last not least again and again I admire the amount of energy and knowledge you all invest into the development and maintaining of this corpus tatoeba, a tool that can offer great opportunities also for didactical an analytical purposes. Concerning their importance we may judge differently, as we may look at the use of this corpus from different angles, but I hope we can agree that it would be a contradiction not to support the use of these possibilities.
I don't have a problem with language specific tags per se, but I do have a problem with ones that simply describe the text of the sentences and have the potential for nearly unbounded growth (i.e. these Esperanto verb tags).
And the words counting...why not letters? Syllables?
I don't think they're terribly useful. I would prefer if we removed them for now, and possibly re-add that information when we get metadata (but only if users can provide compelling use cases). Hopefully, it would be automated in most cases. (It is easy for a computer to count syllables in any language with a phonemic orthography and counting words in a language with spaces is trivial. However, it would get tricky when things like numbers and foreign names are added.)
OK, well I don't know Esperanto so I assumed they were reasonable.
I agree. Obvious info such as number of words and tenses should not be there.
I think that most tags are OK. What I think should be done is a better classification of tags, e.g.
1) author tag
2) date tag
3) tags specific to Language (e.g. "verb ending in -er")
I would divide these tags into three groups:
1. Essential tags
- tags starting with [@], indicating that the sentence needs to be checked, corrected, or removed:
@check
@change
@change flag
@needs native check
@delete
2. Regionalisms
- tags that indicate slangs or regionalisms (since we have only one flag for each language). We have many similar tags, like [Brazil] and [not used in Brazil]. What does the first tag mean? Are we saying that the sentences are about the country itself, Brazil, or that they contain expressions used only in Brazil? That's why we need a word before the country/region to indicate that it's all about usage, for example: [used only in Brazil], [regionalism: Brazil], etc. This category is VERY IMPORTANT, because some languages may sound very different according to the region. Many Spanish students wouldn't understand voseo (the use of the second person singular pronoun "vos" in many countries instead of "tu"). We really need the tag [voseo].
3. Authors
- tags starting with "by".
4. Useless tags
- 80% of the tags. I'll just take some random examples:
[biology]
[cooking]
[to 2nd sing/pl formal]
[It means "people can easily be tamed".]
[Chinese wisdom]
[pater nostrum]
[racist]
[contrepèterie]
The first tag, for example, could be a personal list. Tags like [pater nostrum] could be simply deleted and others like [It means "people can easily be tamed".] could be added as a comment.
I mostly agree with this classification. There are a few tags that are not regionalisms but add similar, useful information (e.g. 成语). Also I think tags like "said by male" can be useful to beginners.
I agree with classifying tags into groups. I however don't agree with your list of useless tags.
[biology],[cooking] Those are examples of what the sentence is about. It's similar to the [medical]or[food]tag that you would find in a dictionary. I believe those tags could potentially be useful for someone wanting to use Tatoeba's data, e.g. finding all the sentences related to one subject.
Actually, since it's hard to predict how exactly Tatoeba's data will be used, I think that having more tags is a good thing since it makes the data "richer".
[It means "people can easily be tamed".] I agree that this belongs in the comments.
[Chinese wisdom]
[pater nostrum]
[contrepèterie]
These I also view these as useful. "Pater Nostrum" shows where the sentence is from. This should probably be changed to "From Pater Nostrum". It shows that it's not say, casual speech. "Contrepèterie" shows that this sentence contains a particular French figure of speech (a spoonerism in English). A list could replace them, but I feel that lists really are "outdated technology". For anyone wanting to make sense of data, tags will be more useful.
What I'd propose is finding a way of organizing the tags into categories so that it's a bit less confusing.
Well, when I say "useless tags" I mean that they can be replaced by a list or a comment... the only problem with comments is that one wouldn't be able to filter the sentences... But if we really want to reduce the list of tags, we will have to abandon some of them.
One unimportant thing that has bugged me forever...
"@Needs Native Check" should be "@needs native check" like all the other tags.
[not needed anymore- removed by CK]
I pretty much agree with everything CK said.
I'd also like to point out that tags such as "subjunctive mood", "present tense", "past tense", "finite", "non-finite" etc. are very useful for the purpose of linguistic corpus analysis. Please don't delete these kinds of tags!
About moral tags @CK
Your personal definition of what is appropriate for your children is your business and I forbid you to impose them on mine!
These definitions have absolutely no grounds that can be share across the entire humanity.
Besides, they're obscurantist: why can't children from all over the world know that "the penis is one of the reproductive organs" ? What's wrong with penises? What's wrong with organs? What's wrong with reproduction?
Your sentences encouraging the culture of snacking and guns are far more unsafe in a world population growingly obese that collect mass-murderers! Think!
I have the same feeling.
The same with [delete], which should probably be [@delete].
Looking to the debates above, as a first objective, we may be first focus on two things
* nearly duplicate tags (either by mispelling or due to the use of a synonym)
* troll tag (tag that have been used only to demonstrate an absurdity by creating an other absurdity)
En tous cas, je trouverais correct que chaque étiquette qui est retirée fasse également l'objet d'un commentaire, sauf, bien sûr, s'il s'agit d'un retrait général, comme celui des "XXX" mais alors il est important de prévenir à l'avance...
.-.-.
Je viens de voir que c'était moi et je n'en éprouve aucune honte.
Les correcteurs orthographiques français, comme je viens de le vérifier ici dans Chrome, acceptent aussi bien « e-mail » que « courriel », donc ton affirmation est mensongère, dans l'absolu...
« courriel » est effectivement d'origine québécoise, mais il est désormais d'usage courant en france.
Les nouvelles règles sur les étiquettes n'ont pas d'effet rétroactif et il n'y avait aucune autre règle auparavant que celle qui donnait le droit aux contributeurs avancés d'étiqueter des phrases, comme on l'a amplement fait pour les miennes, entre autres. Cet étiquetage se faisait soit de manière « légère » comme on le voit ici http://tatoeba.org/epo/sentences/show/18524 ou avec un programme, comme ici http://tatoeba.org/epo/sentences/show/244067
D'ailleurs j'ai même étiqueté moi-même certaines de mes phrases comme des mensonges comme ici http://tatoeba.org/epo/sentences/show/8140 ou ici http://tatoeba.org/epo/sentences/show/1271749
Toutefois, certains mensonges sont patents http://tatoeba.org/epo/sentences/show/841680
http://tatoeba.org/epo/sentences/show/1390132
Je suis personnellement complètement en faveur de la nouvelle règle énoncée par sysko qui enjoint de justifier les étiquettes.
En revanche, je suis contre l'« objectivation » telle que définie par Vortarulo, parce que selon moi, tout est subjectif, à commencer par notre existence, et si donc nous suivions cette voie, aucune étiquette ne serait justifiée...
Nihilism under the sleeves.
Nihilism is as subjective as anything else.
.-.-.
> I just think it is a useless tag. Who cares about whether Tatoeba example sentences happen to be true or false? It says "example sentences" on the title page, not "wisdoms to guide people through their lives".
It may not be useful for you or even for anyone using the Tatoeba web interface. But we all seem to forget the goal of Tatoeba is to produce a high quality corpus of sentences that can be used for a variety of purposes. Some of these purposes may require filtering out sentences that are false, which becomes a lot easier when you have a tag for it.
> Some of these purposes may require filtering out sentences that are false, which becomes a lot easier when you have a tag for it.
+1
.-.-.
What do you mean?
.-.-.
Yeah, I meant to say something about this.
I think the tag should be restricted to sentences with relatively unambiguous references. Tom, Mary, you, and I could be anybody.
As for time, the tag should reflect the truth at the present time. So we can remove and re-add it as necessary.
.-.-.
Well I think most "false" statements will not change too often.
For the ones that do, it's not super critical that we catch them immediately. (Sentences that were at one point true are probably the false sentences that are least harmful in a corpus where you are trying to filter them out all false sentences.)
>It says "example sentences" on the title page, not "wisdoms to guide people through their lives".
Yes, and most newspapers and magazines say they deliver news when most of it is propaganda and marketing...
No public expression is neutral.
>In this specific case, by the way, if attributed to xeklat, the proposition was not a lie, but an accurate description of the behaviour of the software of the phone that xeklat happens to use
But you didn't limit the consideration in your sentence as you do now in this comment.
Your sentence is thus misleading, to say the least, and I want to be able to state it.
I change the tag to "misleading"...
.-.-.
If it's "observative", then you must define the field of the observation...
.-.-.
People tend to construct context out of emptiness.
La nature a horreur du vide.
.-.-.
Oui, donc c'est ce manque de précision qui est "misleading"...
Not being francophone I can't follow the complete discussion, but in my eyes it seems at least unnecessary to have a tag named lie. If someone wants to protest against a sentence he can do this by commenting and discussing the sentence. If there is no compromise possible with the "owner" of the sentence there still the possibility of a discussion on the wall.
@xeklat
I don't think that someone wanted to offend you (or at lest I hope so). In general I don't see a connection between the content of a sentence and the opinion of its author or owner, like I don't see such an connection if I read a novel for instance.
[not needed anymore- removed by CK]
Perhaps instead of "untrue", it could be "false".
I agree that "untrue" "not true" "false" or "not not false" :) would be better than "Lie" since they're more neutral.
I think that it's better to add a comment when adding the Lie tag because it sends a notification and it provides an explanation as to why the tag was added. And I think that the lie tag should only be used when something is objectively false, not for subjective judgments. The controversial tag should be used in these cases.
"I prefer being poor to being rich."
"The finest wines are those from France."
These two should be "controversial", it's a subjective judgment.
"Children should drink milk every day." How is that a lie?
"L'âme, toujours errante, reste éternelle, ici et maintenant." That's a religious belief. I don't think it should be tagged or we will end up with endless
yes, then let's tag ALL sentences with "controversial", because they all are...
>"Children should drink milk every day." How is that a lie?
http://www.youtube.com/watch?v=DTo5TulJLU8
They're not all controversial... Especially not this one: http://tatoeba.org/eng/sentences/show/901201
See here for a better summary of the controversy:
http://en.wikipedia.org/wiki/Milk#Controversy
So tag it controversial if you want.
everything is controversial...
... even controversy?
if we are to replace every tag with "controversial", since every sentence is, we should as well dump the tag functionality altogether...
I don't think the tag "lie" should be used at all. If someone disagrees with a statement, they can say so in the comments.
But what about a sentence like "2+2=5" or "Canada is the largest country of South America". I think that it's nice to have a warning that it's not true.
.-.-.
Quite right.
I still think this should be dealt with in the comments. Adding a tag such as "lie" or "incorrect" is not very transparent. In the comments, you can explain why you think something is wrong or why you disagree with something.
I think a comment should always be added, but tags make it easy for someone using the Tatoeba corpus to filter them out the sentences with lies, if that is what they desire.
I think the "Lie" tag is useful, but like CK, I think it should be renamed. (I'd suggest "false" or "not true").
I don't think the tag is necessary for sentences like "Chuck Norris is a platypus." or "2 + 2 = 5". However, there is nothing wrong with using it in these cases. I wouldn't encourage it though.
The tag is most useful for things that sound like they could be facts, but aren't. If I said:
"According to an article in a recent issue of Scientific American, there is a 96% chance that there is intelligent life within 1,000 light-years of earth."
it seems possible that some people might interpret it as fact even though I just made it up. They might tell their friends or even just waste five minutes looking it up to see if it's true. This could be prevented with a simple tag.
The tag should NOT be used for things like "I ate a sandwich yesterday." even if it is indeed false for the author. If you really feel the need to let everyone know that you didn't eat a sandwich, do it in a comment.
So, here are my criteria for using this tag:
1) It must be objective - nothing that is reasonably disputable
2) It must not be about the author
3) It must not be in inside-joke (sentences about Tatoeba members probably count)
2) It should not be obviously false
.-.-.
> Anyway, even for all those uses you all describe, it seems obvious to me that what you want is a "false" or "not true" tag, not a "lie" tag. Not because of offensiveness, but simply because not everything that is false is a lie.
Yes, I do think it should be renamed.
"The tag should NOT be used for things like "I ate a sandwich yesterday." even if it is indeed false for the author. If you really feel the need to let everyone know that you didn't eat a sandwich, do it in a comment."
Yes, sentences are example sentences. They're not about the author himself. So the sentence "My computer is not working" doesn't concern your computer. It concerns a hypothetical computer owned by a hypothetical person. So no "lie" tag there.
+1
>We must trust women as much as we can trust the weather.<
Reading this sentence I asked myself: Where is the dislike button? But maybe you recommend me a convenient tag.
http://tatoeba.org/deu/sentences/show/892442
sexist?
misogynistic It's already on the French sentence.