Menu
I've been doing a bit of thinking about tags. So far, I've not figured out how to remove a tag from a sentence and browsing the tags page,
http://tatoeba.org/eng/tags/view_all
there seems to have been quite a proliferation since they were introduced. Now that we have a bit of experience with these, it might be a good idea to figure out where we want to take them and how to structure them for that purpose.
I grabbed the list of tags and sorted them out a bit at
http://martin.swift.is/tatoeba/tags.html
A few thoughts:
1) There seem to be a number of empty and duplicate tags. Can Trang or Sysko easily delete these?
2) The @-rule seems to be a good one but some of the tags should be merged (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them).
3) The biggest group of tags (the way I sorted them) is the quotations. Seeing how a sentence can just as well be about a person as a quote by that person, I think the "by-" and "from-" prefixes are useful for both clarification and sorting.
4) Tag language and script. I'm as much of an l10n/i18n freak as the next person, but seeing how that will be sorted out in the near(?) future (there was a post about this at some point recently, wasn't there?) I think we should translate/transliterate the non-English ones and stick with that until we can translate tags. We could possibly include the extended Latin character set but even that isn't necessary for the current purpose.
5) Given the sheer number of tags, it might be nice to have these split up along some lines. How that is done is going to be pretty subjective, but I'm sure we can hash something out. Is there a way to add a field on the tags in the database to categorise them?
6) We might want to consider tossing out some of the tags. 'Fruit' may be fine, but 'apples' just seems a bit useless given the search feature.
Finally, I think the tags can be a fantastic tool, but so far they seem a bit disorganised and unwieldy to be used on a mass scale. Some statistics on how many sentences are tagged and how many sentences each tag has would be very interesting to gauge how people are using them.
> So far, I've not figured out how to remove a tag from a sentence
First step - become a moderator.
> So far, I've not figured out how to remove a tag from a
> sentence
If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.
> Now that we have a bit of experience with these, it might
> be a good idea to figure out where we want to take them
> and how to structure them for that purpose.
Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
(cf. "Some of the Tags That I've Found")
> 1) There seem to be a number of empty and duplicate tags.
> Can Trang or Sysko easily delete these?
Yes.
> 4) Tag language and script. [...] I think we should
> translate/transliterate the non-English ones and stick
> with that until we can translate tags.
It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate. For instance if you tag a French sentence with "preterite", it can be ambiguous whether "preterite" refers to the "passé simple" or to the "imparfait". So if you're going to put a tag that refers to a tense, it's probably better to use the name in the language of the sentence. That's one case I can think of but there are probably other cases...
I still agree though, that we should use English tags by default, until we can translate tags.
> 5) Is there a way to add a field on the tags in the
> database to categorise them?
At the moment, no.
> 6) We might want to consider tossing out some of the
> tags. 'Fruit' may be fine, but 'apples' just seems a bit
> useless given the search feature.
I agree.
> Finally, I think the tags can be a fantastic tool, but so
> far they seem a bit disorganised and unwieldy to be used
> on a mass scale.
I agree. We do have more plans for tags, but they're not a priority yet...
> Some statistics on how many sentences are tagged and how
> many sentences each tag has would be very interesting to > gauge how people are using them.
Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
http://tatoeba.org/files/downloads/tags.csv
The fields are: sentence_id, tag_name
It's exported every week.
> If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.
Fair enough. :-)
> Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
> (cf. "Some of the Tags That I've Found")
Interesting.
>> 1) There seem to be a number of empty and duplicate tags.
>> Can Trang or Sysko easily delete these?
>
> Yes.
OK, then let's start by deleting the ones in:
<http://martin.swift.is/tatoeba/tags.html#empty>
and then look at merging duplicates.
Tatoeba, by the way, doesn't seem to show any difference between an empty and non-existent tag.
>> 4) Tag language and script. [...] I think we should
>> translate/transliterate the non-English ones and stick
>> with that until we can translate tags.
>
> It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate.
I reckon the best solution is to use whatever term would be used on the English translation once we're able to translate tags.
>> 6) We might want to consider tossing out some of the
>> tags. 'Fruit' may be fine, but 'apples' just seems a bit
>> useless given the search feature.
>
> I agree.
OK, we can have a look at this once we've gotten rid of some of the empty and duplicate categories.
>> Some statistics on how many sentences are tagged and how
>> many sentences each tag has would be very interesting to > gauge how people are using them.
>
> Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
> http://tatoeba.org/files/downloads/tags.csv
Great! Thanks.
I've deleted this morning (GTM +1h) the empty tags
If I've time, I will see to make the difference between a non existent and empty tag (in fact there is, non existent tag show an empty page, and empty tag at least show the name of the tag in title:p )
Next release (tomorow) will show tags sorted by number of tags, and autocompletion when adding a tag, this should make it a bit usefull and avoid most of mistyping / duplicate tags
> I've deleted this morning (GTM +1h) the empty tags
Great! Thanks.
> Next release (tomorow) will show tags sorted by number of tags
I'm actually not convinced that this is terribly useful. My impression is that one would use the tags to find a particular topic (in which case, alphabetical ordering would make the most sense) rather than just any popular one (as one might when visiting a blog; which is why things like tag-clouds are appropriate there).
Seeing how the great number of tags would benefit from categorisation, it might actually be helpful to have a semi-automated list of tags... Tatoeba is currently written in PHP, right?
Seeing how many sentences are filed under a tag would be useful, though.
> and autocompletion when adding a tag
This will be a fantastic addition!
yep it just that I've added the count while adding the autocompletion, to have tag name with more tag appearing first in suggestion. For the moment my personnal live is a bit busy so for sure categorisation will be great, but I didn't found time to do it yet ^^
> (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them)
I disagree on this point. There's a reason why it's "Needs Native Check" (rather than "Needs Native Parent"). The creator of the phrase should keep the phrase, and there's no more work in checking than in parenting (in fact, parenting would involve the check anyway).
@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588
What's the point in parenting a sentence if you're not confident that it's correct?
There *is* slightly more work (and arguably less incentive) in leaving a note on someone's sentence and for them to respond, than for a "checker" to simply adopt the sentence and fix it themselves.
(Oh, and I don't think the tags system is tried and trusted enough to warrant an appeal to tradition.)
Leaving a comment lets a non-native owing it learn the correct variant or be confident in their translations.
There are multiple points in doing so:
1) The learning principle. A native speaker correcting it serves as a lesson. One side educates, and the other rectifies a certain mistake and probably won't repeat it. If you go for pure efficiency, then you annihilate this aspect, which, IMO, would be a pity. You'd probably want an environment where people can grow. That involves making mistakes and learning.
2) People grow attached to sentences. Absurd, but we do. They're often a reflection of a person's wit, identity, or interests. And if you're only, say, 95% sure of the translation and would like a check, seeing it orphaned and then with another person's name on it is... well, strange.
3) The native check is mostly to make sure that the grammar and the naturalness of the sentence doesn't have any issues, while the originator of the sentence has a responsibility towards all the translations that it's linked to, and how the sentence fits in on the whole. In this sense, it *is* more work. The person adopting the sentence has to be able to guarantee that all the translations remain valid when he fixes.
Some more thought about tags.
BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.
+1
> So far, I've not figured out how to remove a tag from a sentence
The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag
This is a bug. Use only when you’re absolutely sure it’s applicable.