menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Swift Swift August 19, 2010 August 19, 2010 at 6:20:02 PM UTC flag Report link Permalink

I've been doing a bit of thinking about tags. So far, I've not figured out how to remove a tag from a sentence and browsing the tags page,
http://tatoeba.org/eng/tags/view_all
there seems to have been quite a proliferation since they were introduced. Now that we have a bit of experience with these, it might be a good idea to figure out where we want to take them and how to structure them for that purpose.

I grabbed the list of tags and sorted them out a bit at
http://martin.swift.is/tatoeba/tags.html
A few thoughts:

1) There seem to be a number of empty and duplicate tags. Can Trang or Sysko easily delete these?

2) The @-rule seems to be a good one but some of the tags should be merged (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them).

3) The biggest group of tags (the way I sorted them) is the quotations. Seeing how a sentence can just as well be about a person as a quote by that person, I think the "by-" and "from-" prefixes are useful for both clarification and sorting.

4) Tag language and script. I'm as much of an l10n/i18n freak as the next person, but seeing how that will be sorted out in the near(?) future (there was a post about this at some point recently, wasn't there?) I think we should translate/transliterate the non-English ones and stick with that until we can translate tags. We could possibly include the extended Latin character set but even that isn't necessary for the current purpose.

5) Given the sheer number of tags, it might be nice to have these split up along some lines. How that is done is going to be pretty subjective, but I'm sure we can hash something out. Is there a way to add a field on the tags in the database to categorise them?

6) We might want to consider tossing out some of the tags. 'Fruit' may be fine, but 'apples' just seems a bit useless given the search feature.

Finally, I think the tags can be a fantastic tool, but so far they seem a bit disorganised and unwieldy to be used on a mass scale. Some statistics on how many sentences are tagged and how many sentences each tag has would be very interesting to gauge how people are using them.

{{vm.hiddenReplies[2170] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul August 19, 2010 August 19, 2010 at 6:37:59 PM UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a sentence

First step - become a moderator.

TRANG TRANG August 19, 2010 August 19, 2010 at 9:13:23 PM UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a
> sentence

If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.


> Now that we have a bit of experience with these, it might
> be a good idea to figure out where we want to take them
> and how to structure them for that purpose.

Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
(cf. "Some of the Tags That I've Found")


> 1) There seem to be a number of empty and duplicate tags.
> Can Trang or Sysko easily delete these?

Yes.


> 4) Tag language and script. [...] I think we should
> translate/transliterate the non-English ones and stick
> with that until we can translate tags.

It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate. For instance if you tag a French sentence with "preterite", it can be ambiguous whether "preterite" refers to the "passé simple" or to the "imparfait". So if you're going to put a tag that refers to a tense, it's probably better to use the name in the language of the sentence. That's one case I can think of but there are probably other cases...

I still agree though, that we should use English tags by default, until we can translate tags.


> 5) Is there a way to add a field on the tags in the
> database to categorise them?

At the moment, no.


> 6) We might want to consider tossing out some of the
> tags. 'Fruit' may be fine, but 'apples' just seems a bit
> useless given the search feature.

I agree.


> Finally, I think the tags can be a fantastic tool, but so
> far they seem a bit disorganised and unwieldy to be used
> on a mass scale.

I agree. We do have more plans for tags, but they're not a priority yet...


> Some statistics on how many sentences are tagged and how
> many sentences each tag has would be very interesting to > gauge how people are using them.

Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
http://tatoeba.org/files/downloads/tags.csv

The fields are: sentence_id, tag_name

It's exported every week.

{{vm.hiddenReplies[2172] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift August 20, 2010 August 20, 2010 at 3:12:47 PM UTC flag Report link Permalink

> If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.

Fair enough. :-)

> Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
> (cf. "Some of the Tags That I've Found")

Interesting.

>> 1) There seem to be a number of empty and duplicate tags.
>> Can Trang or Sysko easily delete these?
>
> Yes.

OK, then let's start by deleting the ones in:
<http://martin.swift.is/tatoeba/tags.html#empty>
and then look at merging duplicates.

Tatoeba, by the way, doesn't seem to show any difference between an empty and non-existent tag.

>> 4) Tag language and script. [...] I think we should
>> translate/transliterate the non-English ones and stick
>> with that until we can translate tags.
>
> It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate.

I reckon the best solution is to use whatever term would be used on the English translation once we're able to translate tags.

>> 6) We might want to consider tossing out some of the
>> tags. 'Fruit' may be fine, but 'apples' just seems a bit
>> useless given the search feature.
>
> I agree.

OK, we can have a look at this once we've gotten rid of some of the empty and duplicate categories.

>> Some statistics on how many sentences are tagged and how
>> many sentences each tag has would be very interesting to > gauge how people are using them.
>
> Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
> http://tatoeba.org/files/downloads/tags.csv

Great! Thanks.

{{vm.hiddenReplies[2191] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko August 20, 2010 August 20, 2010 at 3:19:26 PM UTC flag Report link Permalink

I've deleted this morning (GTM +1h) the empty tags
If I've time, I will see to make the difference between a non existent and empty tag (in fact there is, non existent tag show an empty page, and empty tag at least show the name of the tag in title:p )
Next release (tomorow) will show tags sorted by number of tags, and autocompletion when adding a tag, this should make it a bit usefull and avoid most of mistyping / duplicate tags

{{vm.hiddenReplies[2192] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift August 20, 2010 August 20, 2010 at 3:51:57 PM UTC flag Report link Permalink

> I've deleted this morning (GTM +1h) the empty tags

Great! Thanks.

> Next release (tomorow) will show tags sorted by number of tags

I'm actually not convinced that this is terribly useful. My impression is that one would use the tags to find a particular topic (in which case, alphabetical ordering would make the most sense) rather than just any popular one (as one might when visiting a blog; which is why things like tag-clouds are appropriate there).

Seeing how the great number of tags would benefit from categorisation, it might actually be helpful to have a semi-automated list of tags... Tatoeba is currently written in PHP, right?

Seeing how many sentences are filed under a tag would be useful, though.

> and autocompletion when adding a tag

This will be a fantastic addition!

{{vm.hiddenReplies[2197] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko August 20, 2010 August 20, 2010 at 3:56:03 PM UTC flag Report link Permalink

yep it just that I've added the count while adding the autocompletion, to have tag name with more tag appearing first in suggestion. For the moment my personnal live is a bit busy so for sure categorisation will be great, but I didn't found time to do it yet ^^

FeuDRenais FeuDRenais August 19, 2010 August 19, 2010 at 11:23:43 PM UTC flag Report link Permalink

> (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them)

I disagree on this point. There's a reason why it's "Needs Native Check" (rather than "Needs Native Parent"). The creator of the phrase should keep the phrase, and there's no more work in checking than in parenting (in fact, parenting would involve the check anyway).

{{vm.hiddenReplies[2173] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:45:57 AM UTC flag Report link Permalink

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Swift Swift August 20, 2010 August 20, 2010 at 3:35:17 PM UTC flag Report link Permalink

What's the point in parenting a sentence if you're not confident that it's correct?

There *is* slightly more work (and arguably less incentive) in leaving a note on someone's sentence and for them to respond, than for a "checker" to simply adopt the sentence and fix it themselves.

(Oh, and I don't think the tags system is tried and trusted enough to warrant an appeal to tradition.)

{{vm.hiddenReplies[2193] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius August 20, 2010 August 20, 2010 at 3:47:30 PM UTC flag Report link Permalink

Leaving a comment lets a non-native owing it learn the correct variant or be confident in their translations.

FeuDRenais FeuDRenais August 20, 2010 August 20, 2010 at 3:55:46 PM UTC flag Report link Permalink

There are multiple points in doing so:

1) The learning principle. A native speaker correcting it serves as a lesson. One side educates, and the other rectifies a certain mistake and probably won't repeat it. If you go for pure efficiency, then you annihilate this aspect, which, IMO, would be a pity. You'd probably want an environment where people can grow. That involves making mistakes and learning.

2) People grow attached to sentences. Absurd, but we do. They're often a reflection of a person's wit, identity, or interests. And if you're only, say, 95% sure of the translation and would like a check, seeing it orphaned and then with another person's name on it is... well, strange.

3) The native check is mostly to make sure that the grammar and the naturalness of the sentence doesn't have any issues, while the originator of the sentence has a responsibility towards all the translations that it's linked to, and how the sentence fits in on the whole. In this sense, it *is* more work. The person adopting the sentence has to be able to guarantee that all the translations remain valid when he fixes.

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:49:29 AM UTC flag Report link Permalink

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

{{vm.hiddenReplies[2183] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko August 20, 2010 August 20, 2010 at 10:53:08 AM UTC flag Report link Permalink

+1

Demetrius Demetrius August 20, 2010 August 20, 2010 at 10:54:29 AM UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.