menu
Tatoeba
language
Daftar Masuk
language Bahasa Indonesia
menu
Tatoeba

chevron_right Daftar

chevron_right Masuk

Telusuri

chevron_right Tampilkan kalimat acak

chevron_right Berdasarkan bahasa

chevron_right Berdasarkan daftar

chevron_right Berdasarkan label

chevron_right Berdasarkan audio

Komunitas

chevron_right Dinding

chevron_right Daftar semua anggota

chevron_right Bahasa para anggota

chevron_right Penutur asli

search
clear
swap_horiz
search
Swift Swift 19 Agustus 2010 19 Agustus 2010 18.20.02 UTC flag Report link Permalink

I've been doing a bit of thinking about tags. So far, I've not figured out how to remove a tag from a sentence and browsing the tags page,
http://tatoeba.org/eng/tags/view_all
there seems to have been quite a proliferation since they were introduced. Now that we have a bit of experience with these, it might be a good idea to figure out where we want to take them and how to structure them for that purpose.

I grabbed the list of tags and sorted them out a bit at
http://martin.swift.is/tatoeba/tags.html
A few thoughts:

1) There seem to be a number of empty and duplicate tags. Can Trang or Sysko easily delete these?

2) The @-rule seems to be a good one but some of the tags should be merged (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them).

3) The biggest group of tags (the way I sorted them) is the quotations. Seeing how a sentence can just as well be about a person as a quote by that person, I think the "by-" and "from-" prefixes are useful for both clarification and sorting.

4) Tag language and script. I'm as much of an l10n/i18n freak as the next person, but seeing how that will be sorted out in the near(?) future (there was a post about this at some point recently, wasn't there?) I think we should translate/transliterate the non-English ones and stick with that until we can translate tags. We could possibly include the extended Latin character set but even that isn't necessary for the current purpose.

5) Given the sheer number of tags, it might be nice to have these split up along some lines. How that is done is going to be pretty subjective, but I'm sure we can hash something out. Is there a way to add a field on the tags in the database to categorise them?

6) We might want to consider tossing out some of the tags. 'Fruit' may be fine, but 'apples' just seems a bit useless given the search feature.

Finally, I think the tags can be a fantastic tool, but so far they seem a bit disorganised and unwieldy to be used on a mass scale. Some statistics on how many sentences are tagged and how many sentences each tag has would be very interesting to gauge how people are using them.

{{vm.hiddenReplies[2170] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
blay_paul blay_paul 19 Agustus 2010 19 Agustus 2010 18.37.59 UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a sentence

First step - become a moderator.

TRANG TRANG 19 Agustus 2010 19 Agustus 2010 21.13.23 UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a
> sentence

If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.


> Now that we have a bit of experience with these, it might
> be a good idea to figure out where we want to take them
> and how to structure them for that purpose.

Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
(cf. "Some of the Tags That I've Found")


> 1) There seem to be a number of empty and duplicate tags.
> Can Trang or Sysko easily delete these?

Yes.


> 4) Tag language and script. [...] I think we should
> translate/transliterate the non-English ones and stick
> with that until we can translate tags.

It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate. For instance if you tag a French sentence with "preterite", it can be ambiguous whether "preterite" refers to the "passé simple" or to the "imparfait". So if you're going to put a tag that refers to a tense, it's probably better to use the name in the language of the sentence. That's one case I can think of but there are probably other cases...

I still agree though, that we should use English tags by default, until we can translate tags.


> 5) Is there a way to add a field on the tags in the
> database to categorise them?

At the moment, no.


> 6) We might want to consider tossing out some of the
> tags. 'Fruit' may be fine, but 'apples' just seems a bit
> useless given the search feature.

I agree.


> Finally, I think the tags can be a fantastic tool, but so
> far they seem a bit disorganised and unwieldy to be used
> on a mass scale.

I agree. We do have more plans for tags, but they're not a priority yet...


> Some statistics on how many sentences are tagged and how
> many sentences each tag has would be very interesting to > gauge how people are using them.

Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
http://tatoeba.org/files/downloads/tags.csv

The fields are: sentence_id, tag_name

It's exported every week.

{{vm.hiddenReplies[2172] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
Swift Swift 20 Agustus 2010 20 Agustus 2010 15.12.47 UTC flag Report link Permalink

> If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.

Fair enough. :-)

> Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
> (cf. "Some of the Tags That I've Found")

Interesting.

>> 1) There seem to be a number of empty and duplicate tags.
>> Can Trang or Sysko easily delete these?
>
> Yes.

OK, then let's start by deleting the ones in:
<http://martin.swift.is/tatoeba/tags.html#empty>
and then look at merging duplicates.

Tatoeba, by the way, doesn't seem to show any difference between an empty and non-existent tag.

>> 4) Tag language and script. [...] I think we should
>> translate/transliterate the non-English ones and stick
>> with that until we can translate tags.
>
> It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate.

I reckon the best solution is to use whatever term would be used on the English translation once we're able to translate tags.

>> 6) We might want to consider tossing out some of the
>> tags. 'Fruit' may be fine, but 'apples' just seems a bit
>> useless given the search feature.
>
> I agree.

OK, we can have a look at this once we've gotten rid of some of the empty and duplicate categories.

>> Some statistics on how many sentences are tagged and how
>> many sentences each tag has would be very interesting to > gauge how people are using them.
>
> Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
> http://tatoeba.org/files/downloads/tags.csv

Great! Thanks.

{{vm.hiddenReplies[2191] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
sysko sysko 20 Agustus 2010 20 Agustus 2010 15.19.26 UTC flag Report link Permalink

I've deleted this morning (GTM +1h) the empty tags
If I've time, I will see to make the difference between a non existent and empty tag (in fact there is, non existent tag show an empty page, and empty tag at least show the name of the tag in title:p )
Next release (tomorow) will show tags sorted by number of tags, and autocompletion when adding a tag, this should make it a bit usefull and avoid most of mistyping / duplicate tags

{{vm.hiddenReplies[2192] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
Swift Swift 20 Agustus 2010 20 Agustus 2010 15.51.57 UTC flag Report link Permalink

> I've deleted this morning (GTM +1h) the empty tags

Great! Thanks.

> Next release (tomorow) will show tags sorted by number of tags

I'm actually not convinced that this is terribly useful. My impression is that one would use the tags to find a particular topic (in which case, alphabetical ordering would make the most sense) rather than just any popular one (as one might when visiting a blog; which is why things like tag-clouds are appropriate there).

Seeing how the great number of tags would benefit from categorisation, it might actually be helpful to have a semi-automated list of tags... Tatoeba is currently written in PHP, right?

Seeing how many sentences are filed under a tag would be useful, though.

> and autocompletion when adding a tag

This will be a fantastic addition!

{{vm.hiddenReplies[2197] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
sysko sysko 20 Agustus 2010 20 Agustus 2010 15.56.03 UTC flag Report link Permalink

yep it just that I've added the count while adding the autocompletion, to have tag name with more tag appearing first in suggestion. For the moment my personnal live is a bit busy so for sure categorisation will be great, but I didn't found time to do it yet ^^

FeuDRenais FeuDRenais 19 Agustus 2010 19 Agustus 2010 23.23.43 UTC flag Report link Permalink

> (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them)

I disagree on this point. There's a reason why it's "Needs Native Check" (rather than "Needs Native Parent"). The creator of the phrase should keep the phrase, and there's no more work in checking than in parenting (in fact, parenting would involve the check anyway).

{{vm.hiddenReplies[2173] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
Demetrius Demetrius 20 Agustus 2010 20 Agustus 2010 10.45.57 UTC flag Report link Permalink

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

Swift Swift 20 Agustus 2010 20 Agustus 2010 15.35.17 UTC flag Report link Permalink

What's the point in parenting a sentence if you're not confident that it's correct?

There *is* slightly more work (and arguably less incentive) in leaving a note on someone's sentence and for them to respond, than for a "checker" to simply adopt the sentence and fix it themselves.

(Oh, and I don't think the tags system is tried and trusted enough to warrant an appeal to tradition.)

{{vm.hiddenReplies[2193] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
Demetrius Demetrius 20 Agustus 2010 20 Agustus 2010 15.47.30 UTC flag Report link Permalink

Leaving a comment lets a non-native owing it learn the correct variant or be confident in their translations.

FeuDRenais FeuDRenais 20 Agustus 2010 20 Agustus 2010 15.55.46 UTC flag Report link Permalink

There are multiple points in doing so:

1) The learning principle. A native speaker correcting it serves as a lesson. One side educates, and the other rectifies a certain mistake and probably won't repeat it. If you go for pure efficiency, then you annihilate this aspect, which, IMO, would be a pity. You'd probably want an environment where people can grow. That involves making mistakes and learning.

2) People grow attached to sentences. Absurd, but we do. They're often a reflection of a person's wit, identity, or interests. And if you're only, say, 95% sure of the translation and would like a check, seeing it orphaned and then with another person's name on it is... well, strange.

3) The native check is mostly to make sure that the grammar and the naturalness of the sentence doesn't have any issues, while the originator of the sentence has a responsibility towards all the translations that it's linked to, and how the sentence fits in on the whole. In this sense, it *is* more work. The person adopting the sentence has to be able to guarantee that all the translations remain valid when he fixes.

Demetrius Demetrius 20 Agustus 2010 20 Agustus 2010 10.49.29 UTC flag Report link Permalink

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

{{vm.hiddenReplies[2183] ? 'expand_more' : 'expand_less'}} sembunyikan balasan tampilkan balasan
sysko sysko 20 Agustus 2010 20 Agustus 2010 10.53.08 UTC flag Report link Permalink

+1

Demetrius Demetrius 20 Agustus 2010 20 Agustus 2010 10.54.29 UTC flag Report link Permalink

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.