menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Alexs Alexs June 25, 2020 June 25, 2020 at 11:19:17 AM UTC link Permalink

Hi everyone,

As a Kotoeba participant, I am currently thinking of a way to structure tags. The question has already been raised here https://github.com/Tatoeba/tatoeba2/issues/333, but I would like to hear more about our needs and expectations.

Tags are a highly valuable feature of Tatoeba, provided that one can easily scan through them. It is currently hard to explore them, because we cannot see all we can do at a glance, hence the idea to organize tags :)

* I was thinking to organize tags hierarchically, such that each tag can have a parent, that is itself another tag. For example if "animals" is a tag, "cat" could be one of its children. This would enable to build trees as deep as we want, but I wonder whether we need it.
--> Do we need several levels of depth or is one level enough?

* As for the parent tags (super-tags), CK has already done a titanic work on classifying tags into categories http://tatoeba.ueuo.com/display_all_tags.html, that can be summarized as follows: language variants, grammar, topics, idioms, register, meta-information (length/quality), pronunciation, source (by ...). Obviously this list does not have to be decided now, we will be able to move tags across categories, but I think coming up with a few ideas can help answer the first question.
--> What tag categories do we need? How does it help answer the first question?

* Finally, there are duplicate tags, some because of translations, other because of different naming conventions. I do believe organizing tags is a first step to merging duplicates.

Thank you in advance for your feedback!

{{vm.hiddenReplies[35555] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir June 25, 2020 June 25, 2020 at 2:12:17 PM UTC link Permalink

Minusta on tärkeämpää mahdollistaa tunnisteiden julistaminen synonyymeiksi, tai peräti käännöksiksi.

Ontologian rakentamisessa on se ongelma, että kun Tatoeban tunnisteita aletaan lopulta monikielistämään, täytyy se tehdä koko ontologialle. Ei ole mitenkään selvää, että varsinkaan monikerroksinen ontologia kääntyisi sujuvasti kaikille kielille. Jos haluaa rakentaa jonkinlaisen luokittelujärjestelmän, niin mieluummin matalan kuin syvän.

Itse kokisin seuraavat mahdollisuudet jo varsin riittäviksi:

1. Julista kaksi tunnistetta synonyymeiksi. Tällöin kumpikin tunniste jäisi näkymään lauseisiin, joilla ne on, mutta jos etsisi kumpaa tahansa tunnistetta, löytäisi molemmilla merkityt lauseet. Tämä pitäisi toteuttaa mielivaltaiselle määrälle tunnisteita, ei vain kahdelle.

2. Poista tunniste ja ohjaa kaikki sen lauseet toiseen tunnisteeseen. Esimerkiksi animal -> animals tai toisin päin. Tällöin poistettavaksi julistettava tunniste korvattaisiin paremmalla kaikissa lauseissa missä se on, ja aina jos joku kirjoittaisi poistetun tunnisteen lauseeseen, se korvautuisi paremmalla.

Kirjoitin aiemmin aiheeseen liittyen englanniksi: https://tatoeba.org/spa/wall/sh...#message_33126

{{vm.hiddenReplies[35556] ? 'expand_more' : 'expand_less'}} hide replies show replies
Alexs Alexs June 28, 2020 June 28, 2020 at 3:18:10 PM UTC link Permalink

Thank you for your interesting feedback ! I believe your idea amounts to creating a one-layer tree and not giving names to the "supertags", which indeed removes the need to translate these "supertags".

{{vm.hiddenReplies[35568] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir June 29, 2020 June 29, 2020 at 1:58:36 PM UTC link Permalink

Kyllä.

Jos käy ilmi, että ihmiset kaipaavat laajempia tunnisteita tai hierarkiaa niille, voi kai sellaisen askarrella, mutta tunnisteiden yhdistäminen ja käsittely suurempi kokonaisuuksina on jotain välittömästi hyödyllistä ja käyttökelpoista.

Esimerkiksi Wikipediassa on laaja tunnistehierarkia, kun taas toisaalta Stack exchange -sivustoilla sitä ei ole, joten molempia lähestymistapoja näkee. Tämä projekti saattaa olla lähempänä wikiä kuin kysy-ja-vastaa -sivustoa, mutta ehkä suuren tunnistehierarkien rakentaminen ja ylläpito ei kuitenkaan ole sivuston oleellisinta antia.

maaster maaster June 26, 2020 June 26, 2020 at 6:35:56 PM UTC link Permalink

As for me, I added the tag "colloquial spelling" (and perhaps also the "colloquial".
(We two Hungarians can't agree in this matter.)

{{vm.hiddenReplies[35561] ? 'expand_more' : 'expand_less'}} hide replies show replies
Alexs Alexs June 28, 2020 June 28, 2020 at 3:19:53 PM UTC link Permalink

Thank you for pointing that out ! This shows that tags are somewhat subjective, and I guess clustering tags as Thanuir suggested would allow to group these two tags :)

maaster maaster June 29, 2020 June 29, 2020 at 6:01:49 PM UTC link Permalink

It's not really about tags.
It's about that the sentence must be changed or not.

Ricardo14 Ricardo14 June 29, 2020 June 29, 2020 at 3:41:48 AM UTC link Permalink

Thanks a lot for working on that, Alexs! Tags are really for both language learners and translators.

I'd like to point out the following: Some tags are only related to one language (sometimes to a specific "dialect" from a language). That said, it'd be good if certain tags can are used in a particular language.

Some examples:

English - past simple, past continuous, present continuous, phrasal verb.
Portuguese - Brazilian Portuguese, presente do indicativo, Brazilian Spelling (?).
Spanish - Mexican Spanish, Chilean Spanish, voseo.