Alexs Alexs June 25, 2020 June 25, 2020 at 11:19:17 AM UTC link Permalink

Hi everyone,

As a Kotoeba participant, I am currently thinking of a way to structure tags. The question has already been raised here, but I would like to hear more about our needs and expectations.

Tags are a highly valuable feature of Tatoeba, provided that one can easily scan through them. It is currently hard to explore them, because we cannot see all we can do at a glance, hence the idea to organize tags :)

* I was thinking to organize tags hierarchically, such that each tag can have a parent, that is itself another tag. For example if "animals" is a tag, "cat" could be one of its children. This would enable to build trees as deep as we want, but I wonder whether we need it.
--> Do we need several levels of depth or is one level enough?

* As for the parent tags (super-tags), CK has already done a titanic work on classifying tags into categories, that can be summarized as follows: language variants, grammar, topics, idioms, register, meta-information (length/quality), pronunciation, source (by ...). Obviously this list does not have to be decided now, we will be able to move tags across categories, but I think coming up with a few ideas can help answer the first question.
--> What tag categories do we need? How does it help answer the first question?

* Finally, there are duplicate tags, some because of translations, other because of different naming conventions. I do believe organizing tags is a first step to merging duplicates.

Thank you in advance for your feedback!

Thanuir Thanuir June 25, 2020 June 25, 2020 at 2:12:17 PM UTC link Permalink

Minusta on tärkeämpää mahdollistaa tunnisteiden julistaminen synonyymeiksi, tai peräti käännöksiksi.

Ontologian rakentamisessa on se ongelma, että kun Tatoeban tunnisteita aletaan lopulta monikielistämään, täytyy se tehdä koko ontologialle. Ei ole mitenkään selvää, että varsinkaan monikerroksinen ontologia kääntyisi sujuvasti kaikille kielille. Jos haluaa rakentaa jonkinlaisen luokittelujärjestelmän, niin mieluummin matalan kuin syvän.

Itse kokisin seuraavat mahdollisuudet jo varsin riittäviksi:

1. Julista kaksi tunnistetta synonyymeiksi. Tällöin kumpikin tunniste jäisi näkymään lauseisiin, joilla ne on, mutta jos etsisi kumpaa tahansa tunnistetta, löytäisi molemmilla merkityt lauseet. Tämä pitäisi toteuttaa mielivaltaiselle määrälle tunnisteita, ei vain kahdelle.

2. Poista tunniste ja ohjaa kaikki sen lauseet toiseen tunnisteeseen. Esimerkiksi animal -> animals tai toisin päin. Tällöin poistettavaksi julistettava tunniste korvattaisiin paremmalla kaikissa lauseissa missä se on, ja aina jos joku kirjoittaisi poistetun tunnisteen lauseeseen, se korvautuisi paremmalla.

Kirjoitin aiemmin aiheeseen liittyen englanniksi:

Alexs Alexs June 28, 2020 June 28, 2020 at 3:18:10 PM UTC link Permalink

Thank you for your interesting feedback ! I believe your idea amounts to creating a one-layer tree and not giving names to the "supertags", which indeed removes the need to translate these "supertags".

Thanuir Thanuir June 29, 2020 June 29, 2020 at 1:58:36 PM UTC link Permalink


Jos käy ilmi, että ihmiset kaipaavat laajempia tunnisteita tai hierarkiaa niille, voi kai sellaisen askarrella, mutta tunnisteiden yhdistäminen ja käsittely suurempi kokonaisuuksina on jotain välittömästi hyödyllistä ja käyttökelpoista.

Esimerkiksi Wikipediassa on laaja tunnistehierarkia, kun taas toisaalta Stack exchange -sivustoilla sitä ei ole, joten molempia lähestymistapoja näkee. Tämä projekti saattaa olla lähempänä wikiä kuin kysy-ja-vastaa -sivustoa, mutta ehkä suuren tunnistehierarkien rakentaminen ja ylläpito ei kuitenkaan ole sivuston oleellisinta antia.

maaster maaster June 26, 2020 June 26, 2020 at 6:35:56 PM UTC link Permalink

As for me, I added the tag "colloquial spelling" (and perhaps also the "colloquial".
(We two Hungarians can't agree in this matter.)

Alexs Alexs June 28, 2020 June 28, 2020 at 3:19:53 PM UTC link Permalink

Thank you for pointing that out ! This shows that tags are somewhat subjective, and I guess clustering tags as Thanuir suggested would allow to group these two tags :)

maaster maaster June 29, 2020 June 29, 2020 at 6:01:49 PM UTC link Permalink

It's not really about tags.
It's about that the sentence must be changed or not.

Ricardo14 Ricardo14 June 29, 2020 June 29, 2020 at 3:41:48 AM UTC link Permalink

Thanks a lot for working on that, Alexs! Tags are really for both language learners and translators.

I'd like to point out the following: Some tags are only related to one language (sometimes to a specific "dialect" from a language). That said, it'd be good if certain tags can are used in a particular language.

Some examples:

English - past simple, past continuous, present continuous, phrasal verb.
Portuguese - Brazilian Portuguese, presente do indicativo, Brazilian Spelling (?).
Spanish - Mexican Spanish, Chilean Spanish, voseo.