Þræður #16801— Tatoeba

Valmynd

AlanF_US 31. maí 2013 31. maí 2013 kl. 23:54:55 UTC

flag

Report

link

Tengill

(1) Is there a way to search (without downloading the corpus) for sentences of a given language that have a "@needs native check" tag but do not have an "OK" tag?

(2) Is there an easy way for corpus maintainers to search for sentences that have both a "@needs native check" and an "OK" tag so that they can remove both tags (assuming the sentence really is OK now)? Is this something that's generally done?

fela svör sýna svör

marcelostockle 1. júní 2013 1. júní 2013 kl. 00:18:29 UTC

flag

Report

link

Tengill

(2): I do it once in a while.
I load "tags.csv" on Excel, filter all the OK entries and all the @nnc separately, and import both index lists to matlab. There:
> arrayOK = false(1, 3000000);
> arrayOK(filterOK) = true;
> arrayNNC = false(1, 3000000);
> arrayNNC(filterNNC) = true;
>
> seq = 1:3000000;
> seq = seq(filterNNC & filterOK);

and that's it ^^
If you want, I can put a link to a results file here on the Wall tomorrow.

alexmarcelo 1. júní 2013 1. júní 2013 kl. 00:23:34 UTC

flag

Report

link

Tengill

> so that they can remove both tags
Why would we want to remove an "OK" tag?

al_ex_an_der 1. júní 2013 1. júní 2013 kl. 00:34:17 UTC

flag

Report

link

Tengill

> remove both tags

"OK" tags should never be removed.
Their most important function ist to say: Even if the author isn't a native speaker, the sentence has been checked by a native speaker and therefore you can trust it as if it were owned by a native speaker.

Of course there are a lot of other sentences tagged with "OK" too, but it's just a sentence owned by a non-native speaker where this tag makes the really great difference in the eyes of the user.

Of course, if a sentence has both "@needs native check" and "OK", than "@needs native check" makes no sence any longer and should be removed. Mostly this is done right away when the sentence has been checked and tagged "OK".

fela svör sýna svör

alexmarcelo 1. júní 2013 1. júní 2013 kl. 00:35:51 UTC

flag

Report

link

Tengill

AlanF_US 2. júní 2013 2. júní 2013 kl. 21:23:34 UTC

flag

Report

link

Tengill

I see. Thanks for the explanation.

marcelostockle 2. júní 2013 2. júní 2013 kl. 07:49:11 UTC

flag

Report

link

Tengill

Here's the list I talked about. There were only 22 results:

20337
23054
59378
65213
71100
145969
320988
326866
450824
1065935
1443415
1523131
1763683
2069475
2120582
2180411
2180700
2220721
2220744
2223944
2265117
2468564

you can verify both tags until a maintainer removes the respective @nnc

fela svör sýna svör

AlanF_US 2. júní 2013 2. júní 2013 kl. 21:24:35 UTC

flag

Report

link

Tengill

Thanks!

Valmynd

Vantar þér aðstoð?

Forritarar

Um