Üzenőfal (7 362 bejegyzés)
Tippek
Mielőtt bármi kérdésed lenne, kérlek, olvasd el a GYIK oldalt.
Célunk az barátságos légkör fenntartása a civilizált beszélgetések érdekében. Kérünk, olvasd el a magatartási szabályzatot.
AlanF_US
37 perccel ezelőtt
AlanF_US
egy órával ezelőtt
LeviHighway
7 órával ezelőtt
kumakyoo
7 órával ezelőtt
PaulP
9 órával ezelőtt
ssvb
tegnap
AlanF_US
tegnap
Vortarulo
tegnap
AlanF_US
2 nappal ezelőtt
marafon
2 nappal ezelőtt
What is the regexp to search for sentences (Turkish to English) containing " ten" (with a leading space)?
xxxxxx ten xxxxxxx ✅
xxxxxx'ten xxxxxxx 👎🏼
xxxxxxten xxxxxxxx 👎🏼
I assume you're asking about an expression accepted by Tatoeba's integrated search function, which is similar but not identical to a regular expression ("regexp").
I don't believe there is a way to write an expression that finds sentences where "ten" is a standalone word but excludes sentences containing a word that ends with "ten" preceded by an apostrophe. The reason is that the tokenization and search split words not only at word boundaries indicated by spaces, but also at punctuation marks (including apostrophe), which are then discarded. One hacky way to get what you want would be to search for "=ten" and then use the browser's search function with "Whole Words" enabled to search through the results.
To get the full functionality you're looking for, I think you'd have to download the sentences you want and then use a tool (such as a text editor's search function with regular expression search enabled) that gives you this level of control.
Is it acceptable for a member who is not fluent in a language (but understands it to a good extend) to use Google Translate (or similar) to translate sentences into that language?
For instance, could someone with an A2 knowledge of Polish use GT to translate sentences from English to Polish, proof-reading them for obvious mistakes, and then putting them into the corpus?
I think machine translation is acceptable when you can assure its quality. Also, not to mention there're a lot of native speakers write bad translations, and those are not even as good as machine translations.
En näe tämän tuottavan merkittävää lisäarvoa tietokannalle. Suosittelisin mieluummin kääntämään puolasta omalle äidinkielelle tai muuten riittävän vahvalle kielelle, että voi mennä takuuseen omista lauseistaan, ja linkittämään ymmärtämiään puolalaisia lauseita muunkielisiin, jotka myös ymmärtää.
I think it is acceptable. Translators (such as GT or DeepL) nowadays are pretty good. Also, it is not such a problem if that person translates simple sentences or translates into language that is pretty similar to their well-known languages.
I would rather see people contributing in their own native languages.
Machine translation with AI has become much better than it used to be, so I can understand the temptation to use it.
Using AI to translate from a foreign language you know into your own native language might help you think of wordings you might not have otherwise thought of.
If you, as a native speaker, verify that it sounds natural, and if you know the source language well enough to verify that the meaning is the same, I think AI can be a useful tool.
Verifying that the meaning is the same is highly problematic. That's why I started translating sentences with the "by Arthur Conan Doyle" tag and submitted by a native English speaker. The best part about them is that I can look up context (the text before and after them) by searching them on the Internet. This is much better than many of the other short sentences on Tatoeba that are ambiguous regarding the gender of the speaker or other things.
> I started translating sentences with the "by Arthur Conan Doyle" tag
Oh, and to make things perfect, I would like to have CC0 license for both the English sentences and my translations. The works of Arthur Conan Doyle are public domain now. Is this possible?
I agree.
Contributing in a language that is not your strongest
https://en.wiki.tatoeba.org/art...ow/non-native#
Thanks for that link, @marafon. I also encourage people to consult the Rules and Guidelines ( https://en.wiki.tatoeba.org/art...ow/guidelines# ). To find it in the future, go to the bottom of any Tatoeba page and click on "Tatoeba Wiki". The first section on that page contains a link to the "Rules and Guidelines" page.
Tatoeba's mission is to serve as a source of high-quality sentences and high-quality translations. This is ensured by having them written, verified, and owned by humans who know the languages well enough to avoid introducing any mistakes, including subtle ones, not just the obvious ones mentioned by @Vortarulo.
If Tatoeba starts also acting as a consumer of sentences that do not come directly, transparently, and legally from humans, we run the very real risk of generating a cycle in which we pass these poor-quality or misappropriated sentences on to the people who get them from us.
Thanks for your answers, everyone, especially the link to the Guidelines, AlanF_US. I had looked for them but didn't find them. This also goes with my impression.
The question wasn't about me, by the way, but about a user who contributes quite a lot here, but apparently with mostly(?) GT-translated sentences. Perhaps I should point them to the Guidelines.
How do you know that the translations are from Google Translate?
> How do you know that the translations are from Google Translate?
It's not a rocket science. Sometimes a translation is obviously wrong. And when the original sentence is fed into Google Translate, the bad translation precisely matches the Google Translate output.
> Sometimes a translation is obviously wrong. And when the original sentence is fed into Google Translate, the bad translation precisely matches the Google Translate output.
Yes, that's how I detect AI translations too.
And secondly, when a user adds 10 or more translations in different languages, it's hard to believe that they speak so many languages.
Just as a sidenote: I recently translated a sentence containing "Tatoeba" with DeepL and in the translation "Tatoeba" was replaced by "wiktionary".
First of all, if you see that a user is contributing a bad translation, then regardless of what you think the source of that translation is, you should be leaving a comment on the translation and tagging it for action (for instance, "@change") if you can. If you see that the user habitually contributes bad translations, and contacting the user is either not feasible or has no effect, you should send a private message or an email to a corpus maintainer, an admin, or the admin team.
There are several ways to come to the conclusion that someone is using AI. One is that the person says so outright. Others are along the lines of what @ssvb and @PaulP have mentioned, namely that a bad translation happens to match, say, Google Translate's output. While this is not proof, if the evidence is pretty strong (for instance, if there are a lot of similar incidences), it's also worth mentioning to the user and/or an administrator.
I urge people to read the Rules and Guidelines in general. I've submitted an issue to request that a link to the document be added to the footer on each Tatoeba page to make it easier to find.
I have always got this idea for Tatoeba but I understand how hard it is to achieve or it might doesn't fit the purpose of Tatoeba. But here I want to share it:
I wish that there is a "dictionary" function for Tatoeba. It doesn't mean you have to add definitions or etymology to words, but it should be a translation dictionary.
Here's how I think it should work:
When you search for "Chinese" via this dictionary function, you might get a few sentences like this:
I am Chinese. - 我是中國人。
I speak Chinese. - 我說中文。
I wish the contributors can tag how the word "Chinese" is translated into the target language. for example, you can mark:
I am [Chinese]. - 我是[中國人]。
I speak [Chinese]. - 我說[中文]。
This way, we can get a statistic how the word "Chinese" is translated into the target language. For example, when you look up "Chinese", it can tell you that "Chinese" is most usually translated into "中國人", and then followed by "中文".
I think Tatoeba is the best place to do this. Wiktionary is too complicated, and it can never provide as many sentences as Tatoeba.
Ennek az üzenetnek a tartalma szabályellenes, ezért rejtve maradt. Csak adminok és az üzenet szerzője láthatja.
Ennek az üzenetnek a tartalma szabályellenes, ezért rejtve maradt. Csak adminok és az üzenet szerzője láthatja.
Ennek az üzenetnek a tartalma szabályellenes, ezért rejtve maradt. Csak adminok és az üzenet szerzője láthatja.
How can I easily watch my old sentences, e.g. page 2000 ?
https://tatoeba.org/de/sentence...ster?page=2000
You can change the number in the url.
I had a discussion at #13684027 , and found some problem that is universal to the corpus, so I want to discuss here.
An inactive non-native speaker wrote this sentence #12888798: "Do you want a electric gaming machine?"
I personally think that sentence is odd, not used, but it's completely grammatically correct. But we cannot infer that if the author of the sentence meant "consoles" or "video games". How should we deal with it? Should we keep it as "Do you want a electric gaming machine?", or should we change it into "consoles" or "video games", or should we delete it?
Jos lause on oikein, vaikka kenties epätodennäköinen, saa se yleensä jäädä.
Jos sillä on useita mahdollisia käännöksiä, voit kääntää sen millä tahansa tai usealla tavalla, tai tietenkin jättää kääntämättä.
Yksi vaihtoehto, joka tulee mieleen, on että sen kääntää nimenomaan sähköiseksi pelilaitteeksi, joka lienee samalla tavalla outo käännöskielessä kuin alkuperäinenkin lause on. Tällöin molemmat lauseet toivottavasti ilmaisisivat, että kirjoittaja tai puhuja ei oikein ole mukana nykyteknologian kehityksessä, tai käyttää tarpeettoman teknistä, tarkkaa ja hankalaa käsitettä, tms.
The best thing to do with a problematic sentence, whether it's incorrect or just odd, is to add a comment to the sentence. This serves a number of purposes:
- it tells the author of the sentence (who may still be active) and other members of the community that there is something questionable about it and that they should consider modifying, unlinking, retranslating, or deleting it
- it serves as a starting point for other members of the community to contribute their insights, which may be valuable
- it warns people who come across the sentence at Tatoeba or other sites that point to it that there is a problem with the sentence and that they shouldn't use it as a model
- it starts the customary two-week grace period after which a corpus maintainer can edit the sentence
You can also add a rating of "unsure" or "not OK", and (if you are an advanced contributor) a tag. The three tags "@change", "@needs native check", and "@check" are checked regularly by corpus maintainers, so they help ensure that the sentence will be reviewed.
The important thing is that you don't have to decide immediately and unilaterally what to do with the sentence. You just have to open a discussion and we can see where things go from there.
As for the general question of whether odd but correct sentences should be changed, the answer is that it depends on many factors, such as whether there is a replacement that is obviously better and whether the sentence is a problematic translation of another sentence.
As a minor point, apart from not being natural, "Do you want a electric gaming machine?" is not correct. The word "a" should be "an".
I know the idea of having discussions before making decisions. But I want to know to how exactly should we handle it after having discussions or maybe no one even joins the discussion.
It's up to the corpus maintainer's judgment.
From the wiki page: https://en.wiki.tatoeba.org/art...us-maintainers
You may change a sentence after two weeks if there has been no adverse response.
Ennek az üzenetnek a tartalma szabályellenes, ezért rejtve maradt. Csak adminok és az üzenet szerzője láthatja.
Tatoeba m'indique dans un premier temps que la phrase "N'interrompez jamais un ennemi qui est en train de faire une erreur." (citation de Napoléon Bonaparte) n'existe pas encore dans son corpus mais quand je veux l'introduire, il prétend que ma phrase ne peut être ajouté au prétexte qu'elle existe déjà (introduite par Voltaire (robot du Projet Voltaire).
Que faire parce qu"on n'a pas accès à cette phrase ?
Lause on tässä, niin muiden ei tarvitse etsiä: https://tatoeba.org/eo/sentences/show/7709519
Das könnte ein Vorschlag für eine Anpassung sein:
Wenn ein Satz eingegeben wird, der schon existiert, wird dieser Satz im Regelfall abgelehnt.
Wenn aber der existierende Satz "rot" ist, könnte er durch den neuen ersetzt werden.
Das wäre eine einfache Möglichkeit, rote aber korrekte Sätze zu "entsperren".
Man müsste sich überlegen, ob diese Vorgehensweise die Rechte des gesperrten Autors des Satzes verletzt. Wahrscheinlich nicht. Man könnte auch nur den Autor austauschen und das entsprechend in den Logs dokumentieren.
Realistically speaking, in this particular situation, if you think it's really important that the corpus have this specific sentence in a version that is not marked red, your best option is to add a slightly modified version such as:
Comme Napoléon Bonaparte a dit, "N'interrompez jamais un ennemi qui est en train de faire une erreur."
Since there are many reasons a sentence can be marked red (copyright issues related to the specific sentence, copyright issues relating in general to sentences added by the same author, violations of other Tatoeba standards), and since those reasons are not tracked by the system, it's not feasible for a new sentence that exactly matches an existing red one to automatically be let through.
Tai tietenkin joku henkilö, jolla on valtuudet poistaa lauseelta sen punaisuus, voisi tehdä sen, jos kokee sen vaivan arvoiseksi. Tämä lause on selvästi riittävän vanha, että on jo vapautunut tekijänoikeuksista, eikä myöskään ole muuten virheellinen, kuten monet tuon käyttäjän lauseista, jos nyt oikein muistan syyt niiden punaisuudelle.
Lauseelle voi näköjään lisätä käännöksiä. Lisää käännöksen uutena lauseena ja lisää ranskankielisen lauseen käsin käännöksenä.
Esim. https://tatoeba.org/eo/sentences/show/13844149 .