menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Cangarejo {{ icon }} keyboard_arrow_right

Profile

keyboard_arrow_right

Sentences

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Reviews

keyboard_arrow_right

Lists

keyboard_arrow_right

Favorites

keyboard_arrow_right

Comments

keyboard_arrow_right

Comments on Cangarejo's sentences

keyboard_arrow_right

Wall messages

keyboard_arrow_right

Logs

keyboard_arrow_right

Audio

keyboard_arrow_right

Transcriptions

translate

Translate Cangarejo's sentences

email

Contact Cangarejo

Stats

Comments posted
12,169
Sentences owned
20,789
Audio recordings
0
Sentences favorited
0
Contributions
79,323
Show latest activity

Settings

  • Email notifications are ENABLED.
  • Access to this profile is PUBLIC. All the information can be seen by everyone.
Cangarejo

Cangarejo

Member since
February 13, 2022
advanced contributor
Name
-
Country
-
Birthday
-
Homepage
-
Frequency lists of words with number of translations:
https://github.com/CangarejoAsu...n/translations
Plain frequency lists of words:
https://github.com/CangarejoAsu...ree/main/words
Frequency lists of characters:
https://github.com/CangarejoAsu...ain/characters
Selections of random sentences with a focus on vocabulary diversity:
https://github.com/CangarejoAsu...main/sentences
Sentences potentially with problems:
https://github.com/CangarejoAsu...in/problematic
Potentially duplicated sentences:
https://github.com/CangarejoAsu...e/main/similar
Experimental search app for Tatoeba:
https://github.com/CangarejoAsu...main/search.py

I use the Google query below to find public domain sentences from VOA articles. Articles and image captions by AFP, AP, Reuters, RFA, RFE, and some other news companies are not in the public domain.
https://www.google.com/search?q...euters"+-"RFE"

I use the Google query below to find public domain sentences from Gutenberg books containing given words. Most of the books here, but not all, are in the public domain.
https://www.google.com/search?q...+-filetype:txt

I use the Google queries below to find sentences in the public domain from US governmental sites. Federal content is in the public domain but state content is not. Research papers and other third party material are also not. Read the terms of use.
https://www.google.com/search?q...+intext:"word"
https://www.google.com/search?q...reeis.usda.gov
https://www.google.com/search?q....niehs.nih.gov
https://www.google.com/search?q...emag.state.gov
https://www.google.com/search?q...+|+site:va.gov

I have a GitHub repository with scripts for processing Tatoeba dump files.
https://github.com/CangarejoAsul/tatoeba-tools

Dump files and sentence pairs can be downloaded here:
https://tatoeba.org/downloads
http://www.manythings.org/anki/
It's also possible to download lists of sentences. Each list has a page with a download button.


Translation dictionaries:
https://www.infopedia.pt/dicion...gles-portugues
https://dictionary.cambridge.or...sh-portuguese/
https://www.collinsdictionary.c...sh-portuguese/
https://pt.bab.la/dicionario/ingles-portugues/

Translation corpora:
https://www.linguee.pt/ingles-portugues/
https://context.reverso.net/tra...les-portugues/
https://www.wordreference.com/enpt/
https://iknow.jp/content/

Machine translators:
https://www.deepl.com/translator
https://translate.google.com/
https://papago.naver.com/

Dictionaries:
https://www.infopedia.pt/
https://dicionario.priberam.org/
https://michaelis.uol.com.br/
https://www.aulete.com.br/site....aulete_digital
https://www.dicio.com.br/
https://www.estraviz.org/
https://www.merriam-webster.com/
https://www.oxfordlearnersdictionaries.com/
https://www.collinsdictionary.com/
https://dictionary.cambridge.org/
https://www.britannica.com/dictionary
https://www.dictionary.com/
https://en.wiktionary.org/
https://dle.rae.es/
https://www.dictionnaire-academie.fr/
https://www.lalanguefrancaise.com/dictionnaire
https://www.larousse.fr/
https://www.cnrtl.fr/definition/
https://dizionari.corriere.it/dizionario_italiano/
https://dizionari.repubblica.it/
https://www.woorden.org/
https://academia.gal/dicionario
https://www.diccionari.cat/
https://www.duden.de/woerterbuch
https://www.weblio.jp/
https://ordbokene.no/

Encyclopedias:
https://en.wikipedia.org/
https://www.britannica.com/
https://www.treccani.it/

Thesauruses:
https://www.sinonimos.com.br/
https://www.wordhippo.com/
https://www.macmillanthesaurus.com/

Etymology dictionaries:
https://www.etymonline.com/
https://www.behindthename.com/
https://www.arcanum.com/hu/onli...-szotar-F14D3/

Frequency dictionary:
https://books.google.com/ngrams/

Spelling dictionary:
https://woordenlijst.org/

Corpora:
https://www.corpusdoportugues.org/
https://www.english-corpora.org/
https://www.corpusdelespanol.org/

Frequency lists:
https://en.wiktionary.org/wiki/...requency_lists

Writing assistants:
https://www.deepl.com/write
https://languagetool.org/

Text tools:
https://unicode.scarfboy.com/
https://text-compare.com/

Random name generator:
https://www.behindthename.com/random/

Other:
https://helyesiras.mta.hu/

VOA journalists:
Ade Astuti https://www.voanews.com/author/ade-astuti/-vkq_
Art Chimes https://www.voanews.com/author/art-chimes/jyqqo
Brian Allen https://www.voanews.com/author/brian-allen/m-vmi
Bruce Alpert https://www.voanews.com/author/bruce-alpert/oy__v
Deana Mitchell https://www.voanews.com/author/...mitchell/k_t_o
Deborah Block https://www.voanews.com/author/deborah-block/k-qqp
Erika Celeste https://www.voanews.com/author/erika-celeste/pviqv
Faith Lapidus https://www.voanews.com/author/faith-lapidus/ruiqq
Faiza Elmasry https://www.voanews.com/author/faiza-elmasry/-roqp
George Putic https://www.voanews.com/author/george-putic/gpkmy
Hannah McNeish https://www.voanews.com/author/...-mcneish/v_oqp
Jeff Lunden https://www.voanews.com/author/jeff-lunden/y_qmo
Jeff Swicord https://www.voanews.com/author/jeff-swicord/gtqqy
Jessica Berman https://www.voanews.com/author/...a-berman/ubqqo
Joe DeCapua https://www.voanews.com/author/joe-decapua/pvoqm
JoEllen McBride https://www.voanews.com/author/...-mcbride/ktm_r
Kim Lewis https://www.voanews.com/author/kim-lewis/t-oqy
Lenny Ruvaga https://www.voanews.com/author/lenny-ruvaga/i_-m_
Marsha James https://www.voanews.com/author/marsha-james/y____
Mary Morningstar https://www.voanews.com/author/...ningstar/gmoqq
Mike O'Sullivan https://www.voanews.com/author/...sullivan/byqqp
Parke Brewer https://www.voanews.com/author/parke-brewer/gkqqi
Ray Kouguell https://www.voanews.com/author/ray-kouguell/igiqt
Rebecca Ward https://www.voanews.com/author/rebecca-ward/-oiqp
Refael Klein https://blogs.voanews.com/scien...author/rklein/
Richard Paul https://www.voanews.com/author/richard-paul/mbkmm
Rick Pantaleo https://www.voanews.com/author/rick-pantaleo/rkjqi
Rick Pantaleo https://blogs.voanews.com/scien...hor/rpantaleo/
Rosanne Skirble https://www.voanews.com/author/...-skirble/qgqqv
Shelley Schlender https://www.voanews.com/author/...chlender/uioqo
Steve Baragona https://www.voanews.com/author/...baragona/vbqqr
Suzanne Presto https://www.voanews.com/author/...e-presto/vkqqv
Ted Landphair https://www.voanews.com/author/ted-landphair/itoqq
Tom Banse https://www.voanews.com/author/tom-banse/mjqqr
Vidushi Sinha https://www.voanews.com/author/vidushi-sinha/quqqi


Lists of undertranslated words:
https://tatominer.netlify.app/

Contribution statistics:
https://tatolead.netlify.app/

User, language, and sentence statistics:
https://tatoeba.j-langtools.com/allstats/

Tatoeba’s Twitter page:
https://twitter.com/tatoeba_org

Tatoeba’s blog:
https://blog.tatoeba.org/

What’s new on Tatoeba:
https://github.com/Tatoeba/tato...ue+is%3Aclosed

Sentences rated “not okay”:
https://tatoeba.org/sentences_lists/show/170380

Sentences rated “unsure”:
https://tatoeba.org/sentences_lists/show/170383

Sentences tagged with “@change”:
https://tatoeba.org/sentences/s...tags=%40change

Sentences tagged with “@check”:
https://tatoeba.org/sentences/s...&tags=%40check

Sentences tagged with “@check translation”:
https://tatoeba.org/sentences/s...ck+translation

Sentences tagged with “@change flag”:
https://tatoeba.org/sentences/s...%40change+flag


How do I get started?
https://en.wiki.tatoeba.org/art...ow/quick-start

Why are some sentences in red?
https://en.wiki.tatoeba.org/art...tences-in-red?

How do I get less repetitive sentences?
https://tatoeba.org/wall/show_message/40320

What kind of sentences are allowed?
https://tatoeba.org/wall/show_message/40295

Should traditional and simplified Chinese be separated?
https://tatoeba.org/wall/show_message/39947

Can I delete one of my sentences?
https://tatoeba.org/wall/show_message/39921

Does Tatoeba have too many simple, repetitive sentences?
https://tatoeba.org/wall/show_message/39804

How do I change my email address?
https://tatoeba.org/wall/show_message/39796

How do I transfer sentences from one account to another?
https://tatoeba.org/wall/show_message/39669

How do I search for exact words?
https://tatoeba.org/wall/show_message/39648

Can I mass-tag sentences?
https://tatoeba.org/wall/show_message/39603

Can I post sentences in a language I’m not a native speaker of?
https://tatoeba.org/wall/show_message/39580

How do I unlink sentences?
https://tatoeba.org/wall/show_message/39561

Can I sort sentences by difficulty?
https://tatoeba.org/wall/show_message/39500

How do I link sentences?
https://tatoeba.org/wall/show_message/39494

Can I upload a very large quantity of sentences?
https://tatoeba.org/wall/show_message/39434

How do I add audio to sentences?
https://tatoeba.org/wall/show_message/39316

Is it possible to search for sentences with a specific length?
https://tatoeba.org/wall/show_message/39305

How do I download sentences belonging to a particular user?
https://tatoeba.org/wall/show_message/39258

Can I get a list of all the sentences that haven’t been reviewed yet?
https://tatoeba.org/wall/show_message/38996

Is Tatoeba losing users?
https://tatoeba.org/wall/show_message/38155
https://tatoeba.org/wall/show_message/39765
https://tatoeba.org/wall/show_message/39323
https://tatoeba.org/wall/show_message/38883

Can I look at the most recent translations of my sentences?
https://tatoeba.org/wall/show_message/38809

Can I get a list of all the sentences that haven’t been translated yet?
https://tatoeba.org/wall/show_message/38802

Do indirect translations need to match the original sentence?
https://tatoeba.org/wall/show_message/38664

Do other projects misuse the sentences on Tatoeba?
https://tatoeba.org/wall/show_message/38570

Should there be restrictions on which names are allowed?
https://tatoeba.org/wall/show_message/32136

What is considered a sentence?
https://blog.tatoeba.org/2010/0...f-content.html
https://en.wikipedia.org/wiki/Sentence_word
https://learnenglishwithdemi.wo...onal-deletion/

Where can I find a list of current corpus maintainers?
https://tatoeba.org/user/profile/Pfirsichbaeumchen

Has the Tatoeba corpus been poisoned?
https://www.google.com/search?q=data+poisoning

Some public-domain translations by the US Department of State:
https://www.state.gov/translations/


Lay or lie?
https://dictionary.cambridge.or...mar/lay-or-lie

Reason why or reason that?
https://dictionary.cambridge.or...grammar/reason
https://www.merriam-webster.com...ause-redundant

Segway or segue?
https://www.merriam-webster.com...monly-confused

Better than I or better than me?
https://www.merriam-webster.com...ows-it-and-why
https://www.britannica.com/dict...better-than-me

Snuck or sneaked?
https://www.merriam-webster.com...ich-is-correct

Mr or Mr.?
https://www.unr.edu/writing-spe...erican-english

Ingenuity or ingeniosity?
https://www.etymonline.com/word/ingenuity

News is or news are?
https://dictionary.cambridge.or...h-grammar/news

Didn’t used to or didn’t use to?
https://dictionary.cambridge.or...rammar/used-to

Cite, site, or sight?
https://www.merriam-webster.com...nd-sight-usage
https://www.e-education.psu.edu...ts/c3_p19.html

Nevermind or never mind?
https://www.merriam-webster.com...ermind-and-nvm

Bended or bent?
https://grammarist.com/bent-or-bended/
https://www.etymonline.com/word/bended
https://www.merriam-webster.com/dictionary/bended

Auger or augur?
https://www.merriam-webster.com...ger-difference

Comma before “such as”?
https://www.grammarly.com/blog/such-as-comma/

Punctuation inside or outside quotation marks?
https://www.hamilton.edu/academ...-of-quotations
https://www.thesaurus.com/e/gra...otation-marks/
https://owl.purdue.edu/owl/gene...ark_rules.html

Which quotation marks should I use?
https://en.wiki.tatoeba.org/art...riting-dialogs
https://en.wikipedia.org/wiki/Q...#Summary_table
https://sproget.dk/raad-og-regl...-anforselstegn

Which dashes should I use?
https://www.merriam-webster.com...ash-how-to-use
https://www.thepunctuationguide.com/em-dash.html
https://en.wikipedia.org/wiki/Dash
https://www.scribbr.com/language-rules/dashes/
https://english.stackexchange.c...-emdash-or-not

Quotation marks around nicknames?
https://style.mla.org/using-and-styling-nicknames/
https://grammarhow.com/quotation-marks-nicknames/

Spaces before punctuation?
https://de.wikipedia.org/wiki/Plenk
https://bertilow.com/pmeg/skrib...lposignoj.html
https://leconjugueur.lefigaro.f...ypographie.php
https://vitrinelinguistique.oql...x.php?id=22039

A ortografia de Fernando Pessoa:
https://ilcao.com/2012/08/02/fe...ografia-m-c-v/

Que ou do que?
https://ciberduvidas.iscte-iul....ue-e-que/34152
https://www.dicio.com.br/melhor...melhor-do-que/

Há anos atrás?
https://ciberduvidas.iscte-iul....nos-atras/2346
https://www.infopedia.pt/$ha-de...dez-anos-atras
https://www.dicio.com.br/ha-dez...ez-anos-atras/

Me a mim?
https://ciberduvidas.iscte-iul....im-mesmo/31346

Vos a vocês?
https://ciberduvidas.iscte-iul....vos--lhes/2834

Muitas mais coisas ou muito mais coisas?
https://ciberduvidas.iscte-iul....ito-mais-1/944

Ouve-se sons ou ouvem-se sons?
https://ciberduvidas.iscte-iul....se-tiros/11653

Concordância com “maioria”:
https://ciberduvidas.iscte-iul....ioria-de/33836

Povos com inicial maiúscula?
https://ciberduvidas.iscte-iul....de-povos/21898
https://ciberduvidas.iscte-iul....tugueses/14765

Espaços à volta de travessões?
https://ciberduvidas.iscte-iul....ravessao/31251

Pontuação e aspas:
https://ciberduvidas.iscte-iul....as-aspas/34413
https://www.clubedoportugues.co...s-ponto-final/
https://www.tjsc.jus.br/web/ser...-uso-das-aspas

Aspirantes a realizador ou aspirantes a realizadores?
https://ciberduvidas.iscte-iul....izadores/12529

Plural de CD e ONG:
https://ciberduvidas.iscte-iul....act-disc/27776
https://sualingua.com.br/plural...tem-apostrofo/
https://ciberduvidas.iscte-iul....l-de-ong/17269
https://www12.senado.leg.br/man...ao/estilos/ong

Duas milhões ou dois milhões de estrelas?
https://guiadoestudante.abril.c...a-com-milhoes/

¿Puntuación dentro o fuera de comillas?
https://www.fundeu.es/consulta/...to-final-6553/
https://www.lavanguardia.com/cu...-comillas.html
https://www.rae.es/dpd/comillas

¿Prefijos con guiones?
https://www.rae.es/espanol-al-d...rimer-ministro

Citation de vers:
https://www.btb.termiumplus.gc....6&info0=6.12.9
https://vitrinelinguistique.oql...-de-separation
https://www.merci-app.com/regle...is/ponctuation

La punteggiatura in italiano:
https://accademiadellacrusca.it...teggiatura/143


All my comments and sentences are in the public domain, assuming they were derived from content in the public domain.

VOA terms of use:
https://www.voanews.com/p/5338.html

Gutenberg terms of use:
https://www.gutenberg.org/policy/license.html

NASA usage guidelines:
https://www.nasa.gov/multimedia...nes/index.html

NOAA policies:
https://www.fisheries.noaa.gov/...pyright-policy

FDA website policies:
https://www.fda.gov/about-fda/a...bsite-policies

MedlinePlus license:
https://medlineplus.gov/about/using/usingcontent/

National Cancer Institute license:
https://www.cancer.gov/policies/copyright-reuse

CDC policies:
https://www.cdc.gov/other/agencymaterials.html

USGS copyrights:
https://www.usgs.gov/informatio...ts-and-credits

FBI / Department of Justice copyright status:
https://www.fbi.gov/privacy-policy
https://www.justice.gov/legalpolicies#copyright

DOL / OSHA copyright information:
https://www.dol.gov/general/aboutdol/copyright

CIA copyright notice:
https://www.cia.gov/site-policies/

House of Representatives terms of use:
https://www.house.gov/terms-of-use

National Human Genome Reasearch Institute copyright policy:
https://www.genome.gov/about-nh...ance/Copyright

Department of Energy web policies:
https://www.energy.gov/web-policies

National Archives copyright and permissions:
https://www.archives.gov/resear...es/permissions

Department of Education copyright status notice:
https://www2.ed.gov/notices/copyright/index.html

NIH guidance on use:
https://www.nih.gov/institutes-...blic-resources

SEC policies:
https://www.sec.gov/privacy

GovInfo policies:
https://www.govinfo.gov/about/policies#copyright

White House policies:
https://www.whitehouse.gov/copyright/

National Weather Service disclaimer:
https://www.weather.gov/disclaimer

FDIC content and copyright:
https://archive.fdic.gov/Content%20and%20Copyright

Federal Reserve disclaimer:
https://www.federalreserve.gov/disclaimer.htm

Fish & Wildlife Service disclaimer:
https://www.fws.gov/disclaimer

Senate website policies:
https://www.senate.gov/general/privacy.htm

Women's Health policies:
https://www.womenshealth.gov/ab...collaborate-us

Veteran Affairs copyright policy:
https://department.va.gov/copyright-policy/

CC licenses:
https://creativecommons.org/about/cclicenses/


I do not endorse any of the articles or books from which my sentences were taken. I have not read any of these articles and books. I am in no way affiliated with any of these organizations. Let me know if you do not like any of my sentences. I might replace them.

I do not usually translate other users’ sentences, and I do not expect other users to translate my own. I am also not looking for people to proofread my sentences. Tatoeba has a huge amount of sentences. If you want your sentences to ever be translated, you better ask your friends to do it for you, otherwise no one is ever going to even see them.

I am too busy playing with words in my little corner. I usually only translate English sentences containing words that have not been translated yet into Portuguese.

Does Tatoeba have too much negative content?

I prefer the old design of the sentence page. The new design unnecessarily buries some buttons under menus. Also, it’s not possible to change the language of a sentence and the translations become hidden when editing a sentence. The new design is also less compact on computers; it’s better on phones.

I like to disagree. Sorry about that. You will suffer my opinions. Let’s agree to disagree.

When a sentence is giving too much trouble, it’s best to just find a different one.

I prefer longer sentences, because there is more of a one-to-one relationship between the original sentence and its translation. Shorter sentences tend to have slightly different interpretations in different languages. Also, longer sentences from articles or books tend to be more interesting.

Am I a spammer? I should stop opening issues on GitHub.

You can safely ignore everything I say. Also, my corrections may be wrong. I am just an amateur. If a you disagree with any one of my suggestions, you should just ignore it and remove the “@change” tag. Also, do not bother asking me questions about my corrections because I probably will not be able to answer them.

It wouldn’t bother me if Tatoeba decided one day to arbitrarily exclude certain sentences from the files exported on the Downloads section to make the files more usable in politically-correct learning environments where sentences have to be flawless, as long as users still had the option to download the entire unfiltered corpus. This would be similar to what CK already does anyway. http://www.manythings.org/anki/

It’s easy to filter out sentences written by nonnative speakers, or by an arbitrary selection of users, using the data exported on the Downloads page. There’s even a related option on the Search page. Some users and administrators are okay with non-natives writing sentences. Others are against it. Personally, I hope they don’t decide to start barring users from posting sentences in languages they’re not native speakers of.

I think it could be useful to be able to sort the list of all users who speak a given language by date of latest contribution. https://tatoeba.org/users/for_language/por

It would be nice if I could receive notifications the same way I receive personal messages. That way, I would only need to think about notifications while I am actually using the website. Receiving email notifications in real time is somewhat stressful.

You can’t edit your own sentences after audio has been added to them.

There are over a thousand sentences in Chinese tagged with @change.

Maybe Horus could be modified to make suggestions when it finds certain words in certain languages. It could detect incorrectly capitalized words or suggest translations for names of places.

If Tatoeba ever develops a script to automatically remove spaces around em dashes, they can use it on my English sentences (not my Portuguese sentences because in Portuguese it's more common the other way around). They can also substitute round quotes with straight quotes. In Portuguese, they should substitute hyphens with non-breaking hyphens because of the way words are broken at the end of a line when the word already contains a hyphen. In French, they should substitute spaces with non-breaking spaces before punctuation.

My sentences were taken from articles and books in the public domain. Please do not translate copyrighted sentences.

You can use Wikipedia to find out what the translation for certain names is, especially names of animals or plants. Just remember to use other sources to double-check your results.

TRANG and CK ask that users not link sentences written in the same language. Personally, I thought doing so could be useful for monolinguals who want to translate expressions or sayings into the same language, or useful for linking sentences that have the same meaning, to make it easier to find indirect links that could be made direct. https://tatoeba.org/sentences/show/7205661

If you find a sentence that needs to be changed, don’t forget to tag it with “@change” besides leaving a comment, so, in case the user never replies to the message, a corpus maintainer can find it later and decide what to do. It’s better to write “@change” in a tag than in a comment. Use also the review feature. And maybe create private lists of objectionable sentences to show to administrators later.

Maybe the list-of-all-members page should show which users have been active the past 24 hours instead of a list of who made the latest 200 contributions. I’m also curious about who has been most active the past week, month, and year. The rate of contributions keeps growing.

It would be nice if users were automatically assigned different-colored avatars when joining the website. That way it would be easier to distinguish between new users that have not yet uploaded an avatar. Like @brauchinet’s avatar.

In theory, it should be easy to implement a feature that allows excluding multiple users from the search results, but what you can do right now is just create a list of friends and only translate sentences belonging to those users. It should also be simple to make changes to the Advanced Search page to allow only searching for sentences written by non-native speakers when looking for sentences to proofread.

Users want their free translations, so there’s going to be friction, if the sentences in the search results are more likely to belong to some users and not others. The probability that someone will translate your sentences diminishes as Tatoeba grows in size.

When adding tags to a sentence, the autocompletion feature tries to find existing tags with the same prefix. I would be nice to have a similar feature on the Advanced Search page when looking for sentences with certain tags, and also on the Members page when looking for users.

You can’t edit another user’s sentences, but you can add alternate translations.

If you’re going to write a script in Python to calculate the novelty of sentences, you should probably exclude words that are not accepted by spellcheckers because those words are probably untranslatable.

spellchecker = SpellChecker(language = "en")
" ".join(word for word in findall(r"\w+", sentence.lower()) if not spellchecker.unknown([word]))

If you think you have found hate speech, don’t forget to tag it with “hate speech”.

Tatoeba should allow different accounts to use the same email address.

I think it could be useful to be able to sort sentences by number of translations, where translations into the same language count as a single translation.

Maybe the users’ reviews file, users_sentences.csv, should exclude outdated reviews.

The sentences I’ve been posting on Tatoeba aren’t messages I’m trying to get accross. They’re just the first sentences I come accross that meet certain criteria, such as containing yet-untranslated words, having a certain length, making sense out of context, being easy for me to translate, not being too negative or violent, not criticizing people, companies, or countries, and being in the public domain. I don’t really care about famous books or authors.

Maybe the vocabulary request page should allow sorting words by frequency according to frequency lists available on the internet. Also, it could be useful to be able to download the full list for each language and for corpus maintainers to be able to delete nonwords.

It’s easy to say a translation is bad. It’s not as easy to say how it could be better.

It would be useful if there were a character count when adding or translating sentences.

Using foreign words or untranslatable words in sentences is probably a bad idea. I think I might start avoiding sentences with names.

It seems the textbox for leaving comments on sentences is missing a checkbox for users to choose whether they want their comments to show up on the discussion page.

It’s probably a bad idea to get into arguments on the Wall or the discussion page.

I forgot the password for my previous email account, so I may not have seen your messages.

If people really dislike some of your sentences, they’re not going to want to translate any of your other sentences. You will be canceled.

Some users leave comments on sentences just to send messages to other users on the discussion page, so it’s probably pointless to correct other users’ sentences. The sentences might be messages or the comments might be the messages. It's probably more worthwhile to translate new sentences than to correct minor details in old ones. Text data are meant to be dirty. Tatoeba is more interested in translations than in corrections.

I disagree with @Idbx that contributions need to be “balanced”. Users should be allowed to contribute the way they want to and it’s up to Tatoeba to recommend lists of sentences. Tatoeba is like Twitter but grammatical and with translations. Some groups are interested in being able to communicate a set of messages, not in teaching or learning a language.

I disagree with @maaster that users should be allowed to keep sentences that are incorrect. If a native speaker tells you a sentence is wrong and how to fix it, the sentence needs to be changed.

It’s not possible to delete a review after a sentence has been deleted.

When someone changes the language of a sentence, that isn't registered in that sentence's history.

Tatoeba should include more language learning features.
https://tatoeba.org/wall/show_message/40302

Tatoeba was down October 14 and 15, 2023.
https://twitter.com/tatoeba_org...22628767371265

Tatoeba has a hierarchy of users. From top to bottom, there are “administrators”, “corpus maintainers”, “advanced contributors”, and “contributors”. “Advanced contributors” and above can add tags to sentences. “Corpus maintainers” and above can edit other users’ sentences. It’s written at the top of a user’s profile page what their current rank is.

All the sentences on Tatoeba have an open source license. You are free to make a copy of any of them and edit it. To show that a sentence was derived from another, just make it a translation of the other sentence and then unlink the sentences. The link will appear in the log.

I’ve added too many “to-days” to Tatoeba to be bothering other users about extra or missing hyphens or spaces.

Tatoeba is mostly used to build datasets of translations. The datasets can be used for learning, but on software like Anki or on other websites.

As long as your translations sound natural and are faithful to the original sentences, punctuation shouldn’t matter much.

This policy of write original sentences in your native language and translate sentences from non-native languages has the consequence that almost everyone will be translating from English and almost no one will be translating from other languages.

Tatoeba and the organizations that use Tatoeba’s data probably want sentences to be as neutral as possible, so users don’t feel put off by the sentences. Political messages are probably unadvisable, because messages like that are never able to please all sides on an issue. And, anyway, Tatoeba is for spreading languages, not for solving disputes.

Anything you post on the internet will probably be studied by artificial intelligences.

That annoying paperclip from Microsoft Office might be my cousin.

Don’t send me corrections via private message. Post comments publicly so other people can participate and learn.

It would be useful to be able to search for only original sentences.

Email notifications should come with a link that allows users to quickly unsubscribe to notifications.

All sentences posted on Tatoeba have an open-source license. The license only requires that future versions use a compatible license and that all past contributors be credited, but otherwise the sentences don’t “belong” to anyone in particular. Also, Tatoeba is under no legal obligation to host any of the sentences, especially sentences that are problematic. Users should keep personal backups. I’m not a legal expert, but that’s how most open-source licenses work. Of course, Tatoeba needs to take into consideration users’ opinions, needs to not be too imposing, otherwise users might become frustrated and leave. Also, regular folks don’t normally speak with scientific precision. And you can always just add your own alternative translation, if you’re not satisfied with an existing translation.

Tatoeba doesn’t export a full list of previous editors of a sentence. Only the current and original owners get exported, not other contributors or contributors of the sentences from which a translation was derived.

It’s possible to search for sentences that haven’t been translated yet into any language. It would be useful to be able to search for sentences that have at least one translation into any language, sentences that are not untranslated.

Sentences that don’t have any translations yet on Tatoeba aren’t necessarily bad sentences. In fact, they tend to be more authentically idiomatic. Their owners might not be fluent enough to translate them to other languages.

Flag changes should be registered in the log.

Tatoeba does some moderation of the sentences posted but not much. It leaves that task to other websites that use the data. It tries to not impose too much on users so as not to lose them. If you see sentences you don’t like, than don’t translate them. Look for Tatoeba’s positive and constructive side.

Please don’t edit orphaned sentences unless they have a mistake. If there’s an alternative way you would express the sentence, please add a second translation. If you search for English sentences with Italian translations, you’ll find that most sentences have multiple translations. Multiple translations are allowed on Tatoeba.

The links.csv file contains links to sentences that have already be deleted by Horus.


0x002D - hyphen-minus, dual purpose character
0x00AB « left-pointing double angle quotation mark
0x00B0 ° degree sign
0x00B2 ² superscript two
0x00BB » right-pointing double angle quotation mark
0x00D7 × multiplication sign
0x2010 ‐ hyphen, joins words
0x2011 ‑ non-breaking hyphen, prevents word wrapping
0x2012 ‒ figure dash, used for numbers (phone numbers, sports scores)
0x2013 – en dash, used for number ranges (time, year intervals)
0x2014 — em dash, signals interruptions
0x2015 ― quotation dash, used in dialog
0x2018 ‘ high six quotation mark
0x2019 ’ high nine quotation mark
0x201A ‚ low nine quotation mark
0x201C “ high sixty-six quotation mark
0x201D ” high ninety-nine quotation mark
0x201E „ low ninety-nine quotation mark
0x2082 ₂ subscript two
0x2192 → right-pointing arrow
0x2212 − minus sign

Languages

No language added.

TIP: Encourage this user to indicate the languages he or she knows.

{{lang.name}}

{{lang.details}}