Member since
February 13, 2022
advanced contributor
All my comments and sentences are in the public domain, assuming they were derived from content in the public domain.

VOA terms of use:

Gutenberg terms of use:

NASA usage guidelines:

NOAA policies:

FDA website policies:

MedlinePlus license:

National Cancer Institute license:

CDC policies:

USGS copyrights:

FBI / Department of Justice copyright status:

DOL / OSHA copyright information:

CIA copyright notice:

House of Representatives terms of use:

National Human Genome Reasearch Institute copyright policy:

Department of Energy web policies:

National Archives copyright and permissions:

Department of Education copyright status notice:

NIH guidance on use:

SEC policies:

GovInfo policies:

White House policies:

National Weather Service disclaimer:

FDIC content and copyright:

Federal Reserve disclaimer:

Fish & Wildlife Service disclaimer:

Senate website policies:

Women's Health policies:

Veteran Affairs copyright policy:

CC licenses:

I do not endorse any of the articles or books from which my sentences were taken. I am in no way affiliated with any of these organizations. Let me know if you do not like any of my sentences. I might replace them.

I do not usually translate other users’ sentences, and I do not expect other users to translate my own. I am also not looking for people to proofread my sentences. Tatoeba has a huge amount of sentences. If you want your sentences to ever be translated, you better ask your friends to do it for you, otherwise no one is ever going to even see them.

I am too busy playing with words in my little corner. I usually only translate English sentences containing words that have not been translated yet into Portuguese.

Does Tatoeba have too much negative content?

I prefer the old design of the sentence page. The new design unnecessarily buries some buttons under menus. Also, it’s not possible to change the language of a sentence and the translations become hidden when editing a sentence. The new design is also less compact on computers; it’s better on phones.

I like to disagree. Sorry about that. You will suffer my opinions. Let’s agree to disagree.

When a sentence is giving too much trouble, it’s best to just find a different one.

I prefer longer sentences, because there is more of a one-to-one relationship between the original sentence and its translation. Shorter sentences tend to have slightly different interpretations in different languages. Also, longer sentences from articles or books tend to be more interesting.

Am I a spammer? I should stop opening issues on GitHub.

You can safely ignore everything I say. Also, my corrections may be wrong. I am just an amateur. If a you disagree with any one of my suggestions, you should just ignore it and remove the “@change” tag. Also, do not bother asking me questions about my corrections because I probably will not be able to answer them.

It wouldn’t bother me if Tatoeba decided one day to arbitrarily exclude certain sentences from the files exported on the Downloads section to make the files more usable in politically-correct learning environments where sentences have to be flawless, as long as users still had the option to download the entire unfiltered corpus. This would be similar to what CK already does anyway.

It’s easy to filter out sentences written by nonnative speakers, or by an arbitrary selection of users, using the data exported on the Downloads page. There’s even a related option on the Search page. Some users and administrators are okay with non-natives writing sentences. Others are against it. Personally, I hope they don’t decide to start barring users from posting sentences in languages they’re not native speakers of.

I think it could be useful to be able to sort the list of all users who speak a given language by date of latest contribution.

It would be nice if I could receive notifications the same way I receive personal messages. That way, I would only need to think about notifications while I am actually using the website. Receiving email notifications in real time is somewhat stressful.

You can’t edit your own sentences after audio has been added to them.

There are over a thousand sentences in Chinese tagged with @change.

Maybe Horus could be modified to make suggestions when it finds certain words in certain languages. It could detect incorrectly capitalized words or suggest translations for names of places.

If Tatoeba ever develops a script to automatically remove spaces around em dashes, they can use it on my English sentences (not my Portuguese sentences because in Portuguese it's more common the other way around). They can also substitute round quotes with straight quotes. In Portuguese, they should substitute hyphens with non-breaking hyphens because of the way words are broken at the end of a line when the word already contains a hyphen. In French, they should substitute spaces with non-breaking spaces before punctuation.

My sentences were taken from articles and books in the public domain. Please do not translate copyrighted sentences.

You can use Wikipedia to find out what the translation for certain names is, especially names of animals or plants. Just remember to use other sources to double-check your results.

TRANG and CK ask that users not link sentences written in the same language. Personally, I thought doing so could be useful for monolinguals who want to translate expressions or sayings into the same language, or useful for linking sentences that have the same meaning, to make it easier to find indirect links that could be made direct.

If you find a sentence that needs to be changed, don’t forget to tag it with “@change” besides leaving a comment, so, in case the user never replies to the message, a corpus maintainer can later find it and decide what to do. It’s better to write “@change” in a tag than in a comment. Use also the review feature.

Maybe the list-of-all-members page should show which users have been active the past 24 hours instead of a list of who made the latest 200 contributions. I’m also curious about who has been most active the past week, month, and year. The rate of contributions keeps growing.

It would be nice if users were automatically assigned different-colored avatars when joining the website. That way it would be easier to distinguish between new users that have not yet uploaded an avatar. Like @brauchinet’s avatar.

In theory, it should be easy to implement a feature that allows excluding multiple users from the search results, but what you can do right now is just create a list of friends and only translate sentences belonging to those users. It should also be simple to make changes to the Advanced Search page to allow only searching for sentences written by non-native speakers when looking for sentences to proofread.

Users want their free translations, so there’s going to be friction, if the sentences in the search results are more likely to belong to some users and not others. The probability that someone will translate your sentences diminishes as Tatoeba grows in size.

When adding tags to a sentence, the autocompletion feature tries to find existing tags with the same prefix. I would be nice to have a similar feature on the Advanced Search page when looking for sentences with certain tags, and also on the Members page when looking for users.

You can’t edit another user’s sentences, but you can add alternate translations.

If you’re going to write a script in Python to calculate the novelty of sentences, you should probably exclude words that are not accepted by spellcheckers because those words are probably untranslatable.

spellchecker = SpellChecker(language = "en")
" ".join(word for word in findall(r"\w+", sentence.lower()) if not spellchecker.unknown([word]))

If you think you have found hate speech, don’t forget to tag it with “hate speech”.

Tatoeba should allow different accounts to use the same email address.

Users who complain about the quality of content on Tatoeba just need more features to be able to sort and filter sentences in the search results.

I think it could be useful to be able to sort sentences by number of translations, where translations into the same language count as a single translation.

Maybe the users’ reviews file, users_sentences.csv, should exclude outdated reviews.

The sentences I’ve been posting on Tatoeba aren’t messages I’m trying to get accross. They’re just the first sentences I come accross that meet certain criteria, such as containing yet-untranslated words, having a certain length, making sense out of context, being easy for me to translate, not being too negative or violent, not criticizing people, companies, or countries, and being in the public domain. I don’t really care about famous books or authors.

Maybe the vocabulary request page should allow sorting words by frequency according to frequency lists available on the internet. Also, it could be useful to be able to download the full list for each language and for corpus maintainers to be able to delete nonwords.

It’s easy to say a translation is bad. It’s not as easy to say how it could be better.

It would be useful if there were a character count when adding or translating sentences.

Using foreign words or untranslatable words in sentences is probably a bad idea. I think I might start avoiding sentences with names.

It seems the textbox for leaving comments on sentences is missing a checkbox for users to choose whether they want their comments to show up on the discussion page.

It’s probably a bad idea to get into arguments on the Wall or the discussion page.

I forgot the password for my previous email account, so I may not have seen your messages.

If people really dislike some of your sentences, they’re not going to want to translate any of your other sentences. You will be canceled.

Some users leave comments on sentences just to send messages to other users on the discussion page, so it’s probably pointless to correct other users’ sentences. The sentences might be messages or the comments might be the messages. It's probably more worthwhile to translate new sentences than to correct minor details in old ones. Text data are meant to be dirty. Tatoeba is more interested in translations than in corrections.

I disagree with @Idbx that contributions need to be “balanced”. Users should be allowed to contribute the way they want to and it’s up to Tatoeba to recommend lists of sentences. Tatoeba is like Twitter but grammatical and with translations. Some groups are interested in being able to communicate a set of messages, not in teaching or learning a language.

I disagree with @maaster that users should be allowed to keep sentences that are incorrect. If a native speaker tells you a sentence is wrong and how to fix it, the sentence needs to be changed.

It’s not possible to delete a review after a sentence has been deleted.

When someone changes the language of a sentence, that isn't registered in that sentence's history.

Tatoeba should include more language learning features.

Tatoeba was down October 14 and 15, 2023.

Tatoeba has a hierarchy of users. From top to bottom, there are “administrators”, “corpus maintainers”, “advanced contributors”, and “contributors”. “Advanced contributors” and above can add tags to sentences. “Corpus maintainers” and above can edit other users’ sentences. It’s written at the top of a user’s profile page what their current rank is.

All the sentences on Tatoeba have an open source license. You are free to make a copy of any of them and edit it. To show that a sentence was derived from another, just make it a translation of the other sentence and then unlink the sentences. The link will appear in the log.

I’ve added too many “to-days” to Tatoeba to be bothering other users about extra or missing hyphens or spaces.

Tatoeba is mostly used to build datasets of translations. The datasets can be used for learning, but on software like Anki or on other websites.

As long as your translations sound natural and are faithful to the original sentences, punctuation shouldn’t matter much.

This policy of write original sentences in your native language and translate sentences from non-native languages has the consequence that almost everyone will be translating from English and almost no one will be translating from other languages.

Tatoeba and the organizations that use Tatoeba’s data probably want sentences to be as neutral as possible, so users don’t feel put off by the sentences. Political messages are probably unadvisable, because messages like that are never able to please all sides on an issue. And, anyway, Tatoeba is for spreading languages, not for solving disputes.

Anything you post on the internet will probably be studied by artificial intelligences.

That annoying paperclip from Microsoft Office might be my cousin.

0x002D - hyphen-minus, dual purpose character
0x00AB « left-pointing double angle quotation mark
0x00B0 ° degree sign
0x00B2 ² superscript two
0x00BB » right-pointing double angle quotation mark
0x00D7 × multiplication sign
0x2010 ‐ hyphen, joins words
0x2011 ‑ non-breaking hyphen, prevents word wrapping
0x2012 ‒ figure dash, used for numbers (phone numbers, sports scores)
0x2013 – en dash, used for number ranges (time, year intervals)
0x2014 — em dash, signals interruptions
0x2015 ― quotation dash, used for dialog
0x2018 ‘ high six quotation mark
0x2019 ’ high nine quotation mark
0x201A ‚ low nine quotation mark
0x201C “ high sixty-six quotation mark
0x201D ” high ninety-nine quotation mark
0x201E „ low ninety-nine quotation mark
0x2082 ₂ subscript two
0x2192 → right-pointing arrow
0x2212 − minus sign


