Wall (6,756 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
11 hours ago
18 hours ago
19 hours ago
22 hours ago
23 hours ago
2 days ago
5 days ago
9 days ago
Hello, it's me Nuel.
I had to change my e-mail address and create a new account.
How can I get my sentences and my user name back?
I'd also like to ask an admin if they can make me "Advanced contributor" again.
I need your help!
The matter is being taken care of.
✨ Sentences with indirect translations and no direct translations ✨
Original thread: https://tatoeba.org/fr/wall/sho...#message_39021
Excerpt from the original message:
> I’ve made lists of sentences in language A having indirect translations but no
> direct translations in language B. I think several people might find these lists
> quite useful.
> For ease of use, the names of all lists follow the same pattern:
> Indirect translations ISO1 → ISO2
After some months of experiment, I modified the content of those lists so that
1. The lists contain only sentences of users native in language A.
2. The lists contain a block of consecutive sentences, taken from the middle of the corpus (so neither too old, nor too recent).
These small modifications allowed me to be much more efficient in "fast-linking".
The first point allows me to spend a very little time on checking the validity of the original sentence.
The second point avoids the "This sentence, again?!" effect, when facing numbered / gendered language or abbreviation (I have been, I've been, etc.). These sentences are often added at the same time (often as a result of translating), so having them displayed together in the search results allows to quickly link translations to all of them, by just copy-pasting translations that already exist.
My personal way of using these lists is the following: https://tatoeba.org/fr/sentence...roved=no&user=
I order the results by "Last created first", and go from the last page to the first page. That way, sentences I ignored do not get in the way after I ignored them.
You can see all the available pairs of languages in my profile: https://tatoeba.org/fr/sentence...s/of_user/Aiji
If you'd like me to add some pair, please let me know.
I hope some people will make good use of them!
🍎 New Audio Contributors
audio - rus - by retr0ra1n
audio - fra- by Adrien_FR
✹✹ Stats & Graphs ✹✹
Tatoeba Stats, Graphs & Charts have been updated:
Either there are no English sentences containing "culturally-rich" or the pseudo-regexp "*-rich" (double quotes included) is able to extract such a text pattern.
There is one sentence that contains the phrase "culturally-rich". You can find it by searching for "culturally rich" (double quotes included):
All punctuation contained in sentences (including hyphens) is thrown away when the sentences are indexed, so there's no way to specifically find sentences that include that punctuation. Furthermore, some punctuation marks have special significance for the search engine, so including punctuation in your search terms can actually prevent you from finding what you're looking for. Punctuation marks with special significance for the search engine are described on the following page:
Even if all punctuation is thrown away before indexing, it would be nice if an escaped hyphen in the search pattern is recognized as a literal hyphen and the returned results further filtered, based on the search pattern ("*\-rich"), to return only those that DO contain a hyphen immediately before "rich".
The fact that the punctuation is thrown away when the words in a sentence are indexed means there's no longer any sign that the punctuation was there. So no search can distinguish between sentences that have punctuation and those that don't. It would be like storing all the content in uppercase when the sentence is indexed, then trying to find a particular lowercase letter. It can't be done.
This is probably a little off the point under discussion, but the form "culturally rich" and similar adverb-adjective combinations should not include a hyphen in English. On the other hand, an adjective-adjective combination like "super-rich" is correctly hyphenated.
As Alan explained it is not possible to find words specifically containing hyphens. However there is a way to tune the search engine to allow that. Anybody is welcome to open an issue on Github to ask for that.
We've done such tuning in the past to allow searching for question marks, because a lot of users were confused by what happens when you search for a question https://github.com/Tatoeba/tatoeba2/pull/2399 However, because hyphen is also a metacharacter that means "exclude sentences containing that word", we'll have to carefully check how such tuning affects the use of hyphen as a metacharacter.
What about the use of '\' as an escape character just like in true regexp?
Yeah Manticore allows that, and apparently the query parser can also guess not to interpret the hyphen as a metacharacter in certain contexts.
My concerns are towards usability rather than feasibility.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
How can I release all my sentences as CC0
Here are instructions along with some info about limitations you may want to consider:
🍎 Tatoeba's Stats Page - 2023-03-04 Top 5, back to 2020-03-04
I've taken screenshots on March 4th for the past few years. I've merged these into one image.
When I encounter a sentence that I consider flawed, but not strictly speaking ungrammatical, should I do something (report it, comment, complain)? Here's an example: The Italian phrase "Mio figlio poteva morire," which means "My son could have died." It is not, however, in the conditional tense, as in "My son could have died if he hadn't been so quick to react." It is, rather, in the imperfect tense, as in "Because he was mortal, my son could have died," which is something no one says, ever, even though it's technically grammatical. To say a specific person is capable of dying means nothing. It's as if a very poor computer algorithm translated the sentence. I come across these from time to time, and I'm trying to understand whether sentences like this are compatible with the Tatoeba mission statement, which includes the goal of making sure the data is "of good quality." So, is a grammatical sentence that means nothing of good quality or not?
I’d say it’s best to leave a comment.
There are many reasons why a sentence can be “less than natural”.
- It could have been added by a non-native.
- Native speakers who aren’t used to translating tend to adhere to closely to the source sentence (often enough tenses in different languages are similar, but there is no one-to-one relationship)
- Native speakers (of different regions) have different ways of speaking.
- “You” are not a native speaker, and your opinion is based on what you learned in school.
In the case you mentioned, the sentence was added by a - the most active Italian - native speaker. Chances are good that he reads your comment and even answers.
Many contributors have left though and won’t see a comment or won’t answer.
In case a sentence is undoubtedly wrong, a corpus maintainer of Italian who happens to see your comment could change it. If the situation is not so unequivocal, rules state that the decision should be left to the owner of the sentence. It there's no decision, it will stay as it is.
At least other users of Tataoba will see your comment and question or further analyse the sentence.
There is another way to communicate your opinion on the sentence: the rating system: ✓ ? !
It means: I (as a native speaker) think this sentence is ok / doubtful, I wouldn’t use it myself / wrong.
I agree with brauchinet's analysis.
Regarding the specific sentence you mentioned: Although I'm not an expert in Italian, I can imagine situations in which one would use the imperfect "poteva" in conjunction with "morire". How about an Italian parent having just talked about how their son was in active combat for years, then going on to say that during that time, death was always a possibility?
> How about an Italian parent having just talked about how their son was in active combat for years, then going on to say that during that time, death was always a possibility?
As an Italian speaker, I'd use this tense for this context, and also other contexts where someone has got some risks (diseases, accidents, extreme sports, and so on)
This reminds me of a sentence I once posted.
#11005383 I was told right at the beginning that running a solar cooker company and making it feasible and financially stable was impossible.
Normally, a sentence like this would end with “... would be impossible.” It’s much more common for Portuguese speakers to say “was” instead of ”would be”. I imagine that’s also true for Italian speakers. It’s grammatically correct in Portuguese, but maybe not so in English.
Using "was" instead of "would be" here is not grammatically incorrect in English, as long as you think of "running a solar cooker company and making it feasible and financially stable" as a single action. Otherwise, you'd need to say "were" instead of "was" (or reword things a little: "I was told right at the beginning that running a feasible, financially stable solar cooker company was impossible"). However, I think that English speakers may be more likely to use the conditional rather than the indicative when discussing a situation like this (which was hypothetical at the time that the discussion took place).
After a lot of work I finally managed to release 39 Anki flash card frequency decks based on Tatoeba sentences. They don't only contain the sentences, but also audio for each sentence (taken from Tatoeba or high quality text-to-speech) and individual word translations. For most languages they contain 9000 cards (except where Tatoeba had less sentences.)
Code, screenshots and deck downloads are here: https://github.com/Vuizur/tatoeba-to-anki
I recently used one of the decks to learn Czech, and it was much more effective in my opinion than for example Duolingo. Most of all I recommend those decks to language learners who know the script of their language, but still don't know enough to understand natural input.
Very helpful, thank you!
Can it also create other decks besides those with English as one side?
Yes, it supports all language combinations, but currently only supports English Wiktionary definitions though. (Although I'm looking for possibilities for other languages, but it's not that easy.)
Good job! Using <details><summary> for expandable dictionary definitions is a nice solution.
However, I'm noticing a distinct lack of source attribution links on the cards. Not only are you required to provide attribution to comply with the CC BY license (CC BY-SA in the case of Wiktionary), having links would also make it easier for users of the decks to contribute back by e.g. fixing mistakes they notice while studying.
You might also want to upload shared decks to ankiweb.net to help people discover your work.
Thanks, those are good points! I'll add attribution to the descriptions. You're right about the backlinks, it depends on how difficult they are to implement. And I'll try to do Ankiweb 👍.