Wall (6,616 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
2 days ago
3 days ago
3 days ago
7 days ago
7 days ago
7 days ago
7 days ago
7 days ago
** Stats & Graphs **
Tatoeba Stats, Graphs & Charts have been updated:
Weird kind of spam.
This sentence in Hungarian appears to be linked to my sentence by me, but I didn't link it cause I don't even understand it:
Could it be a bug?
The logs on #10969276 show you linking three other sentences before the Hungarian one. My guess is that you clicked one time too many, possibly while sentences were being moved from the list of indirect translations to the list of direct ones, and ended up accidentally hitting the Hungarian sentence.
Is there a way of downloading all sentences in a language that don't have a translation linked, direct or indirect in another given language? Example: all German sentences with no link to an English sentence?
There's no way to download them, but you can search for them using the following.
This is a template set up to show the shortest such sentences first.
If you prefer another sort, you can change that in the lower right-hand corner.
You can fine-tune it to only show German sentences by those claiming to be native speakers.
Or, even focus on German sentences by one particular username, for example Pfirsichbaeumchen.
[Added 13 hours later]
You could bookmark this "dashboard" of useful links to get several options for German to English translating.
> This is a template [...]
Thanks, but I'm aware of everything you suggested.
The reason I asked is because I find the rendering of Tatoeba's search results page painfully slow and CPU-intensive, and I'm not a fan of the limit to 1,000 results.
> You could bookmark this "dashboard" of useful links
I see that the link is off-site and not secure. What advantage does it have over Tatoeba's own search?
You can download all German sentences and all German sentences translated to English, then you can sort them in excel to find those not yet translated to English.
Basically you can use this guide without the sorting out names part: https://tatoeba.org/en/wall/sho...#message_37857
Thanks, but I don't think that method would find German sentences without indirect links to English (i.e., with no links whatsoever to English).
There are no indirect translations as I see, just translations and not translations.
If that "indirect translation" is so important to be not there maybe you can do a similar thing using downloaded links, too. (links.tar.bz2)
Yes, this process only excludes the sentences with "direct translations", but I think that's enough for a simple translator.
What do you need these sentences for?
> What do you need these sentences for?
... To translate.
With a downloaded list of (in my case, mostly) German sentences, some of which have indirect links to English sentences – indirect links I cannot see until I search for them on the website – I often end up translating sentences only to find that my translation either is the same as an existing indirect translation or differs from an existing indirect translation in some way trivial enough to discourage me from adding my own. I could add my own translation to many of the sentences here that already have an English translation, direct or indirect – trust me, I really could. And although I do so sometimes, I do also try to avoid this, which itself gets annoying. It would be nice to be able to download sentences that don't have direct or indirect translations, thus avoiding the situations I've described.
Note that when you add an English translation to a German sentence that is not directly-linked to the same pre-existing English sentence, a direct link is created, so doing so can benefit our project. This will happen in cases where the existing English sentence is not even a visible indirect translation, too.
I'm aware of that. I've done it enough it times. Incidentally, I've probably linked hundreds (if not thousands) of your sentences to German sentences in the process.
"I often end up translating sentences only to find that my translation either is the same as an existing indirect translation or differs from an existing indirect translation in some way trivial enough to discourage me from adding my own."
The sentence you are translating from doesn't change at all, just because it has an "indirect translation" exact the same to yours or a bit more articulate.
I think for translation the method I tried to show you is enough good.
Don't let those "indirect translations" (they are not even translations... at all or not yet, so,,,) to discourage you.
And what more, if you can translate those sentences differently, don't afraid to add them.
A great amount of English sentences has Italian translations and one sentence has not only one translation in most cases, but 3 or 4.
Other than this maybe you can do something using the downloaded links. (links.tar.bz2)
I haven't tried yet, so I can say this much only.
I think both @CK and @Cabo are missing @sundown's point. He's not asking how he can add sentences of possibly marginal usefulness. He's asking how he can add English sentences that have no existing direct or indirect translations because he thinks translating those will be the most beneficial.
One approach would be to solve a simpler problem: find all the German sentences that have no translation in *any* language. These are guaranteed not to have direct or indirect translations into English. But even solving this problem will require some tedious manual work (for instance, adding each sentence from the results of a search into a list, then downloading the list), or programming, or performing checks at run time to exclude sentences that you have already downloaded.
One question is this: Unless your goal for producing these sentences is strictly personal, you're going to want to upload your translations, right? Since we don't have an automated, generally available way of doing this at the moment, this task, as compared to working on a list of search results that are already "live", will add back the execution time that you've eliminated by using a downloaded file. So it seems like working from search results, despite the drawbacks, may be the best way to go. It also has the advantage that sentences that you translate will not appear in subsequent searches.
Thanks @AlanF_US. That's more like what I meant.
Lately, I've been translating sentences using a list of sentences using a method like Cabo's to find German sentences that don't have a direct English translation. Until now, when I come to Tatoeba to add my translations, I've often found that the sentences I've written
1. already exist as *indirect* translations;
2. differ in some trivial way from existing indirect translations;
3. are significantly different from existing indirect translations.
I always add my translation when 3) is true, but not always when 2) is true, even when I prefer my sentence, and personally wouldn't use/say the existing sentence because of the wording, the spelling, the different preposition: whatever. In this situation, having come up with my own translation away from the site and therefore unaware of any indirect translations, 2) can be annoying. Why not just add my own translation? I sometimes do, but there has been talk here of cutting down on translations that don't add value. If it's a matter of adding a sentence I would say or write as opposed to an existing one I wouldn't, then I think adding my sentence would be valuable. What's valuable to one person is not, of course, to another.
When I look at *direct* English-German translations here – let alone indirect ones – and translate the German sentence myself, it's not uncommon that I end up with a different English sentence than the existing one. No big surprise there, of course. We all have different ways of speaking. It's partly, though in no way completely down to the fact that I grew up in a different part of the world, speaking a different English to most contributors here. The difference is made a lot starker, however, because my English differs from that of the most active contributor here by far who has, I believe, fifty trillion sentences to his name.
> So it seems like working from search results, despite the drawbacks, may be the best way to go.
Not when your computer freezes for minutes at a time trying to render a search results page. And anyway, I've come to like translating offline. My translations might not be any better for it, but there isn't the pressure to press the "submit translation" button prematurely. I can come back later to a half-completed translation and finish it. But I would never have gone to all this trouble if it weren't for the slowness and limit on search results of the site search.
Please also add your translations in case number two, especially if it is a matter of regional or dialectical variation. The corpus benefits from such variation and having several different translations for a given sentence is a strenght, not a flaw.
I agree that you should go ahead and add such translations. Language learners often like to know various correct ways to say the same thing. If you, as a native speaker, add a variation that is what a certain language learner has been saying, he/she can then know for certain that what he/she has been saying is correct.
Thanks @CK. It's certainly something you've been doing systematically over the years,
Thanks, @Thanuir. It's never clear to me whether anyone here knows or cares about the issues of variation in the corpus (beyond paying lip service to it). At present, the corpus is completely imbalanced, which, naturally enough, puts off speakers of other dialects from staying here very long.
It might be that variation in the corpus is a multidimensional matter.
If I were to add a new sentence with Tom and Mary, but also a new word, would it add to or reduce the variation in the corpus? What if the sentence structure is novel or very frequent?
What about translating «Tom likes Mary» -type sentences to a language with many sentences? What about translating it in a way that uses an idiom particular to that language? What about to a language with very few sentences?
The questions above are rhetorical ones and not meant to be answered. But the point is, I believe great many people care about issues of variation, but they do it along the dimensions that matter to them.
In the end the ways to affect Tatoeba are:
1. Do something, maybe someone else eventually does something more with it. Like add a sentence or translate a sentence. This does not work if you want the satisfaction of seeing your work flourish and grow.
2. Establish a personal relation with another user; maybe translate stuff from each other.
3. Add sentences on a huge scale.
4. Build a tool or code the website to increase or decrease the visilibity of something or alter people's workflow in a way that emphasizes some stuff.
Thanks, @Thanuir, I appreciate your comments. I agree that there are different ways to measure variety, and that people care about some aspects of variety in the corpus and feel differently about others.
> 3. Add sentences on a huge scale.
This practice by one dedicated user above all others has contributed to the problem of lack of variety.
If what you really want is to work offline from a file that you've saved, and don't mind pasting the results back into the website one by one, I suggest doing the kind of advanced search mentioned earlier in this thread, but copying and pasting the content from the beginning to the end of the sentence section of the results into a text editor (such as Notepad++). You can then use "search/replace all" functionality to get rid of lines that contain information you don't need (for instance, lines that say "[German]").
I found that with 50 results per page, it took me only 3 minutes to go through the first 10 pages of results and copy them into the text file. That's 500 sentences. Even if your computer and/or connection are slower than mine, that seems a pretty reasonable length of time to download a group of sentences that can keep you busy for quite a while.
If you use this approach, you should choose not to display any translations in your results. Otherwise, the results will be hard to read.
For this particular reason (translating sentences directly in Tatoeba without a need to download them for your personal projects ) this link must work perfectly:
If you need to have all sentences that match your criteria at once, you would probably need a customised script.
This is the same search as a template, which means you can insert a search query, or just click the "search" button to see any German sentence without any links to English.
This is set for random output, but you can easily change to another "sort," if you like.
A 1,000 sentence limit shouldn't be a problem, since as you add translations, you'll get a different set of 1,000 sentences.
Perhaps you would also like to turn on the "Owned by a self-identified native" option.
Hi. I've uploaded a script that lists all the sentences in a given language that don't have any direct or indirect translations in some other given language. It's slow and uses a lot of memory, but I'm able to run it on my cheap two-year-old phone that has 2 GB of RAM using Pydroid. You need to download the links.csv and sentences.csv files first in order to use it. I've also uploaded the result of running the script for the language pair German-English. To use it for a different language pair, just change the variables "source" and "target" at the top of the file. Let me know if you find any bugs.
Thanks a lot, Cangarejo.
To elaborate, for those who are not familiar with Python or downloads:
This is a Python script. If you don't have Python on your machine, you need to download it from python.org.
To download sentences.csv and links.csv, you need to go to the "Downloads" link at the bottom of any Tatoeba page. Then you will need to download sentences.tar.bz2 (under "Sentences") and links.tar.bz2 (under "Links"). Both of these bz2 files need to be unzipped twice (one to form a .tar file and the other to form a directory containing a .csv file) with a utility such as 7-Zip.
Then you need to move all the resulting files into a single directory and finally run "python untranslated.py" (or possibly "python3 untranslated.py"). If you need more details, let us know.
I think you can also use the script with the sentences_detailed.csv file without making any changes to the code. Just change the filename.
By the way, aren't these TSV files instead of CSV files?
Thanks very much, @Cangarejo. Thanks too to @AlanF_US for the help to us non-programmers.
🍎 Stats - Audio File Contributors
Get the "tatoeba.org/audio/of/USERNAME" links for over 200 voices.
hola buenas Tardes , Administradores me complace escribirles para solicitar que pueda añadir mi Dialecto Saharaui a este maravilloso sitio. gracias de ante mano
Aquí están las instrucciones para añadir una lengua:
Como no sé si entiendes el inglés, te explico: necesitamos que la lengua tenga un código ISO 639-3 (supongo que en tu caso estamos hablando de la lengua con el código aao https://iso639-3.sil.org/code/aao). En caso de que así sea, necesitamos que crees una lista con cuantas más oraciones mejor en esta lengua y elijas una bandera para la lengua. Después, cuando ya lo hayas hecho, envía un correo electrónico a email@example.com con toda esta información.
Muchas gracias , me ha servido de mucha ayuda . un saludo
Out of curiosity, is there a way to see contributions from a user in a specific language? I wanted to go back to a sentence I released, but it was adopted and the "please adopt" tag removed, and it doesn't show up in "latest contributions".
No, sorry, this is currently not an option.
Also, if the sentence you released was created a long time ago, you might not be able to see it in the "Latest contributions" even if you could filter by language, because the latest contributions are limited to 1000 entries.
As Trang pointed out, Tatoeba currently doesn't have a feature to make this easy.
However, depending on how curious you *really* are, there is a way to find this information.
Basically, you'd have to download https://downloads.tatoeba.org/e...utions.tar.bz2 and search for "insert sentence" lines with your username and the language you wrote the sentence in.
Since you mentioned on GitHub that you've been learning about programming, you might want to try this as a practice problem.