Wall (7,275 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
small_snow
an hour ago
frpzzd
yesterday
LeviHighway
2 days ago
frpzzd
2 days ago
sharptoothed
2 days ago
LeviHighway
2 days ago
lingomaxim
2 days ago
frpzzd
2 days ago
LeviHighway
2 days ago
alt
3 days ago
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
Hi everyone,
I have tried to use the sentence export page on Tatoeba, but unfortunately, I am not managing to do it correctly. Could someone please help me download the following data (preferably as a CSV file, zipped or not)?
1. All Kabyle sentences
—with their translations in French, English, and Spanish,
—and with an audio recording attached.
2. All original Kabyle sentences that:
—do not have translations,
—do not have audio recordings,
—but without duplicated sentences if possible.
Any help, guidance, or explanation on how to extract these properly would be greatly appreciated.
Thank you very much in advance!
Igider
Dowload
https://tatoeba.org/en/downloads
Advanced search:
https://tatoeba.org/en/sentences/advanced_search
I think that you need to download sentences for each language in separate files, then download connections file and after that you need to use some graph tools to connect sentences (example NetworkX for Python).
Note first of all that you are talking about a language with more than 777,000 sentences, and working with such a large set of sentences is going to require scripting/programming not just for the downloading of the sentences, and not just for the selection of the subset you want, but also for the management of those sentences on your side.
The download page lets you download the following with the click of a button:
- all sentences in language A with translations in language B
- all sentences in language A
- all sentences in language A that have audio (but not the audio itself, which can only be downloaded if it the license says so, and needs to be downloaded via a URL; this is explained on the downloads page)
Another alternative, as the page says, is to produce a list of sentences, which can then be downloaded. However, this is impractical with the number of sentences you'd be dealing with.
Without scripting/programming knowledge, you could do three downloads, consisting of all Kabyle sentences translated into French, English, and Spanish, respectively. This would give you TSV files (tab-separated rather than comma-separated) containing the sentences and translations. (For reference, the one for French would be 17.8 MB in size and contain more than 200,000 entries.) This is not what you asked for, but it's probably the best you can do without scripting/programming. Otherwise, the help you need is most likely going to go beyond what can be provided on the Wall.
Thank you for your reply.
Since obtaining the full dataset is not practical through the current process, would you kindly provide at least the audio files related to the Kabyle sentences?
That alone would already be extremely helpful, and I would greatly appreciate your assistance with this.
Thank you again.
In the section "Sentences with audio", the Downloads page says this:
---
File description:
Contains the ids of the sentences, in all languages, for which audio is available. Other fields indicate who recorded the audio, its license and a URL to attribute the author. If the license field is empty, you may not reuse the audio outside the Tatoeba project.
Downloading audio:
A single sentence can have one or more audio, each from a different voice. To download a particular audio, use its audio id to compute the download URL. For example, to download the audio with the id 1234, the URL is https://tatoeba.org/audio/download/1234.
---
You can use a tool like wget to download files once you know the URLs.
If that information is not enough to get you started, see if you can find someone on your side who has the technical knowledge to do this.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.