Burada Tatoeba'nın nasıl kullanılacağı, hatalar veya garip davranışların nasıl raporlanacağı gibi genel sorular sorabilir ya da en basitinden topluluğun geri kalanı ile kaynaşabilirsiniz.
Soru sormadan önce SSS'yi okuduğunuzdan emin olun.
En son mesajlar
Wall (894 threads)
I'm developing a website (http://www.listeningpractice.org) that uses Tatoeba recordings to help language learners. I'm thinking about adding the option to filter the sentences by the accent of the speaker. I would have to group the sentences by speaker and then I just have to figure out the accent of each speaker. Could you please tell me if the owner of the sentence is always the recorder as well?
I definitely think we should indicate them. The list of sentences with audio for download should include this information. It would be also nice if we could know the contributor of audio for each sentence on the website and browze a list of sentences recorded by a specific member. It's a shame that the current state of the site greatly discourages people from contributing audio.
# Small update
* We fixed the problem of the languages not being displayed on the translate page, in the list for random sentences.
* We fixed an issue where a sentence was not part of the search results, even though it had been indexed previously. This happened when the sentence was recently translated, or for which the owner or correctness has changed.
# UI translations
I'd like to mention that we now have the website interface translated 100% in 7 languages: Arabic, Esperanto, Finnish, French, German, Italian and Russian.
We have as well Marathi (97%), Japanese (92%) and Polish (90%) not far from being completed.
# Sentences deduplication
We're delaying once more the sentences deduplication. There are still some details we'd like to fix. Even though they are not critical and the deduplication itself is working properly (as far as we know), it's better to fix them sooner than later.
When everything is fixed, there will be another round of deduplication on the dev website. We will leave a few days again for everyone to check that there's indeed no major issue. Then we will run the script finally on the real website.
Thank you for your patience.
Just to elaborate on this. The original problem was that even though newly added sentences could be found within an hour (and now 15 minutes) using search, they could actually not be found as a translation. For example, let’s say you translate an existing Arab sentence A into a Bolivian sentence B. Then, you were able to find B by looking up words of B, but looking up “words of A translated into Bolivian” didn’t return A.
The interesting thing is that fixing that bug brought a new feature: changes in terms of authorship and unapproved status are now instantly visible in search results. These criteria affect the search result ranking algorithm (orphan sentences get a lower score, and unapproved sentences an even lower score). It means that for example, orphan sentences will be brought up in the search results as soon as someone adopts them.
The recent improvements on the search (including this) also lays the foundations for a more advanced search functionality with multiple criteria like owner, creation/modification date, tags, unapproved status etc.
We ran the deduplication script on the dev server last weekend. We found some other issues because of sentences such as #613428, which has for duplicates #2505312, #2505313, #2509131, and #2509135.
The issues were fixed. The script completed yesterday. It seems to deduplicate properly.
You can go and check some of the sentences that were deduplicated by looking at Horus' comments.
You can also check from the exports files of the dev database: http://downloads.tatoeba.org/dev/
and compare them with the ones from the prod: http://tatoeba.org/eng/downloads
There are 2 problems remaining.
1) The way things are logged can be confusing.
For instance: http://dev.tatoeba.org/eng/sentences/show/613428
Question: is it a problem for anyone if things are logged this way?
We could try and spend more time to make the logs more user friendly, or we could just leave it this way.
2) Comments are currently copied (instead of being moved) to the main sentence.
For instance: http://dev.tatoeba.org/eng/sentences/show/3550769.
If you go to the main sentence: http://dev.tatoeba.org/eng/sentences/show/1926402
you will see that there is a copy of the comment.
Question: do you prefer to have the comments on both the main and the duplicate sentence? Or only on the main sentence?
Thanks in advance for your feedbacks.
I would definitely be in favor of running the script now and making improvements to the log later, if at all. It's already possible to get the information we need now, and we would all benefit from having those duplicates merged as soon as possible.
If and when we do make changes to the logging, is it possible to report the number of the sentence on which an operation was performed as well as the operation? For instance, instead of:
linked to #123456
could we write this?
#111111 linked to #123456
Optionally, that first number could be suppressed where it's identical to the sentence where the log is being displayed. But if it's displayed all the time, that's fine, too. It doesn't take up much horizontal space.
Whether comments are copied or not doesn't matter to me, so I would say the current behavior is fine.
> was performed as well as the operation?
It's already displayed, just not displayed clearly.
If you look for instance at my logs:
linked to #3649128
When the logs are on the sentence's page, all the operations only concern the sentence itself. Which is why the number sentence number is not displayed.
If there is a place outside of the sentence's page where the sentence id is not displayed, let me know.
I'm still not happy with sentences like http://dev.tatoeba.org/fre/sentences/show/2144030 (as I wrote earlier in http://tatoeba.org/eng/wall/sho...#message_21045).
I agree with Alan and think that sentence numbers should be displayed. That would solve most problems. Besides, every addition of a sentence should be logged.
Eckhardtgabriel - Aug 27th 2010, 19:49
Why you don't take off your coat?
CK - Jan 14th 2013, 15:57
unlinked #483087 from #2143848
May 9th 2014, 22:22
[Comment to #483087]
Word order: Why don't you take off our coat?
I know this doesn't look very nice, but each sentence number has its history. You can't just pretend they were the same.
> That would solve most problems.
As I've replied to Alan, the sentence numbers are displayed on all the pages other than the sentence's page. On the sentence's page, all the operations only concern the sentence itself so the number would be repeated everywhere. You won't see something like "#2 linked to #3" on sentence #1.
Regarding the the comments, the question is whether it's necessary to fix, which means the deduplication could be delayed for an extra week, or if it's fine to have some data that is not completely coherent, and have the script run earlier.
As far as I'm concerned, I don't think there is a hurry to run the script so we could take the extra time, or an extra month if needed. But I know that people have been waiting for the deduplication for years, literally, and may just be alright with not having the perfect script.
In order to solve your problem, the simplest solution I can think of would be:
1) We don't copy the logs of the duplicates, we leave them there. Once the deduplication is done, we add a comment on the main sentence to indicate what were the duplicates found.
For instance. "This sentence had duplicates: #123, #456, #789. The duplicates have been deleted."
2) We do copy the comments of the duplicates, but we add a short message. "Comment copied from #123 because of duplicate merge".
Keep in mind that the text will be only in English, it won't be translated like the rest of the interface. So we should try to make it simple enough for people who do not speak English.
I'll post some suggestions below. Feel free to suggest something else.
1) The message on the deleted duplicate.
2) The message on the remaining sentence
3) The extra info in the comments that were copied.
Sentences in: Lojban
Show translations in: English
However this leaves Lojban sentences that do not have English translations.
I want to see a list of Lojban sentences that have English translations,
and filter the sentences that do not. Is this possible?
I was told that now tatoeba uses transifex to translate the website. I already signed up but I am still not able to translate into Spanish. Shall I wait for having the rights?
I am trying specifically to correct this string because in Spanish it should say Romanche and not Romaní as shown here. Can somebody help?