31 minutes ago
Hi guys!

I'm developing a website (http://www.listeningpractice.org) that uses Tatoeba recordings to help language learners. I'm thinking about adding the option to filter the sentences by the accent of the speaker. I would have to group the sentences by speaker and then I just have to figure out the accent of each speaker. Could you please tell me if the owner of the sentence is always the recorder as well?

5 minutes ago
Sentences are not always recorded by their owners, and I don't think there's a way right now to find out who recorded them.

I definitely think we should indicate them. The list of sentences with audio for download should include this information. It would be also nice if we could know the contributor of audio for each sentence on the website and browze a list of sentences recorded by a specific member. It's a shame that the current state of the site greatly discourages people from contributing audio.
4 days ago
** Tatoeba update (December 17th, 2014) **


# Small update

* We fixed the problem of the languages not being displayed on the translate page[1], in the list for random sentences.
* We fixed an issue where a sentence was not part of the search results, even though it had been indexed previously. This happened when the sentence was recently translated, or for which the owner or correctness has changed.

# UI translations

I'd like to mention that we now have the website interface translated 100% in 7 languages[2]: Arabic, Esperanto, Finnish, French, German, Italian and Russian.
We have as well Marathi (97%), Japanese (92%) and Polish (90%) not far from being completed.

# Sentences deduplication

We're delaying once more the sentences deduplication. There are still some details we'd like to fix[3]. Even though they are not critical and the deduplication itself is working properly (as far as we know), it's better to fix them sooner than later.

When everything is fixed, there will be another round of deduplication on the dev website. We will leave a few days again for everyone to check that there's indeed no major issue. Then we will run the script finally on the real website.

Thank you for your patience.


[1] http://tatoeba.org/eng/activiti...late_sentences
[2] https://www.transifex.com/proje...toeba_website/
[3] http://tatoeba.org/eng/wall/sho...#message_21231
14 hours ago - düzenlendi 14 hours ago
> * We fixed an issue where a sentence was not part of the search results, even though it had been indexed previously. This happened when the sentence was recently translated, or for which the owner or correctness has changed.

Just to elaborate on this. The original problem was that even though newly added sentences could be found within an hour (and now 15 minutes) using search, they could actually not be found as a translation. For example, let’s say you translate an existing Arab sentence A into a Bolivian sentence B. Then, you were able to find B by looking up words of B, but looking up “words of A translated into Bolivian” didn’t return A.

The interesting thing is that fixing that bug brought a new feature: changes in terms of authorship and unapproved status are now instantly visible in search results. These criteria affect the search result ranking algorithm (orphan sentences get a lower score, and unapproved sentences an even lower score). It means that for example, orphan sentences will be brought up in the search results as soon as someone adopts them.

The recent improvements on the search (including this) also lays the foundations for a more advanced search functionality with multiple criteria like owner, creation/modification date, tags, unapproved status etc.
17 hours ago

How much time do the translations in Transifex take to be visible in tatoeba? I went thru the FAQs in wiki notes but I did not see that aspect.
17 hours ago
As far as I remember you see them in the developers site http://dev.tatoeba.org/epo/home within an hour; in the public site after every update.
15 hours ago
You can see them on the developers site within ten minutes.
5 days ago
​** Sentences deduplication **

We ran the deduplication script on the dev server last weekend. We found some other issues because of sentences such as #613428, which has for duplicates #2505312, #2505313, #2509131, and #2509135.

The issues were fixed. The script completed yesterday. It seems to deduplicate properly.
You can go and check some of the sentences that were deduplicated by looking at Horus' comments[1].
You can also check from the exports files of the dev database: http://downloads.tatoeba.org/dev/
and compare them with the ones from the prod: http://tatoeba.org/eng/downloads

There are 2 problems remaining.

1) The way things are logged can be confusing.
For instance: http://dev.tatoeba.org/eng/sentences/show/613428

Question: is it a problem for anyone if things are logged this way?
We could try and spend more time to make the logs more user friendly, or we could just leave it this way.

2) Comments are currently copied (instead of being moved) to the main sentence.
For instance: http://dev.tatoeba.org/eng/sentences/show/3550769.
If you go to the main sentence: http://dev.tatoeba.org/eng/sentences/show/1926402
you will see that there is a copy of the comment.

Question: do you prefer to have the comments on both the main and the duplicate sentence? Or only on the main sentence?

Thanks in advance for your feedbacks.


[1] http://dev.tatoeba.org/eng/sent.../of_user/Horus
5 days ago
1) The logs look somewhat verbose but I'm sure I can live with it. The benefit of de-duplication exceeds this inconvenience greatly, I think. Besides, the subsequent runs of the script will generate much less log entries, I believe.
2) I think it's OK the way it is now.
5 days ago
I'm thrilled that we've gotten this far! Thanks to everyone who has made it happen.

I would definitely be in favor of running the script now and making improvements to the log later, if at all. It's already possible to get the information we need now, and we would all benefit from having those duplicates merged as soon as possible.

If and when we do make changes to the logging, is it possible to report the number of the sentence on which an operation was performed as well as the operation? For instance, instead of:

linked to #123456

could we write this?

#111111 linked to #123456

Optionally, that first number could be suppressed where it's identical to the sentence where the log is being displayed. But if it's displayed all the time, that's fine, too. It doesn't take up much horizontal space.

Whether comments are copied or not doesn't matter to me, so I would say the current behavior is fine.
5 days ago
> is it possible to report the number of the sentence on which an operation
> was performed as well as the operation?

It's already displayed, just not displayed clearly.

If you look for instance at my logs:

You have:
Sentence #3649129
linked to #3649128

When the logs are on the sentence's page, all the operations only concern the sentence itself. Which is why the number sentence number is not displayed.

If there is a place outside of the sentence's page where the sentence id is not displayed, let me know.
5 days ago - düzenlendi 5 days ago
I think we should take either intelligibility or exactitude. Probably the latter. Current pages are neither intelligible nor exact.

I'm still not happy with sentences like http://dev.tatoeba.org/fre/sentences/show/2144030 (as I wrote earlier in http://tatoeba.org/eng/wall/sho...#message_21045).

I agree with Alan and think that sentence numbers should be displayed. That would solve most problems. Besides, every addition of a sentence should be logged.

Eckhardtgabriel - Aug 27th 2010, 19:49
added #483087
Why you don't take off your coat?

CK - Jan 14th 2013, 15:57
unlinked #483087 from #2143848

May 9th 2014, 22:22
[Comment to #483087]
Word order: Why don't you take off our coat?

I know this doesn't look very nice, but each sentence number has its history. You can't just pretend they were the same.
5 days ago
> I agree with Alan and think that sentence numbers should be displayed.
> That would solve most problems.

As I've replied to Alan, the sentence numbers are displayed on all the pages other than the sentence's page. On the sentence's page, all the operations only concern the sentence itself so the number would be repeated everywhere. You won't see something like "#2 linked to #3" on sentence #1.

Regarding the the comments, the question is whether it's necessary to fix, which means the deduplication could be delayed for an extra week, or if it's fine to have some data that is not completely coherent, and have the script run earlier.

As far as I'm concerned, I don't think there is a hurry to run the script so we could take the extra time, or an extra month if needed. But I know that people have been waiting for the deduplication for years, literally, and may just be alright with not having the perfect script.

In order to solve your problem, the simplest solution I can think of would be:

1) We don't copy the logs of the duplicates, we leave them there. Once the deduplication is done, we add a comment on the main sentence to indicate what were the duplicates found.
For instance. "This sentence had duplicates: #123, #456, #789. The duplicates have been deleted."

2) We do copy the comments of the duplicates, but we add a short message. "Comment copied from #123 because of duplicate merge".
5 days ago
I basically agree, but is Horus going to speak only English?
5 days ago
For the moment yes. There are definitely solutions to translate Horus' comments but I see nothing easy enough that would be worth investing the time and delaying the deduplication.
4 days ago
Considering that it will be tedious to change the text of the messages added by Horus after the deduplication, I'd like to have your opinion on what should be written.
Keep in mind that the text will be only in English, it won't be translated like the rest of the interface. So we should try to make it simple enough for people who do not speak English.

I'll post some suggestions below. Feel free to suggest something else.

1) The message on the deleted duplicate.

- http://dev.tatoeba.org/eng/sent...comment-487064
- http://dev.tatoeba.org/eng/sent...comment-487062
- http://dev.tatoeba.org/eng/sent...comment-487063
- http://dev.tatoeba.org/eng/sent...comment-487060

2) The message on the remaining sentence

- http://dev.tatoeba.org/eng/sent...comment-487067

3) The extra info in the comments that were copied.

- http://dev.tatoeba.org/eng/sent...comment-434496
- http://dev.tatoeba.org/eng/sent...comment-469692
- http://dev.tatoeba.org/eng/sent...comment-485258
Dans l'exemple que vous fournissez, vous indiquez avoir procédé à 4 suppressions, mais je ne vois la trace que de 3 doublons dans le log...D'où sort le 4e ?
Le dernier doublon n'apparaît pas dans les logs de la phrase restante car il n'était pas lié à cette phrase, contrairement aux 3 autres doublons.
22 hours ago
seuls les doublons qui étaient liés sont tracés ? Mais alors comment fait-on pour savoir quels doublons ont été fusionnés ?
3 days ago
We are currently experiencing a problem that makes it impossible to unlink sentences. We will let you know as soon as the problem is fixed.
Has there been any progress yet?
No, not yet.
It's fixed now.
I know we can "Browse by language" then:
Sentences in: Lojban
Show translations in: English

However this leaves Lojban sentences that do not have English translations.
I want to see a list of Lojban sentences that have English translations,
and filter the sentences that do not. Is this possible?
Sorry if I ask too many questions but I am new on the site so I am not familiar with many procedures. How is it possible to add a new language for the tatoeba lists? For instance, there are many sentences catalogued as undefined. I believe many of these belong to Lingala so how it is possible to add this language into the website.
You should read the FAQs on the Wiki
There is one for adding a new language here http://en.wiki.tatoeba.org/arti...nguage-request
Hello guys,

I was told that now tatoeba uses transifex to translate the website. I already signed up but I am still not able to translate into Spanish. Shall I wait for having the rights?

I am trying specifically to correct this string because in Spanish it should say Romanche and not Romaní as shown here. Can somebody help?

2 days ago
Put a share button somewere in the page to invite other people in tatoeba
2 days ago
Hello Gillux,

I found out there still things to translate, but it is not allowing me
to access this link,

2 days ago - düzenlendi 2 days ago
Launchpad isn't used anymore to translate Tatoeba's interface. Now we use Transifex: https://www.transifex.com/proje...toeba_website/
