menu
تتويبا
language
سجّل لِج
language العربية
menu
تتويبا

chevron_right سجّل

chevron_right لِج

تصفح

chevron_right Show random sentence

chevron_right تصفح حسب اللغة

chevron_right تصفح حسب القائمة

chevron_right تصفح حسب الوسم

chevron_right تصفح ملفات الصوت

المجتمع

chevron_right الحائط

chevron_right قائمة بجميع الأعضاء

chevron_right لغات الأعضاء

chevron_right المتحدثون الأصليون

search
clear
swap_horiz
search
gillux {{ icon }} keyboard_arrow_right

الملف الشخصي

keyboard_arrow_right

الجُمل

keyboard_arrow_right

المفردات

keyboard_arrow_right

Reviews

keyboard_arrow_right

القوائم

keyboard_arrow_right

المفضلة

keyboard_arrow_right

التعليقات

keyboard_arrow_right

التعليقات على جمل gillux

keyboard_arrow_right

رسائل الحائط

keyboard_arrow_right

السجلات

keyboard_arrow_right

تسجيل صوتي

keyboard_arrow_right

المدوّنات

translate

ترجِم جمل gillux

رسائل gillux على الحائط (المجموع ٥٩٥)

gillux gillux قبل 18 يومًا ٢٨ مارس ٢٠٢٤ ٣:٤٧:٤٩ م UTC link Permalink

I agree there is room for improvment in search errors.

The reason for this error is that the search engine we use, Manticore, does not allow searches that contain only negations, that is to say "all sentences except the ones that contain some word". (Actually, newer versions of Manticore do allow such searches but not by default, I assume it is because it may consume a lot of resources.)

gillux gillux ١ نوفمبر ٢٠٢٣ ١ نوفمبر ٢٠٢٣ ١٠:٢١:٢٥ ص UTC link Permalink

> The audio file name is based on the number of the sentence with which it was submitted. If the algorithm were to instead choose the lower number, the audio file would have to be renamed, which is tricky and could cause problems if something happened to the system at that moment.

Yes I think that's the reason the deduplication algorithm was designed not to remove sentences with audio. But since the introduction of "multiple audio per sentence" feature, audio files are no longer named after the sentence number (they are now named after their own, audio-specific id), so in theory we could change the sentence selection algorithm of the deduplication bot. I personally doubt it is worth the effort though, but pull requests are always welcome.

gillux gillux ٢ سبتمبر ٢٠٢٣ ٢ سبتمبر ٢٠٢٣ ٥:١٣:٣٦ ص UTC link Permalink

> just wanted to thank you for providing all the sentences and also thanks to the people that keep this website running.

You are welcome. ^^
Thank you for the kind words.

gillux gillux ٣١ يوليو ٢٠٢٣ ٣١ يوليو ٢٠٢٣ ١:٥٥:١٩ م UTC link Permalink

There is a rule/guideline here that says:

Do not intentionally add bad or confusing sentences in order to make a point.

https://en.wiki.tatoeba.org/art...how/guidelines

gillux gillux ٣١ يوليو ٢٠٢٣ ٣١ يوليو ٢٠٢٣ ٩:٥١:٣٦ ص UTC link Permalink

It is not only a matter of "when", but also "how". You can look at past discussions on the topic, such as https://tatoeba.org/fr/wall/show_message/31919. There is no clear consensus on how to deal with this issue.

gillux gillux ٣٠ يوليو ٢٠٢٣ ٣٠ يوليو ٢٠٢٣ ١١:١٩:٤٣ ص UTC link Permalink

Thank you for your feedback everybody! I updated the design of the word count search filter on https://dev.tatoeba.org/. I also added some validation rules that should prevent accidentally using conflicting word count filters such as both "At least 10" and "At most 5".

And while I was at it I made the advanced search page responsive.

Feedback still welcome.

gillux gillux ٣٠ يوليو ٢٠٢٣ ٣٠ يوليو ٢٠٢٣ ٥:٤٣:٢٥ ص UTC link Permalink

Thank you very much for your feedback and wise suggestions, Alan. 👍
CK also reported word wrapping problem on Github, I think I can combine both of your suggestions and kill two birds with one stone.

About the explanatory text underneath, I am afraid it would get too long and take too much space. Maybe I could move in into some popping up info box or something. After all, once the user is aware of that, there is no need to constantly display it.

gillux gillux ٣٠ يوليو ٢٠٢٣ ٣٠ يوليو ٢٠٢٣ ٥:٣٤:١١ ص UTC link Permalink

I am reporting a bug to myself %-) Words are incorrectly counted for sentences including a question mark prefixed with a space, an exclamation mark or other non-word character.

https://dev.tatoeba.org/fr/sent...rd_count_min=2

This is a side effect of allowing searching question marks, a feature that was implemented 3 years ago https://github.com/Tatoeba/tatoeba2/pull/2399

gillux gillux ٢٩ يوليو ٢٠٢٣ ٢٩ يوليو ٢٠٢٣ ٤:٢٣:٣١ م UTC link Permalink

I am glad you appreciate it!

About changing the wording, I am not sure which wording is better, they look equivalent to me. I am not sure I understand the rest of your message between parenthesis.

> As I see now, if I don't fill the 'at most' section, it assumes 0 and shows no sentences

If you don't fill the 'at most' field, Tatoeba assumes there is no upper limit and it shows all sentences, just like now.

> It will be an additional search function, or always be on like everything else?

It will be always on.

gillux gillux ٢٩ يوليو ٢٠٢٣ ٢٩ يوليو ٢٠٢٣ ١١:٢١:٥٥ ص UTC link Permalink

I have been working on adding a word count filter to the search. If you make a search on https://dev.tatoeba.org/, you will be able to restrict results to sentences having at least n words, or at most n words. Keep in mind that dev.tatoeba.org only contains a small, old subset of sentences from tatoeba.org. If you notice anything wrong, please let me know.

The new language Mapuche is also about to be added.

In addition, for developers, I have started to implement a proper API as described here: https://github.com/Tatoeba/tatoeba2/pull/3064

I created a simple client example: https://downloads.tatoeba.org/a...v_example.html

gillux gillux ٣ يوليو ٢٠٢٣ ٣ يوليو ٢٠٢٣ ٢:٣٤:٥٠ م UTC link Permalink

Issues with notification emails

Lately, some notification emails may have not been delivered properly. This is related to the issue reported by brauchinet about posting comments taking unusually long. This issue should now be fixed, but in the mean time some of you may not have been properly emailed about things going on, possibly new comments, new private messages, @ notifications and wall post replies.

Incident timeline:
- started on June 28th 2023 05h UTC
- solved on July 3rd 2023 13h31 UTC

Note that the incident did not affect all notifications in that time span, only some of them. The server logs show that 38 sentence owners potentially missed one or more comment. I will try to address these by sending them a private message. As for other kind of notifications, including notifications to members who also commented on the sentence, private messages, wall post replies and @ notifications, please double check as I cannot really tell who missed what.

gillux gillux ١٥ يونيو ٢٠٢٣, edited ١٥ يونيو ٢٠٢٣ ١٥ يونيو ٢٠٢٣ ٦:٢٥:١٥ ص UTC, edited ١٥ يونيو ٢٠٢٣ ٦:٣١:٥٤ ص UTC link Permalink

歡迎來Tatoeba。很抱歉讓你覺得這裡沒有辦法使用台灣的中文。你可以使用Mandarin Chinese這種語言加台灣中文的例句。而且,不管輸入簡體字還是繁體字,Tatoeba都會自動地把你寫的字寫成另外字體,讓大家看得懂也找得到你的中文。

至於像電腦有各種的說法那樣的生詞,你可以使用台灣的也沒關係。其實,你應該使用對你來說最道地,最自然的中文寫例句。這樣對學中文的人才有幫助。沒有台灣中文獨立的語言是因為中文雖然在不同的地方有些不一樣的說法,但是大部分是一樣,所以比較現實放在一起。

希望你了解我說的話,也希望你可以多加一些台灣中文的例句。

gillux gillux ١٣ مايو ٢٠٢٣ ١٣ مايو ٢٠٢٣ ٦:١١:٢٦ ص UTC link Permalink

Sadly it's not possible to do this at the moment. It has been mentioned several times in the past. The issue is being tracked here: https://github.com/Tatoeba/tatoeba2/issues/1576
Code or design contributions are welcome!

gillux gillux ١٧ أبريل ٢٠٢٣ ١٧ أبريل ٢٠٢٣ ٤:٥٩:٣٩ م UTC link Permalink

It is possible to download some sentences as a file that you can later print out. What are the 7683 sentences you need?

gillux gillux ٢٦ فبراير ٢٠٢٣ ٢٦ فبراير ٢٠٢٣ ١١:٤٦:٠٨ ص UTC link Permalink

Yeah Manticore allows that, and apparently the query parser can also guess not to interpret the hyphen as a metacharacter in certain contexts.
https://manual.manticoresearch....on#blend_chars

My concerns are towards usability rather than feasibility.

gillux gillux ٢٦ فبراير ٢٠٢٣, edited ٢٧ فبراير ٢٠٢٣ ٢٦ فبراير ٢٠٢٣ ١٠:٢٤:١٣ ص UTC, edited ٢٧ فبراير ٢٠٢٣ ٦:٤٦:٤٠ ص UTC link Permalink

As Alan explained it is not possible to find words specifically containing hyphens. However there is a way to tune the search engine to allow that. Anybody is welcome to open an issue on Github to ask for that.

We've done such tuning in the past to allow searching for question marks, because a lot of users were confused by what happens when you search for a question https://github.com/Tatoeba/tatoeba2/pull/2399 However, because hyphen is also a metacharacter that means "exclude sentences containing that word", we'll have to carefully check how such tuning affects the use of hyphen as a metacharacter.

gillux gillux ٢١ يناير ٢٠٢٣ ٢١ يناير ٢٠٢٣ ٥:١٤:٤٢ ص UTC link Permalink

It feels sad but thank you for taking the time to write about these issues. Call me a "technical solutionist", but I believe there are technical ways to at least mitigate the problem, in addition to convincing members to change their behavior. But as you said, these are costly so they need to be carefully thought before implementing.

Just to make sure we are talking about the same thing: Basically, you cannot get useful results when searching because all the results look similar, so you need to scroll over and over in order to find useful sentences.

I saw interesting ideas here:
https://github.com/Tatoeba/tatoeba2/issues/2816

I'd like to suggest the following approaches, in order of estimated cost of development (from cheapest to most expensive):
- add a "number of words" search filter to easily exclude short sentences
- expose per-language (and per-user?) statistics about sentences "diversity", similar to what can be seen in the Github issue
- introduce a new smarter ranking algorithm that favor sentences based on their uniqueness, length or other criterion, using GDEX as an inspiration https://www.sketchengine.eu/guide/gdex/ ). This however brings new political problems of how to decide on the criteria.
- cluster search results so that similar sentences are grouped, only display one sentence of each group, but allow clicking on a group to see all the hidden sentences

gillux gillux ٢٣ يونيو ٢٠٢٢ ٢٣ يونيو ٢٠٢٢ ٧:٤٢:٥٦ ص UTC link Permalink

Awesome! Thank you, Pinari. ♥

@CK I listened to a few sentences having two audio, and it looks like one voice has a much lower volume that the other. This must not be very practical for learners of Turkish to listen to. I think this could be fixed by applying some audio gain normalization filter. I am talking about this: https://en.wikipedia.org/wiki/Audio_normalization

Audio normalization can be done before uploading the audio, and it’s better to do it before any lossy compression, to avoid loosing quality by re-encoding the mp3. In other words, best is to record → normalize → compress to mp3.

I don’t know if that could fit in your workflow though. Do you directly upload mp3s that you get from audio contributors? What about audio from Common Voice?

Another approach is to add a replay gain value as metadata to an already-compressed audio, so that the player will apply the normalization filter at play-time: https://en.wikipedia.org/wiki/ReplayGain. The downside is that if the player does not support it, it doesn’t work.

gillux gillux ١٤ يونيو ٢٠٢٢ ١٤ يونيو ٢٠٢٢ ٣:١٥:١٠ م UTC link Permalink

Thank you for doing this. I’m very happy to listen to all these new audio!

Now we probably need to have a better way to add recordings that were not recorded by Tatoeba contributors. In the mean time I guess you can keep creating new Tatoeba accounts to identify Common Voice users…

gillux gillux ٢٩ مايو ٢٠٢٢ ٢٩ مايو ٢٠٢٢ ١:١١:١٠ م UTC link Permalink

May I ask how did you compile the list of sentence IDs? What’s your purpose?
Just being curious about how people use Tatoeba :-)