Wall (6,959 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
sharptoothed
4 days ago
Cangarejo
4 days ago
Cangarejo
7 days ago
Thanuir
7 days ago
ondo
8 days ago
ddnktr
8 days ago
ondo
8 days ago
AlanF_US
12 days ago
Nandixer
13 days ago
cblanken
15 days ago
The content of this message goes against our rules and was therefore hidden. It is displayed only to admins and to the author of the message.
I have been working on adding a word count filter to the search. If you make a search on https://dev.tatoeba.org/, you will be able to restrict results to sentences having at least n words, or at most n words. Keep in mind that dev.tatoeba.org only contains a small, old subset of sentences from tatoeba.org. If you notice anything wrong, please let me know.
The new language Mapuche is also about to be added.
In addition, for developers, I have started to implement a proper API as described here: https://github.com/Tatoeba/tatoeba2/pull/3064
I created a simple client example: https://downloads.tatoeba.org/a...v_example.html
To try out the word count filter on the dev site see the "advanced search"
https://dev.tatoeba.org/en/sent...dvanced_search
I really like this idea! I would use it all the time.
As I see now, if I don't fill the 'at most' section, it assumes 0 and shows no sentences, same with filling 'at least' section with 10 and 'at most' section with 9.
(what if changing the wording to Length between and removing 'at most' and 'at least' explanations and allowing both menu to work as higher and lower number input? (yes it needs an in case function, but it would need one otherwise checking if the number is more or less or restrictions on the imput))
It will be an additional search function, or always be on like everything else?
I am glad you appreciate it!
About changing the wording, I am not sure which wording is better, they look equivalent to me. I am not sure I understand the rest of your message between parenthesis.
> As I see now, if I don't fill the 'at most' section, it assumes 0 and shows no sentences
If you don't fill the 'at most' field, Tatoeba assumes there is no upper limit and it shows all sentences, just like now.
> It will be an additional search function, or always be on like everything else?
It will be always on.
"If you don't fill the 'at most' field, Tatoeba assumes there is no upper limit and it shows all sentences, just like now."
Oh, right, maybe I clicked on it.
Then it's working well.
The previous API is working well, and is being trialled to import Tatoeba content into Lingopolo as can be seen in the recent additions here https://lingopolo.org/japanese/recent-additions
The next steps will be to start mass import of Tatoeba content using the API.
gillux, I like it a lot. It's definitely a big improvement over people depending on manually applied tags to choose the length of sentences.
I suggest using the word "used" instead of "assumed":
For languages without word boundaries, the number of characters is assumed instead.
->
For languages without word boundaries, the number of characters is used instead.
Another approach would be to treat languages with and without word boundaries on the same footing, rather than to treat the languages without word boundaries as a special case. Then you could eliminate the phrase "word(s)" after each box, and change the text below to say this:
For languages with word boundaries, the number of words is used. For other languages, the number of characters is used.
Thank you very much for your feedback and wise suggestions, Alan. 👍
CK also reported word wrapping problem on Github, I think I can combine both of your suggestions and kill two birds with one stone.
About the explanatory text underneath, I am afraid it would get too long and take too much space. Maybe I could move in into some popping up info box or something. After all, once the user is aware of that, there is no need to constantly display it.
It looks great now. Thanks.
I am reporting a bug to myself %-) Words are incorrectly counted for sentences including a question mark prefixed with a space, an exclamation mark or other non-word character.
https://dev.tatoeba.org/fr/sent...rd_count_min=2
This is a side effect of allowing searching question marks, a feature that was implemented 3 years ago https://github.com/Tatoeba/tatoeba2/pull/2399
Thank you for your feedback everybody! I updated the design of the word count search filter on https://dev.tatoeba.org/. I also added some validation rules that should prevent accidentally using conflicting word count filters such as both "At least 10" and "At most 5".
And while I was at it I made the advanced search page responsive.
Feedback still welcome.
"Thank you for your feedback everybody! I updated the design of the word count search filter."
These two sentences have more than 15 words. There are 1,773,638 English sentences as I'm typing these lines. Only about 24% of them (418,818) have 10 words or more. Only 5% of them (88,273) have 15 words or more. That is alarming! 🙂
That being said, thank you so much for setting up this new feature, Gillux. I love it.
私も、今日使ってみてるんですが、とても便利ですね。
gilluxさん、ありがとうございます!
Guten Tag Tatoeba Gemeinde,
ich habe die Satz Nr. 11984754 vor kurzem erstellt aber festgestellt, dass ein identischer Satz unter der Nummer 2808984 existiert.
Findet in so einem Fall keine automatische Zusammenführung? Soll ich meinen Satz wieder löschen?
Vielen Dank für die Antwort.
Theo
Sieh noch mal nach. 😉
aha es gibt einen "Sentences deduplication bot." :-)
Guten Morgen,
als Neuling beschäftigt mich folgende Frage.
Wenn ich eine Übersetzung hinzufüge, wird diese als ersten Grades dargestellt bezogen auf den übersetzten Quellsalz. In den anderen Sprachen wird meine Übersetzung als zweiten Grades angezeigt. Soll ich meinen Satz dann noch direkt bei den anderen Sprachen die ich kenne als Übersetzung eintrage, um daraus ersten grades zu erreichen? Oder ist das eher unwichtig?
Viele Grüße
Theo
Hallo Theo,
im Prinzip ist es natürlich gut, möglichst viele Übersetzungen ersten Grades zu haben. Wenn du Lust und Laune hast, gib also den selben (oder auch einen modifizierten) Satz zu einer anderen Sprache ein. Wenn die Sätze gleich sind, werden sie vom System zusammengeführt; d.h. dein Satz existiert dann nur mehr einmal und hat dafür mehrere Übersetzungen ersten Grades.
Wenn man "fortgeschrittener Mitarbeiter" ist (man muss sich dazu bewerben), ist es leichter. Es gibt dann eine Funktion, mit der mehr man zwei Sätze einfach "zusammenhängen" kann.
Danke Dir :-)
🍎 Tatoeba.org Native Speakers with Native Language Sentences
http://tatoeba.ueuo.com/stats-2023-07-29.html
Find native speakers of languages you are studying and get links to their native language sentences.
Updated: 2023-07-29
I also created a cut-down version that may work better on devices that can't handle the full version.
I cut out all the lines for contributors with less than 50 native-speaker contributions.
http://tatoeba.ueuo.com/stats-2023-07-29cut.html
🍎 Stats - 2023-07-29 - Native Speakers with Recent Contributions
http://tatoeba.ueuo.com/stats-2...-29recent.html
The default sort is "Language", "Number of Sentences", "Since 2023-05-20", "Username."
291 members had more than 1 native language contribution between 2023-05-20 and 2023-07-29.
I created this for those who are interested in finding native speakers who have recently been contributing.
This page was inspired by this Github request.
https://github.com/Tatoeba/tatoeba2/issues/3075
Thanks.
Cabo: https://tatoeba.org/user/profile/Cabo
[ENG] Corpus Maintainer Candidate for Hungarian
Cabo has applied. As usual, please feel free to send us a private message to let us know your opinion (click on the link at the end of this message).
[EPO] Kandidato por iĝi bontenanto de la hungara frazaro
Kandidatas Cabo. Kiel ĉiam ne hezitu sendi privatan mesaĝon por sciigi al ni vian opinion (alklaku la ligilon je la fino de la mesaĝo).
[DEU] Korpuspflegerkandidat für das Ungarische
Cabo hat sich beworben. Wie immer ist jeder eingeladen, sich hierzu in einer Privatnachricht zu äußern (auf die Verknüpfung am Ende dieser Nachricht klicken).
https://tatoeba.org/private_mes...rsichbaeumchen
This time it didn't happen.
This is an outcome, too, while probably the first time I ever see this.
Thanks to all the active Hungarian members providing support for the decision.
Hi @CK
Aren't the recordings of https://tatoeba.org/en/user/profile/CVjpn29 public domain?
The Common Voice licence is "CC-0 (public domain)" but I see the message "You may not reuse the following audio recordings outside the Tatoeba project because CVjpn29 has not chosen any license for them.", e.g. https://tatoeba.org/en/sentences/show/10930328 Audio says "License: No license for offsite use"
There isn’t an option yet to set the license of audio contributions to CC0. You should probably go with what’s written on the profile page. Then again, it would be trivial to add support for CC0, so maybe there’s a reason why it hasn’t been done yet...
ok, thanks!
** August 2023 Updates **
I've just updated a few things that I built for Tatoeba:
- Tatominer https://tatominer.netlify.app
- Tatolead https://tatolead.netlify.app
- Spread by Tatoebans ✨ https://tatoeba.org/en/sentences_lists/show/170280
- Rated as 'not OK' 🔴 https://tatoeba.org/en/sentences_lists/show/170380
- Rated as 'unsure' 🟠 https://tatoeba.org/en/sentences_lists/show/170383
- Pruned English ✂️ https://tatoeba.org/en/sentences_lists/show/171182
- JMdict - Japanese 🇯🇵 https://tatoeba.org/en/sentences_lists/show/171073
- JMdict - English 🇬🇧 https://tatoeba.org/en/sentences_lists/show/171072
More information about these tools at my profile page: https://tatoeba.org/en/user/profile/lbdx
Thanks for generating those lists, lbdx. I've been using them for a while (especially Pruned English), and I've been finding them very useful. 😊
Query:
https://tatoeba.org/en/sentence...+one%22&to=tur
One result:
eng
How did you know that was my favorite one?
tur
Onun en sevdiğim film olduğunu nasıl bildin?
Once again, I see a translation that, due to linking or what not, connected two sentences that aren't accurate translations of each other. The English sentence has "my favorite one", but the Turkish translation has "my favorite film" (en sevdiğim film)!
Honestly this feature/bug sucks.
I think you should leave a comment like “@unlink #5849169 #5874169” on both sentences. Advanced contributors and above can easily unlink sentences.
OK, but when I raised a similar issue before, defenders of the "feature" thought it was fine to leave the sentences as they were:
https://tatoeba.org/en/wall/sho...3message_38960
The last time you complained, the sentences weren't linked as translations of each other. This time they are. So the situation is different.
This linked sentences stuff seems to be something that the developers of the site care about. As a regular user, all I care is that search results return sentences that are consistently accurate translations of each other. I don't want to see "indirect sentences" and what not. Those should be suppressed by DEFAULT.
If you allow to just put names everywhere I just put all names what can be used in Hungary here. To that niiice and full representation.
"Avoid using the same words, names, topics, or patterns over and over again. "
my ass
Phirsh says that it's desctructive that I can finally represent ALL names in the corpus.
Why do people allowed to represent their names, but not me?
You probably Hungaro-phobes. :,D
(you know well it's a parody, right?)
I love Hungary and the Hungarians, and you're very welcome to have as many different names in your sentences as others do. It's only fair. Just please try to be creative.
Zulejka evett.
Zosja evett.
Zorka evett.
Zorinka evett.
Zóra evett.
Zonga evett.
Zomilla evett.
Zoltána evett.
Zolna evett.
Zója evett.
Zoé evett.
Zoárda evett.
Zizi evett.
Zita evett.
Zinajda evett.
Zinaida evett.
Zina evett.
+ many more
This is not a good way to represent all existing names. The interestingness of the sentence as a whole should outweigh the interestingness of the names used in it. We may not always succeed, but it's important that we keep trying.
On the page where you can enter new sentences, it says:
"We like diversity. Unleash your creativity! Avoid using the same words, names, topics, or patterns over and over again."
This is really nothing personal, Cabo. If you catch others who don't seem to be aware of this, please feel free to politely draw their attention to it.
Yeah, we can see the creativity and interestingness in other sentences...
Politely draw their attention? :DDD
Yeah, if ever worked.
There is a rule/guideline here that says:
Do not intentionally add bad or confusing sentences in order to make a point.
https://en.wiki.tatoeba.org/art...how/guidelines
You know why did I tell last time I'm leaving, because fucking this.
I would delete all the duplicates with different names. There's is no need for them.
And there is no need for me writing duplicates, but now I'm joining them. And I will be the best of writing duplicates.
On Tatoeba, the vandalism of a few slowly outweighs the genuine efforts of the many 😢
A Tatoebán néhány ember vandalizmusa lassan felülmúlja sokak valódi erőfeszítéseit. (Ibdx)
Kár, hogy éppen egy olyan magyar (is), aki úgy gondolja magáról, hogy
alkalmas magyar korpuszfelelősnek.
Az életben vannak írott és íratlan szabályok. Ez igaz a Tatoebára is.
Amikor szolgái aludtak, jött az ellensége, és konkolyt szórt a búza közé, aztán elment. ....Hagyjátok, hadd nőjön mind a kettő az aratásig! Aratáskor majd szólok az aratóknak: Előbb a konkolyt szedjétek össze, kössétek kévébe és égessétek el, a búzát pedig gyűjtsétek csűrömbe!” Mt. 13,24-
Ha végeznéd a korpuszfelelősi munkád, akkor nem jelentkeztem volna.
Ráadásul jegaevi is inaktív, aki szintén korpuszfelelős lenne.
A konzultáció meg van nyitva, és majd Pfirsich döntést hoz.
Írd meg nyugodtan aggodalmaidat: https://tatoeba.org/en/wall/sho...#message_40060
Mit kellene tennem, mint korpuszfelelős a tegnap beírt 'számtalan'
Abélia
Abiáta
Abigél
Ada
Adala
Adalberta
Adalbertina
Adalind
Adaora
Adél
Adela
Adéla
Adelaida
Adelgund
Adelgunda
Adelheid
Adélia
Adelin
Adelina
Adelinda
Adeliz
Adeliza
Adema
Adeodáta
Adina
Admira
Adolfina
Adonika
Adóra
Adria
Adriána
Adrianna
Adrienn
Adrienna
Adrina
Áfonya
Áfra
Afrika
Afrodita
Afrodité
Afszana
Agapi
Agáta
Ági
Aglája
Aglent
Agnabella
Agnella
Ágnes
Agnéta
Ágosta
Ágota
Agrippína
Aida
Aina
Ainó
Aira
Aisa
Aisah
Ajándék
Ajla
Ájlá
Ájlin
Ajna
Ajnácska
Ajnó
Ajra
Ajsa
Ájszel
Ajtonka
Akaiéna
Akilina
Alamea
Alaméa
Alana
Alba
Alberta
Albertin
Albertina
Albina
Alda
Áldáska
Aldea
Álea
Aléna
Aleszja evett. stb. mondatoddal?
Javaslom első lépésben ezek TÖRLÉSÉT.
Észrevetted, ami tegnap lett beírva, de rengeteg hibás mondatot leokéztál évekkel ezelőtt, valamint a múltkoriakban is már tucatszámra szóltam mondatoknál (már másodjára, mert egy évvel azelőtt is írtam), hogy javításra szorulnak, azt nem sikerült észrevenni.
Sztem elsősorban mint korpuszfelelős a @change-es mondatokat kellene javítanod.
(Visszautasítható pozíció.)
Valóban jó lenne, ha vki a @change-mondatokat kijavítaná és nem a szomszédba kellene futkosni szívességgel.
Ez most nem arról szól, hogy tudsz-e valami személyeskedést kitalálni másra.
Az, hogy valaki nem végzi a "munkáját" (lol, önkéntesek vagyunk) megfelelően, hol válasz arra, hogy amit csinálsz, konkrétan kimeríti a terrorcselekmény eszmei és erkölcsi tartalmát? Sem neked, sem senki másnak nincs joga vandálkodni csak azért, mert valami nem a szája íze szerint alakul.
Ez teszi tönkre a Tatoebát, nem az, hogy valaki nem javított ki X mondatot. Kiöli belőle a jóhiszeműséget, ami nélkül nem lehet önkéntesekkel egy ilyen léptékű projektet működtetni.
Minden negatívumot, pozitívumot megírhatsz itt is: https://tatoeba.org/en/wall/sho...#message_40060
1. Tee lista unkarilaisista nimistä.
2. Aina, kun lisäät lauseen johon tulee nimi, käytä seuraavaa nimeä listalta.
Näin saat luonnollisella tavalla vähitellen käytettyä kaikki unkarilaiset nimet lisäämättä valtavaa määrää tavallista tylsempiä lauseita.