Wall (5,929 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
2 hours ago
3 hours ago
3 hours ago
4 hours ago
5 hours ago
5 hours ago
7 hours ago
12 hours ago
delete this word! Because this is not sentence
It will be deleted, thank you.
Please thank me for turkish contributions
Man, are you really 12 year old (based on your profile)?
Yes, I'm 13 years old.
but not other
Thank you for the Turkish translations and example sentences you have added to Tatoeba.
Boracasli, you are a strange cookie. ;=)
Hah! Too funny...
...But only thanks for thousands (1000, 2000) are written in “What’s new”, not for 1500...
I haven't followed all of the discussions around tags but I thought I would throw some of my ideas on how to improve tagging.
1) tag translation
Right now, it seems that the preferred way of doing things is to use English tags as opposed to translated tags. We thus have some tags such as "Canadian French" which is about as awkward as if we had tags named "Anglais britannique" and "Anglais américain". Unilingual tagging will prevent Tatoeba from being truly multilingual as it makes English a required language for tagging purposes.
The best solution would be, as for everything else on Tatoeba, to have tag translations. They could be handled by an interface similar to the sentences and be freely translatable by anyone.
2) special tags
Given their popularity, I think that a few tags could be given special status and be treated separately:
a) proverb, aphorism, saying, quote tags.
proverb is one of the most popular tag, quote is another one.
b) source tags. Create a special tag/input field. When something is identified as a quote, this special field would be used to enter the source. Source tags are some of the most popular tags and right now they're quite a mess. Some use "by-", some don'; some use the translated name of the author, some use the English name. As an option, you could add a second field to identify the translator. This would be useful when quoting say, a translation of Plato.
c) date tag: Add a special field for sentences identified as a quote to allow the user to input the approximate year that the quoted work was published in. This would be useful, both for users and for administrators who want to keep track of copyright issues. I've seen some users, such as Paul, append the date as a comment and there have been some debates that might have been settled by the use of a date tag.
d) Male and Female tags: very popular
e) Certain administrative tags: Delete required, Check by native speaker required, change required. Very popular tags currently. It would be simpler to have official tags for these.
f) Length of sentence: More of a neat feature, but sentences could be automatically classified by Tatoeba as "short" "medium" "long" or "extra-long".
g) XXX, offensive, PG13 tags: There should be special tags to handle these type of sentences. Tatoeba could then automatically exclude these sentences from searches/front page, so that a new user does not stumble upon them unless he explicitly chooses to turn off the "safe Tatoeba" feature.
h) Dialect/regional language tags: There could be a special field to indicate if a sentence is British English/Marne region French, etc. This is another popular tag.
i) Formality: A mildly popular tag but the subject of some recent discussions (e.g. about barbecues). It would be logical to be able to indicate whether a sentence is Formal, Familiar, etc. Honorable/Humble tags could also be added here or perhaps as a separate field
j) Archaic, old-fashioned, neologism, modern slang: tags to indicate how modern/archaic a sentence is. Again, a popular tag and it would not overlap with the date feature, as sentences can be old-fashioned yet written recently.
for tag translations, for the moment it's in English, because we don't have the translation system that will require, nor the time to implement it, so for the moment to avoid duplicate tags which only differs by language, it's better to choose english, as it's the language spoken by the majority of trusted users (the only ones who can tag yet)
Isn't the issue the very opposite ?
Most trusted-users are English-speaking because Tatoeba is in English ?!
It's because me and Trang speak fluently French and English, and most people speak English than French, because anyway tatoeba has always been in at least both this languages, and we always answered in French to people who posted in French.
But I agree this is something we should change, to enlarge the bases of trusted users / moderator. For this I think I will enlarge the wall system to several one, maybe one dedicated per language (it's still an idea), this way people will not feel "oppressed" by the fact most of messages on this wall(99%?) are in English.
"most people speak English than French" => C'est la chose la plus extravagante que j'ai jamais lu sur ce mur, sysko : 6 milliard d'êtres humains ne parlent ni français ni anglais ! Et évidemment, avec des principes pareils, ils ne risquent pas de venir...
English: Might have as many as 1.8 billion speakers.
French: More than 200 million total both native and second language.
With 6.801.400.000 people in the world, Tatoeba leaves alone more than 4.801.400.000 of them.
But we can't forget that we have trusted_users and moderators who can cover in a proficient way users who speak:
Mandarin (1,151 million)
Spanish (more than 500 million)
Russian (277 million)
Hindi (just Standard Hindi, 258 million)
Arabic (246 million)
Portuguese (240 million)
Japanese (132 million)
Dutch (25 million)
Is it enough?
1.8 billion is the number of people in the countries where English is an official language. It does in no way represent the number of English-speakers in these countries. India, for example, has 1.2 billion of these so-called "English speakers", except only 2 to 3% of the indian population actually do speak English. Most speak Hindi and Urdu and more often, one of the other 400 languages of the subcontinent. So most of the time, they are trilingual without even counting English...
So this number is just extravagant.
All serious references account bu less than a third of this number, ie between 450 and 550 million speakers.
tu m'as mal compris
trang et moi parlons français et anglais, donc malheureusement nous n'avons pu dans un premier temps que fournir le site à des francophones, et des anglophones. Partant de cela, les premiers contributeurs / utilisateurs n'ont pu être que des francophones ou anglophones. Et ces derniers étant plus nombreux, il y a rapidement eu plus d'anglophones, qui ont attiré d'autres anglophones etc.
Bien sur que si je parlais 50 langues, nous aurions mis tatoeba dans ces 50 langues. Mais comme tu vois, Trang et moi ne sommes pas sectaire, et c'est avec grand plaisir, que nous avons ouvert la possibilité au utilisateurs, voulant étendre l'accessibilité du site à des non francophones et non anglophones (justement en pensant à ces 6 milliards de personne, ou au simple fait qu'il est toujours plus agréable de pouvoir avoir une interface dans sa langue natale, surtout pour un site qui propose du contenu multilingue) de traduire le site en lui même.
donc j'avoue ne pas avoir ou nous avons des principes qui repousseraient l'adoption de tatoeba par des non anglophones, francophones ? Mis à part que par la force des choses, Trang et moi ne pouvons produire l'interface qu'en anglais/français dans un premier temps, et devons relayer le travail à la communauté pour la traduction.
> For this I think I will enlarge the wall system to
> several one, maybe one dedicated per language (it's
> still an idea), this way people will not feel
> "oppressed" by the fact most of messages on this
> wall(99%?) are in English.
Meh, I think that would just needlessly complicate things. I don't feel 'oppressed' when there's a wall thread all in French or something. And there are more comments not in English than there are in English nowadays.
you mean the interface? it's in 9 other languages^^
Yeah, sure, and so are you !
I wish^^, but we've got active contributors in at least 9 languages and I'm sure they would be happy to answer new user's questions in their respective languages...either in comments or on the wall...I know I would/already do :)
Well, I'm impressed. I agree with pretty much everything you've said - you must have read my mind.
in fact for some of them, it must no be tags, some other kind of meta informations. (tags are part of them)
> XXX, offensive, PG13 tags
IMHO not only PG13, but PG<anything>.
Also, I believe there should be a way to tag a sentence when adding it (or at least a checkbox ‘This sentence is offensive’); otherwise, sentences will still appear on the main page even if the filter is added, because it takes time (especially with my slow connection ^^) to tag the sentence as a rude one.
Please - set as "Esperanto"
Hi everyone, I've just noticed that a lot (but not all) sentences labelled as Slovak are in fact not Slovak, and a couple of sentences labelled as Russian are in fact not Russian.
Some "Slovak" examples:
I know they're not Slovak because, as a native speaker of Czech, I am able to understand Slovak. I don't understand these, though. They appear to be in some Southern Slavic language, possibly Serbian/Croatian or perhaps Slovenian.
Some "Russian" examples:
These can't be Russian because, obviously, they're not in the Cyrillic script. Again they seem to be Serbian or Croatian, or perhaps Slovenian.
I wonder, what's going on here? Are these all language-detection errors?
It should be good now :)
There's just the mystery of there being 71 sentences in one and 69 in the other... I was sure I always added them in pairs. I wonder what happened.
Nevermind, it's just a flag display error on the main page.
Indeed, I updated it ^^
Thank you for bringing this up on the wall. I've gotten numerous comments on my sentences because of this, and so maybe it's better to just put the explanation here for everyone to see.
Yes, there are numerous sentences under "wrong flags". This is not a mistake. This is intentional. All the sentences you have mentioned are either in Bosnian or Croatian. Because those languages are not yet available on Tatoeba, there is no correct flag to put them under. If you look at the sentence tags, you'll notice that all of the sentences you highlighted are already marked "Bosnian" or "Croatian". They will be fixed and put under the proper flags once these languages become available.
You're right. Some languages aren't listed at Tatoeba yet, despite sentences having been added in them. These are sometimed incorrectly labelled until the language can be added.
These have, however, been tagged as sentences in another language. You should see the tag to the right of the sentence. If you come across a sentence with the wrong language and isn't tagged, you can comment on that sentence (or, in particular if you find a large batch of these post them here as you've just done).
The system doesn't say when the tag was added so there's a chance the tags were added after you posted this here. In that case; thanks for bringing it to the attention of the community.
I see. These sentences are work in progress. Nothing to be alarmed about, then.
Saluton al ĉiuj, mi estas 78 aĝa emerita pensiulo, Mi esperantistiĝis en 1957. Mi estas kompilanta Esperanto-Vjetnama vortaron. Krome Esperanto, mi parolas la vjetnaman (mia denaska linvo), la francan (tre bone), la anglan (sufiĉe bone). Pezuras al mi respondi9 al ĉiuj viaj demandoj. Thu (aŭtuno)
Ni estas feliĉa vi estas tie ĉi!
(my Esperanto is not good ^^')
[not needed anymore- removed by CK]
We will do this in the new version, (yeah it's my new "How can I make people wait sentence :p" )
To be honnest the code which manage the user profile is a bit crappy, and the time it will take is too long compare of the use "I'm talking for the do not show transliteration"
For the point 2, the new version will use a totaly new architecture, which will permit really easily this kind of "only what I need" filtering.
Tatoeba 'Back door' request.
I would like a url like the
ones but where you can use parameters to determine what is shown.
e.g. something like ...
* Show sentence 311760
* Show translations (trans=1)
* Do not show indirect translations (indirect=0)
* Do not show transliteration (or furigana) (translit=0)
I think this would be useful when using this site as a learners reference from third party sites (or programs).
it's typically what will be the purpose of the API :)
Hi, I'd like to share two of my ideas.
1. Export to csv is a great feature! However, I had a little problem with that csv - when I opened it in MS Excel, it was not recognized as a Unicode file, so non-Latin letters were displayed incorrectly.
After I opened and re-saved it in Notepad, Excel correctly recognized the encoding of the file.
I compared the original CSV and the re-saved one, and I noticed that the latter contained extra three bytes at the beginning with hex codes 0xEF, 0xBB, 0xBF. I'm not very skilled in the technical stuff, but I believe this is some attribute that indicates the encoding of the file.
If you could add these three bytes by default that would be great! If not, it's also ok with me - notepad is always handy.
2. Could you please implement the notification for new personal messages? We've already discussed this with Demetrius. He's told me that this hadn't been implemented because of potential spam threat, but maybe you could make this checkmark unchecked by default, and only those who are fully aware of what they are doing will check it?
It's really convenient to know immediately that you have a new message.
That's all for now :)
Many thanks to the Tatoeba team for such a great project!
> three bytes at the beginning with hex codes 0xEF, 0xBB, 0xBF.
This is all about the BOM. Microsoft programs loves the BOM, but many Linux, Unix and Mac OS programs hate the BOM.
Technically the BOM is not required for 8-bit Unicode (which Tatoeba uses) but it is still valid.
There are a bunch of free utilities around on the web to add/remove BOMs.
(P.S. BOM = Byte Order Marker).
> when I opened it in MS Excel, it was not recognized as a
> Unicode file
This is a known Excel bug (= Excel sucks ;-). Here is how to import a file in Excel 2007 (and later).
1. Create a new, blank Excel workbook.
2. Click 'Data'
3. Click 'From text'
4. Select file and follow instructions.
I've just tried it - and it works, thanks for another way around.
But for most of the users (including me) it's way faster and easier to click "open" after the file is downloaded, than opening Excel and opening it through the Import feature. With these three bytes that I mentioned Excel opens the file fine.
However, as I've already mentioned, I'm not a technician, maybe they work only for Windows or maybe there are other potential problems.
this site can be so useful for the new learners,,,but it still has some empties,,,i really need such a site,,but i wish the establisher of this site,,make this site perfect,as its name tatoeba,,,,thanks alot,all the best ...............
Just about everything on this site is open source and available for use. You can take a copy for yourself and do what you want with it (within the terms of the license).
But the best way to make this site perfect is to take part in improving it. For example, by helping the Tatoeba interface support more languages.