menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
saeb saeb June 3, 2010 June 3, 2010 at 5:39:10 AM UTC link Permalink

I've got a lot of questions (banning me is always an option :P):

-what's Tatoeba's position on using sentences that are: political, controversial, etc...?
-how about ones that express conspiracy theories?
-can contributors mine sentences from sites like Wikipedia, wikia, etc..?
-have you considered just dumping a chunk of sentences
from these sites into tatoeba?
-how about tagging vocab and grammar points in sentence comments and linking them to sites that explain them...would you recommend it? would a specific format be easier to integrate later on?

{{vm.hiddenReplies[1082] ? 'expand_more' : 'expand_less'}} hide replies show replies
kellenparker kellenparker June 3, 2010 June 3, 2010 at 7:32:59 AM UTC link Permalink

you're so banned.

Things are already political in a way. For example: It's illegal in China (where I live) to show the blue Uуɡhuɾ flag used on the site based on it representing a ѕераɾаtіѕt movement. I completely agree that it's the best choice to use it to represent the language, but any time I'm looking at sentences here I'm technically breaking some law here. So if/when they add Τіbеtаn (please add Τіbеtаn), same thing. Use the Τіbеtаn flag? I think they should, but it's still potentially a politically charged decision.

{{vm.hiddenReplies[1084] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 3, 2010 June 3, 2010 at 4:11:46 PM UTC link Permalink

Сап Татоева ве ваппеd iп Сһiпа sооп?..

{{vm.hiddenReplies[1109] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 3, 2010 June 3, 2010 at 4:16:08 PM UTC link Permalink

Probably, if we try hard enough.

sysko sysko June 3, 2010 June 3, 2010 at 9:59:38 AM UTC link Permalink

For mining from other website, you can as soon as the content is licenced under a CC-BY compatible or you've got the authorization of the original author to do so (for the last point a email of that guy with say "Me , XXXX YYYY authorized the use of my content under a CC-BY licence by tatoeba etc." would be enough, we're adult people uh?)
for Wikipedia unfortunately the content is licenced under CC-BY-SA and the SA (which mean you can reuse their content only if you keep under the CC-BY-SA licence) is incompatible with CC-BY (which authorize people to make derivative/extanded works based on ours under the licence they want)

for grammar point and so, we will add in few times a tagging system, linking some of this tags to grammar page explaining this, can be considered, but I first prefer to see how we will integrate it etc. but it can be a good idea :)

for the political/controversial sentences, maybe for legal reason we will need to add something like "people are responsible of what they post", but for myself I think we mustn't forbid them for a simple reason:
if we begin to forbid some, what will be the limit betweem a "not controversial one and a controversial", there will always have a category of people to find a sentence controversial/illegal, and them we will have loooooong and looooong discussion about "hey why this sentence is authorized and why this one is not?"
moreover I don't know about you but I find these sentence the most interesting because if they're controversial it's because they're real sentences about a given point of culture, and they have more value for someone than "I'm eating an apple". I didn't join tatoeba to make it a database of puritan-ready sentences
furthermore tatoeba is an "example sentences" website which mean it's here to give you all the kind of sentences you're likely to hear/read/say in a given languages, banned "controversial" one will be truncated this and it will not reflect the truth, only the "politically and moraly correct" (I want to be able to find sentences with "fuck" etc.). and example sentence means it's to show you an example of use, not to convince you about the idealogy one sentence can contain (otherwise we removed all the bible sentences, quote from philisophy books etc.).
I hope people who come on this website, and who will come, are adult enough to understand that, and I think things will balance themselves, you will be able to find controversial sentences which desagree each other. and you will have somewaht a status quo
I think our role is only to keep the controversial part in sentences only, (i.e comments like "Jesus sucks")

I will chat with trang and give you the general point of view,
"be conservative in what you send, liberal in what you accept"

(for tibetan, if you find someone able to contribute in it I will be glad to add it :) )

MY (have to discussed a lot longer with Trang and the others to tell you what is TATOEBA
s one) position

{{vm.hiddenReplies[1088] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 3, 2010 June 3, 2010 at 10:00:50 AM UTC link Permalink

MY (have to discussed a lot longer with Trang and the others to tell you what is TATOEBA
s one) position => has to be read at the beginning (ohhhh a bug)

saeb saeb June 3, 2010 June 3, 2010 at 2:12:41 PM UTC link Permalink

all of this is really nice but let's push the argument to its limits...now, are you guys ready to accept sentences from sources like bin laden's speeches...

{{vm.hiddenReplies[1095] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 3, 2010 June 3, 2010 at 2:18:02 PM UTC link Permalink

Do a WWWJDIC search on XXX
http://www.csse.monash.edu.au/~...wwwjdic.cgi?10

I don't see that we should be any more afraid of political controversy than we are of that sort of thing.

I would, however, suggest that dubious content can be rejected if it does not come with accurate translations.

sysko sysko June 3, 2010 June 3, 2010 at 2:35:00 PM UTC link Permalink

in fact at least the content, as me and Trang are French, the server is hosted in France, need to comply with french law, so the thing that we cannot (as long as we will be hosted in France) have on tatoeba:
*negation of the Holocaust
*invitation to violence and racial anger
* (maybe other things, I will search on the subject, anyway I first need to have a serious discussion with trang about this, we will make an official blog post about this I think

for bin laden's speeches as long as they comply to French law about content, I have nothing special against it (as ever for the moment it's only MY opinion), and anyway I think they're copyrighted :p (except if one's can provide me an autorization from bin laden ;-)
my position on this kind of sentences is
“I disapprove of what you say, but I will defend to the death your right to say it”
they're part of what one can want to know how to say, and as said, the problem with limits is that you need to said arbitrary "this can done" "this can't'" . And as said, it's only example sentences, we need a dsclaimer for this, so we hope people to browse the sentences with that in my mind, the goal of tatoeba is not to say "all our sentences are THE truth" but only "this is a set of sentences one can say in those languages"

after when adding this kind of sentences, having them tagged "controversial" and a comment explaining where they come from, in which context etc. can help

{{vm.hiddenReplies[1098] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 3, 2010 June 3, 2010 at 3:23:21 PM UTC link Permalink

BTW, what about quoting?

E.g. Is «Путин говорил, что террористов нужно „мочить в сортирах“» (Putin said, that it’s necessary to ‘soak’ the terrorists ‘in the john’) a possible sentence? Does it violate ©?

{{vm.hiddenReplies[1102] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 3, 2010 June 3, 2010 at 3:31:28 PM UTC link Permalink

Depends of what say copyright about quoting in the country of the author of the quote, to be honnest it's a legal question, and I haven't so much knowledge on this, so we need to search about this

saeb saeb June 3, 2010 June 3, 2010 at 3:24:34 PM UTC link Permalink

it's all there...anti-semitism, holocaust denial, invitaion to violence, and much more goodies that can get you jailed in any country...that's if the CIA didn't get a hold of you first and send you to guantanamo. so sysko...still willing to "defend to the death" :P

{{vm.hiddenReplies[1103] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 3, 2010 June 3, 2010 at 3:33:17 PM UTC link Permalink

yep so please avoid sentences that can get me jailed in my country :p (French jails are worst than guantanamo, ohhhhh mr sarkozy I was only joking, noooooooooo saeb help me don't let them catch me nooooooo)

{{vm.hiddenReplies[1107] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 3, 2010 June 3, 2010 at 5:15:57 PM UTC link Permalink

he he...Imagine if I added some of those distorted qura'anic verses in salman rushdie's the satanic verses...you'll probably be assasinated even before the CIA gets you :P

quoting from wiki: "...others connected with the book have suffered violent attacks. Hitoshi Igarashi..Japanese language translator..was stabbed to death..Ettore Capriolo..Italian..translator..seriously injured in a stabbing..William Nygaard, the publisher in Norway, barely survived..Aziz Nesin..Turkish..translator, was the intended target in the events that led to the Sivas massacre"
source:http://en.wikipedia.org/wiki/Th...es_controversy

{{vm.hiddenReplies[1120] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 4, 2010 June 4, 2010 at 9:59:00 AM UTC link Permalink

Saeb, would you please be so kind to translate the sentence 398986 into Arabic? ;)))

{{vm.hiddenReplies[1131] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 12:04:54 PM UTC link Permalink

done :)

Demetrius Demetrius June 3, 2010 June 3, 2010 at 2:39:54 PM UTC link Permalink

Only if they are correct Arabic. ;)

Nec Cæsar suprā grammaticōs.

{{vm.hiddenReplies[1099] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 3, 2010 June 3, 2010 at 3:15:04 PM UTC link Permalink

give caesar a break..latin was hard anyway ;)
btw his speeches are not only perfectly correct arabic but are also very eloquent...[I hope I don't get jailed for this :P]

kellenparker kellenparker June 3, 2010 June 3, 2010 at 4:17:20 PM UTC link Permalink

I knew a guy named Jesús. He WAS kinda a jerk, but this is hardly the forum to air those grievances.

And all the successful terrorist leaders are eloquent. Otherwise no one would listen.

Demetrius Demetrius June 3, 2010 June 3, 2010 at 3:13:01 PM UTC link Permalink

Yet another question. What about orthography?

What if I were to add Pushkin’s sentences in the way he wrote it, with lots of obscure letters? ;)
«Цвѣтокъ засохшiй, безуханный, забытый въ книгѣ вижу я».
(Modern Russian: «Цветок засохший, безуханный, забытый в книге вижу я».)

We do have inconsistences in orthographies already: British and American sentences. Also, I mark macra in Latin sentences, while Muiries does not (btw. we have i/j and v/u too :))). Now I abstain from writing Cæsar and pœna, but it’s so tempting to use these wonderful ligatures...

{{vm.hiddenReplies[1100] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 3, 2010 June 3, 2010 at 3:29:34 PM UTC link Permalink

the same with French, with accented upper case letters, (À but a lot of people write it A as they don't know/care about how typing it on windows), we also have this for cœur / sœur etc.

there I will say as long as both way are accepted/considered as correct, you can add as you want, I think it's a problem on my side to make my "delete duplicate" script able to handle this, for the moment the search engine already handle this.

{{vm.hiddenReplies[1104] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 3, 2010 June 3, 2010 at 3:31:39 PM UTC link Permalink

BTW, auto-detection doesn't work with Latin if it has macra.

{{vm.hiddenReplies[1106] ? 'expand_more' : 'expand_less'}} hide replies show replies
kellenparker kellenparker June 3, 2010 June 3, 2010 at 4:17:58 PM UTC link Permalink

If we're worrying about orthography then saeb needs to include harakat :)

{{vm.hiddenReplies[1112] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 3, 2010 June 3, 2010 at 4:26:56 PM UTC link Permalink

*debate!!* get over it kellen, the crushing majority of written arabic media(books,news,etc..) DOESNOT use harakaat...I personally find their use characteristic of children's books...haven't had them in my arabic school books since grade 5...they're just there for kids to sound out words...

{{vm.hiddenReplies[1113] ? 'expand_more' : 'expand_less'}} hide replies show replies
kellenparker kellenparker June 3, 2010 June 3, 2010 at 4:31:13 PM UTC link Permalink

yeah i know. im just messing with you.

{{vm.hiddenReplies[1114] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 3, 2010 June 3, 2010 at 4:35:59 PM UTC link Permalink

ah man I hope I didn't offend you :P
btw I really want to add them...I know how invaluable they can be to arabic learners :)

{{vm.hiddenReplies[1116] ? 'expand_more' : 'expand_less'}} hide replies show replies
kellenparker kellenparker June 3, 2010 June 3, 2010 at 4:39:13 PM UTC link Permalink

no you didn't. i assumed prefacing it with *debate!!* signified not being serious.

learning is useful, but my motivation is perhaps more selfish: with harakat my transliteration script is better. i'm not cool enough to have it run through tashkeel first, though that would be badass.

{{vm.hiddenReplies[1117] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 3, 2010 June 3, 2010 at 5:35:15 PM UTC link Permalink

yes indeed Mr. Parker as sharp as always :P

I think I finally figured out why japanese and chinese sentences look so cool...it's that extra line underneath...If I could just get my hands on one for arabic..it would totally be badass :P

TRANG TRANG June 3, 2010 June 3, 2010 at 9:06:23 PM UTC link Permalink

> what's Tatoeba's position on using sentences that are:
> political, controversial, etc...?
> how about ones that express conspiracy theories?

I obviously have the same opinion as sysko on this. But if question is whether we're ready or not, we're not ^^
Which doesn't prevent you to add such sentences.

But we will be more ready when we have at least tags and a way to filter out by default the controversial sentences. We can then let users search, browse or export them only if they have checked a little box.

Anyway, I'll write a blog post about it, to make this more "official".


> how about tagging vocab and grammar points in sentence
> comments and linking them to sites that explain
> them...would you recommend it? would a specific format be
> easier to integrate later on?

You can link to other websites in the comments but that may flood a little bit too much the comments.

I'd rather recommend to create lists for each grammar point, and add the sentence to the corresponding list(s). Then, in the title of the list, you add the URL to the page explaining the grammar point.

I think that's the best thing you could do.

TRANG TRANG June 3, 2010 June 3, 2010 at 10:17:17 PM UTC link Permalink

Well actually concerning the grammar thing... There's some hope to have the tags done soon (not this weekend but the next one?). So it's probably better to wait for us to integrate tags :)

{{vm.hiddenReplies[1126] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 5:13:03 PM UTC link Permalink

Don't push yourself to get this implemented. I never wanted to come across as a person who would nag *sighs*. It's just that sometimes I find myself 'forced' to link to grammar websites in order to convince s.o. to correct his sentence...so I had to ask If doing it in a certain format would make it easier for you to integrate later on...

{{vm.hiddenReplies[1139] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 5:25:18 PM UTC link Permalink

missing the time when I was the only one contributing to arabic :P...I guess we moved on from being in the feudal age.

saeb saeb June 4, 2010 June 4, 2010 at 5:23:47 PM UTC link Permalink

one more question (I'm so getting sacked for this), do you advise against linking in comments? If so..care to explain a bit?

{{vm.hiddenReplies[1140] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 4, 2010 June 4, 2010 at 6:20:41 PM UTC link Permalink

> Don't push yourself to get this implemented. I never
> wanted to come across as a person who would nag
> *sighs*.

No worries, we didn't take your question as nagging or anything :P

We had already started talking about the tags a few weeks ago, and as far as I'm concerned I was expecting to have something by the end of June. But I guess it will be earlier.

It's something we all want to have soon anyway. It has a lot of benefits while not being over-complicated to implement :)


> do you advise against linking in comments?

Do we have any reason to advise against...? I mean, okay, if you're going to link to child porn or things like that, of course we're against.

{{vm.hiddenReplies[1143] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 6:38:36 PM UTC link Permalink

one more thing...can I get an official statement on Arabic contributions to be strictly 'standard arabic'...so if anyone wanted to add colloquial sentences you'll just add it as a new language...

{{vm.hiddenReplies[1144] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 4, 2010 June 4, 2010 at 7:18:26 PM UTC link Permalink

yep strictly "standard Arabic", in fact we're relying on ISO 639 ALPHA-3 code, when taking this kind of decision (i.e "do we need to add it as a separate language in Tatoeba, though we're open if iso 639 alpha3 make no difference but someone is able to explain us why it would need to be separated)
and for arabic there is pleeeeeeenty of iso code, so saeb We will give you a mission
it would be nice if you can warn this to potential Arabic contributors, and report us when a new "local" Arabic is added :) this way we will add ASAP the corresponding code in our database

{{vm.hiddenReplies[1145] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 7:34:10 PM UTC link Permalink

oh c'mon sysko this is serious :D
'standard arabic' and colloquial arabic...HUGE difference (yes kellen wanna have a debate about it :P)
I could only imagine that it would be confusing to learners if Tatoeba had a mixture of standard and colloquial. besides colloquial arabic has a lot of dialects that sometimes are not mutually intelligible (an arab who doesn't know the dialect won't understand). now do you want to have that all under one arabic language? your choice sysko :P

saeb saeb June 4, 2010 June 4, 2010 at 7:35:39 PM UTC link Permalink

I think I discussed something similar with TRANG when I asked for my native tongue to be added as a separate language on Tatoeba...

saeb saeb June 4, 2010 June 4, 2010 at 8:52:22 PM UTC link Permalink

sorry sysko...now I get what you're saying :)...I've been doing this from the start so it's alright...and my understanding is that all these codes are because IBM (and others) were finding smarter ways to encode the arabic script (and other abjads)...the only sound that's in some dialects but not in the standard is the 'v' 'e' accented sounds AFAIK...but they're included in ara so I'm guessing it should work for any dialect....

TRANG TRANG June 4, 2010 June 4, 2010 at 8:21:36 PM UTC link Permalink

Yes we can make it official that it is standard Arabic.

Also I'm not sure I understand your reply to sysko ^^ Perhaps he wasn't clear enough, but what he meant to say was that we understand there are many variants of Arabic. We will add any "variant" of Arabic as a new language (as long as there's an ISO code for it).

Even in case there is no ISO code for a dialect, we can consider to add it as a new language (and making up our one code) if it's really justified to do so.

He gave you the task to warn us in case someone is adding sentences that are not in standard Arabic, so that we can create a new language for, it if possible.

(because of course we are fluent in Arabic but we don't have much time to check everyone's sentences, you know :P)

And I just checked the various codes available:
http://www.sil.org/iso639-3/doc...ion.asp?id=ara

It appears that we would have to change the code to "arb", to make it officially standard Arabic.
We originally took "ara" which is grouping all the Arabic languages.

{{vm.hiddenReplies[1150] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 8:25:47 PM UTC link Permalink

ok I don't quite get this why need any different encodings? I mean I'm perfectly fine with using my arabic keyboard to type in ANY dialect I know...

{{vm.hiddenReplies[1151] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 4, 2010 June 4, 2010 at 10:06:05 PM UTC link Permalink

to be sure everything is clear, the iso code i'm speaking about is only a convention (a standard in fact) to give each language/dialect an non-ambigous AND international AND computer-friendly code,
more or less the same things which exist for countries (FR /GB/CN etc.)
so we're not talking about UTF8/ASCII etc. , just the way we store the language in the database (the unique 3-letters for each languages)

Hope it's clear now (but I fully understand it's not something obvious when you're not working in the backoffice of tatoeba ^^)

saeb saeb June 4, 2010 June 4, 2010 at 8:27:02 PM UTC link Permalink

this is the funniest misunderstanding ever :D!

{{vm.hiddenReplies[1152] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 4, 2010 June 4, 2010 at 8:43:31 PM UTC link Permalink

:P

saeb saeb June 4, 2010 June 4, 2010 at 10:06:24 PM UTC link Permalink

anyone know where I could find the char sets for all these codes I need to compare ara and arb...

{{vm.hiddenReplies[1157] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 4, 2010 June 4, 2010 at 10:14:55 PM UTC link Permalink

I don't think there is a charset difference as this code are purely linguistic :)

ara is a macrolanguage which mean the "set" of all this languages (so not a language by itself) and arb is the "standard" """""dialect""""" of Arabic

{{vm.hiddenReplies[1158] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 4, 2010 June 4, 2010 at 10:30:37 PM UTC link Permalink

thx :) [god sometimes I'm really stupid]

saeb saeb June 4, 2010 June 4, 2010 at 8:06:46 PM UTC link Permalink

ok here's a more reasonable proposal...Tatoeba recommends adding sentences in 'standard arabic' and requires all colloquial sentences to be tagged with the name of the dialect...then if we have enough sentences in a dialect (or dialect family) add it as a separate language.

{{vm.hiddenReplies[1149] ? 'expand_more' : 'expand_less'}} hide replies show replies
kellenparker kellenparker June 5, 2010 June 5, 2010 at 7:45:42 AM UTC link Permalink

I second that.