2010-02-19 17:49
Finding contributors (to the code): Why don't you guys (& girls) share (i.e. open source) your site's code on say github? First that would make this a truly "open source" project, and secondly people could help add features. Think about it.
Some guys here seem pretty eager to get their features implemented ;)
2010-02-19 20:29
Like sysko said, we are actually open source. The reason why it's not promoted anywhere is because:

1) The code hasn't met my standards of elegance yet... Still too many parts that make me cringe when I look at them.

2) We still don't have a sound methodoly and organization in our way of working and I really don't have time to manage more people ^^'

(we're in PHP though, sorry :P)
2010-02-19 20:48
Good enough. Desolé for the PHP part, but then it's self-inflicted :)
2010-02-21 16:21
Something this thread brings up, the user interface should be stored in a cookie, not as part of the URL. French is easy enough, but what if I click and I'm new and/or don't read Chinese? I'm stuck and have to go back, close the window, or start over at the main page. Quick and Dirty fix would be change all of the languages to something like "English - English; Francais - French; zhongwen - Chinese; nihongo - Japanese" etc. (Not suggesting they be romanized, I just don't feel like dealing with my IME.) Not trying to be Anglocentric, but most people can decipher language names in English.
2010-02-21 16:59
it is stored in the session,

moreover at the top of each pages you have a menu to change the language interface ;-) with the name of each language in it's own way (français,english, 中文 ...) and when changing it reload the page with the new language, and change the language of your session
and all new pages you want to show will be displayed in this language
2010-02-21 17:04
My point is, not everyone knows Chinese/Japanese. In fact the vast majority cannot read a single character. So they arrive at a page, seeing "中文" at the top is of little or no help, nor is "日本語". It's not a problem for you, not a problem for me, but sit some random people down at the page at tell them to navigate it.
2010-02-21 17:08
ok, agree, will see to make it clearer :)
2010-02-19 19:24
In fact the code is already available on a svn public repository (with read only access and a AGPL licence (I'm a bit FOSS fanatic), but I don't think we're against code contributors, so write access can be granted for motivated code-contributors)

after why it's not explicity written somewhere on the website, hmm I think (Trang has maybe much relevant reason than me) it's because the project lacks documentation and is in a rewriting / cleaning phase, so we prefer to show a pretty reviewed code ^^

but if you want to take a look, I can give you the repo in private message
2010-02-19 19:38
Wrt link to repo: depends on the language. I ain't touching PHP :p
2010-02-19 20:23
it's PHP :p
2010-02-19 20:45
Hehe, to be honnest the use of PHP is more about an historical choice ( i would like python too :p)
2010-02-19 12:48
Change Request

For the 'wish list' I would like to suggest a couple of features.

1. People who don't own a sentence can click an icon (check a checkbox) when posting a comment to make it an official request to change a sentence.

2. Add an extra line to the "Your Links" section of the profile.

* View all my sentences
* View sentences with change requests
* View my favorite sentences
* Sentences with undetected language

It is too easy for sentence owners to miss comments. For example I suspect fcbond hasn't noticed the comment I made in the following link.
2010-02-19 18:20
Maybe that's a bit too complicated. At least there are several other feature requests in the pipe, that might have more impact.
As you indicated in the 2nd note already though, adding a "news feed" for requests on one's sentences is surely needed.

I would propose to change your request into something with a broadend scope: "disown request" to disown sb by taking over their sentence. This could also be used on inactive users. Some time without reaction *zing* you go ahead.
hide replies
2010-02-19 18:28
I also was considering recommending a way of taking over neglected sentences, but I considered the 'change request' to be a less controversial idea. After all not everybody who's away for a month or two has actually given up.
2010-02-19 19:19
maybe in a more general way, we can make something "a la twitter", I mean, when you want a comment to be notify to someone, you just @userNickName, and maybe add a "feed" section in the profile

anyway in future release we plan to review a bit the architecture, add some category in the profile, and make your profile a more central page

what do you think ?

PS: for disowning, I will see with Trang how she planned to handle it (we used to talk about that, but i've a weak memory :( )
2010-02-19 20:21
The way I envision it is not really "disowning", it's more about finding a better owner if the current owner doesn't do his job (kind of like, if you're a bad parent, your kids are taken away).

But I still don't know what would be the best solution because we don't really have a real need for that yet. I mean, it's not very frequent that you'd want to disown someone from his/her sentences. Which means it's not an urgent feature either.
2010-02-19 20:06
Normally he fcbond should have received a notfication when you commented his sentence, but the notification system was broken :'( ...about 100 notifications that should have been sent but weren't, *sigh*.

Anyway, having a link to a page that lists the comments posted on your sentences is something I should have done a long time ago... When I decided to integrate an "ownership/adoption" system in replacement of the moderation system we used to have, it was obvious that users should be able to quickly access comments on their sentences.

But I think that having a checkbox would be add unnecessary complexity. People will usually read all the comments on their sentences, so there's no need to filter out specifically those that require a correction.
2010-02-19 18:13
+1 feature request:

Tag sentences for a special form. Say your language has several possible translations for a given sentence, tag these sentences for the given feature. Example: I would add one for informal you (french 'tu') and one for formal you (french 'vous').

A wiki would be nice to list those feature request, I think this "Wall" might get a bit to muddled.
2010-02-19 19:28
+1 for the wiki, in fact it's already in the todo list (in fact as for the code, we've also a ticket system for developper)

and you're true it can be great as someone has already talked about tagging

2010-02-23 01:23
I was kinda thinking a PHPbb install or a Google Group would be effective. I'd limit access though.
2010-02-18 16:11
Even if it's written in the terms of use, which is supposed to be accepted by everyone who contribute

it is FORBIDDEN to add sentence which come from books / dictionaries, for the simple reason it does not belong to you, and by the way, you can't deliver them under a CC-BY licence

thanks to take care
2010-02-18 19:07
What about books that are out of copyright?
2010-02-18 19:11
no problem if they're in the public domain, even if sometimes it's hard to say, espcially for book which are in public domain after its author death, as the period change from a country to another
but yep, no problem for copyright-free books, or for books you've written yourself, or for those the author give you the right to use extract in Tatoeba :)
2010-02-18 22:36
I take it, though, that you can't use "Fair use exemption" ?

I understand that whether fair use is included in copyright law depends on the country in question. (Japan doesn't, yet, but might soon )

Whether fair use applies also depends on a number of other factors (US case )
2010-02-18 22:50
for the "fair use", it will not say an absolute "NO, we can't". Because for example France, even without the "fair use" notion, authorize to make small quote of books as soon as the quotation is justified by the scientific, pedagogic information they provide to the work they're incorporated in. But after as we have no lawyer in the team, and to be honnest, other works we find more important to do than to check if we can have a safe "fair use".

Moreover I think it's better in a first time to say "no, no quotes from copyrighted books", to avoid quote from books which can not be considered as a "fair use", rather than playing with fire.

But one day, when Tatoeba will be quite near complete in feature (i.e when we will not have dozen of feature request to code and bugs to fix ^^), I try to see if it's possible or not, and then give you a clear and absolute answer.

so my answer "for the moment we will considered we can't"
2010-02-19 08:13
Fair enough.
2010-02-15 06:44
A->B or B->A

There doesn't seem to be an easy way to tell whether a pair of sentences are
A(Japanese) translated to B(English)
B(English) translated to A(Japanese)

I would like to make sure this feature is firmly placed in the wish list for future development.
2010-02-17 22:31
Is there a specific case where you would need this information?

One not too difficult way is to look at the creation date of each sentence (which you can see in the first entry in the logs). If A was created before B, then it must have been A->B.
2010-02-18 13:56
> Is there a specific case where you would need this information?

Not as such, but I can explain _why_ I want this information to be recorded / displayed.

One common use of the translated example sentences is to explain what the original sentence means. So it is quite normal to have, for example, obscure English translated (explained) into normal Japanese.

Imagine you have this:
A. Many a mickle makes a muckle. [Proverb]
B. 塵も積もれば山となる。

B is the Japanese equivalent (and 'translation') of A.

Some well meaning person might decide that hardly anybody knows what "Many a mickle makes a muckle." means and 'correct' it into a different English sentence.

A more common example would be the
A. I'll make you a present of a doll.
B. あなたに人形をお贈りします。

If B. is the original and A. the translation then you could well say that the English in A is a little odd and should be changed. If A is the original and B is the translation then you could say that the A demonstrates a somewhat old-fashioned phrasing and B explains is.
2010-02-19 02:52
In those cases it would be desirable to have both phrasings, with uncommon ones marked somehow, so users should be encouraged to add alternate translations rather than 'correcting' existing ones when the translation is not erroneous.

I think this could be handled by attaching more metadata to sentences. Properties like masculine/feminine, proverb, quotation, polite, slang, dated, etc. could be tagged onto a single sentence, so you could have e.g. both a polite and a colloquial translation, and mark them as such.

That would also address the problem where sentences from the Tanaka Corpus are currently marked with tags like [F] on the English sentence, even though the property belongs to the Japanese sentence. That doesn't work that well when the sentences aren't restricted to pairs.
2010-02-15 10:07
Which languages to add, and which not to add.

Just a short note to say that Wikipedia adopted the policy to only create Wikipedias for languages that have an ISO 639-3 Code. There might be exceptions. I think this decision helped them pretty much ease the process for new languages.
2010-02-16 14:14
Yep we used to have a discussion about that, and we don't plan to only add languages which have an ISO 639 alpha 3 code, wikipedia as to do this because they have tons of contributors, and an encyclopedia need much more data than just a database of example sentences, so I can understand why they don't prefer to have a lot of articles rather than a lot of dialects or so

but for us, I think as soon as the language is enough different to be not totaly intelligible with an other, then we can add it as a specific language (that's the case for shanghainese for example, the closest ISO 639 code is for Wu language, but the Wu language, for which shanghainese is a "dialect", is divided in some other "dialects" (even i don't really like the word "dialect"), which are not intelligible with shanghainese)

and as I think tatoeba can be used to keep a trace of language, especially endangered ones, I would find really pityfull to not add a small language, only because it has no iso 639-3 code (moreover I've heard iso 639-4 code will be released)
2010-02-16 10:14
How do I set the language I translate into?

If I click on "死ね" and try to add a translation, it comes up as German, while I mean it to be English, ...
2010-02-16 13:40
for the moment (but we will change it soon) you have no way to directly specify the language
But, if Tatoeba misdetects the language, then you simply click on the flag next to the incriminate sentence and set it to the right one. It can be done whenever you want, as soon as the sentence belong to you (like editing)
2010-02-16 13:53
Thanks. I tried clicking on the flag, I didn't realise I had to select the sentence first. I look forward to the new version where I can specify it at once.

BTW -- I seem to have added some rubbish (which I have marked with DELETE ME in the comments). Sorry.
2010-02-16 14:07
ok no problem for the duplicate sentence, they will be automatically deleted, and an admin will deleted the other one
2010-02-16 14:15
you could have direcly change the language on one of the sentence you've already added :p
2010-02-15 06:47
Changed from ...

Another 'wish list' item. When looking at the log of recently changed items I would like to be able to see what changed items have been changed _FROM_.
2010-02-14 13:48
Procedure when replacing (not correcting) sentences.


If a sentence is pretty much completely replaced it's position in the "Sentence X is translation of Sentence Y" system may need to change.

Sentence A (Japanese) was (allegedly) translated into Sentence B (English) which was in turn translated into Sentence C (German)

Japanese is noted to be unrelated to English and so completely replaced with a new Japanese sentence.
So _now_
Sentence B (English) is translated into
Sentence C (German) and also translated into
Sentence D (Japanese).

So A -> B -> C
changes to
B -> C
-> D
2010-02-10 21:02
Here's a nice user-test for the system:

Sentence nº164914 was a Japanese sentence I wanted to edit. I edited its English translation, which went well, but because responded too slow I accidentally added a new Japanese sentence (nº361150) instead of amending nº164914. I can't delete translations, so I just changed that old Japanese one to a Dutch translation of the sentence. Meanwhile, someone else added a German translation too! :)

* What happens to the indices for the former Japanese sentence?
* Should user be able to do this?

As a developer I suspect that changing languages on an existing (Japanese) sentence is bound to cause issues, but as a user it makes perfect sense to solve the issue I ran into. Your thoughts?
2010-02-11 18:51
> What happens to the indices for the former Japanese sentence?

As Paul said, it gets left behind. We don't have (yet) strong mechanisms that would help keep the database consistent.

> Should user be able to do this?

No, they shouldn't. Ideally, there should be guidelines (which I'm hoping to be able to write by the end of the month) to help users understand better how things work and how they can contribute in a way that doesn't give us (developers) more work than we already have ^^'
I wrote down some of the ideas in my comment here :

> Perhaps "nominate for deletion" could be added as explicit functionality?

Yes, in general, we could have various status for a sentence. Actually in the previous version of Tatoeba we used to have that, but I haven't re-implemented it. A sentence could be marked as "to delete", "checked" or "locked" (and perhaps other things, I don't remember). When a sentence was checked, it meant you could rely on it for not having mistakes. When it was locked, no one could edit it anymore.

But this is not urgent compared to other things we have to do. My priority at the moment is to make sure that people understand clearly that when they translate, they have to translate from the sentence written in big letters. I'm pretty sure that very often, people are adding translations to a Japanese sentence when they were actually translating from the English sentence.
We also have to enable people to link and unlink sentences. There are many sentences that are linked to each other without being translations of each other, and there are many sentences that could be translations of each other but are not linked to each other.

Once all of that is settled, and people understand that they have to view the corpus as a GRAPH and not a table, it will be less likely that they behave in a way that we don't want them to behave, like what you did. And perhaps "nominate for deletion" will not be *that* useful because instead of deleting, you could just edit your sentence into whatever you want and unlink it from any sentence it was linked to.
2010-02-11 19:20
Educating users is desirable of course, but opportunistic contributors will make mistakes. In the case of incidental contributions, not being able to delete an entry that should not have been created, nor nominate it for deletion is likely to frustrate the user. An alternative may be to offer a grace period for sentences you created yourself, being able to delete them within a certain period as long as they are not linked to by other new sentences.

On the topic of linking translations: is it possible to link a sentence to multiple sentences? There are many cases where the translated sentences actually do function as proper translations of each other, as well the sentence they are linked to.

Visualizing the graph is challenging within the confines of HTML/CSS, good luck there. Further indenting of the non-direct translations might help.
2010-02-10 22:36
> What happens to the indices for the former Japanese sentence?

It gets left behind. I manually copied it to the new sentence in this case, but Trang will need to clear up things. I'm not sure if I can delete index entries.

> Should user be able to do this?

It's probably a bad idea.

Suppose you have

A(English) translates to B(Japanese) translates to C(German)

If you change the Japanese to a new, Dutch, sentence you get

A(English) translates to B(Dutch) (doesn't really) translates to C(German)

Because C is really the translation of the (vanished) Japanese sentence not the (new) Dutch sentence.

I think it would have been best to have left the Japanese duplicate and added a "Please delete me" comment.
2010-02-11 10:00
> I think it would have been best to have left the Japanese duplicate and added a "Please delete me" comment.

Agreed. (Since Tatoeba is in beta, I try to actively break things by using it from a novice user's perspective.)

Perhaps "nominate for deletion" could be added as explicit functionality? A way for user's to flag a sentence as undesirable (with optional comment). In time the comment system will become hard to monitor for "delete me" type messages.