Wall (7,273 threads)
Tips
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
alt
2 hours ago
frpzzd
3 hours ago
AlanF_US
4 hours ago
CK
7 hours ago
gillux
9 hours ago
LeviHighway
11 hours ago
alt
11 hours ago
Babelball
2 days ago
cafoc64474
3 days ago
LeviHighway
3 days ago
I've read the latest Tatoeba Blog, but I'm going to comment here because I think more people read it. :-)
*******
I'm not 100% convinced by the 'adoption' approach, but I think it could work with a few adjustments. Here's one idea:
I think people (not the owners) should be able to issue an official 'call for action'. That call would only be able to be closed by the person who made it, or by a 'super user' (e.g. Trang, Sysko, etc.)
How it could work:
* User sees a sentence that is linked to a sentence that it is not a good translation of. User cannot unlink it because both sentences are owned.
* User posts a comment with the 'Call for action' checkbox ticked. "Please unlink this sentence from sentence 123XXX00 as it is not a good translation."
* Owner of sentence is notified. Owner of sentence has a link to a list of all currently open 'calls for action' on his sentences. The super users (Trang, Sysko) also have a similar link that works for all users.
* If, after one week, the call for action is _not_ closed the the ownership of the sentence is revoked and the person who posted the request is notified.
* The person who posted the request can close it at any time, when he is satisfied with the explanation given by the owner or action taken.
* Super-users (Trang, Sysko, etc.) can deal with the request themselves, and can also close the action item even if the person who made it is not satisfied (person making request might not come back to Tatoeba, it might have been a trivial or frivolous request).
This would give a formal and more easily trackable way of handling corrections needed to owned sentences, given that the owner may be away or may otherwise lose track of the comments made on his sentences.
> I've read the latest Tatoeba Blog, but I'm going to comment here because I think more people read it.
Actually I haven't posted about it here yet because it wasn't really official yet ;) I was still discussing it with sysko, and actually reviewed certain things, but the overall idea remains the same.
For now, we are focusing correcting sentences themselves, not the way they are linked. Because there are many sentences in French that could be corrected, and quickly, if only we took the time to check them in an organized way.
The whole linkage problem is of course something we will have to deal with, but it is actually in what would be the "phase 2". Not something we will work on yet... If we can already provide a "sentences.csv" that is not filled with spelling/grammar mistakes, it will be a good step forward.
When will the Tatoeba interface next be updated from the Launchpad translation data? I've noticed that some of the more obvious mistakes are still around although I corrected them in Launchpad some time ago.
Next time should be tomorrow, when we update Tatoeba for bug fixes and small changes.
But generally speaking, the interface translations can be updated any time... Since it's not entirely automated, it's not regular... You have to remind me to do it ^^'
I see the interface update has gone through. I think it looks a lot better now.
Notes on Tatoeba interface translation.
* Launchpad is no longer showing the full path to the source code. It doesn't include the website any more.
* Translation item 15
\controllers\user_controller.php:258
The original English is incorrect. It should be "Error" not "Erreur"
* Translation item 505
The original English has some minor errors.
"We really want to thanks" -> "We really want to thank"
"us a much more complete data files" -> "us much more complete data files"
"wouldn't have a so much complete IPA" -> "wouldn't have an IPA so complete."
Also, possibly "IPA table" rather than just IPA? (Not sure)
[not needed anymore- removed by CK]
So, this is the link that get passed on from good contributor to good contributor:
http://blog.tatoeba.org/2010/02...n-tatoeba.html
I really need to put it somewhere so that more people will be enclined to read it. Anyway, as Paul explained, you should not change a sentence if it is valid.
But you have to know that only 'trusted users' can link/link sentences at the moment. The 'sentence_annotations' page as well can only be accessed by trusted users. This is because these features require a deeper understanding of Tatoeba and are not dummy-safe.
If you are interested in becoming a trusted user, you can just ask me :)
this is the link that get passed on from a crazy cult member to another:
[tatoeba's bible]
I really need to put it as the site's background image.
But you have to know that only 'elite cult members' can perform rituals at the moment.
If you are interested in becoming one, just ask to be initiated :P
-------------------------------------------
btw, are slangy sentences allowed on tatoeba...u know gaming slang, 1337 speak, gang slang...
PS watch out for the lame battle b/w arabic n portuguese :P
which battle 0:-)?
Who gets more sentences. :)
Oh, Demetrius, I didn't know that you battle saeb ;).
oh, MUIRIEL, let's pretend that you don't keep on adding portuguese until it's one more than arabic xD....I'm smelling a conspiracy...Let me guess, Trang is in on this too right? You guys are so on!
You're lucky I'm full of homework in these weeks :P
But I guess Portuguese and Arabic will both lose to Dorenda.
Well, I still have over 600 sentences to go before I catch up with Portuguese, and even more for Arabic, so if work hard, there still is a chance for you to stay ahead of Dutch. :P
Why so paranoiac, saeb ;)?
Great work, brauliobezerra =)!
I don’t battle anymody. :) I guess I have no chances to make Russian higher than Dutch until Dorenda’s around. :)
Hmm... It seems I can declare war on Esperanto! :)))
> btw, are slangy sentences allowed on tatoeba...u know gaming slang, 1337 speak, gang slang...
They are allowed, but if you do add these kind of sentences, it would be useful if you put them in a list. Also we prefer to avoid sentences that are "not safe for kids" until we have a way to tag them and filter them out (so that people who may use our content for educational purpose can only go for the more "decent" type of content).
> PS watch out for the lame battle b/w arabic n portuguese :P
You guys make me laugh (in a good way :P) It's actually entertaining to watch x)
PS: I had nothing to with it. Muiriel decided to pick on you on her own!
*Corrected typo*
This is the official method, AFAIK.
1. Do not change either the Japanese or the English (provided that both Japanese and English are valid sentences).
2. Add a new sentence, as a translation, of either the Japanese or English.*
* For practical reasons it is best to add a new English translation of the Japanese as the Japanese needs index information adding. Also it is preferable to add sentences in your native, not second, language.
3. After adding the new translation, unlink the old translation. You can do this by 'owning' either the sentence being translated (Japanese) or the incorrect old translation (English), refreshing the page, and then clicking the appropriate 'scissor icon'.
4. The 'meaning' field of the Japanese index data will need to be changed to the ID of the new sentence. You can do that from this page
http://tatoeba.org/sentence_annotations/
although it may be easier just to leave a note / PM for me or Jim to do so.
Linkage.
OK, I'm seeing a lot of cases where two sentences should be linked (or unlinked) but both are owned. Could we allow linking/unlinking between sentences even if we aren't the owners? It's slowing things down, especially as you can't count on people staying in Tatoeba.
Yes, but before I give more power to trusted users, I want to display the "latest links" somewhere (in the same way there's a page where you can see the latest sentences added/edited/deleted).
but isn't that too much power? technically trusted users can then 'disappear' a sentence, right?
They don't actually go anywhere, and can still be found by a whole bunch of methods.
Some sentences need to disappear, anyway ;-)
See
http://tatoeba.org/jpn/sentences/show/383895
It's linked to
http://tatoeba.org/jpn/sentences/show/336221
(which should be deleted)
It isn't directly linked to
http://tatoeba.org/jpn/sentences/show/383894
(but it should be)
At the moment I can't link it to the sentence it should be linked to, nor can I unlink it from the sentence it should be unlinked from.
I can see that you need it :), but generally speaking, if it gets implemented for trusted users...well it'll be possible for s.o. to unlink a sentence and cause it to be 'left behind' without the consent of it's owner...which is pretty close to deleting a sentence imo.
Well, dude*, we're either trusted or we're not.
* Hopefully correct.
yet there isn't any real criteria for getting trusted or not mate*.
* Hopefully correct.
[not needed anymore- removed by CK]
[not needed anymore- removed by CK]
Actually both those cases, kimi and boku, are notorious for being very often used by the opposite gender to that expected by textbooks.
Those [M] and [F] refer to the Japanese.
I added them years ago at someone's suggestion as it seemed like a good idea at the time 8-)}. Ideally they should be part of metadata associated with the Japanese. It wouldn't break my heart if they were removed from the English sentences entirely.
[not needed anymore- removed by CK]
I'd forgotten they were originally on the Japanese sentences. All the more reason to remove them from the English ones.
I think moving them back to the Japanese is more than a global replacement. I think it would be better to remove them totally.
> I think moving them back to the Japanese is more than
> a global replacement. I think it would be better to
> remove them totally.
I suggest holding your horses on the second part, at
least until Tatoeba has a meta data handling system up.
JPN INDICES
I've sent* in a UTF-8 file (with BOM, unfortunately) containing the updated index and meaning field information I've been working on. Could Sysko or Trang update Tatoeba from it and post here when it's done?
* To the team@tatoeba.fr address.
Yes, I would like to know too. I have a change or two to get in before Saturday's dump.
I've just read an email from Trang saying it's done. Hope it all went well (fingers metaphorically crossed - would be really crossed but that makes typing difficult).
The new Japanese readings include both the kanji and their reading below the sentence. Is this an interim solution or would people be willing to reconsider it?
My view is that it's a little redundant and it could be better to either leave the kanji out or add the kanji readings as furigana.
I wasn't able to set MeCab up on this machine but I assume it isn't too difficult to format the output to create ruby code.
Speaking of formatting the MeCab output, can one disable the "readings" on punctuation and long vowel marks (see e.g. Sentence nº126252 and nº75484) and parse numbers together (see e.g. Sentence nº115312)?
[not needed anymore- removed by CK]
There's no need to use the semantically incorrect tt tag. There is already ruby character markup available and stylesheets to render them properly in modern browsers that can't handle them by default. For general inline and block level styling there are the span and div tags.
You have a good point that the redundancy in the kanji is useful though maybe not terribly aesthetically pleasing. I suppose this may be the best solution until we have furigana implemented.
[not needed anymore- removed by CK]
I sort of figured that. What I wanted to point out wast that the semantically correct equivalent of your proposal would be to use a span inline element:
東京 <span class="reading">[とうきょう]</span>
with the style definition:
.reading { whatever-property:value; }
One could actually drop the brackets:
東京 <span class="reading">とうきょう</span>
and add them in CSS with something like:
.reading:before {content: "["}
.reading:after {content: "]"}
But we can just as well wait for furigana...
I like the new way too (not that I read it often.) MUCH better than the old romanization.
It's temporary. I'm aware that it is a bit redundant.
We actually have a tool that converts into furigana. It works at least on IE8, Firefox 3.5, Chome 4.1 and Opera 10:
http://tatoeba.org/eng/tools/ro...&type=furigana
So we can display furigana but this is not in our priorities at the moment, so you will probably have to wait a couple of months before we get it done.
Good to hear that it's only temporary. I was aware of the furigana tool. Thanks for your work on this issue.
Any idea how simple it is to fix the readings for the punctuations, long vowel marks and numbers?
For the punctuations, it should not be too difficult.
For the long vowel marks, if you are talking about the romaji, we used to convert it into a hyphen. So we will leave it like it is now.
For the numbers, it depends what kind of fix you want...
Punctuations: Great.
Long vowel marks: Sentence nº75484 shows what I mean. The long vowel mark is repeated in the "reading" like the punctuation: "ー" becomes "ー[ー]".
Numbers: It would be good if "10時" got the reading "じゅうじ" rather than "1[いち] 0[ぜろ] 時[じ]" (see e.g. sentence nº115312).
Similarly, 10日 should get "とおか" and 10分 should get "じっぷん" (or "じゅっぷん" though sometimes it's "じゅうぶん" ... *sigh*).
Re: Numbers: It would be good if "10時" got the reading "じゅうじ" rather than "1[いち] 0[ぜろ] 時[じ]" (see e.g. sentence nº115312).
The fix for that is to add to the IPADIC files used by MeCab. They look something like:
蹴込,1285,1285,5622,名詞,一般,*,*,*,*,蹴込,ケコミ,ケコミ
in their raw form. Adding 10時 ジュウジ is possible. It would be best to become familiar with the structure and weightings in those files before embarking on it.
I'd like to see furigana. However, when I last checked a few years ago, support for the RUBY tag was rather limited and furigana displayed rather poorly in some browsers.
The 'kanji + readings' display doesn't really bother me, but then I rarely look at that line anyway.
Yes, it appears IE is the only browser with even limited ruby character support. The markup is, however, designed to fall back on something similar to what we currently have and the ruby characters can be implemented on modern browsers using CSS.
For more, see: http://en.wikipedia.org/wiki/Ruby_character
Quick fix idea.
This should be a relatively easy change, and it should make my (and Jim's) life easier.
1. Setting the 'meaning' field for a Japanese sentence automatically links that sentence to the English sentence identified.
2. On a standard sentence display of a Japanese sentence the link to the sentence identified in the meaning field looks different to the rest. (i.e. Red arrow instead of green, or something).
I also suggest that a meaning field entry of zero (0) could be used to identify Japanese sentences that are intentionally not to be used with WWWJDIC.
Duplicate removal script.
I don't know how the script works exactly, but I think it may be missing a step.
Suppose we have
100000 Hello.
100001 こんにちは。
100002 Hi.
100001 is linked to 100000
100001 has the meaning field of 100000
Now, suppose someone decides that 'Hello' and 'Hi' are close enough to not need both.
100000 Hello.
100001 こんにちは。
100002 Hi. ---> Hello.
Then suppose the script removes 100000.
100001 こんにちは。
100002 Hello.
Is 100001 still linked to 100000? It should be linked to the duplicate 100002 instead.
Does 100001 still have the meaning field of 100000? It should have the meaning field of 100002 instead.
In other words is Sentence A is removed as a duplicate of Sentence B then all the links that pointed to Sentence A should now point to Sentence B instead.
the remove duplicate script does the following
identify all the sentence which have both the same language and the same text
and after it will keep the oldest sentence which are owned by someone (or the oldest one if none of the duplicate belongs to someone) and then will relink all links to the duplicate to this one
(so comments / translations / lists etc... etc.. )
and finally will remove the duplicate and keep only one
so the script will not produce any broken reference to a removed sentences
> identify all the sentence which have both the same language and the same text
So it also merges duplicates that are not linked whatsoever?
Yep, that way even if new comers add
I love you
and translate it,
as I love you already exist, the script will delete the new "I love you" and link the translation to the old "I love you" (or also removed it, if the translation already exists too)
> so the script will not produce any broken reference to
> a removed sentences
There are, however, some broken references being produced. It's not clear how though.
236727 あなたには姉妹がいますか。
was linked to 71123, which now no longer exists.
69566 Do you have any sisters?
does exist and was indirectly linked from 236727.
I don't know when 71123 was removed, why it was removed, or how it was removed, but something obviously went wrong somewhere. (It was one of the \N records last week - so it obviously isn't a recent deletion)
Hopefully these broken links are left over from earlier times and won't be reoccurring.
ok at least the remove duplicate script will not produce anymore broken links