Wall (6,005 threads)
Before asking a question, make sure to read the FAQ.
We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.
14 hours ago
20 hours ago
22 hours ago
3 days ago
3 days ago
3 days ago
I am not sure whether it is a bug or a curiosity ("feature"), but when you want to search for multiple Japanese words, the whitespaces have to be entered in English. Typing whitespace when using the Japanese IME leads to different results (always no results?)
hmmm I think it's because the search engine handle only "normal" space as word separator, can you give me this space (in an aswer to this message, between ) this way I will be able to convert them before sending the request to the search engine.
Sure. And here it is:[ ]. Lol, do you see the difference?
yep it's a full width space :)
Please do the same for non-breaking space [ ]. I sometimes use it to prevent dashes (—) from being moved to the next line; it should be treated in the same way as the ordinary space for search purposes.
There are lots of other spaces, but I’m not sure anyone has ever used these on Tatoeba: en quad [ ], em quad [ ], en space [ ], em space [ ], 3-per-em space [ ], 4-per-em space [ ], 6-per-em space [ ], figure space [ ], medium mathematical space [ ], punctuation space [ ], thin space [ ], hair space [ ], zero-width space 
ok thanks, it will be present in next release
[not needed anymore- removed by CK]
Yes it's sort of a bug, and don't worry I noticed it :) I'm just waiting until I'm done importing lukaszpp's sentences, then I'll fix the logs to indicate his name. I still have one batch and I import only when there's not a lot of new sentences.
In the case of lukaszpp, he was the one who compiled the sentences so I wouldn't be comfortable having them as my contributions anyway.
Trang you can tell us you did it to stay in the top 20 :p
Ah, shoot, you caught me sysko :P
This seems to happen with all the batch imports. The Ukrainian proverbs from Shtoota are contributed by sysko and owned by me (422069). I don't see any problem with this.
After all, such things come from collections not neccessarily compiled by the people who have suggested importing them, and IMO there’s nothing bad with admins being higher in the contributors table. They contribute in other ways too.
Is there anyway to sort the sentences that show up based on "difficulty" or length,
so that the easier, shorter sentences show up first-- followed by increasingly more difficult/lengthy sentences?
Indeed there is no way. I actually don't event know how results are sorted... But I agree it would be much better if search results could be sorted by length (even though shorter doesn't necessarily mean easier ^^).
Anyway, sysko will be able to tell you when this feature can be expected. He is the one in charge of the search engine.
Actually, I’m not sure it’s the best way of sorting sentences. Shorter sentences tend to be stranger, as there is little context.
Yes, definitely not the best, but better than no sorting at all. Often I just want short sentences but I have to browse through many pages to find them.
Anyway, the best way of sorting sentences would have to take into consideration things like the number of people who thought the sentence was useful for their particular query, or the number of people who used the sentence as a learning material. But we don't have a way to measure these things right now.
Basically, not really, but if you ask nicely Trang or Sysko might put together a list of sentences in whatever language you choose of a certain length.
Alternatively you can go to the downloads page and use a database program to make your own lists.
Unfortunately there is not yet a way of 'uploading' batches of sentence IDs to create a Tatoeba list
@Trang/Sysko: *HINT HINT*
the duplicate removal script has been updated and re run, and will be as before, run once a week.
That's nice to know.
Unlinking Japanese / English pairs.
Just a reminder that unlinking a Japanese / English pair may well have repercussions for WWWJDIC example sentences. Specifically, if a Japanese / English pair does not match in Tatoeba, unlinking them and adding new translations will _NOT_ automatically fix it in WWWJDIC.
If you unlink a Japanese / English pair, at the least, please mention this in a comment! Adding a @change tag at the same time will help track it down later if I don't see the comment right away.
Trusted users can also correct the 'meaning' field of the Japanese index data by using the annotations page.
I am currently working through 183 sentence pairs where the link is broken in Tatoeba but not in WWWJDIC. >_<
I'm working on setting up the necessary for all of this to be fixed automatically.
Posting comments or adding @change tags won't be needed.
[not needed anymore- removed by CK]
What I notice is that many duplicates are created in good faith for 2 main reasons:
1) there are no links between the sentence that are viewed and the desired translations because
a) the deduplication process has failed for some reason.
b) the sentences are not deduplicated because they are the same except for a different name or a unit (problem which I mentioned earlier and which could be solved through conventions)
2) the desired translation is not visible.
This is the case, for example, if you view a list of sentences in L1, translated into L2 and for which no translation exists into L3. This list doesn't enable to see the translations from L2 into L3 when they exist, so the temptation is great to believe that they don't and to recreate them.
I wonder if this whole duplicate business could not be solved by automatically merging a duplicate sentence at the time it is entered into tatoeba.
So if I add a new sentence, the server looks it up in the database, and if it is identical to an existing entry, both are merged, which is basically a link operation to the sentence I am translating from. That should be as demanding as a simple search in the database, plus one link operation.
As a consequence, the database would at no point in time contain duplicates.
(We have to consider multiple entries though if two sentences look identical in two languages.)
@xtofu80 But simple does not imply fast, and in fact exact sentences match in a nearly 500 000 sentences database is simple but at all not fast. otherwise I will have not choose the more complex but faster duplicate removal script ^^
@feuDRenais, yep good idea, I will add it in the todo list, but to be honnest, don't expect it before a looooooong time.
@CK it can be an idea, but the question is "which sentence to add in this smaller set ?" maybe basic sentences "i love you" etc. but it will not solve the problem entirely
@sacredceltic, yep I think so, except one user recently, since I'm in tatoeba, the only reasons of duplicate were the one you give, either because due too much "indirect" to be viewed, or because people did not search before adding (for not so common sentences, I can understand that one does not check existence of every single sentence he adds)
so far the best solution I've found, is the duplicate removal script, run once a week, it handles every case (even if its look identical but are not in the same languages), keeps link, tags, audio
the two problems are the following
it's not real time
it is dependant of the database structure and so need to slightly modified each time we add new feature linking to sentences (which happen every 6 months ^^⁾
anyway a not real time solution will always the second drawback because you can't know if people add tags/ add to list/ add to [add here whatever future feature] ,
and a real time solution will need to be really fast. (fast < 0.1s)
the fact is I was extremly busy this week, so I didn't find the time to readapt the duplicate removal script, but this is now my current priority.
Personnaly I think real time is not really so much important, and once a week (or if you want 2 time a week) is enough yet.
I think a similarity match would be really nice. Like in Google. E.g.:
Your sentence "A went to the store with B" was not found. Did you mean:
"C went to the store with B"
"A went to the hardware store with B"
"A and B went to the park"
It would at least let the searcher know what's out there. Better yet, it would be nice to have an automatic check before submitting a brand new sentence (NOT a translation). E.g.:
Your sentence "A went to the store with B" is already very similar to...
Thursday WWWJDIC examples update summary
16 records deleted.
10 records added.
Regarding, again, sentence quality:
I would propose a much more liberal use of the "Needs Native Check" tag (or something similar, if it already exists). I see right now that it's been mostly used by myself (and somewhat by Demetrius), but otherwise has gotten very little exposure (unfortunately, we use it for languages where there are currently no natives chez Tatoeba...)
If its use was formally encouraged for sentences a foreign translator was not, say, 95+% sure on, corrections afterwards would be much easier. Native speakers could just check all the tagged sentences in their respective languages, and go through with the checks when they had a chance.
I suggest not putting the tag on the other people's sentences, or at least writing something about it in the comments. I've been very surprised to find my Russian sentence tagges 'needs native check' recently (http://tatoeba.org/sentences/show/451147).
Just so you know, it wasn't me who put the tag ;-)
I have occasionally done it to other people's sentences, but I generally try to leave a "NNC-tagged" comment to indicate that I was the one who tagged it. I do think it's a powerful tool for people to use on their own sentences, though... (though only trusted users can tag)
(I think it could also be a powerful tool if you want native speakers to swarm to a specific sentence as well... Since it's very easy to miss comments, but the tag is easy to find. But yea, too liberal of a use would not be great either...)
Short answer; you don't.
Longer answer; if they are exactly identical they will be merged automatically the next time the duplicate removal script is run.
Final answer; ... but the script is currently under maintenance so for the meantime if you identify both sentences by number of link a moderator can sort things out for you.
Hi everyone. I'm new here but I can already feel how addictive this project is. I just can't stop myself from translating another sentence... and another... and another...