menu
Tatoeba
language
S'inscriure Connexion
language Occitan
menu
Tatoeba

chevron_right S'inscriure

chevron_right Connexion

Percórrer

chevron_right Afichar la frasa aleatòria

chevron_right Percórrer per lenga

chevron_right Percórrer per lista

chevron_right Percórrer per etiqueta

chevron_right Percórrer los enregistraments àudio

Community

chevron_right Paret

chevron_right Lista de totes los membres

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7277 threads)

Astúcias

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Darrièrs messatges subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

small_snow

2 days ago

subdirectory_arrow_right

frpzzd

4 days ago

subdirectory_arrow_right

LeviHighway

4 days ago

subdirectory_arrow_right

frpzzd

4 days ago

feedback

sharptoothed

5 days ago

subdirectory_arrow_right

LeviHighway

5 days ago

subdirectory_arrow_right

lingomaxim

5 days ago

subdirectory_arrow_right

frpzzd

5 days ago

feedback

LeviHighway

5 days ago

FeuDRenais FeuDRenais June 27, 2010 June 27, 2010 at 8:03:52 PM UTC flag Report link Permalink

Strange errors on 413554 and 413553... (could the mods check?)

The view became messed up once the sentences were tagged.

{{vm.hiddenReplies[1447] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 27, 2010 June 27, 2010 at 8:09:36 PM UTC flag Report link Permalink

Happens with any tag. It looks like there's a bug in the latest (just installed) version.

http://tatoeba.org/eng/tags/sho...cit_%28Uyghur_

blay_paul blay_paul June 27, 2010 June 27, 2010 at 8:12:18 PM UTC flag Report link Permalink

Note that there are also tags that were left when the sentence they were attached to was deleted.

TRANG TRANG June 27, 2010 June 27, 2010 at 8:31:54 PM UTC flag Report link Permalink

Sorry about that ^^ It's fixed.

phiz phiz June 27, 2010 June 27, 2010 at 6:52:06 PM UTC flag Report link Permalink

http://tatoeba.org/eng/sentence...e+commence+pas

Thought I'd bring it up here. The english and japanese examples are slightly off in contrast with the other translations. blay_paul said he'd fix it but there's no sign of change yet.

{{vm.hiddenReplies[1445] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 27, 2010 June 27, 2010 at 7:19:47 PM UTC flag Report link Permalink

Actually I changed the English so it matches the Japanese better and added two new alternative translations of the Japanese.

I don't know the other languages so I'll have to take your word about them.

blay_paul blay_paul June 27, 2010 June 27, 2010 at 1:43:15 PM UTC flag Report link Permalink

*Paging Sysko*

I've decided it's a good idea to simplify the indexing a little by dropping the |1, |2, etc. notation. I basically had it for redundancy and now that the indexing has settled down it isn't as important as it was.

Could you do a global search and replace in the Japanese Index field of "|?(" with "(" ?

e.g. 為る|1(する) becomes 為る|1(する), but ちゃう|2 does not become ちゃう.

There are too many entries to change for the search feature on the Sentence Annotations page to work.

Thanks in advance, Paul

{{vm.hiddenReplies[1438] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 27, 2010 June 27, 2010 at 1:57:31 PM UTC flag Report link Permalink

done, I've replaced all |?( by (

{{vm.hiddenReplies[1439] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 27, 2010 June 27, 2010 at 2:14:19 PM UTC flag Report link Permalink

Great! You even read my mind for what I really meant to say, instead of what I actually wrote. ;-)

e.g. 為る|1(する) becomes 為る(する), but ちゃう|2 does not become ちゃう.

(Oops)

blay_paul blay_paul June 26, 2010 June 26, 2010 at 6:51:40 PM UTC flag Report link Permalink

Sentences.csv export.

As before the sentences.csv export includes spurious \ symbols accompanying line breaks. Can you please filter for tabs and linebreaks in the sentence text!

Sentence 400602 included a tab character.

4197 seems to have contained two line breaks.

I can't tell whether it's possible to remove them by editing the text as a moderator because they don't display in the first place.

There are about 59 sentences with two line breaks in them

{{vm.hiddenReplies[1430] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 26, 2010 June 26, 2010 at 7:25:10 PM UTC flag Report link Permalink

I've removed all line break in sentence, tab, and multiple spaces in the database

{{vm.hiddenReplies[1431] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 26, 2010 June 26, 2010 at 8:30:31 PM UTC flag Report link Permalink

Thank you!

TRANG TRANG June 27, 2010 June 27, 2010 at 12:57:26 PM UTC flag Report link Permalink

Yes, thank you!!

Scott Scott June 26, 2010 June 26, 2010 at 6:04:21 PM UTC flag Report link Permalink

I just want to advertise a bit this list that I created for Japanese sentences to be checked by a native. Anyone can add sentences to it and of course natives are welcome to come and check the sentences:
http://tatoeba.org/eng/sentences_lists/show/131

Former member Former member June 26, 2010 June 26, 2010 at 12:25:00 AM UTC flag Report link Permalink

Hey gang.

Registered to-day and have no idea as to my purpose as yet but will learn.

{{vm.hiddenReplies[1422] ? 'expand_more' : 'expand_less'}} hide replies show replies
Scott Scott June 26, 2010 June 26, 2010 at 3:22:06 AM UTC flag Report link Permalink

Welcome. If you know two languages, you can simply start translating sentences or adding your own.

{{vm.hiddenReplies[1423] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 26, 2010 June 26, 2010 at 11:55:30 AM UTC flag Report link Permalink

Welcome to tatoeba.
Just some remarks to Scotts general introduction:
When adding new sentences, please also consider copyright issues.
For a general introduction you can watch the video on drumbeat.
If you have any specific questions, feel free to ask here on the wall for general questions, or at the specific sentence if you have issues with a sentence, its translations, doubt about whether a sentence in a foreign language is correct, etc. This is a community-based project after all. Please also fill out your profile so that we know which languages you speak, where you are from, etc.

sysko sysko June 26, 2010 June 26, 2010 at 2:05:12 PM UTC flag Report link Permalink

Welcome :)

Even if you know only one and want to learn an other, just add sentences in your language (and check before if they does not exist with the search bar :) ) and ask in comments, how to translate this in language YYY
(btw I think we should add the video somewhere directly here)

blay_paul blay_paul June 24, 2010 June 24, 2010 at 9:54:14 AM UTC flag Report link Permalink

Could we have the duplicate removal script run?

{{vm.hiddenReplies[1402] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 26, 2010 June 26, 2010 at 12:05:49 AM UTC flag Report link Permalink

Done.

I still didn't have time to take care of the [F] and [M] inline tags though...

{{vm.hiddenReplies[1421] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 26, 2010 June 26, 2010 at 2:20:56 PM UTC flag Report link Permalink

[F] and [M] tags have all been removed (and also trailing space)

{{vm.hiddenReplies[1426] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 26, 2010 June 26, 2010 at 2:22:55 PM UTC flag Report link Permalink

Whooo! Thanks.

TRANG TRANG June 27, 2010 June 27, 2010 at 12:57:47 PM UTC flag Report link Permalink

You have a new fan ;)

blay_paul blay_paul June 24, 2010 June 24, 2010 at 2:43:33 PM UTC flag Report link Permalink

For those of you keeping track of such things.

Since last I checked (last Saturday) WWWJDIC has had 8 new sentence pairs added and 30 removed. The low rate of sentence addition is largely because of the time required to do the indexing.

{{vm.hiddenReplies[1414] ? 'expand_more' : 'expand_less'}} hide replies show replies
saeb saeb June 24, 2010 June 24, 2010 at 6:38:11 PM UTC flag Report link Permalink

just curious, could the output from this parser be used somehow to automate indexing: http://www.jdictionary.com/parser ? 

{{vm.hiddenReplies[1416] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 24, 2010 June 24, 2010 at 6:50:19 PM UTC flag Report link Permalink

Short answer, no.

Longer answer, it might be of some use but it would also take a lot of time setting it up for the best results. It's probably not worth the effort. There are things that could be done to speed things up but they require volunteer(s) with the right know-how and quite a bit of free time.

{{vm.hiddenReplies[1418] ? 'expand_more' : 'expand_less'}} hide replies show replies
Scott Scott June 24, 2010 June 24, 2010 at 8:00:47 PM UTC flag Report link Permalink

It too bad that it takes so long. がんばって!

JimBreen JimBreen June 25, 2010 June 25, 2010 at 1:48:01 AM UTC flag Report link Permalink

That parser saeb mention uses MeCab (as does Tatoeba), The original indices were generated using ChaSen (very similar to MeCab), but they have been massively massaged since then. The trouble with using MeCab is that it's too fine-grained, and will break up compound nouns, expressions, etc. which for indexing should be kept whole. A better tool would be WWWJDIC's text glosser. See:
http://www.csse.monash.edu.au/~...81%A7%E3%81%AB
for an example. Its output is more aligned to dictionary entries.

MUIRIEL MUIRIEL June 24, 2010 June 24, 2010 at 2:04:10 PM UTC flag Report link Permalink

How should sentences like the following sentence be tagged:
http://tatoeba.org/deu/sentences/show/34979

It's not a "proverb" as the part to which the tag should refer is only "land of milk and honey".

"expression"? "phrase"? "saying" ?

{{vm.hiddenReplies[1409] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 24, 2010 June 24, 2010 at 2:31:00 PM UTC flag Report link Permalink

idiom

{{vm.hiddenReplies[1410] ? 'expand_more' : 'expand_less'}} hide replies show replies
MUIRIEL MUIRIEL June 24, 2010 June 24, 2010 at 6:46:12 PM UTC flag Report link Permalink

thank you, I'll use idiom =).

blay_paul blay_paul June 24, 2010 June 24, 2010 at 2:31:02 PM UTC flag Report link Permalink

I don't think there is an official answer yet, but you could always tag it as "uses idiom" or "includes proverb" or something.

{{vm.hiddenReplies[1411] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 24, 2010 June 24, 2010 at 2:32:52 PM UTC flag Report link Permalink

I generally tag such sentences with the "idiom" tag. The only problem is that the idiom itself is not marked within the sentence. But I guess, most people can figure it out for themselves with the hint that it actually is an idiom.

{{vm.hiddenReplies[1412] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 24, 2010 June 24, 2010 at 2:34:02 PM UTC flag Report link Permalink

at least for more obfuscated idiom, one can precise the "pure" form of the idiom in comment

{{vm.hiddenReplies[1413] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 24, 2010 June 24, 2010 at 4:00:02 PM UTC flag Report link Permalink

BTW, Russian has a subtlier distinction:
* пословица is the proverb that is a full sentence
* поговорка is the proverb that is not a full sentence and is used in context
Belarusian and Ukrainian have this as well.

IMHO we should mark the distinction somehow when tagging, otherwise it'll be impossible to translate tags when they'll become translatable.

CK CK June 24, 2010, edited October 26, 2019 June 24, 2010 at 3:29:57 AM UTC, edited October 26, 2019 at 3:57:55 AM UTC flag Report link Permalink

[not needed - removed by CK]

{{vm.hiddenReplies[1398] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius June 24, 2010 June 24, 2010 at 9:52:08 AM UTC flag Report link Permalink

IMHO a more general search with tags would be more useful.

E.g. find sentences tagged "OK" and not tagged "easy" in Chinese ("OK -easy"?).

{{vm.hiddenReplies[1401] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 24, 2010 June 24, 2010 at 10:23:36 AM UTC flag Report link Permalink

sure, it's planned but it's not in my priority list

sysko sysko June 24, 2010 June 24, 2010 at 10:00:02 AM UTC flag Report link Permalink

Yep in fact this has been considered for a long time because has Scott said this is the major criticisms Tanaka corpus has, and so tatoeba too

The major problem is "how" do this, now we have tags , I can make the "ok" tag a special tag, which only trusted user who are not the sentence owner (no CK I'm not talking about you :P) can set

and maybe add a bit everywhere a "show only proofread" checkbox

{{vm.hiddenReplies[1403] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 24, 2010 June 24, 2010 at 10:18:22 AM UTC flag Report link Permalink

> which only trusted user who are not the sentence owner
> can set

Not to be picky, but any trusted user could
* Un-own one of their sentences.
* Mark it OK
* Re-own it.

What I would suggest is that two different trusted users vet it as OK (that could include the sentence owner). One marks it "Checked by [name]" then anybody who is not [name] can change that to OK.

{{vm.hiddenReplies[1405] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 24, 2010 June 24, 2010 at 10:22:07 AM UTC flag Report link Permalink

I'm even more picky ("never trust your user, even trusted user :P")
in the database we have a table which keep logs of action on sentences, so I will check the guy who "added" it, not the current owner :) (otherwise you would have not been able to correct a tanaka corpus sentence and set it to ok)

{{vm.hiddenReplies[1406] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 24, 2010 June 24, 2010 at 10:41:57 AM UTC flag Report link Permalink

in fact to have your "need two user" we can consider that "owned" a sentence is the first proofread (and then when you adopt a sentence you will not be able to set the OK tag, like the guy who contributed this sentence)
and then if the sentence is owned and you're not the current owner nor the guy who added it then you will be able to set the OK tag

Scott Scott June 24, 2010 June 24, 2010 at 3:42:38 AM UTC flag Report link Permalink

Your idea seems to be about some system of verification or vetting. I think it's a good idea. I don't know if this is planned but it would be great to have a system where trusted users can verify sentences as correct. Eventually you would end up with a bank of sentences guaranteed to be good and this could address one of the major criticisms directed at the Tanaka corpus as being unreliable and containing unnatural sentences.

sysko sysko June 24, 2010 June 24, 2010 at 10:01:46 AM UTC flag Report link Permalink

at least I plan very soon to make the following things
* filter by language on a tag page (show only ok sentences in French etc.)
* have a nice page to list only sentences in XXXX not translated in YYYY