Paret - Tatoeba

Wall (7277 threads)

Astúcias

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Darrièrs messatges

subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

small_snow

2 days ago

subdirectory_arrow_right

frpzzd

4 days ago

subdirectory_arrow_right

LeviHighway

4 days ago

subdirectory_arrow_right

frpzzd

4 days ago

feedback

sharptoothed

5 days ago

subdirectory_arrow_right

LeviHighway

5 days ago

subdirectory_arrow_right

lingomaxim

5 days ago

subdirectory_arrow_right

frpzzd

5 days ago

feedback

LeviHighway

5 days ago

FeuDRenais June 27, 2010 June 27, 2010 at 8:03:52 PM UTC

flag

Report

link

Permalink

Strange errors on 413554 and 413553... (could the mods check?)

The view became messed up once the sentences were tagged.

hide replies show replies

blay_paul June 27, 2010 June 27, 2010 at 8:09:36 PM UTC

flag

Report

link

Permalink

Happens with any tag. It looks like there's a bug in the latest (just installed) version.

http://tatoeba.org/eng/tags/sho...cit_%28Uyghur_

blay_paul June 27, 2010 June 27, 2010 at 8:12:18 PM UTC

flag

Report

link

Permalink

Note that there are also tags that were left when the sentence they were attached to was deleted.

TRANG June 27, 2010 June 27, 2010 at 8:31:54 PM UTC

flag

Report

link

Permalink

Sorry about that ^^ It's fixed.

phiz June 27, 2010 June 27, 2010 at 6:52:06 PM UTC

flag

Report

link

Permalink

http://tatoeba.org/eng/sentence...e+commence+pas

Thought I'd bring it up here. The english and japanese examples are slightly off in contrast with the other translations. blay_paul said he'd fix it but there's no sign of change yet.

hide replies show replies

blay_paul June 27, 2010 June 27, 2010 at 7:19:47 PM UTC

flag

Report

link

Permalink

Actually I changed the English so it matches the Japanese better and added two new alternative translations of the Japanese.

I don't know the other languages so I'll have to take your word about them.

blay_paul June 27, 2010 June 27, 2010 at 1:43:15 PM UTC

flag

Report

link

Permalink

*Paging Sysko*

I've decided it's a good idea to simplify the indexing a little by dropping the |1, |2, etc. notation. I basically had it for redundancy and now that the indexing has settled down it isn't as important as it was.

Could you do a global search and replace in the Japanese Index field of "|?(" with "(" ?

e.g. 為る|1(する) becomes 為る|1(する), but ちゃう|2 does not become ちゃう.

There are too many entries to change for the search feature on the Sentence Annotations page to work.

Thanks in advance, Paul

hide replies show replies

sysko June 27, 2010 June 27, 2010 at 1:57:31 PM UTC

flag

Report

link

Permalink

done, I've replaced all |?( by (

hide replies show replies

blay_paul June 27, 2010 June 27, 2010 at 2:14:19 PM UTC

flag

Report

link

Permalink

Great! You even read my mind for what I really meant to say, instead of what I actually wrote. ;-)

e.g. 為る|1(する) becomes 為る(する), but ちゃう|2 does not become ちゃう.

(Oops)

blay_paul June 26, 2010 June 26, 2010 at 6:51:40 PM UTC

flag

Report

link

Permalink

Sentences.csv export.

As before the sentences.csv export includes spurious \ symbols accompanying line breaks. Can you please filter for tabs and linebreaks in the sentence text!

Sentence 400602 included a tab character.

4197 seems to have contained two line breaks.

I can't tell whether it's possible to remove them by editing the text as a moderator because they don't display in the first place.

There are about 59 sentences with two line breaks in them

hide replies show replies

sysko June 26, 2010 June 26, 2010 at 7:25:10 PM UTC

flag

Report

link

Permalink

I've removed all line break in sentence, tab, and multiple spaces in the database

hide replies show replies

blay_paul June 26, 2010 June 26, 2010 at 8:30:31 PM UTC

flag

Report

link

Permalink

Thank you!

TRANG June 27, 2010 June 27, 2010 at 12:57:26 PM UTC

flag

Report

link

Permalink

Yes, thank you!!

Scott June 26, 2010 June 26, 2010 at 6:04:21 PM UTC

flag

Report

link

Permalink

I just want to advertise a bit this list that I created for Japanese sentences to be checked by a native. Anyone can add sentences to it and of course natives are welcome to come and check the sentences:
http://tatoeba.org/eng/sentences_lists/show/131

Former member June 26, 2010 June 26, 2010 at 12:25:00 AM UTC

flag

Report

link

Permalink

Hey gang.

Registered to-day and have no idea as to my purpose as yet but will learn.

hide replies show replies

Scott June 26, 2010 June 26, 2010 at 3:22:06 AM UTC

flag

Report

link

Permalink

Welcome. If you know two languages, you can simply start translating sentences or adding your own.

hide replies show replies

xtofu80 June 26, 2010 June 26, 2010 at 11:55:30 AM UTC

flag

Report

link

Permalink

Welcome to tatoeba.
Just some remarks to Scotts general introduction:
When adding new sentences, please also consider copyright issues.
For a general introduction you can watch the video on drumbeat.
If you have any specific questions, feel free to ask here on the wall for general questions, or at the specific sentence if you have issues with a sentence, its translations, doubt about whether a sentence in a foreign language is correct, etc. This is a community-based project after all. Please also fill out your profile so that we know which languages you speak, where you are from, etc.

sysko June 26, 2010 June 26, 2010 at 2:05:12 PM UTC

flag

Report

link

Permalink

Welcome :)

Even if you know only one and want to learn an other, just add sentences in your language (and check before if they does not exist with the search bar :) ) and ask in comments, how to translate this in language YYY
(btw I think we should add the video somewhere directly here)

blay_paul June 24, 2010 June 24, 2010 at 9:54:14 AM UTC

flag

Report

link

Permalink

Could we have the duplicate removal script run?

hide replies show replies

TRANG June 26, 2010 June 26, 2010 at 12:05:49 AM UTC

flag

Report

link

Permalink

Done.

I still didn't have time to take care of the [F] and [M] inline tags though...

hide replies show replies

sysko June 26, 2010 June 26, 2010 at 2:20:56 PM UTC

flag

Report

link

Permalink

[F] and [M] tags have all been removed (and also trailing space)

hide replies show replies

blay_paul June 26, 2010 June 26, 2010 at 2:22:55 PM UTC

flag

Report

link

Permalink

Whooo! Thanks.

TRANG June 27, 2010 June 27, 2010 at 12:57:47 PM UTC

flag

Report

link

Permalink

You have a new fan ;)

blay_paul June 24, 2010 June 24, 2010 at 2:43:33 PM UTC

flag

Report

link

Permalink

For those of you keeping track of such things.

Since last I checked (last Saturday) WWWJDIC has had 8 new sentence pairs added and 30 removed. The low rate of sentence addition is largely because of the time required to do the indexing.

hide replies show replies

saeb June 24, 2010 June 24, 2010 at 6:38:11 PM UTC

flag

Report

link

Permalink

just curious, could the output from this parser be used somehow to automate indexing: http://www.jdictionary.com/parser ?　

hide replies show replies

blay_paul June 24, 2010 June 24, 2010 at 6:50:19 PM UTC

flag

Report

link

Permalink

Short answer, no.

Longer answer, it might be of some use but it would also take a lot of time setting it up for the best results. It's probably not worth the effort. There are things that could be done to speed things up but they require volunteer(s) with the right know-how and quite a bit of free time.

hide replies show replies

Scott June 24, 2010 June 24, 2010 at 8:00:47 PM UTC

flag

Report

link

Permalink

It too bad that it takes so long. がんばって！

JimBreen June 25, 2010 June 25, 2010 at 1:48:01 AM UTC

flag

Report

link

Permalink

That parser saeb mention uses MeCab (as does Tatoeba), The original indices were generated using ChaSen (very similar to MeCab), but they have been massively massaged since then. The trouble with using MeCab is that it's too fine-grained, and will break up compound nouns, expressions, etc. which for indexing should be kept whole. A better tool would be WWWJDIC's text glosser. See:
http://www.csse.monash.edu.au/~...81%A7%E3%81%AB
for an example. Its output is more aligned to dictionary entries.

MUIRIEL June 24, 2010 June 24, 2010 at 2:04:10 PM UTC

flag

Report

link

Permalink

How should sentences like the following sentence be tagged:
http://tatoeba.org/deu/sentences/show/34979

It's not a "proverb" as the part to which the tag should refer is only "land of milk and honey".

"expression"? "phrase"? "saying" ?

hide replies show replies

xtofu80 June 24, 2010 June 24, 2010 at 2:31:00 PM UTC

flag

Report

link

Permalink

idiom

hide replies show replies

MUIRIEL June 24, 2010 June 24, 2010 at 6:46:12 PM UTC

flag

Report

link

Permalink

thank you, I'll use idiom =).

blay_paul June 24, 2010 June 24, 2010 at 2:31:02 PM UTC

flag

Report

link

Permalink

I don't think there is an official answer yet, but you could always tag it as "uses idiom" or "includes proverb" or something.

hide replies show replies

xtofu80 June 24, 2010 June 24, 2010 at 2:32:52 PM UTC

flag

Report

link

Permalink

I generally tag such sentences with the "idiom" tag. The only problem is that the idiom itself is not marked within the sentence. But I guess, most people can figure it out for themselves with the hint that it actually is an idiom.

hide replies show replies

sysko June 24, 2010 June 24, 2010 at 2:34:02 PM UTC

flag

Report

link

Permalink

at least for more obfuscated idiom, one can precise the "pure" form of the idiom in comment

hide replies show replies

Demetrius June 24, 2010 June 24, 2010 at 4:00:02 PM UTC

flag

Report

link

Permalink

BTW, Russian has a subtlier distinction:
* пословица is the proverb that is a full sentence
* поговорка is the proverb that is not a full sentence and is used in context
Belarusian and Ukrainian have this as well.

IMHO we should mark the distinction somehow when tagging, otherwise it'll be impossible to translate tags when they'll become translatable.

CK June 24, 2010, edited October 26, 2019 June 24, 2010 at 3:29:57 AM UTC, edited October 26, 2019 at 3:57:55 AM UTC

flag

Report

link

Permalink

[not needed - removed by CK]

hide replies show replies

Demetrius June 24, 2010 June 24, 2010 at 9:52:08 AM UTC

flag

Report

link

Permalink

IMHO a more general search with tags would be more useful.

E.g. find sentences tagged "OK" and not tagged "easy" in Chinese ("OK -easy"?).

hide replies show replies

sysko June 24, 2010 June 24, 2010 at 10:23:36 AM UTC

flag

Report

link

Permalink

sure, it's planned but it's not in my priority list

sysko June 24, 2010 June 24, 2010 at 10:00:02 AM UTC

flag

Report

link

Permalink

Yep in fact this has been considered for a long time because has Scott said this is the major criticisms Tanaka corpus has, and so tatoeba too

The major problem is "how" do this, now we have tags , I can make the "ok" tag a special tag, which only trusted user who are not the sentence owner (no CK I'm not talking about you :P) can set

and maybe add a bit everywhere a "show only proofread" checkbox

hide replies show replies

blay_paul June 24, 2010 June 24, 2010 at 10:18:22 AM UTC

flag

Report

link

Permalink

> which only trusted user who are not the sentence owner
> can set

Not to be picky, but any trusted user could
* Un-own one of their sentences.
* Mark it OK
* Re-own it.

What I would suggest is that two different trusted users vet it as OK (that could include the sentence owner). One marks it "Checked by [name]" then anybody who is not [name] can change that to OK.

hide replies show replies

sysko June 24, 2010 June 24, 2010 at 10:22:07 AM UTC

flag

Report

link

Permalink

I'm even more picky ("never trust your user, even trusted user :P")
in the database we have a table which keep logs of action on sentences, so I will check the guy who "added" it, not the current owner :) (otherwise you would have not been able to correct a tanaka corpus sentence and set it to ok)

hide replies show replies

sysko June 24, 2010 June 24, 2010 at 10:41:57 AM UTC

flag

Report

link

Permalink

in fact to have your "need two user" we can consider that "owned" a sentence is the first proofread (and then when you adopt a sentence you will not be able to set the OK tag, like the guy who contributed this sentence)
and then if the sentence is owned and you're not the current owner nor the guy who added it then you will be able to set the OK tag

Scott June 24, 2010 June 24, 2010 at 3:42:38 AM UTC

flag

Report

link

Permalink

Your idea seems to be about some system of verification or vetting. I think it's a good idea. I don't know if this is planned but it would be great to have a system where trusted users can verify sentences as correct. Eventually you would end up with a bank of sentences guaranteed to be good and this could address one of the major criticisms directed at the Tanaka corpus as being unreliable and containing unnatural sentences.

sysko June 24, 2010 June 24, 2010 at 10:01:46 AM UTC

flag

Report

link

Permalink

at least I plan very soon to make the following things
* filter by language on a tag page (show only ok sentences in French etc.)
* have a nice page to list only sentences in XXXX not translated in YYYY

Wall (7277 threads)

Astúcias

sharptoothed

small_snow

frpzzd

LeviHighway

frpzzd

sharptoothed

LeviHighway

lingomaxim

frpzzd

LeviHighway

Besonh d'ajuda ?

Desvolopaires

A prepaus