menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (6,960 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

marafon

3 days ago

feedback

CK

3 days ago

feedback

sharptoothed

9 days ago

subdirectory_arrow_right

Cangarejo

9 days ago

subdirectory_arrow_right

Cangarejo

12 days ago

subdirectory_arrow_right

Thanuir

12 days ago

subdirectory_arrow_right

ondo

13 days ago

subdirectory_arrow_right

ddnktr

13 days ago

feedback

ondo

13 days ago

subdirectory_arrow_right

AlanF_US

16 days ago

blay_paul blay_paul June 13, 2010 June 13, 2010 at 8:42:09 AM UTC link Permalink

Goodbye to [M], [F]

Now that the tag system has arrived this is a good opportunity to get rid of the [F] and [M] tags. Please leave the others in place for now. I will email lists of sentence IDs for sentences you can apply 'female' and 'male' tags to.

{{vm.hiddenReplies[1261] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 14, 2010 June 14, 2010 at 10:59:27 PM UTC link Permalink

Any progress on the male / female tag import?

There's no rush, but it would be nice if we could have it sorted out by the Saturday update for Jim.

{{vm.hiddenReplies[1276] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 15, 2010 June 15, 2010 at 1:00:08 PM UTC link Permalink

Importing the tags should be done by Saturday.

Removing the [M] and [F] can be done by Saturday but I prefer not to rush on this.

Exporting the tags can be a bit tricky but I'll send an email to Jim to tell him what I can easily do, and see if that's fine with him.

TRANG TRANG June 13, 2010 June 13, 2010 at 8:00:10 PM UTC link Permalink

What about the [XXX] sentences? I started taking them out of the sentence and add a real tag instead. But since we don't export tags (yet) in our downloads files, I figured perhaps you still need to keep the tag in the sentence?

If you don't, then you can just erase the XXX
=> http://tatoeba.org/eng/tags/sho...s_with_tag/XXX

If you do need them, then you can add back the [XXX] where I took it off.

{{vm.hiddenReplies[1266] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 13, 2010 June 13, 2010 at 8:35:38 PM UTC link Permalink

I don't think we should remove other tags yet. I started with [M] and [F] because (a) they are automatically generated so it was easy to update them and (b) they are not that important.

In particular I don't think taking the [XXX] tags out now is a good idea because there is still no way for users to filter them out. At least seeing the [XXX] tells users that we know it's a dodgy sentence.

blay_paul blay_paul June 13, 2010 June 13, 2010 at 1:20:43 PM UTC link Permalink

OK, I've sent in the female / male tag list. The next question will be how to process them for WWWJDIC. You should check with Jim for that - although you could just add the tags to the end of their entries in WWWJDIC.csv with [ ] added around them.

cburgmer cburgmer June 13, 2010 June 13, 2010 at 1:11:21 PM UTC link Permalink

I for one welcome our new tag system.

I hope you guys&girls do think about us who we later need to implement some logic on the data. While tags add important knowledge for humans I'm still not sure how to use this information when automatically selecting translations.

I do think though that "our" data is safe in your hands. Thanks for the good job!

{{vm.hiddenReplies[1263] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 14, 2010 June 14, 2010 at 10:45:21 PM UTC link Permalink

sysko and I were pondering on what exactly you meant by "I hope you guys&girls do think about us who we later need to implement some logic on the data".

Is there anything specific we should think about? :)

{{vm.hiddenReplies[1275] ? 'expand_more' : 'expand_less'}} hide replies show replies
cburgmer cburgmer June 15, 2010 June 15, 2010 at 11:25:46 AM UTC link Permalink

He, thanks for asking.

My current perspective is how to select good sentences based on a given word. So if I have the word "love" I'll probably have several English sentences that include this word. Now if English had different pronouns depending on gender I would have two versions of each sentence, e.g. "I love you (male)" and "I love you (female)". Now think of even more specific tags, like "literal" and "metaphorical" translation. Which would provide the best examples? Probably the latter one (see discussion here http://tatoeba.org/deu/sentence...334#comments). "Literal" translations really are just that, translations. They probably don't make good example sentences. A good logic would need to take care of that.

Me as a programmer I would need to have the answers here (I'm of course speaking from the perspective of integrating Tatoeba with Eclectus, sysko should know :). While tags provide a good logic for humans, it does probably complicate things here for me (and most probably others).

So my basic point is, don't forget the machine readable side over the human readable website.

Not sure if you pondered all the cases where Tatoeba data could be used. But maybe such an analysis could assist your design decisions.

CK CK June 10, 2010, edited October 25, 2019 June 10, 2010 at 2:14:49 PM UTC, edited October 25, 2019 at 8:09:48 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1234] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 14, 2010 June 14, 2010 at 10:39:13 PM UTC link Permalink

Until we have a "links" section, perhaps you can simply put these kind of links in your profile.

It will certainly make them easier to find than if they get buried in the Wall.

brauliobezerra brauliobezerra June 14, 2010 June 14, 2010 at 5:06:13 PM UTC link Permalink

Trang or sysko, the "csv" export still has leading \n and \t escaped with a '\'. For example, French sentences from 181804 to 181859. Please use some trim function :D.

blay_paul blay_paul June 13, 2010 June 13, 2010 at 9:52:15 AM UTC link Permalink

[@moderator] Should be deleted

This list can now be substituted by adding the tag "to delete" to the sentence. You should still add a comment explaining why.

Note that the moderators may not speak the same languages you do - so an additional explanation in English may speed things up. :-)

{{vm.hiddenReplies[1262] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 13, 2010 June 13, 2010 at 6:29:35 PM UTC link Permalink

Actually I just thought of one problem - there isn't a way to search for tags yet, is there?

{{vm.hiddenReplies[1265] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK June 14, 2010, edited October 25, 2019 June 14, 2010 at 11:11:45 AM UTC, edited October 25, 2019 at 8:09:16 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1270] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 14, 2010 June 14, 2010 at 11:24:42 AM UTC link Permalink

Yeah, should be a little careful applying proverb tags to sentences with [proverb]. They may have [proverb] on non-proverb English sentences that are translations of Japanese proverbs.

TRANG TRANG June 13, 2010 June 13, 2010 at 12:45:30 AM UTC link Permalink

Tags!

http://blog.tatoeba.org/2010/06...12th-2010.html

To quote myself:
"We count on everyone to try and help us figure out what works best. Feel free to discuss about issues related to tags on the Wall."

{{vm.hiddenReplies[1246] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK June 13, 2010, edited October 25, 2019 June 13, 2010 at 2:13:41 AM UTC, edited October 25, 2019 at 8:09:29 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1251] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 14, 2010 June 14, 2010 at 10:35:05 PM UTC link Permalink

Concerning "checked by X people" => yes, it's something we have thought of, but for technical reasons we didn't do it. sysko may tell you more about it, but at any rate, it was not very urgent.

Concerning the the pre-defined tags, we also thought of it. I forgot to talk about it in the blog post... But I'm still unsure of what to put in the list of pre-defined tags. I actually first want to let all users tag with whatever they want.

I also remember I forgot to talk about auto-completion. Well in general there are lots of things we can do with tags =)

I'm okay with using "OK" instead of "checked". Although I'm not totally sure if people will understand that it means the sentence has been proofread, but it's shorter and we can later add descriptions to tags. We can also change the name if it turns out to be too confusing.

CK CK June 13, 2010, edited October 25, 2019 June 13, 2010 at 2:15:49 AM UTC, edited October 25, 2019 at 8:09:22 AM UTC link Permalink

[not needed anymore- removed by CK]

CK CK June 12, 2010, edited October 25, 2019 June 12, 2010 at 4:39:35 PM UTC, edited October 25, 2019 at 8:09:35 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1243] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG June 13, 2010 June 13, 2010 at 12:40:45 AM UTC link Permalink

Well, the definition of "sentence" is still not exactly clear, to me at least. I mean, sometimes a "sentence" can be just a word... Like "Hello".

One thing is sure: you should avoid adding something that is clearly a partially formed sentences. Instead of "to be in love", you should add "He is in love" (for instance).

But in general, we are not very strict on the matter of what is accepted or not because we haven't decided yet what is a sentence.

We have only decided that a sentence has punctuation :)

The other problem is that I actually wouldn't even what word to use instead of "sentence"...

{{vm.hiddenReplies[1245] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 13, 2010 June 13, 2010 at 6:24:04 AM UTC link Permalink

> The other problem is that I actually wouldn't even
> what word to use instead of "sentence"...

I think 'sentence' is a useful approximation - especially if you take off your grammatician hat. ;-)

hamid hamid June 9, 2010 June 9, 2010 at 11:09:46 AM UTC link Permalink

I thought tatobea supports my native language(Farsi).
But now I know it was just a thought.

{{vm.hiddenReplies[1211] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamid hamid June 9, 2010 June 9, 2010 at 2:31:03 PM UTC link Permalink

Thanks. So, I start to add some sentences. And after a while, I will call you to add my language to your list.
I think this is better. Because I'm not that active or have no free time.

{{vm.hiddenReplies[1214] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 9, 2010 June 9, 2010 at 2:33:15 PM UTC link Permalink

no problem, even if there's only a douzen, it's enough, most of the times it's enough to attract more people contribute in your language :)

{{vm.hiddenReplies[1215] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamid hamid June 9, 2010 June 9, 2010 at 2:47:39 PM UTC link Permalink

Ok. But could you tell me how to stick a flag?

{{vm.hiddenReplies[1218] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 9, 2010 June 9, 2010 at 2:50:12 PM UTC link Permalink

the flag will be added by us on the interface, in the same time as the lanugage itself

Pharamp Pharamp June 9, 2010 June 9, 2010 at 2:39:32 PM UTC link Permalink

Hi Hamid :)
Which flag do you think should be used?
Iran, Afghanistan...?

{{vm.hiddenReplies[1216] ? 'expand_more' : 'expand_less'}} hide replies show replies
hamid hamid June 9, 2010 June 9, 2010 at 2:43:26 PM UTC link Permalink

Ofcourse Iran (Farsi).
afghanistan is Pashto.

blay_paul blay_paul June 9, 2010 June 9, 2010 at 11:31:48 AM UTC link Permalink

You can add sentences now and correct the flag when it's added to the list. Probably wouldn't take long.

sysko sysko June 9, 2010 June 9, 2010 at 1:14:17 PM UTC link Permalink

Hi hamid, in fact it's not because your language is not in the list that we don't want it/will not support, it's just one has to know we add a sentence in the list as soon as we have some sentences in it, otherwise the list will be full of hundreds of languages with 0 sentences, which will not be confortable for users
but if you're ready to add some sentences in your language, we will be glad to add it in the list :)

TRANG TRANG June 13, 2010 June 13, 2010 at 12:28:34 AM UTC link Permalink

Your language has been added :)

You can now set your sentences to "Persian".

Little note: for the name of the language we used "Persian" and not "Farsi", as Wikipedia says it's "the more widely used name of the language in English".

http://en.wikipedia.org/wiki/Persian_language

xtofu80 xtofu80 June 3, 2010 June 3, 2010 at 8:12:20 AM UTC link Permalink

I am not sure whether this is worth discussing, but there are some sentences which are really redundant, e.g.
162883, 83091, two rather long sentences which only differ in the subject being "my mom" vs. "my dad".
Shouldn't we remove one of such pairs and concentrate on the gist instead of wasting our efforts on translating countless variants?

{{vm.hiddenReplies[1085] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK June 12, 2010, edited October 25, 2019 June 12, 2010 at 3:31:00 AM UTC, edited October 25, 2019 at 8:09:42 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[1239] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 12, 2010 June 12, 2010 at 10:43:16 AM UTC link Permalink

Hi CK,
I completely agree with your notion of near duplicates versus clutter.
I think that besides "dealing" with clutter that already exists, we should also put some effort into guidelines about creating new content.

TRANG TRANG June 11, 2010 June 11, 2010 at 7:08:29 PM UTC link Permalink

Okay I haven't replied to this yet so I will, to make it clear about "variations" of sentences.

Our position is: people can do whatever they like. If they want to add all the possible variations, they can. If they don't want to, they don't have to.

It doesn't hurt to have "near duplicates". It just make Tatoeba a bit noisy. But that's our job, as engineers, to figure out how to filter and organize data so that it can be used efficiently for language learners.

Meanwhile, as sysko said, variations of sentences can be very useful for language processing, so we shouldn't delete them.

{{vm.hiddenReplies[1237] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 11, 2010 June 11, 2010 at 7:43:26 PM UTC link Permalink

Just to clarify the clarification. Near duplicates will be removed from WWWJDIC - but not by deleting them from Tatoeba. So feel free to point out Japanese sentences and English sentences linked to Japanese sentences that are near duplicates.

{{vm.hiddenReplies[1238] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 19, 2010 June 19, 2010 at 1:26:52 PM UTC link Permalink

Hi Paul, I saw you always post a comment "Not for WWWJDIC" in each sentence. Shouldn't that be solved by using tags?

{{vm.hiddenReplies[1315] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 19, 2010 June 19, 2010 at 1:33:44 PM UTC link Permalink

I could, but I started doing that before tags existed.

It also gives people a chance to notice what sentences I'm excluding and ask why (or just complain ;-).

{{vm.hiddenReplies[1316] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 19, 2010 June 19, 2010 at 1:53:38 PM UTC link Permalink

So do you filter the sentences according to your comment, or do you mark them somewhere else AND put a comment in?
I just want to know how we should approach sentences we find should not appear there (e.g. hiragana-kanji variants of exactly the same sentence.)

{{vm.hiddenReplies[1317] ? 'expand_more' : 'expand_less'}} hide replies show replies
blay_paul blay_paul June 19, 2010 June 19, 2010 at 2:08:01 PM UTC link Permalink

In the secret sentence annotation page, where the Japanese index can be entered / edited, I put -1 in the meaning field.

No one else can see that so the note is just to let people know what I'm doing (generally excluding near-duplicate sentences from WWWJDIC).

sysko sysko June 3, 2010 June 3, 2010 at 10:34:31 AM UTC link Permalink

In an other side I'm working with an other guy on a machine-learning based automated translator, and this kind of "near" duplicate sentences are REALLY usefull

{{vm.hiddenReplies[1092] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko June 3, 2010 June 3, 2010 at 10:37:24 AM UTC link Permalink

in fact as a learner I also like to find sometimes this kind of sentences where only a part change, it's easier to see some grammar point this way (because for example in French sentences changing a "my mom" by "my dad" could change the verbs / adjectiv and so in the sentences, which is always interesting to see this variation on the same sentence)

{{vm.hiddenReplies[1093] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift June 3, 2010 June 3, 2010 at 5:23:29 PM UTC link Permalink

On this point, I've chosen to add these nuances in comments. There are otherwise just going to be way too many similar sentences.

blay_paul blay_paul June 3, 2010 June 3, 2010 at 9:24:01 AM UTC link Permalink

> Shouldn't we remove one of such pairs and concentrate on
> the gist instead of wasting our efforts on translating
> countless variants?

There is a constant effort to remove near - duplicates. At the current rate we're probably losing a couple of dozen a week, if not more.

However removing duplicates does not produce _new_ content. And new content is what's needed to fill out Tatoeba and make it more appealing.

{{vm.hiddenReplies[1086] ? 'expand_more' : 'expand_less'}} hide replies show replies
xtofu80 xtofu80 June 3, 2010 June 3, 2010 at 10:06:29 AM UTC link Permalink

Yes, you are right, producing new content is also important, though I as a native German speaker am right now mostly busy with adding German translations to the already existing Jap-Eng. sentence pairs. And that's when I came across these near-duplicates.
Currently I am thinking about how I could involve my Japanese language exchange partner to produce some content. At least, I will check with her some sentences I found dubious.

So how would be the best procedure if I come across such a sentence pair? Make a comment? Add it to the "mark for deletion" list?

sysko sysko June 3, 2010 June 3, 2010 at 10:44:48 AM UTC link Permalink

moreover I think here the problem is not to have or not this countless variant (for the reasons below I would prefer to keep them), but rather "how to show to contributors only 'usefull' sentences"

blay_paul blay_paul June 10, 2010 June 10, 2010 at 1:00:04 PM UTC link Permalink

Please could we have the duplicate removal script run soon? (Before Saturday, anyway)

{{vm.hiddenReplies[1233] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp June 10, 2010 June 10, 2010 at 5:32:08 PM UTC link Permalink

I would like to ask also a manual update of Launchpad translations (Tatoeba > Launchpad sense) for translating all the new stuff :) merci^^

TRANG TRANG June 11, 2010 June 11, 2010 at 12:55:13 AM UTC link Permalink

Done.