Wall - Tatoeba

Wall (6,960 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages

subdirectory_arrow_right

marafon

4 days ago

feedback

CK

4 days ago

feedback

sharptoothed

10 days ago

subdirectory_arrow_right

Cangarejo

10 days ago

subdirectory_arrow_right

Cangarejo

13 days ago

subdirectory_arrow_right

Thanuir

13 days ago

subdirectory_arrow_right

ondo

14 days ago

subdirectory_arrow_right

ddnktr

14 days ago

feedback

ondo

14 days ago

subdirectory_arrow_right

AlanF_US

18 days ago

Demetrius August 22, 2010 August 22, 2010 at 9:18:52 PM UTC

link

Permalink

[@sysko and anyone interested]
Uzbek script-switching code is here:
http://uyghur.webatu.com/uzb/uzbek_script.zip

The Cyrillic>Latin should work fine, Latin>Cyrillic may have problems with Russian loanwords. Also, it may be slow, since it’s a simple PHP str_replace w/ 2 arrays.

A simple page for anyone to toy with the transliteration:
http://uyghur.webatu.com/uzb/

blay_paul August 22, 2010 August 22, 2010 at 9:01:16 PM UTC

link

Permalink

There are about 26 sentences with the 'Bosnian' tag. Is that enough for a flag?

hide replies show replies

FeuDRenais August 22, 2010 August 22, 2010 at 9:02:30 PM UTC

link

Permalink

Yes. Especially because there are going to be lots, lots more...

hide replies show replies

Demetrius August 22, 2010 August 22, 2010 at 9:15:31 PM UTC

link

Permalink

Are there any Bosnian and Croatian sentences these NOT matching Serbian?.. :o

hide replies show replies

FeuDRenais August 22, 2010 August 22, 2010 at 9:24:13 PM UTC

link

Permalink

Nope :-)

I add all three versions together when I add Bosnian/Croatian.

FeuDRenais August 22, 2010 August 22, 2010 at 9:27:08 PM UTC

link

Permalink

We really need the flags before the duplicate-script removal starts eliminating Bosnian sentences as duplicates of the Croatian... ;-)

(just kidding, I differentiated all the flags)

FeuDRenais August 22, 2010 August 22, 2010 at 9:02:56 PM UTC

link

Permalink

(Croatian as well, please)

blay_paul August 21, 2010 August 21, 2010 at 11:44:48 AM UTC

link

Permalink

Quick request.

Could you add a link from the sentence annotations page of an example to the example itself?

e.g.
http://tatoeba.org/eng/sentence...ns/show/118697
should have a link to
http://tatoeba.org/eng/sentences/show/118697

FeuDRenais August 20, 2010 August 20, 2010 at 6:12:59 PM UTC

link

Permalink

It seems silly that this hasn't been brought up before, but why not add a note about the need for periods and capitalization? I think a lot of new users are just not aware of this point until they get commented on.

hide replies show replies

TRANG August 20, 2010 August 20, 2010 at 9:09:54 PM UTC

link

Permalink

We'll be processing sentences to automatically add missing periods, capital letters and fix other typography issues. Sacredceltic had actually brought this up already.

But it's true we can add a note about punctuation and capital letters. I'll append it to the red warning message below the translation form.

hide replies show replies

Demetrius August 21, 2010 August 21, 2010 at 7:56:16 PM UTC

link

Permalink

BTW, we should be careful when adding sentences with periods.

Sometimes there are non-standard 1337-like ‘punctuation’ like "!!1" in the end.

I’m not sure whether we should keep such sentences though...

hide replies show replies

blay_paul August 21, 2010 August 21, 2010 at 9:28:04 PM UTC

link

Permalink

I think non-standard punctuation is appropriate when the rest of the example is of the same style. (This applies to the ♪'s used to end some Japanese sentences).

saeb August 21, 2010 August 21, 2010 at 3:07:22 AM UTC

link

Permalink

I've got ~3k sentences with the period at the beginning -_-' (help, plz?)

hide replies show replies

sysko August 21, 2010 August 21, 2010 at 11:33:11 AM UTC

link

Permalink

I think looking how long we know each other that I can do it for you :p
can you give an id of a correct position period and a non correct one, because in the database I'm a bit confused about how they display arabic script.

hide replies show replies

saeb August 22, 2010 August 22, 2010 at 12:36:20 PM UTC

link

Permalink

thank you so much sysko

period at the beginning:
http://tatoeba.org/eng/sentences/show/370561

period at the end:
http://tatoeba.org/eng/sentences/show/370569

blay_paul August 20, 2010 August 20, 2010 at 6:45:52 PM UTC

link

Permalink

It sounds like a good idea to me.

blay_paul August 20, 2010 August 20, 2010 at 4:14:13 PM UTC

link

Permalink

Thursday WWWJDIC examples status update.

30 records deleted.
13 records added.

Swift August 19, 2010 August 19, 2010 at 6:20:02 PM UTC

link

Permalink

I've been doing a bit of thinking about tags. So far, I've not figured out how to remove a tag from a sentence and browsing the tags page,
http://tatoeba.org/eng/tags/view_all
there seems to have been quite a proliferation since they were introduced. Now that we have a bit of experience with these, it might be a good idea to figure out where we want to take them and how to structure them for that purpose.

I grabbed the list of tags and sorted them out a bit at
http://martin.swift.is/tatoeba/tags.html
A few thoughts:

1) There seem to be a number of empty and duplicate tags. Can Trang or Sysko easily delete these?

2) The @-rule seems to be a good one but some of the tags should be merged (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them).

3) The biggest group of tags (the way I sorted them) is the quotations. Seeing how a sentence can just as well be about a person as a quote by that person, I think the "by-" and "from-" prefixes are useful for both clarification and sorting.

4) Tag language and script. I'm as much of an l10n/i18n freak as the next person, but seeing how that will be sorted out in the near(?) future (there was a post about this at some point recently, wasn't there?) I think we should translate/transliterate the non-English ones and stick with that until we can translate tags. We could possibly include the extended Latin character set but even that isn't necessary for the current purpose.

5) Given the sheer number of tags, it might be nice to have these split up along some lines. How that is done is going to be pretty subjective, but I'm sure we can hash something out. Is there a way to add a field on the tags in the database to categorise them?

6) We might want to consider tossing out some of the tags. 'Fruit' may be fine, but 'apples' just seems a bit useless given the search feature.

Finally, I think the tags can be a fantastic tool, but so far they seem a bit disorganised and unwieldy to be used on a mass scale. Some statistics on how many sentences are tagged and how many sentences each tag has would be very interesting to gauge how people are using them.

hide replies show replies

FeuDRenais August 19, 2010 August 19, 2010 at 11:23:43 PM UTC

link

Permalink

> (sentences with the native-check tag should probably be orphaned so that a sufficiently proficient speaker can adopt them)

I disagree on this point. There's a reason why it's "Needs Native Check" (rather than "Needs Native Parent"). The creator of the phrase should keep the phrase, and there's no more work in checking than in parenting (in fact, parenting would involve the check anyway).

hide replies show replies

Swift August 20, 2010 August 20, 2010 at 3:35:17 PM UTC

link

Permalink

What's the point in parenting a sentence if you're not confident that it's correct?

There *is* slightly more work (and arguably less incentive) in leaving a note on someone's sentence and for them to respond, than for a "checker" to simply adopt the sentence and fix it themselves.

(Oh, and I don't think the tags system is tried and trusted enough to warrant an appeal to tradition.)

hide replies show replies

Demetrius August 20, 2010 August 20, 2010 at 3:47:30 PM UTC

link

Permalink

Leaving a comment lets a non-native owing it learn the correct variant or be confident in their translations.

FeuDRenais August 20, 2010 August 20, 2010 at 3:55:46 PM UTC

link

Permalink

There are multiple points in doing so:

1) The learning principle. A native speaker correcting it serves as a lesson. One side educates, and the other rectifies a certain mistake and probably won't repeat it. If you go for pure efficiency, then you annihilate this aspect, which, IMO, would be a pity. You'd probably want an environment where people can grow. That involves making mistakes and learning.

2) People grow attached to sentences. Absurd, but we do. They're often a reflection of a person's wit, identity, or interests. And if you're only, say, 95% sure of the translation and would like a check, seeing it orphaned and then with another person's name on it is... well, strange.

3) The native check is mostly to make sure that the grammar and the naturalness of the sentence doesn't have any issues, while the originator of the sentence has a responsibility towards all the translations that it's linked to, and how the sentence fits in on the whole. In this sense, it *is* more work. The person adopting the sentence has to be able to guarantee that all the translations remain valid when he fixes.

Demetrius August 20, 2010 August 20, 2010 at 10:45:57 AM UTC

link

Permalink

@FeuDRenais
Also, there was a proposal to write all tags lowercase:
http://tatoeba.org/eng/wall/sho...8#message_1588

TRANG August 19, 2010 August 19, 2010 at 9:13:23 PM UTC

link

Permalink

> So far, I've not figured out how to remove a tag from a
> sentence

If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.

> Now that we have a bit of experience with these, it might
> be a good idea to figure out where we want to take them
> and how to structure them for that purpose.

Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
(cf. "Some of the Tags That I've Found")

> 1) There seem to be a number of empty and duplicate tags.
> Can Trang or Sysko easily delete these?

Yes.

> 4) Tag language and script. [...] I think we should
> translate/transliterate the non-English ones and stick
> with that until we can translate tags.

It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate. For instance if you tag a French sentence with "preterite", it can be ambiguous whether "preterite" refers to the "passé simple" or to the "imparfait". So if you're going to put a tag that refers to a tense, it's probably better to use the name in the language of the sentence. That's one case I can think of but there are probably other cases...

I still agree though, that we should use English tags by default, until we can translate tags.

> 5) Is there a way to add a field on the tags in the
> database to categorise them?

At the moment, no.

> 6) We might want to consider tossing out some of the
> tags. 'Fruit' may be fine, but 'apples' just seems a bit
> useless given the search feature.

I agree.

> Finally, I think the tags can be a fantastic tool, but so
> far they seem a bit disorganised and unwieldy to be used
> on a mass scale.

I agree. We do have more plans for tags, but they're not a priority yet...

> Some statistics on how many sentences are tagged and how
> many sentences each tag has would be very interesting to > gauge how people are using them.

Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
http://tatoeba.org/files/downloads/tags.csv

The fields are: sentence_id, tag_name

It's exported every week.

hide replies show replies

Swift August 20, 2010 August 20, 2010 at 3:12:47 PM UTC

link

Permalink

> If you are not a moderator, you can only remove tags that you have added. You can't remove others' people tags.

Fair enough. :-)

> Thanks for taking the time ^^ CK had also tried to sort the tags: http://a4esl.com/temporary/tatoeba/
> (cf. "Some of the Tags That I've Found")

Interesting.

>> 1) There seem to be a number of empty and duplicate tags.
>> Can Trang or Sysko easily delete these?
>
> Yes.

OK, then let's start by deleting the ones in:
<http://martin.swift.is/tatoeba/tags.html#empty>
and then look at merging duplicates.

Tatoeba, by the way, doesn't seem to show any difference between an empty and non-existent tag.

>> 4) Tag language and script. [...] I think we should
>> translate/transliterate the non-English ones and stick
>> with that until we can translate tags.
>
> It's not always easy to translate, and sometimes a non-English tag may be more appropriate/accurate.

I reckon the best solution is to use whatever term would be used on the English translation once we're able to translate tags.

>> 6) We might want to consider tossing out some of the
>> tags. 'Fruit' may be fine, but 'apples' just seems a bit
>> useless given the search feature.
>
> I agree.

OK, we can have a look at this once we've gotten rid of some of the empty and duplicate categories.

>> Some statistics on how many sentences are tagged and how
>> many sentences each tag has would be very interesting to > gauge how people are using them.
>
> Actually if you want to do some stats, you can use this file (it's not part of the official downloads files yet):
> http://tatoeba.org/files/downloads/tags.csv

Great! Thanks.

hide replies show replies

sysko August 20, 2010 August 20, 2010 at 3:19:26 PM UTC

link

Permalink

I've deleted this morning (GTM +1h) the empty tags
If I've time, I will see to make the difference between a non existent and empty tag (in fact there is, non existent tag show an empty page, and empty tag at least show the name of the tag in title:p )
Next release (tomorow) will show tags sorted by number of tags, and autocompletion when adding a tag, this should make it a bit usefull and avoid most of mistyping / duplicate tags

hide replies show replies

Swift August 20, 2010 August 20, 2010 at 3:51:57 PM UTC

link

Permalink

> I've deleted this morning (GTM +1h) the empty tags

Great! Thanks.

> Next release (tomorow) will show tags sorted by number of tags

I'm actually not convinced that this is terribly useful. My impression is that one would use the tags to find a particular topic (in which case, alphabetical ordering would make the most sense) rather than just any popular one (as one might when visiting a blog; which is why things like tag-clouds are appropriate there).

Seeing how the great number of tags would benefit from categorisation, it might actually be helpful to have a semi-automated list of tags... Tatoeba is currently written in PHP, right?

Seeing how many sentences are filed under a tag would be useful, though.

> and autocompletion when adding a tag

This will be a fantastic addition!

hide replies show replies

sysko August 20, 2010 August 20, 2010 at 3:56:03 PM UTC

link

Permalink

yep it just that I've added the count while adding the autocompletion, to have tag name with more tag appearing first in suggestion. For the moment my personnal live is a bit busy so for sure categorisation will be great, but I didn't found time to do it yet ^^

Demetrius August 20, 2010 August 20, 2010 at 10:49:29 AM UTC

link

Permalink

Some more thought about tags.

BTW, IMHO tags should be as general as possible: not “2nd Person Formal”, but “2nd Person”, “Formal”. This makes them more useful for automated processing.

hide replies show replies

sysko August 20, 2010 August 20, 2010 at 10:53:08 AM UTC

link

Permalink

blay_paul August 19, 2010 August 19, 2010 at 6:37:59 PM UTC

link

Permalink

> So far, I've not figured out how to remove a tag from a sentence

First step - become a moderator.

Demetrius August 20, 2010 August 20, 2010 at 10:54:29 AM UTC

link

Permalink

> So far, I've not figured out how to remove a tag from a sentence

The alghoritm is:
a) add the same tag to any other sentence (temporarily)
b) copy the delete link and replace the sentence number
c) delete your temporary tag

This is a bug. Use only when you’re absolutely sure it’s applicable.

Gruzilkin August 20, 2010 August 20, 2010 at 10:24:31 AM UTC

link

Permalink

Am I doing something wrong or is there no such function?

I want to search for English sentences that are NOT translated into Russian (so I could translations), and I don't see how to do it

hide replies show replies

FeuDRenais August 20, 2010 August 20, 2010 at 10:37:48 AM UTC

link

Permalink

Click on the English flag on the top right corner of the main page. This will give you ALL the English sentences available. Then on the right side you'll see options to narrow down the list. You want to choose the one "Show Sentences Not Directly Translated Into...", and then "Russian". Voilà.

FeuDRenais August 20, 2010 August 20, 2010 at 10:39:45 AM UTC

link

Permalink

As a side note, you should probably select "Show Translations in...", then "All Languages". Sometimes, there's already a good English-to-Russian translation, but it just hasn't been linked. For efficiency reasons, it might be better to just request that someone with the power to link link the two.

FeuDRenais August 20, 2010 August 20, 2010 at 9:59:16 AM UTC

link

Permalink

Question Regarding Transliterations:

How much work, and what exact steps, are needed to set up a transliteration system for a specific language? I ask because we have a number of languages now that can be written in multiple alphabets. For many of these, the task seems to be a very simple one, as a one-to-one letter correspondence between the alternative alphabets exists (e.g. Serbian). For some, it is only possible to do it in one way but not the other (unless a dictionary is available), but again, the task should not be a difficult one as the majority of the current entries are inputted on the good side of the one-way (e.g. Uighur, Uzbek, and - I think, but Demetrius could confirm - Tatar).

So, what would one have to do to realize this?

hide replies show replies

Demetrius August 20, 2010 August 20, 2010 at 10:39:35 AM UTC

link

Permalink

It's easy, but I'm too lazy to do this. ^^ I've started working on Uzbek, maybe I'll finish it someday...

Also, sysko has to do transliteration caching. It will allow making transliteration more time-intensive (dictionary searches...).

> one-to-one letter correspondence between
> the alternative alphabets exists (e.g. Serbian).
But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

> Uighur
Latin > Arabic is the easiest (unless people omit ' and don't differenciate j/zh ^^).
Since we have no Latin Uyghur sentences, there is no rush.

Others require a LARGE dictionary of proper names, since Arabic has no capital letters. Cyrillic requires a dictionary of Russian loanwords.

> and - I think, but Demetrius could confirm - Tatar
No, this one is tricky in both directions. The hardest part is q and ğ. Usually к = k and г = g w/ front vowels, к = q and г = q with back ones.

But Arabic words break vowel harmony:
нигъмәт — niğmət ‘dish’ (ğ is marked as гъ), сәгать — səğət ‘clock’ (ğ is marked by changing vowel letter, the vowel quality is marked by the soft sign), сәгатем — səğətem ‘my clock’ (ğ is marked by a vowel letter w/out a soft sign)

Russian loanwords break vowel harmony in other way, they force the K and G even near back vowels.

Also, there are W and V:
В = V (вагон — vagon ‘carriage’), W (авыл — awıl ‘village’)
У = U (су — su ‘water’), W (тау = taw ‘mountain’)
Ү = Ü (күрү — kürü ‘see’), W (Мəскəү — Məskəw ‘Moscow’)

hide replies show replies

FeuDRenais August 20, 2010 August 20, 2010 at 10:56:17 AM UTC

link

Permalink

> But injekcija = инјекциjа, not ињекциjа. So you need a dictionary when transcribing from Latin...

Good point. But transliteration can't handle letter pair --> single letter correspondence? That would be much easier than a dictionary. Unless there are instances of "nj" that are нј and not њ, but this is never the case. Anyway, there are nearly no Latin Serbian submissions up to now, and so a one-way from the Cyrillic (if that's what it comes down to), is perfectly okay, IMO.

> Uighur

I disagree here. You'd need a big dictionary for Latin to Arabic. One particular example is n+g and ng (نگ and ڭ). The Latin "ng" could be transliterated as either. I think there are other cases, as well (personally, I don't much like the Latin Uighur...)

Arabic to Latin WITHOUT proper names is perfectly all right, in my opinion. It's not perfect, but it would still make a world of difference and people would usually be able to figure out what should be capitalized anyway.

hide replies show replies

Demetrius August 20, 2010 August 20, 2010 at 11:02:51 AM UTC

link

Permalink

> Good point. But transliteration can't handle
> letter pair --> single letter correspondence?
Of course it can. I mean that usually nj = њ, but in the word injekcija it's a morpheme boundary and it should be retained нј.

> Unless there are instances of
> "nj" that are нј and not њ,
> but this is never the case.
Инјекциjа!

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.
In fact, you don't need a dictionary at all for Latin>Arabic. Only for zh/j and ' if people omit these.

All the other directions you need a dictionary.

> The Latin "ng" could be transliterated as either.
No, as far as I know. Latin requires breaking these with ': ng and n'g.

> I think there are other cases,
> as well (personally, I don't
> much like the Latin Uighur...)
But it's indeed very easy to process.

hide replies show replies

FeuDRenais August 20, 2010 August 20, 2010 at 11:18:38 AM UTC

link

Permalink

> Инјекциjа!

I should have said "this is never the case, with the exception of a few foreign words". A small dictionary could be made for those, but the loss isn't great if the transliteration doesn't handle them properly.

> No, as far as I know. Latin requires breaking these with ': ng and n'g.

In that case, it's fine.

> I disagree here. You'd need a big dictionary
> for Latin to Arabic.

I meant Arabic to Latin.

sysko August 20, 2010 August 20, 2010 at 10:52:33 AM UTC

link

Permalink

I think what I can do,
have the automatic transliteration tool for each languages
each time a new sentences is added / updated or if missing, we call the tool and store it in a specific table of the database.
if it exists then we just retrieve it from the database.

and maybe add a special page for trusted users, to give them the possibility to edit the stored transliteration,
this way :
your dictionnary will not need to handle eeeeevery case, at least as soon as it handles most common case we can put it even if some particular sentences it will need a manual edit. This way we can complete the dictionnary step by step and make the feature sooner avaiblable.
And when a transliteration will be edited it will be flagged, this way if for some reason we update and regenerate the transliteration, we will not erase the manually edited one (as we can suppose they're right)
I think is the best thing we can do, as anyway for lot of languages, making a 100% correct transliteration tool is a dream (even with long running tool as mecab and adso we reach I think 90%)

hide replies show replies

Demetrius August 20, 2010 August 20, 2010 at 10:56:14 AM UTC

link

Permalink

OK, that is a good idea. I'll send my (imperfect) Uzbek transliterator on Monday.

sysko August 20, 2010 August 20, 2010 at 10:15:10 AM UTC

link

Permalink

for some Demetrius started to work on them, you can see with him what he already done, what is hardly doable etc.
For the other one, either give me a letter to letter/ word to word transliteration file (like origin[tab]transliterate) and I think it will not take me long to integrate it in tatoeba.

Swift August 19, 2010 August 19, 2010 at 1:21:32 PM UTC

link

Permalink

I've been seeing if I'll just grow accustomed to the new contribute setup but so far I haven't.

The random sentences had the drawback that occasionally you hit sentences you'd seen before. Getting the latest sentences has the advantage that you more easily spot multiple sentences that can be linked to your translation, but increases the times one bumps into sentences one's already seen (and not translated for whatever reason -- the exclude direct translations feature is great, by the way).

Another advantage of getting sentences in the order they were contributed in is that you translate whole batches of similar sentences making search results more more well rounded even if they cover fewer topics.

How about flipping the order on the contribute page around (or add the option), starting with the oldest sentences? That way one can work ones way through the batch, filtering out already translated sentences and starting after the ones one has chosen not to translate.

As a bonus, these are generally orphaned sentences and often need more attention than the ones active contributors are adding.

hide replies show replies

sysko August 19, 2010 August 19, 2010 at 1:32:41 PM UTC

link

Permalink

You can get orphan sentences by setting "translated into none" :)
For the "several random sentences" page, I will get it back soon, but for the moment this page is slow as hell and was really internally bad designed, but I figured to make it fast so it will soon be as before.

The page which show all the sentences in a language show newest, because this way it favourites collaboration, I add a sentence, it will get more chance to be translated than old one because it will appear on first page for a while (depending of this language activity, not sure "a while" will be very long for Russian or Esperanto :p). But you can go to the oldest one by going clicking on "last" :p (more seriously, yep we can add a revert order button)

hide replies show replies

Swift August 19, 2010 August 19, 2010 at 5:49:49 PM UTC

link

Permalink

Not quite sure what you mean by "translated into none". By "orphaned" I mean the sentences that aren't owned by anyone.

Going to the last page is certainly a workaround but showing the oldest first would make browsing and finding where one left off much easier (as the reference point wouldn't be moving constantly.

The collaboration argument is a very good one, but it would still be nice to be able to choose to work on the back-log. :-)

Many thanks!

Swift August 19, 2010 August 19, 2010 at 1:09:16 PM UTC

link

Permalink

Would anyone be terribly upset if we got rid of one of these?
http://tatoeba.org/eng/sentences/show/222359
http://tatoeba.org/eng/sentences/show/222367

hide replies show replies

blay_paul August 19, 2010 August 19, 2010 at 1:25:04 PM UTC

link

Permalink

Not in this case - because the only difference is whether one word is in hiragana or in kanji.

Wall (6,960 threads)

Tips

marafon

CK

sharptoothed

Cangarejo

Cangarejo

Thanuir

ondo

ddnktr

ondo

AlanF_US

Need some help?

Developers

About