menu
Tatoeba
language
S'inscriure Connexion
language Occitan
menu
Tatoeba

chevron_right S'inscriure

chevron_right Connexion

Percórrer

chevron_right Afichar la frasa aleatòria

chevron_right Percórrer per lenga

chevron_right Percórrer per lista

chevron_right Percórrer per etiqueta

chevron_right Percórrer los enregistraments àudio

Community

chevron_right Paret

chevron_right Lista de totes los membres

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7120 threads)

Astúcias

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Darrièrs messatges subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

TATAR1

3 days ago

subdirectory_arrow_right

AlanF_US

3 days ago

feedback

sharptoothed

4 days ago

subdirectory_arrow_right

Shanaz

7 days ago

subdirectory_arrow_right

Qaztat

8 days ago

subdirectory_arrow_right

TATAR1

8 days ago

feedback

Tartar

8 days ago

subdirectory_arrow_right

menaud

10 days ago

debian2007 debian2007 December 22, 2010 December 22, 2010 at 1:42:48 PM UTC link Permalink

Can I add parentheses in my sentences as a comment? Like:
Original: Én elmentem a vásárba fél pénzzel.
Comment with parentheses: (Én) (elmentem a vásárba) (fél pénzzel).

Little clarification: Something like (Subject) (verb of motion && destination) (instrumental). I think it can be valuable, but I do not want to waste your precious SQL database with my sentence analysis. So I would like to post comments to my already written sentences. (I saw something like this in arihato's comments, and maybe I can do the same). Is it allowed or not? oO

{{vm.hiddenReplies[4533] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 22, 2010 December 22, 2010 at 1:52:11 PM UTC link Permalink

In the sentence commment yep, but not in the sentence text itself. Because as the data of tatoeba can be reuse for any purpose it is important to keep the sentence as pure as possible. But as I plan with one of my friends, to focus on tools for sentence analysis when we will have finished a first release of tatoeba, I think I will add a field somewhere to add such informations in a more specific place than the "all purpose" comments.

Swift Swift December 21, 2010 December 21, 2010 at 5:39:08 PM UTC link Permalink

** Tag cleanup III **

There are a couple of translations in this next batch so I'd like you to have a quick look at these. Again, all original titles are kept for later.

See:
http://martin.swift.is/tatoeba/...l_renames.html

Most of these are just for standardising the capitalisation of tags, but there are some renames and translations towards the end of the file. Some are simple "Ad" -> "advertisement" but another renames "family" as "relatives" as the topic of the sentences isn't family, but the relatives.

My search gave me the term "preterite" as the translation for the Italian "passato remoto" but please correct me if I'm wrong.

These and other rename proposals are marked with little arrows on
http://martin.swift.is/tatoeba/tags.html

{{vm.hiddenReplies[4517] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp December 21, 2010 December 21, 2010 at 7:51:25 PM UTC link Permalink

http://en.wiktionary.org/wiki/passato_remoto

Oh eventually I can quote Wiktionary. I feel free.

{{vm.hiddenReplies[4518] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift December 21, 2010 December 21, 2010 at 8:09:14 PM UTC link Permalink

That page defines “passato remoto” as “past historic tense”. The latter term's page[1] links to the Wikipedia page for the term which redirects to the article on “preterite”[2], the discussion page of which claims that the terms are identical in at least one language.[3]

Just to be sure, is it better to use “past historic tense” than “preterite”?

[1] http://en.wiktionary.org/wiki/past_historic_tense
[2] http://en.wikipedia.org/wiki/Past_historical
[3] http://en.wikipedia.org/wiki/Ta...erger_proposal

{{vm.hiddenReplies[4519] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp December 21, 2010 December 21, 2010 at 9:24:34 PM UTC link Permalink

Well, I didn't quoted it to contradict you ^^

Anyway, I don't really agree with this translation. I would never call a "passé simple" (French) with the term "passato remoto", even if they are perfectly identical (for my Northern usage).
The difference between English/Italian increases a lot in usage, therefore an English preterite could be translated in two different ways in Italian, and always one of them isn't a preterite. The same thing happens for Spanish/Italian pairs. I don't really know how to resolve this :/

{{vm.hiddenReplies[4520] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift December 21, 2010 December 21, 2010 at 11:51:21 PM UTC link Permalink

Oh, by the way, does anyone have any input on what the “passé simple” should be called? “Passé simple”, “simple past”, “preterite”, “passé défini”, definite past or something else entirely?

{{vm.hiddenReplies[4522] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift December 22, 2010 December 22, 2010 at 4:28:56 AM UTC link Permalink

Just to avoid a misunderstanding (not the least on my side), this isn't a question of finding corresponding sentences, but the term which would be used for the original sentence.

So, if there was a sentence in French that was in what the French would call the passé simple, the tag shouldn't say which tense the English sentence would be in, but rather which tense English linguists would use to describe the French sentence. While some translations would all use the same tag (e.g. topical tags), grammatical tags would differ with translations of each other.

Once we get the translation feature, that sentence would then say “passé simple” in the French interface. The question is what the English interface should say.

{{vm.hiddenReplies[4530] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp December 22, 2010 December 22, 2010 at 11:42:43 PM UTC link Permalink

I will ask my English (well she's Italian) teacher about it tomorrow. I perfectly see your point, but I imagine we need a linguist or something more expert. Ohoh, maybe my teacher is. (>Pharamp doubts it<)

{{vm.hiddenReplies[4542] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic December 24, 2010 December 24, 2010 at 10:43:38 AM UTC link Permalink

Reading foreigners talking about the classification of french tenses with english tags could be hilarious. It is just sad, alas...
This is the very reason I stopped contributing to Tatoeba: Anglophone kids deciding, in English, what French is. Already with English, they have enough problems...
Have fun!

brauliobezerra brauliobezerra December 22, 2010 December 22, 2010 at 10:35:08 AM UTC link Permalink

I guess the English interface should say 'French "passé simple"'. If it is impossible or misleading to translate, don't translate.

{{vm.hiddenReplies[4532] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir December 22, 2010 December 22, 2010 at 10:33:59 PM UTC link Permalink

I agree ^^

Nero Nero December 22, 2010 December 22, 2010 at 12:09:21 AM UTC link Permalink

Imperfect tense?

{{vm.hiddenReplies[4523] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir December 22, 2010 December 22, 2010 at 3:02:13 AM UTC link Permalink

No, in French there are the "passé composé" (il a marché), the imperfect (l'imparfait, il marchait) and the "passé simple" (il marcha). And as fas as I know, preterite means that something happened in the past, so it doesn't work either because it can refer to any of them... In this case I'd choose to keep the French term, "passé simple" to avoid misunderstandings.

{{vm.hiddenReplies[4528] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nero Nero December 22, 2010 December 22, 2010 at 3:58:15 AM UTC link Permalink

Preterite is a verb form found in some languages. The "passé simple" translates to "simple past" so that's what I would use.

{{vm.hiddenReplies[4529] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir December 22, 2010 December 22, 2010 at 10:25:58 AM UTC link Permalink

But I think that would lead to misunderstandings because the people learning French maybe would think that this is the equivalent to the English "simple past", and this is not true.

{{vm.hiddenReplies[4531] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nero Nero December 23, 2010 December 23, 2010 at 3:49:34 AM UTC link Permalink

Technically there is no "simple past" in English. Beginners to learning a foreign language or people who don't know much about English might have trouble with it, but simple past =/= preterite.

{{vm.hiddenReplies[4549] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir December 23, 2010 December 23, 2010 at 3:34:53 PM UTC link Permalink

Wow... I had always been taught that the English past tense (e.g. he broke, she looked...) was the "simple past" [1]...
Then what's the name of the English past tense? preterite?

[1] http://www2.gsu.edu/~wwwesl/egw/verbs.htm
http://www.usingenglish.com/ref...regular-verbs/

{{vm.hiddenReplies[4550] ? 'expand_more' : 'expand_less'}} hide replies show replies
Zifre Zifre December 23, 2010 December 23, 2010 at 3:45:11 PM UTC link Permalink

Yeah, I've always heard it called "simple past" or just "past". (Of course, this is high school English in America. They don't wan't to scare people with big scary linguistic terms like "preterite". :P)

I've never heard anyone use the word "preterite" not in reference to tenses in foreign languages.

Here is a list of what I would call all the English tenses:

He will walk -> future
He will be walking -> future progressive
He will have walked -> future perfect
He walks -> present
He is walking -> present progressive
He has walked -> present perfect
He walked -> (simple) past
He was walking -> past progressive
He had walked -> past perfect (pluperfect)

{{vm.hiddenReplies[4551] ? 'expand_more' : 'expand_less'}} hide replies show replies
arcticmonkey arcticmonkey December 23, 2010 December 23, 2010 at 4:02:51 PM UTC link Permalink

Technically, there are only two tenses in English: present and past

simple, progressive, and perfect are aspects

Nero Nero December 24, 2010 December 24, 2010 at 2:07:56 AM UTC link Permalink

I was talking about the French simple past doesn't equal what in English is known as the "simple past". They were afraid of people confusing the French simple past and the English "simple past" when they're not the same thing. The actual term for it in English is the preterite.

Zifre Zifre December 22, 2010 December 22, 2010 at 2:19:25 AM UTC link Permalink

No, I believe that the imperfect tense is something else entirely.

Zifre Zifre December 22, 2010 December 22, 2010 at 2:15:25 AM UTC link Permalink

I would say preterite. At least that's what the corresponding Spanish tense (el pretérito) is usually called in English.

The corresponding English tense (e.g. "he walked") is usually just called "past", since there are no imperfect or other past tenses.

{{vm.hiddenReplies[4524] ? 'expand_more' : 'expand_less'}} hide replies show replies
Nero Nero December 22, 2010 December 22, 2010 at 2:31:24 AM UTC link Permalink

Well "he walked" is the preterite in English technically. German has the imperfect which equates to the English preterite. But I just found out there's a difference between imperfect and the simple past in French.

I would go with simple past or keep the French phrase, because it's kind of specific to French unless it applies to another language.

{{vm.hiddenReplies[4527] ? 'expand_more' : 'expand_less'}} hide replies show replies
Zifre Zifre December 22, 2010 December 22, 2010 at 7:53:02 PM UTC link Permalink

I believe the French "passé simple" exists in many Romance languages (and probably other Indo-European languages as well). I know it exists in Spanish.

The only difference is that it is rarely used in French, while it is very common in Spanish.

I'm pretty sure that the English past tense (e.g. "he walked") is from the same origin as the Spanish preterite and French passé simple. English has no real imperfect tense. Obviously, the perfect tenses in all three languages are related. (e.g. "He has walked" vs. "Él ha caminado")

{{vm.hiddenReplies[4537] ? 'expand_more' : 'expand_less'}} hide replies show replies
Shishir Shishir December 22, 2010 December 22, 2010 at 10:33:11 PM UTC link Permalink

Two comments: first, in Spanish we don't have any tense called "past simple", we have "indefinite praeteritum" & "imperfect praeteritum", and they do NOT match with the use of the French passé simple. In Italian I guess the pasato remoto is used more or less in the same situations as the passé simple, but that's it.
And taking your own example, if you say in French "il a marché", you can translate it to English as "he has walked" and "he walked"; and to Spanish as "él ha caminado" and "él caminó". There's no true equivalent in Spanish to the passé simple. French people use the passé simple (as far as I know) only in formal contexts, and mainly in literature and newspapers, that's why I'd rather have it clearly tagged so that learners (like me) can actually see that this is not the usual tense and not simply think that this is the equivalent of Spanish "pretérito indefinido" or English "simple past". Maybe they all came from the same thing, but they are no longer used in the same contexts or situations.

{{vm.hiddenReplies[4538] ? 'expand_more' : 'expand_less'}} hide replies show replies
stefz stefz January 22, 2011 January 22, 2011 at 1:30:34 PM UTC link Permalink

Yes, the simple past forms (passé simple in French, pretérito indefinido in Spanish, pretérito perfeito in Portuguese) have the same origin, i.e., the Latin perfect form. As you might know, the original Latin did not have composed forms, only simple forms. Later, composed forms (j'ai fait, yo he hecho, eu tem feito) became common, but it took some time for them to "travel" from Rome, where new fashions where created, to the periphery. Therefore, in French, the passé simple is not use any more in spoken language. In Spanish, the simple and composed forms are used in parallel, but for expressing different situation (although in South American Spanish, the simple form is also used for cases where in Spain the composed form is used). In Portuguese (I do only know well the Portuguese of Brasil) the simple form is the most common, whereas you do not really have to use the composed form (interesting also that Portuguese uses "ter" = Latin "tenere" instead of "haver" = Latin "habere" to form the composed forms, as the other Romanic languages do: French "avoir", Spanish "haber"). Also, Portuguese is the only Romanic language that has preserved a simple form for the past perfect ("fizera", also "tinha feito"). This has been replaced by the composed forms in the other Romanic languages ("avais fait", "había hecho"). I don't know enough Italian to say, but following the theory of fashions originating from Rome, I suppose that the simple form might have even less importance than in French or might have disappeared at all.

The expressions "perfect" and "imperfect" refer, as far as I know, to the usage of the corresponding past tenses (praeteritum = passed by (= German vorübergegangen)) in Latin: "perfect" means "terminated", i.e., the praeteritum perfectum was used for actions that started and ended in the past, whereas the praeteritum imperfectum ("not terminated") for actions that started in the past and persisted till the present.

That means, there exist different denominations for the tenses, some originating from its grammatical structure (passé simple), some from its usage ("perfect", "imperfect"). As the usage of the tenses differes considerably between the languages, it is very difficult to find good translations, or even impossible. I would stick to the original expressions of the individual language.
Due to the different usage of the tenses in the different languages, it is very difficult to translate the tenses in sentences correctly, as a single sentence rarely deliveres the context...

{{vm.hiddenReplies[4812] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 22, 2011 January 22, 2011 at 3:34:39 PM UTC link Permalink

>in French, the passé simple is not use any more in spoken language.

I beg to differ. The french passé simple is a narrative tense, along with the passé antérieur and they are used to tell stories, irrelevant of whether they're written OR SPOKEN.
As such, passé simple has ALWAYS been used to tell stories, no less now than before.
It is true that passé composé tends to replace passé simple and plus-que-parfait the passé antérieur, in the spoken narrative of uneducated people, but not everybody is uneducated or speaks a broken French. I use passé simple to tell stories, and I love it because it is so much more beautiful.
Actually, passé simple is familiar to people whose parents read or told them stories as they were children. Hence the charm that is ascribed to it, which is also the reason why it sounds unfamiliar to uneducated or poorly educated people, or people who never read stories.
As people who don't read - or are not being read to - greatly outnumber those who do, especially in the younger generation, the perception that passé simple "is disappearing" or "is outdated" is dominant, although it is plain wrong, as any educated story-teller will prove.
"Il se marièrent et eurent beaucoup d'enfants...analphabètes." Voilà !

Zifre Zifre December 22, 2010 December 22, 2010 at 10:58:02 PM UTC link Permalink

Sorry, I should have been more clear. I was referring to the linguistic origins of the tenses, not their usage (which is obviously very different in all three languages).

Also, in English, I've always heard the Spanish tenses referred to like this:

caminó -> preterite
caminaba -> imperfect
ha caminado -> (present) perfect

Maybe it's just some oddity of my school's Spanish curriculum, but I've never heard "preterite" refer to any of the tenses except for the first one above.

I agree that it's probably best to use a special tag for the French tense since it's so rare and doesn't correspond well in usage to any tense in any other language.

{{vm.hiddenReplies[4540] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp December 22, 2010 December 22, 2010 at 11:39:40 PM UTC link Permalink

Just a little annotation (not directly for you, Zifre ;) )

Standard Italian usage is exactly the same as French.
In (Deep) Southern Italy you can easily listen to old people using the "passato remoto" exactly as the Spanish "preterite"/caminó, but it's not correct or standardised at all for what I know.

Moreover, sentences here in Tatoeba are mainly written by me and Guybrush88: as we are both from the same region in the Centre-North, we can be quite sure that we still haven't any Italian sentence using the "passato remoto" like in Spanish.

All this because! I would like a special tag for Italian too :P

Swift Swift December 21, 2010 December 21, 2010 at 11:46:12 PM UTC link Permalink

As I said; just to be sure. I wasn't sure since there were conflicting information and you didn't appear to have written the Wiktionary page. I'll change the term to “past historic tense”. Thanks for clearing this up.

brauliobezerra brauliobezerra December 21, 2010 December 21, 2010 at 1:34:24 PM UTC link Permalink

Olá, eu comecei a traduzir o guia do colaborador para português. Porém, estou sem tempo e estou traduzindo bem aos poucos. Quem quiser ajudar, é só me avisar que eu dou permissão de edição. Segue o link:

https://docs.google.com/documen...icR9267fVh-YOA

sysko sysko December 15, 2010 December 15, 2010 at 8:29:54 PM UTC link Permalink

Now the pinyin, script detection / script conversion for sentences is made by a homebrew software, it should fix all the problems of strange conversion (trash characters etc.).
As said, as it's "homebrew", if you find a non-accurate transcription/segmentation or a bug, please report it here :)

{{vm.hiddenReplies[4451] ? 'expand_more' : 'expand_less'}} hide replies show replies
minshirui minshirui December 20, 2010 December 20, 2010 at 11:17:45 PM UTC link Permalink

Where should one report bugs in the Cantonese romanization? For example, in this sentence (http://tatoeba.org/eng/sentences/show/676270), the characters are:
我次次聽呢首歌都會聽到喊。

Currently, this is generated:
ngo⁵ ci³ ci³ ting³ ne¹ sau² go¹ dou¹ wui⁶ ting³ dou³ ham⁶ .

However, some of the words are incorrectly romanized. It should be:
ngo⁵ ci³ ci³ ting¹ ni¹ sau² go¹ dou¹ wui⁶ ting¹ dou³ haam³.

{{vm.hiddenReplies[4513] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 21, 2010 December 21, 2010 at 2:36:08 PM UTC link Permalink

You can give the sentence an “incorrect transcription” tag, or maybe leave a comment to point out the mistakes, too.

Actually, I’m currently in the process of proofreading all the Cantonese sentences and making a list of pronunciations of my own. After I’m finished with the list and the list is imported into the database, most of the wrong romanizations should be fixed. :-)

(By the way, the “wui⁶” in the romanization of the sentence is wrong too. It should be “wui⁵.” Also, 聽 is only pronounced as “ting¹” when it’s combined with other characters like 日, 朝 and 晚 to mean “tomorrow”. When meaning “to listen”, it’s usually pronounced “teng¹.” Therefore, the correct romanization of the sentence “我次次聽呢首歌都會聽到喊” would be: “ngo⁵ ci³ ci³ teng¹ ni¹ sau² go¹ dou¹ wui⁵ teng¹ dou³ haam³.”)

sysko sysko December 15, 2010 December 15, 2010 at 8:30:43 PM UTC link Permalink

for those interested I will try to clean up the code, add some docs, and release it as a free software.

{{vm.hiddenReplies[4452] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 18, 2010 December 18, 2010 at 8:48:00 AM UTC link Permalink

I'm not sure if this is the right place to ask, but as for the Cantonese romanization, are there anything I can do to help? Like compiling a list of words with the corresponding pronunciations?

{{vm.hiddenReplies[4493] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 18, 2010 December 18, 2010 at 9:20:07 AM UTC link Permalink

what a coincidence, I was speaking with Demetrius last time and he showed me a list of cantonese words with romanization, so I'm adapting "sinoparser" (let's call this software this way, except if someone has an idea for a better name) to support Cantonese with jyutping :)

{{vm.hiddenReplies[4494] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 18, 2010 December 18, 2010 at 9:38:35 AM UTC link Permalink

It's great to hear that the Cantonese sentences are going to have romanizations soon ;-) ! But since a lot of the Chinese characters have multiple pronunciations in Cantonese, I think we also need to make the romanizations editable.

For example, 咪,
when meaning "microphone," is pronounced "mai1"
when meaning "don't," is pronounced "mai5"
when meaning "as a result", is pronounced "mai6"

It would be difficult for computers to decide which pronunciation to use...

{{vm.hiddenReplies[4495] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 18, 2010 December 18, 2010 at 3:24:01 PM UTC link Permalink

ok I've finished it, now I just need to integrate it. It will produce maybe less accurate result, and will maybe often cut it character by character as the data file I use for the cantonese has "only" 45000 entries (which is low if you count all the entry for every single sinogram, there is no so much "words") , and the one for mandarin, more than 200 000. But I will try to find other "open source" dictionnaries to have a better word segmentation with a pronounciation generated from the current data file, and after with the help of tatoeba we will able to raffine this step by step

@nickyeow for the problem you're talking about is it like in Mandarin for 的 which can be "de" or "di" but that we can guess 95% of the time automatically because alone it's "de" and by having the entry "的确" => "dique" we can handle this '(i.e if you are able to segment correctly the sentence then you can guess 99% of the time which romanization it is)

or is it much more like Japanese, where even if you're able to segment the sentence into "words" the pronounciation differ depending on the meaning of the words (i.e you need to understand the sentence to guess which one to choose)
?

{{vm.hiddenReplies[4496] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 18, 2010 December 18, 2010 at 3:28:53 PM UTC link Permalink

if it's the first case, then it can be solve by adding more data (which mean I can continue to use the same algorithm for generating the romanization of Cantonese)
otherwise it will need a new layer on my software, add a grammar analyser to the current lexical parser.

{{vm.hiddenReplies[4497] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 18, 2010 December 18, 2010 at 5:05:09 PM UTC link Permalink

I think Cantonese is more similar to Mandarin in this regard, and a large part (90%, perhaps?) of the problem could be solved simply by adding more data.

For the more complicated cases, grammar analyzers might have to be used. For example, since the 咪 as I mentioned above is almost always used with 囉 when meaning "as a result," we can create a simple script that can recognize the sentence pattern "咪…囉" and make the pronunciation of 咪 "mai6" whenever such a sentence pattern is detected.

The final particles can be a bit tricky too — some final particles have different pronunciations to indicate different moods, and the wrong pronunciation of a final particle can make a sentence sound completely ridiculous. Unfortunately, I can't think of ways to deal with them other than manually proofreading all the sentences (but of course, I'll be glad to help with that :-).

{{vm.hiddenReplies[4498] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 18, 2010 December 18, 2010 at 11:35:44 PM UTC link Permalink

ok I've added (finally found that crappy bug, a stupid mistake from mine in the code) romanization for Cantonese, but now I see that in fact there's seems to not have so many "words" outside single character transcription.
Anyway this afternoon I've checked the web and it seems that the world is missing a "open source" list of cantonese words (there's one free, but as free beer not free speech, which is cantodict, but as the leader of this project doesn't plan to release the data ...)
so maybe we can start making such a list :)

{{vm.hiddenReplies[4500] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 19, 2010 December 19, 2010 at 3:27:46 AM UTC link Permalink

Thanks a LOT, sysko! It must have been such a frustrating experience to find out that bug... :-/

Anyway, I've started to proofread the sentences, and while a lot of them are romanized perfectly, there is still a number of them that contain mistakes. Fortunately, most of these mistakes could be fixed by adding more data, so I'm trying to jot down all the errors I come across and make a list of pronunciations of my own.

And I have a question: how should the format of the list be like, so that it can be conveniently imported into the database?

{{vm.hiddenReplies[4502] ? 'expand_more' : 'expand_less'}} hide replies show replies
sysko sysko December 19, 2010 December 19, 2010 at 6:36:36 AM UTC link Permalink

thanks to you too, I think you've spent more time adding all the Cantonese sentences (in addition to all the Mandarin sentences) than me coding this software :)

for the list, a .txt file with the following format would be perfect

word[tab]jyutping
word2[tab]jyutping2

:)

{{vm.hiddenReplies[4503] ? 'expand_more' : 'expand_less'}} hide replies show replies
nickyeow nickyeow December 19, 2010 December 19, 2010 at 11:56:17 AM UTC link Permalink

Okay! I'll start working on the list now. Hopefully I'll finish proofreading all the Cantonese sentences within a month or two.

Demetrius Demetrius December 20, 2010 December 20, 2010 at 5:17:44 PM UTC link Permalink

I’m glad to see it’s done. :)

BTW can you display tones in superscript, like in CantoDict? E.g. not ngo5 but ngo⁵? IMHO it’s more readable.

sysko sysko December 18, 2010 December 18, 2010 at 5:28:10 PM UTC link Permalink

ok glad to hear that :)
btw I've finished to code it and also integrated it, but I'm facing with a bug (the software starts to take all the CPU ressource) which occurs only when I try to put in the real website Oo, weird. I will try to see why tomorow (in fact the problem does not come from the cantonese itself, because I've adapated the entire code of Sinoparser to be more flexible if we had support for other Chinese language (such as Shanghainese))

TRANG TRANG December 19, 2010 December 19, 2010 at 6:50:51 PM UTC link Permalink

Projects using Tatoeba

I'm gathering links here:
http://blog.tatoeba.org/2010/12...g-tatoeba.html

If you know any other, let me know :)

Swift Swift December 11, 2010 December 11, 2010 at 11:01:13 PM UTC link Permalink

** Bug? **

When I go to http://tatoeba.org/eng/favorites/of_user/1005 and click "show user" with any username, I get a mostly blank page with the title "Errors". Blank because only the first 51 lines are received.

{{vm.hiddenReplies[4431] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG December 19, 2010 December 19, 2010 at 6:47:51 PM UTC link Permalink

Fixed. The form wasn't supposed to be there anymore ^^

CK CK December 17, 2010, edited October 30, 2019 December 17, 2010 at 11:53:27 AM UTC, edited October 30, 2019 at 1:23:55 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[4479] ? 'expand_more' : 'expand_less'}} hide replies show replies
debian2007 debian2007 December 17, 2010 December 17, 2010 at 3:36:11 PM UTC link Permalink

It will be a long one:
Javascript template Italian (I removed some noise from text):
A http://similar.uw.hu/ita_1.htm
B http://similar.uw.hu/ita_2.htm
C http://similar.uw.hu/ita_3.htm
D http://similar.uw.hu/ita_4.htm
E http://similar.uw.hu/ita_5.htm
F http://similar.uw.hu/ita_6.htm
G http://similar.uw.hu/ita_7.htm
H http://similar.uw.hu/ita_8.htm
I http://similar.uw.hu/ita_9.htm
J http://similar.uw.hu/ita_10.htm
K http://similar.uw.hu/ita_11.htm
L http://similar.uw.hu/ita_12.htm
M http://similar.uw.hu/ita_13.htm
N http://similar.uw.hu/ita_14.htm
O http://similar.uw.hu/ita_15.htm
P http://similar.uw.hu/ita_16.htm
Q http://similar.uw.hu/ita_17.htm
R http://similar.uw.hu/ita_18.htm
S http://similar.uw.hu/ita_19.htm
T http://similar.uw.hu/ita_20.htm
U http://similar.uw.hu/ita_21.htm
V http://similar.uw.hu/ita_22.htm
W http://similar.uw.hu/ita_23.htm
X http://similar.uw.hu/ita_24.htm
Y http://similar.uw.hu/ita_25.htm
Z http://similar.uw.hu/ita_26.htm
Javascript template Russian (Yandex search engine links, the last few characters of alphabet was missing from my source wordlist file):
a http://similar.uw.hu/rus_1.htm
б http://similar.uw.hu/rus_2.htm
в http://similar.uw.hu/rus_3.htm
г http://similar.uw.hu/rus_4.htm
д http://similar.uw.hu/rus_5.htm
е http://similar.uw.hu/rus_6.htm
ё http://similar.uw.hu/rus_7.htm
ж http://similar.uw.hu/rus_8.htm
з http://similar.uw.hu/rus_9.htm
и http://similar.uw.hu/rus_10.htm
й http://similar.uw.hu/rus_11.htm
к http://similar.uw.hu/rus_12.htm
л http://similar.uw.hu/rus_13.htm
м http://similar.uw.hu/rus_14.htm
н http://similar.uw.hu/rus_15.htm
о http://similar.uw.hu/rus_16.htm
п http://similar.uw.hu/rus_17.htm
р http://similar.uw.hu/rus_18.htm
с http://similar.uw.hu/rus_19.htm
т http://similar.uw.hu/rus_20.htm
у http://similar.uw.hu/rus_21.htm
ф http://similar.uw.hu/rus_22.htm
х http://similar.uw.hu/rus_23.htm

If a made mistakes, I will delete this wall post, and will provide new links (or I overwrite current ones).

{{vm.hiddenReplies[4485] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius December 20, 2010 December 20, 2010 at 5:54:55 PM UTC link Permalink

Thank you, I believe it may be useful. :)

The «ъ», «ы» and «ь» missing is perfectly OK: they aren’t allowed in the beginning of the word by Russian phonology.[1]
The «э» missing isn't a big deal since it's a rather rare letter.


But I’d like to warn against using the Russian wordlist. It is simply too big!

It contains, in particular:
· old-fashioned spellings (азбест/azbest instead of асбест/asbest ‘asbestos’)
· old-fashioned words (алкалический/alkalicheskij ‘alkaline (adj.)’ instead of щелочной/shhelochnoj)
· some names that have been obsolete for centuries like Агриппина/Agrippina or Андроник/Andronik
· two forms of patronymics for each male name: Андроникович/Andronikovich ‘son of Andronik’, Андрониковна/Andronikovna ‘daughter of Andronik’
· some words that can be used only in poetic speech, and sound very strange elsewhere (e.g. аметистово/ametistovo ‘amethystively, in an amethyst way’[2])

I doubt all Russian contributors will be able to cover this vocabulary in any reasonable period of time. I think we need something more concise.


[1] Ы/y` might occur at the beginning of some Turkish, Eskimo etc. placenames, but even there it is usually supstituted by И.
[2] Yes, it is very weird. Although it can be used as a part of compound words, e.g. аметистово-фиолетовый/ametistovo-fioletovy`j ‘purple like an amethyst, amethyst-purple’, but in such a case it’s not a word on its own.

{{vm.hiddenReplies[4511] ? 'expand_more' : 'expand_less'}} hide replies show replies
Demetrius Demetrius December 20, 2010 December 20, 2010 at 7:58:05 PM UTC link Permalink

By the way, ритуално is a mere mistake in the wordlist.

debian2007 debian2007 December 21, 2010 December 21, 2010 at 12:48:33 PM UTC link Permalink

Demetrius, I can improve this list, but:
1) I used a big wordlist, so native speakers can decide which words are archaic, or not used at all.
1a) If you or anyone have suggestions, what wordlist/dictionary should I use, please tell me. The most recent monolingual dictionary (Толковый словарь) what I found is from 1950'. All those dictionaries available from slovopedia. (Modern ones contains many words from "old" dictionaries)
2) I do not like use any frequency lists, because they contains many words not used in colloquial speech. They even contain slangs.
3) What kind of list do you want? Even ruscorpora does not contains all necessary words. I can compile a wordlist from any documents. If I would use movie subtitles, it may contains too much slangs. If I would use contemporary literature, it may contains too much words, used only by writers.
4) There is no russian wordnet available yet (or not as big as it should be).
5) I could use a wordlist from open source spell checkers. It seems a good idea. But that list is huge too.
-) So I can not provide a concise and useful wordlist from materials, what I know about. Send me PM and links if you have some useful bookmarks. Or if it is not available in electronically form, I would still interested in it.

kebukebu kebukebu December 18, 2010 December 18, 2010 at 3:47:52 AM UTC link Permalink

Very nice idea :) Once these lists start to shrink, it might be useful to do the same thing for words that only show up below a certain threshold -- for example, in fewer than ten sentences, or something like that.

Also -- translators, keep in mind that if you know the translation of a word in another language, you can search for existing sentences which you can translate in order to fill the gap. For example, if you want to add a sentence for the Russian word "зять" and you know that in English it can mean "son in law", then you can search for existing English sentences to translate:

http://tatoeba.org/eng/sentence...rom=und&to=und

At the same time, you can certainly always add new sentences :)

(By the way, CK, right now, the links for eng-2 and eng-awl still reference eng-1.html)

Swift Swift December 17, 2010 December 17, 2010 at 9:44:00 PM UTC link Permalink

** Tag cleanup II **

We now continue with the clean up of the tags quotes. There were multiple tags that referred to the same person. I've identified these, transliterated others. There are a few still to do, but once these are done, almost a fifth of the tags will have been sorted and taken care of.

Please refer to:
http://martin.swift.is/tatoeba/tags-cleanup.html
for a list of tags that are being merged (there are also a couple of renames that don't seem to have gone through last time).

After that I'm hoping to finish sorting the tags before merging and renaming those. With that out of the way, we can start looking at categorisation schemes for the tags, to make them easier to find and use. I've been chatting with CK about that, but we'll try to have a little brain storm session at some point in the beginning of next year.

I just want to bring that to people's attention as it may help to start thinking about this. Send your ideas (even the half-baked ones) my way if you like.

DancingHorses DancingHorses December 17, 2010 December 17, 2010 at 6:42:43 AM UTC link Permalink

So far, for me, searching for any word in Japanese spelled in kana will return unrelated results. It appears that each kana character is being treated as an individual word.

{{vm.hiddenReplies[4473] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift December 17, 2010 December 17, 2010 at 1:37:28 PM UTC link Permalink

Yes, each character seems to be searched separately, but if you only get unrelated results, it's because there are no better matches.

Searching for "飛行機" will give you results for "飛行機" first, but sentences with "飛行" and "機" separately last.

Searching kana is a bit more difficult, because it will only turn up results where the sentence is written with the kana. Searching for "ひこうき" will, for example not turn up any meaningful results, but "そんなことないよ" will (though not that exact phrase, though).

飛行機: http://tatoeba.org/eng/sentence...rom=jpn&to=und
ひこうき: http://tatoeba.org/eng/sentence...rom=jpn&to=und
そんなことないよ: http://tatoeba.org/eng/sentence...rom=jpn&to=und

{{vm.hiddenReplies[4483] ? 'expand_more' : 'expand_less'}} hide replies show replies
Swift Swift December 17, 2010 December 17, 2010 at 3:55:22 PM UTC link Permalink

Though you can, of course, search for exact phrases with quotes (e.g. "そんなことないよ").

Gyuri Gyuri December 16, 2010 December 16, 2010 at 2:21:22 PM UTC link Permalink

Tajperareto:

http://tatoeba.org/epo/sentence...ndom_sentences

Ĉu traduki frrazojn... > frazojn

{{vm.hiddenReplies[4465] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aleksej Aleksej December 17, 2010 December 17, 2010 at 1:29:23 PM UTC link Permalink

Mi korektis tion ĉe https://translations.launchpad.net/tatoeba