menu
Tatoeba
language
Zaregistrovat se Přihlásit se
language Čeština
menu
Tatoeba

chevron_right Zaregistrovat se

chevron_right Přihlásit se

Prohlížet věty

chevron_right Náhodná věta

chevron_right Podle jazyka

chevron_right Podle seznamů

chevron_right Podle štítků

chevron_right Podle nahrávek

Komunita

chevron_right Zeď

chevron_right Seznam všech členů

chevron_right Jazyky členů

chevron_right Rodilí mluvčí

search
clear
swap_horiz
search

Zeď (7 123 témat)

Tipy

Před položením dotazu se podívejte na často kladené otázky – FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Poslední zprávy subdirectory_arrow_right

sharptoothed

Před 6 dny

subdirectory_arrow_right

sharptoothed

Před 6 dny

subdirectory_arrow_right

TATAR1

Před 6 dny

subdirectory_arrow_right

AlanF_US

Před 7 dny

feedback

sharptoothed

Před 8 dny

subdirectory_arrow_right

Shanaz

Před 11 dny

subdirectory_arrow_right

Qaztat

Před 11 dny

subdirectory_arrow_right

TATAR1

Před 11 dny

feedback

Tartar

Před 11 dny

subdirectory_arrow_right

menaud

Před 14 dny

debian2007 debian2007 22. prosince 2010 22. prosince 2010 13:42:48 UTC link Trvalý odkaz (permalink)

Can I add parentheses in my sentences as a comment? Like:
Original: Én elmentem a vásárba fél pénzzel.
Comment with parentheses: (Én) (elmentem a vásárba) (fél pénzzel).

Little clarification: Something like (Subject) (verb of motion && destination) (instrumental). I think it can be valuable, but I do not want to waste your precious SQL database with my sentence analysis. So I would like to post comments to my already written sentences. (I saw something like this in arihato's comments, and maybe I can do the same). Is it allowed or not? oO

{{vm.hiddenReplies[4533] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 22. prosince 2010 22. prosince 2010 13:52:11 UTC link Trvalý odkaz (permalink)

In the sentence commment yep, but not in the sentence text itself. Because as the data of tatoeba can be reuse for any purpose it is important to keep the sentence as pure as possible. But as I plan with one of my friends, to focus on tools for sentence analysis when we will have finished a first release of tatoeba, I think I will add a field somewhere to add such informations in a more specific place than the "all purpose" comments.

Swift Swift 21. prosince 2010 21. prosince 2010 17:39:08 UTC link Trvalý odkaz (permalink)

** Tag cleanup III **

There are a couple of translations in this next batch so I'd like you to have a quick look at these. Again, all original titles are kept for later.

See:
http://martin.swift.is/tatoeba/...l_renames.html

Most of these are just for standardising the capitalisation of tags, but there are some renames and translations towards the end of the file. Some are simple "Ad" -> "advertisement" but another renames "family" as "relatives" as the topic of the sentences isn't family, but the relatives.

My search gave me the term "preterite" as the translation for the Italian "passato remoto" but please correct me if I'm wrong.

These and other rename proposals are marked with little arrows on
http://martin.swift.is/tatoeba/tags.html

{{vm.hiddenReplies[4517] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Pharamp Pharamp 21. prosince 2010 21. prosince 2010 19:51:25 UTC link Trvalý odkaz (permalink)

http://en.wiktionary.org/wiki/passato_remoto

Oh eventually I can quote Wiktionary. I feel free.

{{vm.hiddenReplies[4518] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Swift Swift 21. prosince 2010 21. prosince 2010 20:09:14 UTC link Trvalý odkaz (permalink)

That page defines “passato remoto” as “past historic tense”. The latter term's page[1] links to the Wikipedia page for the term which redirects to the article on “preterite”[2], the discussion page of which claims that the terms are identical in at least one language.[3]

Just to be sure, is it better to use “past historic tense” than “preterite”?

[1] http://en.wiktionary.org/wiki/past_historic_tense
[2] http://en.wikipedia.org/wiki/Past_historical
[3] http://en.wikipedia.org/wiki/Ta...erger_proposal

{{vm.hiddenReplies[4519] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Pharamp Pharamp 21. prosince 2010 21. prosince 2010 21:24:34 UTC link Trvalý odkaz (permalink)

Well, I didn't quoted it to contradict you ^^

Anyway, I don't really agree with this translation. I would never call a "passé simple" (French) with the term "passato remoto", even if they are perfectly identical (for my Northern usage).
The difference between English/Italian increases a lot in usage, therefore an English preterite could be translated in two different ways in Italian, and always one of them isn't a preterite. The same thing happens for Spanish/Italian pairs. I don't really know how to resolve this :/

{{vm.hiddenReplies[4520] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Swift Swift 21. prosince 2010 21. prosince 2010 23:51:21 UTC link Trvalý odkaz (permalink)

Oh, by the way, does anyone have any input on what the “passé simple” should be called? “Passé simple”, “simple past”, “preterite”, “passé défini”, definite past or something else entirely?

{{vm.hiddenReplies[4522] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Swift Swift 22. prosince 2010 22. prosince 2010 4:28:56 UTC link Trvalý odkaz (permalink)

Just to avoid a misunderstanding (not the least on my side), this isn't a question of finding corresponding sentences, but the term which would be used for the original sentence.

So, if there was a sentence in French that was in what the French would call the passé simple, the tag shouldn't say which tense the English sentence would be in, but rather which tense English linguists would use to describe the French sentence. While some translations would all use the same tag (e.g. topical tags), grammatical tags would differ with translations of each other.

Once we get the translation feature, that sentence would then say “passé simple” in the French interface. The question is what the English interface should say.

{{vm.hiddenReplies[4530] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Pharamp Pharamp 22. prosince 2010 22. prosince 2010 23:42:43 UTC link Trvalý odkaz (permalink)

I will ask my English (well she's Italian) teacher about it tomorrow. I perfectly see your point, but I imagine we need a linguist or something more expert. Ohoh, maybe my teacher is. (>Pharamp doubts it<)

{{vm.hiddenReplies[4542] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sacredceltic sacredceltic 24. prosince 2010 24. prosince 2010 10:43:38 UTC link Trvalý odkaz (permalink)

Reading foreigners talking about the classification of french tenses with english tags could be hilarious. It is just sad, alas...
This is the very reason I stopped contributing to Tatoeba: Anglophone kids deciding, in English, what French is. Already with English, they have enough problems...
Have fun!

brauliobezerra brauliobezerra 22. prosince 2010 22. prosince 2010 10:35:08 UTC link Trvalý odkaz (permalink)

I guess the English interface should say 'French "passé simple"'. If it is impossible or misleading to translate, don't translate.

{{vm.hiddenReplies[4532] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Shishir Shishir 22. prosince 2010 22. prosince 2010 22:33:59 UTC link Trvalý odkaz (permalink)

I agree ^^

Nero Nero 22. prosince 2010 22. prosince 2010 0:09:21 UTC link Trvalý odkaz (permalink)

Imperfect tense?

{{vm.hiddenReplies[4523] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Shishir Shishir 22. prosince 2010 22. prosince 2010 3:02:13 UTC link Trvalý odkaz (permalink)

No, in French there are the "passé composé" (il a marché), the imperfect (l'imparfait, il marchait) and the "passé simple" (il marcha). And as fas as I know, preterite means that something happened in the past, so it doesn't work either because it can refer to any of them... In this case I'd choose to keep the French term, "passé simple" to avoid misunderstandings.

{{vm.hiddenReplies[4528] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Nero Nero 22. prosince 2010 22. prosince 2010 3:58:15 UTC link Trvalý odkaz (permalink)

Preterite is a verb form found in some languages. The "passé simple" translates to "simple past" so that's what I would use.

{{vm.hiddenReplies[4529] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Shishir Shishir 22. prosince 2010 22. prosince 2010 10:25:58 UTC link Trvalý odkaz (permalink)

But I think that would lead to misunderstandings because the people learning French maybe would think that this is the equivalent to the English "simple past", and this is not true.

{{vm.hiddenReplies[4531] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Nero Nero 23. prosince 2010 23. prosince 2010 3:49:34 UTC link Trvalý odkaz (permalink)

Technically there is no "simple past" in English. Beginners to learning a foreign language or people who don't know much about English might have trouble with it, but simple past =/= preterite.

{{vm.hiddenReplies[4549] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Shishir Shishir 23. prosince 2010 23. prosince 2010 15:34:53 UTC link Trvalý odkaz (permalink)

Wow... I had always been taught that the English past tense (e.g. he broke, she looked...) was the "simple past" [1]...
Then what's the name of the English past tense? preterite?

[1] http://www2.gsu.edu/~wwwesl/egw/verbs.htm
http://www.usingenglish.com/ref...regular-verbs/

{{vm.hiddenReplies[4550] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Zifre Zifre 23. prosince 2010 23. prosince 2010 15:45:11 UTC link Trvalý odkaz (permalink)

Yeah, I've always heard it called "simple past" or just "past". (Of course, this is high school English in America. They don't wan't to scare people with big scary linguistic terms like "preterite". :P)

I've never heard anyone use the word "preterite" not in reference to tenses in foreign languages.

Here is a list of what I would call all the English tenses:

He will walk -> future
He will be walking -> future progressive
He will have walked -> future perfect
He walks -> present
He is walking -> present progressive
He has walked -> present perfect
He walked -> (simple) past
He was walking -> past progressive
He had walked -> past perfect (pluperfect)

{{vm.hiddenReplies[4551] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
arcticmonkey arcticmonkey 23. prosince 2010 23. prosince 2010 16:02:51 UTC link Trvalý odkaz (permalink)

Technically, there are only two tenses in English: present and past

simple, progressive, and perfect are aspects

Nero Nero 24. prosince 2010 24. prosince 2010 2:07:56 UTC link Trvalý odkaz (permalink)

I was talking about the French simple past doesn't equal what in English is known as the "simple past". They were afraid of people confusing the French simple past and the English "simple past" when they're not the same thing. The actual term for it in English is the preterite.

Zifre Zifre 22. prosince 2010 22. prosince 2010 2:19:25 UTC link Trvalý odkaz (permalink)

No, I believe that the imperfect tense is something else entirely.

Zifre Zifre 22. prosince 2010 22. prosince 2010 2:15:25 UTC link Trvalý odkaz (permalink)

I would say preterite. At least that's what the corresponding Spanish tense (el pretérito) is usually called in English.

The corresponding English tense (e.g. "he walked") is usually just called "past", since there are no imperfect or other past tenses.

{{vm.hiddenReplies[4524] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Nero Nero 22. prosince 2010 22. prosince 2010 2:31:24 UTC link Trvalý odkaz (permalink)

Well "he walked" is the preterite in English technically. German has the imperfect which equates to the English preterite. But I just found out there's a difference between imperfect and the simple past in French.

I would go with simple past or keep the French phrase, because it's kind of specific to French unless it applies to another language.

{{vm.hiddenReplies[4527] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Zifre Zifre 22. prosince 2010 22. prosince 2010 19:53:02 UTC link Trvalý odkaz (permalink)

I believe the French "passé simple" exists in many Romance languages (and probably other Indo-European languages as well). I know it exists in Spanish.

The only difference is that it is rarely used in French, while it is very common in Spanish.

I'm pretty sure that the English past tense (e.g. "he walked") is from the same origin as the Spanish preterite and French passé simple. English has no real imperfect tense. Obviously, the perfect tenses in all three languages are related. (e.g. "He has walked" vs. "Él ha caminado")

{{vm.hiddenReplies[4537] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Shishir Shishir 22. prosince 2010 22. prosince 2010 22:33:11 UTC link Trvalý odkaz (permalink)

Two comments: first, in Spanish we don't have any tense called "past simple", we have "indefinite praeteritum" & "imperfect praeteritum", and they do NOT match with the use of the French passé simple. In Italian I guess the pasato remoto is used more or less in the same situations as the passé simple, but that's it.
And taking your own example, if you say in French "il a marché", you can translate it to English as "he has walked" and "he walked"; and to Spanish as "él ha caminado" and "él caminó". There's no true equivalent in Spanish to the passé simple. French people use the passé simple (as far as I know) only in formal contexts, and mainly in literature and newspapers, that's why I'd rather have it clearly tagged so that learners (like me) can actually see that this is not the usual tense and not simply think that this is the equivalent of Spanish "pretérito indefinido" or English "simple past". Maybe they all came from the same thing, but they are no longer used in the same contexts or situations.

{{vm.hiddenReplies[4538] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
stefz stefz 22. ledna 2011 22. ledna 2011 13:30:34 UTC link Trvalý odkaz (permalink)

Yes, the simple past forms (passé simple in French, pretérito indefinido in Spanish, pretérito perfeito in Portuguese) have the same origin, i.e., the Latin perfect form. As you might know, the original Latin did not have composed forms, only simple forms. Later, composed forms (j'ai fait, yo he hecho, eu tem feito) became common, but it took some time for them to "travel" from Rome, where new fashions where created, to the periphery. Therefore, in French, the passé simple is not use any more in spoken language. In Spanish, the simple and composed forms are used in parallel, but for expressing different situation (although in South American Spanish, the simple form is also used for cases where in Spain the composed form is used). In Portuguese (I do only know well the Portuguese of Brasil) the simple form is the most common, whereas you do not really have to use the composed form (interesting also that Portuguese uses "ter" = Latin "tenere" instead of "haver" = Latin "habere" to form the composed forms, as the other Romanic languages do: French "avoir", Spanish "haber"). Also, Portuguese is the only Romanic language that has preserved a simple form for the past perfect ("fizera", also "tinha feito"). This has been replaced by the composed forms in the other Romanic languages ("avais fait", "había hecho"). I don't know enough Italian to say, but following the theory of fashions originating from Rome, I suppose that the simple form might have even less importance than in French or might have disappeared at all.

The expressions "perfect" and "imperfect" refer, as far as I know, to the usage of the corresponding past tenses (praeteritum = passed by (= German vorübergegangen)) in Latin: "perfect" means "terminated", i.e., the praeteritum perfectum was used for actions that started and ended in the past, whereas the praeteritum imperfectum ("not terminated") for actions that started in the past and persisted till the present.

That means, there exist different denominations for the tenses, some originating from its grammatical structure (passé simple), some from its usage ("perfect", "imperfect"). As the usage of the tenses differes considerably between the languages, it is very difficult to find good translations, or even impossible. I would stick to the original expressions of the individual language.
Due to the different usage of the tenses in the different languages, it is very difficult to translate the tenses in sentences correctly, as a single sentence rarely deliveres the context...

{{vm.hiddenReplies[4812] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sacredceltic sacredceltic 22. ledna 2011 22. ledna 2011 15:34:39 UTC link Trvalý odkaz (permalink)

>in French, the passé simple is not use any more in spoken language.

I beg to differ. The french passé simple is a narrative tense, along with the passé antérieur and they are used to tell stories, irrelevant of whether they're written OR SPOKEN.
As such, passé simple has ALWAYS been used to tell stories, no less now than before.
It is true that passé composé tends to replace passé simple and plus-que-parfait the passé antérieur, in the spoken narrative of uneducated people, but not everybody is uneducated or speaks a broken French. I use passé simple to tell stories, and I love it because it is so much more beautiful.
Actually, passé simple is familiar to people whose parents read or told them stories as they were children. Hence the charm that is ascribed to it, which is also the reason why it sounds unfamiliar to uneducated or poorly educated people, or people who never read stories.
As people who don't read - or are not being read to - greatly outnumber those who do, especially in the younger generation, the perception that passé simple "is disappearing" or "is outdated" is dominant, although it is plain wrong, as any educated story-teller will prove.
"Il se marièrent et eurent beaucoup d'enfants...analphabètes." Voilà !

Zifre Zifre 22. prosince 2010 22. prosince 2010 22:58:02 UTC link Trvalý odkaz (permalink)

Sorry, I should have been more clear. I was referring to the linguistic origins of the tenses, not their usage (which is obviously very different in all three languages).

Also, in English, I've always heard the Spanish tenses referred to like this:

caminó -> preterite
caminaba -> imperfect
ha caminado -> (present) perfect

Maybe it's just some oddity of my school's Spanish curriculum, but I've never heard "preterite" refer to any of the tenses except for the first one above.

I agree that it's probably best to use a special tag for the French tense since it's so rare and doesn't correspond well in usage to any tense in any other language.

{{vm.hiddenReplies[4540] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Pharamp Pharamp 22. prosince 2010 22. prosince 2010 23:39:40 UTC link Trvalý odkaz (permalink)

Just a little annotation (not directly for you, Zifre ;) )

Standard Italian usage is exactly the same as French.
In (Deep) Southern Italy you can easily listen to old people using the "passato remoto" exactly as the Spanish "preterite"/caminó, but it's not correct or standardised at all for what I know.

Moreover, sentences here in Tatoeba are mainly written by me and Guybrush88: as we are both from the same region in the Centre-North, we can be quite sure that we still haven't any Italian sentence using the "passato remoto" like in Spanish.

All this because! I would like a special tag for Italian too :P

Swift Swift 21. prosince 2010 21. prosince 2010 23:46:12 UTC link Trvalý odkaz (permalink)

As I said; just to be sure. I wasn't sure since there were conflicting information and you didn't appear to have written the Wiktionary page. I'll change the term to “past historic tense”. Thanks for clearing this up.

brauliobezerra brauliobezerra 21. prosince 2010 21. prosince 2010 13:34:24 UTC link Trvalý odkaz (permalink)

Olá, eu comecei a traduzir o guia do colaborador para português. Porém, estou sem tempo e estou traduzindo bem aos poucos. Quem quiser ajudar, é só me avisar que eu dou permissão de edição. Segue o link:

https://docs.google.com/documen...icR9267fVh-YOA

sysko sysko 15. prosince 2010 15. prosince 2010 20:29:54 UTC link Trvalý odkaz (permalink)

Now the pinyin, script detection / script conversion for sentences is made by a homebrew software, it should fix all the problems of strange conversion (trash characters etc.).
As said, as it's "homebrew", if you find a non-accurate transcription/segmentation or a bug, please report it here :)

{{vm.hiddenReplies[4451] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
minshirui minshirui 20. prosince 2010 20. prosince 2010 23:17:45 UTC link Trvalý odkaz (permalink)

Where should one report bugs in the Cantonese romanization? For example, in this sentence (http://tatoeba.org/eng/sentences/show/676270), the characters are:
我次次聽呢首歌都會聽到喊。

Currently, this is generated:
ngo⁵ ci³ ci³ ting³ ne¹ sau² go¹ dou¹ wui⁶ ting³ dou³ ham⁶ .

However, some of the words are incorrectly romanized. It should be:
ngo⁵ ci³ ci³ ting¹ ni¹ sau² go¹ dou¹ wui⁶ ting¹ dou³ haam³.

{{vm.hiddenReplies[4513] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 21. prosince 2010 21. prosince 2010 14:36:08 UTC link Trvalý odkaz (permalink)

You can give the sentence an “incorrect transcription” tag, or maybe leave a comment to point out the mistakes, too.

Actually, I’m currently in the process of proofreading all the Cantonese sentences and making a list of pronunciations of my own. After I’m finished with the list and the list is imported into the database, most of the wrong romanizations should be fixed. :-)

(By the way, the “wui⁶” in the romanization of the sentence is wrong too. It should be “wui⁵.” Also, 聽 is only pronounced as “ting¹” when it’s combined with other characters like 日, 朝 and 晚 to mean “tomorrow”. When meaning “to listen”, it’s usually pronounced “teng¹.” Therefore, the correct romanization of the sentence “我次次聽呢首歌都會聽到喊” would be: “ngo⁵ ci³ ci³ teng¹ ni¹ sau² go¹ dou¹ wui⁵ teng¹ dou³ haam³.”)

sysko sysko 15. prosince 2010 15. prosince 2010 20:30:43 UTC link Trvalý odkaz (permalink)

for those interested I will try to clean up the code, add some docs, and release it as a free software.

{{vm.hiddenReplies[4452] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 18. prosince 2010 18. prosince 2010 8:48:00 UTC link Trvalý odkaz (permalink)

I'm not sure if this is the right place to ask, but as for the Cantonese romanization, are there anything I can do to help? Like compiling a list of words with the corresponding pronunciations?

{{vm.hiddenReplies[4493] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 18. prosince 2010 18. prosince 2010 9:20:07 UTC link Trvalý odkaz (permalink)

what a coincidence, I was speaking with Demetrius last time and he showed me a list of cantonese words with romanization, so I'm adapting "sinoparser" (let's call this software this way, except if someone has an idea for a better name) to support Cantonese with jyutping :)

{{vm.hiddenReplies[4494] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 18. prosince 2010 18. prosince 2010 9:38:35 UTC link Trvalý odkaz (permalink)

It's great to hear that the Cantonese sentences are going to have romanizations soon ;-) ! But since a lot of the Chinese characters have multiple pronunciations in Cantonese, I think we also need to make the romanizations editable.

For example, 咪,
when meaning "microphone," is pronounced "mai1"
when meaning "don't," is pronounced "mai5"
when meaning "as a result", is pronounced "mai6"

It would be difficult for computers to decide which pronunciation to use...

{{vm.hiddenReplies[4495] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 18. prosince 2010 18. prosince 2010 15:24:01 UTC link Trvalý odkaz (permalink)

ok I've finished it, now I just need to integrate it. It will produce maybe less accurate result, and will maybe often cut it character by character as the data file I use for the cantonese has "only" 45000 entries (which is low if you count all the entry for every single sinogram, there is no so much "words") , and the one for mandarin, more than 200 000. But I will try to find other "open source" dictionnaries to have a better word segmentation with a pronounciation generated from the current data file, and after with the help of tatoeba we will able to raffine this step by step

@nickyeow for the problem you're talking about is it like in Mandarin for 的 which can be "de" or "di" but that we can guess 95% of the time automatically because alone it's "de" and by having the entry "的确" => "dique" we can handle this '(i.e if you are able to segment correctly the sentence then you can guess 99% of the time which romanization it is)

or is it much more like Japanese, where even if you're able to segment the sentence into "words" the pronounciation differ depending on the meaning of the words (i.e you need to understand the sentence to guess which one to choose)
?

{{vm.hiddenReplies[4496] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 18. prosince 2010 18. prosince 2010 15:28:53 UTC link Trvalý odkaz (permalink)

if it's the first case, then it can be solve by adding more data (which mean I can continue to use the same algorithm for generating the romanization of Cantonese)
otherwise it will need a new layer on my software, add a grammar analyser to the current lexical parser.

{{vm.hiddenReplies[4497] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 18. prosince 2010 18. prosince 2010 17:05:09 UTC link Trvalý odkaz (permalink)

I think Cantonese is more similar to Mandarin in this regard, and a large part (90%, perhaps?) of the problem could be solved simply by adding more data.

For the more complicated cases, grammar analyzers might have to be used. For example, since the 咪 as I mentioned above is almost always used with 囉 when meaning "as a result," we can create a simple script that can recognize the sentence pattern "咪…囉" and make the pronunciation of 咪 "mai6" whenever such a sentence pattern is detected.

The final particles can be a bit tricky too — some final particles have different pronunciations to indicate different moods, and the wrong pronunciation of a final particle can make a sentence sound completely ridiculous. Unfortunately, I can't think of ways to deal with them other than manually proofreading all the sentences (but of course, I'll be glad to help with that :-).

{{vm.hiddenReplies[4498] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 18. prosince 2010 18. prosince 2010 23:35:44 UTC link Trvalý odkaz (permalink)

ok I've added (finally found that crappy bug, a stupid mistake from mine in the code) romanization for Cantonese, but now I see that in fact there's seems to not have so many "words" outside single character transcription.
Anyway this afternoon I've checked the web and it seems that the world is missing a "open source" list of cantonese words (there's one free, but as free beer not free speech, which is cantodict, but as the leader of this project doesn't plan to release the data ...)
so maybe we can start making such a list :)

{{vm.hiddenReplies[4500] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 19. prosince 2010 19. prosince 2010 3:27:46 UTC link Trvalý odkaz (permalink)

Thanks a LOT, sysko! It must have been such a frustrating experience to find out that bug... :-/

Anyway, I've started to proofread the sentences, and while a lot of them are romanized perfectly, there is still a number of them that contain mistakes. Fortunately, most of these mistakes could be fixed by adding more data, so I'm trying to jot down all the errors I come across and make a list of pronunciations of my own.

And I have a question: how should the format of the list be like, so that it can be conveniently imported into the database?

{{vm.hiddenReplies[4502] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
sysko sysko 19. prosince 2010 19. prosince 2010 6:36:36 UTC link Trvalý odkaz (permalink)

thanks to you too, I think you've spent more time adding all the Cantonese sentences (in addition to all the Mandarin sentences) than me coding this software :)

for the list, a .txt file with the following format would be perfect

word[tab]jyutping
word2[tab]jyutping2

:)

{{vm.hiddenReplies[4503] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
nickyeow nickyeow 19. prosince 2010 19. prosince 2010 11:56:17 UTC link Trvalý odkaz (permalink)

Okay! I'll start working on the list now. Hopefully I'll finish proofreading all the Cantonese sentences within a month or two.

Demetrius Demetrius 20. prosince 2010 20. prosince 2010 17:17:44 UTC link Trvalý odkaz (permalink)

I’m glad to see it’s done. :)

BTW can you display tones in superscript, like in CantoDict? E.g. not ngo5 but ngo⁵? IMHO it’s more readable.

sysko sysko 18. prosince 2010 18. prosince 2010 17:28:10 UTC link Trvalý odkaz (permalink)

ok glad to hear that :)
btw I've finished to code it and also integrated it, but I'm facing with a bug (the software starts to take all the CPU ressource) which occurs only when I try to put in the real website Oo, weird. I will try to see why tomorow (in fact the problem does not come from the cantonese itself, because I've adapated the entire code of Sinoparser to be more flexible if we had support for other Chinese language (such as Shanghainese))

TRANG TRANG 19. prosince 2010 19. prosince 2010 18:50:51 UTC link Trvalý odkaz (permalink)

Projects using Tatoeba

I'm gathering links here:
http://blog.tatoeba.org/2010/12...g-tatoeba.html

If you know any other, let me know :)

Swift Swift 11. prosince 2010 11. prosince 2010 23:01:13 UTC link Trvalý odkaz (permalink)

** Bug? **

When I go to http://tatoeba.org/eng/favorites/of_user/1005 and click "show user" with any username, I get a mostly blank page with the title "Errors". Blank because only the first 51 lines are received.

{{vm.hiddenReplies[4431] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
TRANG TRANG 19. prosince 2010 19. prosince 2010 18:47:51 UTC link Trvalý odkaz (permalink)

Fixed. The form wasn't supposed to be there anymore ^^

CK CK 17. prosince 2010, upraveno 30. října 2019 17. prosince 2010 11:53:27 UTC, upraveno 30. října 2019 1:23:55 UTC link Trvalý odkaz (permalink)

[not needed anymore- removed by CK]

{{vm.hiddenReplies[4479] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
debian2007 debian2007 17. prosince 2010 17. prosince 2010 15:36:11 UTC link Trvalý odkaz (permalink)

It will be a long one:
Javascript template Italian (I removed some noise from text):
A http://similar.uw.hu/ita_1.htm
B http://similar.uw.hu/ita_2.htm
C http://similar.uw.hu/ita_3.htm
D http://similar.uw.hu/ita_4.htm
E http://similar.uw.hu/ita_5.htm
F http://similar.uw.hu/ita_6.htm
G http://similar.uw.hu/ita_7.htm
H http://similar.uw.hu/ita_8.htm
I http://similar.uw.hu/ita_9.htm
J http://similar.uw.hu/ita_10.htm
K http://similar.uw.hu/ita_11.htm
L http://similar.uw.hu/ita_12.htm
M http://similar.uw.hu/ita_13.htm
N http://similar.uw.hu/ita_14.htm
O http://similar.uw.hu/ita_15.htm
P http://similar.uw.hu/ita_16.htm
Q http://similar.uw.hu/ita_17.htm
R http://similar.uw.hu/ita_18.htm
S http://similar.uw.hu/ita_19.htm
T http://similar.uw.hu/ita_20.htm
U http://similar.uw.hu/ita_21.htm
V http://similar.uw.hu/ita_22.htm
W http://similar.uw.hu/ita_23.htm
X http://similar.uw.hu/ita_24.htm
Y http://similar.uw.hu/ita_25.htm
Z http://similar.uw.hu/ita_26.htm
Javascript template Russian (Yandex search engine links, the last few characters of alphabet was missing from my source wordlist file):
a http://similar.uw.hu/rus_1.htm
б http://similar.uw.hu/rus_2.htm
в http://similar.uw.hu/rus_3.htm
г http://similar.uw.hu/rus_4.htm
д http://similar.uw.hu/rus_5.htm
е http://similar.uw.hu/rus_6.htm
ё http://similar.uw.hu/rus_7.htm
ж http://similar.uw.hu/rus_8.htm
з http://similar.uw.hu/rus_9.htm
и http://similar.uw.hu/rus_10.htm
й http://similar.uw.hu/rus_11.htm
к http://similar.uw.hu/rus_12.htm
л http://similar.uw.hu/rus_13.htm
м http://similar.uw.hu/rus_14.htm
н http://similar.uw.hu/rus_15.htm
о http://similar.uw.hu/rus_16.htm
п http://similar.uw.hu/rus_17.htm
р http://similar.uw.hu/rus_18.htm
с http://similar.uw.hu/rus_19.htm
т http://similar.uw.hu/rus_20.htm
у http://similar.uw.hu/rus_21.htm
ф http://similar.uw.hu/rus_22.htm
х http://similar.uw.hu/rus_23.htm

If a made mistakes, I will delete this wall post, and will provide new links (or I overwrite current ones).

{{vm.hiddenReplies[4485] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Demetrius Demetrius 20. prosince 2010 20. prosince 2010 17:54:55 UTC link Trvalý odkaz (permalink)

Thank you, I believe it may be useful. :)

The «ъ», «ы» and «ь» missing is perfectly OK: they aren’t allowed in the beginning of the word by Russian phonology.[1]
The «э» missing isn't a big deal since it's a rather rare letter.


But I’d like to warn against using the Russian wordlist. It is simply too big!

It contains, in particular:
· old-fashioned spellings (азбест/azbest instead of асбест/asbest ‘asbestos’)
· old-fashioned words (алкалический/alkalicheskij ‘alkaline (adj.)’ instead of щелочной/shhelochnoj)
· some names that have been obsolete for centuries like Агриппина/Agrippina or Андроник/Andronik
· two forms of patronymics for each male name: Андроникович/Andronikovich ‘son of Andronik’, Андрониковна/Andronikovna ‘daughter of Andronik’
· some words that can be used only in poetic speech, and sound very strange elsewhere (e.g. аметистово/ametistovo ‘amethystively, in an amethyst way’[2])

I doubt all Russian contributors will be able to cover this vocabulary in any reasonable period of time. I think we need something more concise.


[1] Ы/y` might occur at the beginning of some Turkish, Eskimo etc. placenames, but even there it is usually supstituted by И.
[2] Yes, it is very weird. Although it can be used as a part of compound words, e.g. аметистово-фиолетовый/ametistovo-fioletovy`j ‘purple like an amethyst, amethyst-purple’, but in such a case it’s not a word on its own.

{{vm.hiddenReplies[4511] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Demetrius Demetrius 20. prosince 2010 20. prosince 2010 19:58:05 UTC link Trvalý odkaz (permalink)

By the way, ритуално is a mere mistake in the wordlist.

debian2007 debian2007 21. prosince 2010 21. prosince 2010 12:48:33 UTC link Trvalý odkaz (permalink)

Demetrius, I can improve this list, but:
1) I used a big wordlist, so native speakers can decide which words are archaic, or not used at all.
1a) If you or anyone have suggestions, what wordlist/dictionary should I use, please tell me. The most recent monolingual dictionary (Толковый словарь) what I found is from 1950'. All those dictionaries available from slovopedia. (Modern ones contains many words from "old" dictionaries)
2) I do not like use any frequency lists, because they contains many words not used in colloquial speech. They even contain slangs.
3) What kind of list do you want? Even ruscorpora does not contains all necessary words. I can compile a wordlist from any documents. If I would use movie subtitles, it may contains too much slangs. If I would use contemporary literature, it may contains too much words, used only by writers.
4) There is no russian wordnet available yet (or not as big as it should be).
5) I could use a wordlist from open source spell checkers. It seems a good idea. But that list is huge too.
-) So I can not provide a concise and useful wordlist from materials, what I know about. Send me PM and links if you have some useful bookmarks. Or if it is not available in electronically form, I would still interested in it.

kebukebu kebukebu 18. prosince 2010 18. prosince 2010 3:47:52 UTC link Trvalý odkaz (permalink)

Very nice idea :) Once these lists start to shrink, it might be useful to do the same thing for words that only show up below a certain threshold -- for example, in fewer than ten sentences, or something like that.

Also -- translators, keep in mind that if you know the translation of a word in another language, you can search for existing sentences which you can translate in order to fill the gap. For example, if you want to add a sentence for the Russian word "зять" and you know that in English it can mean "son in law", then you can search for existing English sentences to translate:

http://tatoeba.org/eng/sentence...rom=und&to=und

At the same time, you can certainly always add new sentences :)

(By the way, CK, right now, the links for eng-2 and eng-awl still reference eng-1.html)

Swift Swift 17. prosince 2010 17. prosince 2010 21:44:00 UTC link Trvalý odkaz (permalink)

** Tag cleanup II **

We now continue with the clean up of the tags quotes. There were multiple tags that referred to the same person. I've identified these, transliterated others. There are a few still to do, but once these are done, almost a fifth of the tags will have been sorted and taken care of.

Please refer to:
http://martin.swift.is/tatoeba/tags-cleanup.html
for a list of tags that are being merged (there are also a couple of renames that don't seem to have gone through last time).

After that I'm hoping to finish sorting the tags before merging and renaming those. With that out of the way, we can start looking at categorisation schemes for the tags, to make them easier to find and use. I've been chatting with CK about that, but we'll try to have a little brain storm session at some point in the beginning of next year.

I just want to bring that to people's attention as it may help to start thinking about this. Send your ideas (even the half-baked ones) my way if you like.

DancingHorses DancingHorses 17. prosince 2010 17. prosince 2010 6:42:43 UTC link Trvalý odkaz (permalink)

So far, for me, searching for any word in Japanese spelled in kana will return unrelated results. It appears that each kana character is being treated as an individual word.

{{vm.hiddenReplies[4473] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Swift Swift 17. prosince 2010 17. prosince 2010 13:37:28 UTC link Trvalý odkaz (permalink)

Yes, each character seems to be searched separately, but if you only get unrelated results, it's because there are no better matches.

Searching for "飛行機" will give you results for "飛行機" first, but sentences with "飛行" and "機" separately last.

Searching kana is a bit more difficult, because it will only turn up results where the sentence is written with the kana. Searching for "ひこうき" will, for example not turn up any meaningful results, but "そんなことないよ" will (though not that exact phrase, though).

飛行機: http://tatoeba.org/eng/sentence...rom=jpn&to=und
ひこうき: http://tatoeba.org/eng/sentence...rom=jpn&to=und
そんなことないよ: http://tatoeba.org/eng/sentence...rom=jpn&to=und

{{vm.hiddenReplies[4483] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Swift Swift 17. prosince 2010 17. prosince 2010 15:55:22 UTC link Trvalý odkaz (permalink)

Though you can, of course, search for exact phrases with quotes (e.g. "そんなことないよ").

Gyuri Gyuri 16. prosince 2010 16. prosince 2010 14:21:22 UTC link Trvalý odkaz (permalink)

Tajperareto:

http://tatoeba.org/epo/sentence...ndom_sentences

Ĉu traduki frrazojn... > frazojn

{{vm.hiddenReplies[4465] ? 'expand_more' : 'expand_less'}} skrýt odpovědi zobrazit odpovědi
Aleksej Aleksej 17. prosince 2010 17. prosince 2010 13:29:23 UTC link Trvalý odkaz (permalink)

Mi korektis tion ĉe https://translations.launchpad.net/tatoeba