Burada Tatoeba'nın nasıl kullanılacağı, hatalar veya garip davranışların nasıl raporlanacağı gibi genel sorular sorabilir ya da en basitinden topluluğun geri kalanı ile kaynaşabilirsiniz.
Soru sormadan önce SSS'yi okuduğunuzdan emin olun.
En son mesajlar
- 2 saat önce, ema_rega tarafından
- 5 saat önce, mraz tarafından
- 6 saat önce, Aleksandro40 tarafından
- 9 saat önce, sacredceltic tarafından
- 9 saat önce, AlanF_US tarafından
- 10 saat önce, CK tarafından
- 10 saat önce, AlanF_US tarafından
- 10 saat önce, CK tarafından
- 11 saat önce, AlanF_US tarafından
- 13 saat önce, TRANG tarafından
Duvar (723 başlık)
Mi ne komprenas vian demandon. Ĉu vi povas doni ekzemplon de tiu, kiun vi serĉas?
Krome, ŝajnas ke vi parolas la piemontan, la italan kaj la esperantan lingvojn. Je via profilpaĝo, http://tatoeba.org/eng/user/profile/ema_rega, vi povas indiki ĉi tiun.
Kiam vi kreas frazojn, estas plej bona se vi uzas vian plej fortan lingvon. Se vi legas bonan tatoeban frazon en iu malpli forta lingvo, kiun vi facile komprenas, vi povas traduki ĝin en vian plej fortan lingvon. Dankon!
Vi vidas la liston de ĉiuj etikedoj.
Nun uzu la serĉilon de via retumilo* (the search function of your browser)
por trovi etikedon.
* Mi uzas Google Chrome kaj devas klaki la simbolon, kiu konsistas el tri horizontalaj strioj. Tiu simbolo estas ĉe mi tute dekstre apud la kampo de la retadreso. Se mi klakas tiun simbolon, mi povas elekti la serĉilon.
Nun vi povas dekstre elekti lingvon. Poste vi ricevos liston, kiu enhavos nur frazojn en tiu lingvo.
Se vi deziras ofte uzi serĉon por certa etikedo, vi povas faciligi la aferon. Vi simple kopiu la retadreson de la serĉo kaj kopiu ĝin al via profila paĝo (aŭ iu alia loko), kie vi ĉiam povas facile retrovi ĝin kaj denove starti tiun saman serĉon.
kreas liston de ĉiuj okvortaj frazoj en Esperanto
kreas liston de ĉiuj citaĵo de Johann Wolfgang Goethe tradukitaj al Esperanto
- provide an internal tool for searching for a specific tag (rather than force the user to rely on using the browser's search function)
- provide a list of tags sorted alphabetically rather than by frequency
I made the same requests a while ago and wrote tickets for them (number 288 and number 35, respectively). Good to know I'm not the only one interested in this functionality. I'll take another look at them.
Gondoltam, hogy ezt jelentheti nálad, amit kérdezel.
A dolog lényege, hogy mindenkinek más az anyanyelve, ill. más nyelven tud, ért, fordít.
Ha tehát egy eszperantó nyelvű mondatot talál egy olasz, lefordítja, lefordíthatja egy
kínai is. Az olaszt lefordítja egy angol és így tovább.
A zöld nyíl azt jelenti, hogy közvetlen a fekete, hogy csak közvetett fordítás.
Ezért kérdeztem rá párszor, hogy "tudsz ilyen vagy olyan nyelven?" - sőt a
magyar mondat fordítása? - mert úgy gondoltam, hogy egy másik nyelvű
mondatot fordítottál le. Vagy nem így gondolod? Üdv. : mraz
I wanted to know this information, so I generated this data from last Saturday's export of the data.
This list includes all languages that had over 100 duplicate sentences.
4.31% of hun (2417 of 56116) are duplicates
3.09% of ita (7991 of 258233)
2.91% of rus (5803 of 199734)
2.79% of tur (5473 of 196471)
2.28% of epo (7961 of 349293)
2.20% of ber (1968 of 89604)
2.17% of tat (122 of 5620)
1.66% of fin (749 of 45003)
1.27% of lat (175 of 13792)
1.15% of nob (112 of 9737)
1.11% of srp (182 of 16460)
1.10% of fra (2629 of 239725)
0.99% of por (1496 of 151162)
0.96% of eng (4473 of 464668)
0.90% of deu (2540 of 282869)
0.88% of ina (146 of 16511)
0.87% of spa (1834 of 211857)
0.82% of bul (132 of 16018)
0.78% of dan (146 of 18657)
0.72% of pol (455 of 63087)
0.56% of mar (117 of 20956)
0.45% of heb (460 of 102880)
0.43% of nld (167 of 38906)
0.08% of jpn (136 of 179148)
ita = 234
epo = 164
rus = 115
hun = 93
deu = 46
eng = 37
tur = 21
pol = 12
spa = 11
fra = 10
49,416 = total number of duplicates in the database
1) because the same sentence is being simultaneously translated in the same target language, by different people who are unaware of it.
This is particularly a side-effect of showing last sentences on the front page => every newcomer jumps on what is being shown and translates it...
Maybe we should not show latest contributions on the front page but on a different, dedicated page...
But this might also happen because several people translate simultaneously from the same list, and I can't see how to avoid this
2) because different sentences, in the same or different languages, end up being translated into the same target sentence.
That is unavoidable, especially since the two source sentences that are being translated are not directly linked.
When I translate "I show her" and then "I show him" into French, I have to create a duplicate, because these 2 sentences are initially unlinked, but they have the same translation in French : « Je lui montre ». I can't see how to avoid this.
I want to be able to translate what I want from any device, at any time.
One way I could see would be to be able to link my latest translations to a sentence from a list. But that would work only if the 2 sentences are close to each other in the list I'm translating...
Apparently, it's taking a lot of time to rewrite the script and thoroughly test it before running it again, or perhaps our existing programmers can't quite figure out how to write a script that can safely be used. If there are any programmers here who think they can help, there is information in the wiki on how to setup a local version of the Tatoeba website. Write firstname.lastname@example.org to offer your services after doing that.
"The interface now remembers the most recent list to which you assigned a sentence and sets that as the default when you want to assign another sentence to a list."
Has anyone found this useful? Or on the contrary, has anyone found this unpractical?
I would like to know if it will affect anyone if we revert this, that is: the first list would be the one that is always selected.
Here's a common workflow for me: as I go through Hebrew sentences, I find ones that I want to study and save them to a list. When that list reaches 100 sentences (the maximum that can be downloaded), I start a new list. I've done this five times. I want to save the lists I've created, which collect at the top of my list of lists, so the first list is guaranteed to be one that I don't want to assign sentences to at the moment.
Note that if, for instance, you had individual lists, and were in a phase where you wanted to always add sentences to a collaborative list, you would be in the same situation of always having to choose a new list.
I'd vote for having this reverted to the way it was.
Alan, too, could also just insert an exclamation mark (!) in front of the current list is he compiling to force it to the top.
On another matter, there seem to be a lot of "collaborative lists" that aren't really collaborative. If members set these to personal "my lists," then the amount of (seldom-used) data that needs to be included in each page displaying sentences would be a lot less. Pages would likely load a bit faster. Note that this data is duplicated 10 times on pages with 10 sentences.
I need help from for people who know the Tibetan language. I’m trying to make the search function into Tibetan to work. To better explain my request, I’ll first to explain a little bit how the search function is working.
Let’s say you have the sentence “This costs: 10$.” The search engine needs to extract the words “this”, “costs” and “10” so that people can find this sentence by searching for one of these keywords. It does this by having a list of characters that are part of words. This list includes a to z but not punctuation and currency symbols. Let’s say we include the colon in that list. Then searching for “costs” wouldn’t return that sentence ; only “costs:” would.
For languages like Tibetan it’s actually a little bit more complex since it doesn’t have word boundaries, but the idea is the same. I need someone who knows which characters are parts of real words in Tibetan to review this chart  and give me the character codes (numbers like 0F0A, 0F2E…). Thank you.
However, if it will help you, the basic consonants are:
0F40 0F41 0F42 0F43 0F44
0F45 0F46 0F47 0F49
0F4A 0F4B 0F4C 0F4D 0F4E
0F4F 0F50 0F51 0F52 0F53
0F54 0F55 0F56 0F57 0F58
0F59 0F5A 0F5B 0F5C 0F5D
0F5E 0F5F 0F60 0F61 0F62
0F63 0F64 0F65 0F66 0F67
0F68 0F69 0F6A 0F6B 0F6C
And the vowels:
0F71 0F72 0F73 0F74 0F75
0F76 0F77 0F78 0F79 0F7A
0F7B 0F7C 0F7D 0F7E 0F7F
0F80 0F81 0F82 0F83 0F84
We then have the subjoined consonants:
0F90 0F91 0F92 0F93 0F94
0F95 0F96 0F97 0F99
0F9A 0F9B 0F9C 0F9D 0F9E
0F9F 0FA0 0FA1 0FA2 0FA3
0FA4 0FA5 0FA6 0FA7 0FA8
0FA9 0FAA 0FAB 0FAC 0FAD
0FAE 0FAF 0FB0 0FB1 0FB2
0FB3 0FB4 0FB5 0FB6 0FB7
0FB8 0FB9 0FBA 0FBB 0FBC
It may help to know that a regular consonant can be transformed to a subjoined consonant by adding 0050 hex or 0080 decimal.
You may want the numbers as well:
0F20 0F21 0F22 0F23 0F24
0F25 0F26 0F27 0F28 0F29
I wonder how this happened.
Duplicates linked to the German sentence http://tatoeba.org/eng/sentences/show/1879758
mraz - Mar 26th 2014, 09:37
linked to 3127072
mraz - Mar 26th 2014, 09:39
linked to 3127079
mraz - Mar 26th 2014, 09:42
linked to 3127088
mraz - Mar 26th 2014, 09:43
linked to 3127093
mraz - Mar 26th 2014, 09:44
linked to 3127098
mraz - Mar 26th 2014, 09:45
linked to 3127103
mraz - Mar 26th 2014, 09:46
linked to 3127107
(1) Did mraz actually do what neron described, or did something else go wrong that we don't know about?
(2) If neron's scenario occurred, can we change the code to disable the "translate" button until the interface is updated?
Also, a guideline: if what neron described is what really happened, users should try to avoid this type of multiple submission of the same sentence (unless and until we change the code to prevent it).
I recently worked to improve the furiganas for sentences of the Japanese language. The furiganas are now displayed as hiraganas instead of katakanas. In addition, they are no longer attached to words already in kanas. (Actually, it’s not perfect: when a word contains a mix of kanas and kanjis, the whole word, including the kana parts, is displayed in the furigana.)
In other words, we now have (#3501384):
言い訳[いいわけ] ばっか すん な よ 。
言い訳[イイワケ] ばっか[バッカ] すん[スン] な[ナ] よ[ヨ] 。[。]
Last but not least, the furiganas should contain less errors than they used to. For instance, 来ない is now correctly read as こない instead of *きない. But beware, furiganas are still not 100% accurate.
EDIT: On a side note, I’d like to mention that deploying the updated version of our (terrible) furigana generation software on tatoeba.org was a piece of cake, thanks to the work of pallavshah, one of the GSOC student who worked on Tatoeba this summer. In other words, he saved us hours of tedious work and we can develop faster and safer.
I'm looking forward to seeing perfect furigana. I guess the trickiest are words like 飼い犬（かいいぬ）, since it's probably difficult for a machine to decide whether it's 飼（か）い犬（いぬ） or 飼（かい）い犬（ぬ）. I'm willing to help you if there's anything I can do.
It is simply impossible for computer-generated romanisation to be 100% accurate. To illustrate the problem, the final particle 呀 can be pronounced as aa1, aa3, aa4 and aa6 depending on the tone you want to convey...
It would be great we could list all the languages that could benefit from having editable alternative scripts, so we can implement a solution that could easily be ported to other languages. We basically need to know:
— the language;
— its alternative script(s), how they derivate from the main script and what are they used for (a link to the Wikipedia page should be enough);
— whether the script(s) can be computer-generated with 100 % accuracy;
— if not, what are the tools out there that can generate a partially accurate script;
I think every language would require some kind of reading aid at least partially to show how to read, say, "2014" or "Louis XIV".
You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.
That makes sense. Actually I have the feeling that the way furiganas are implemented is kinda wrong because they are treated as an alternative script, just like e.g. romanization of Chinese. As a result, Japanese sentences are displayed twice (one with furiganas and one without), which is a total waste a space and a bad way of presenting sentences. Implementing reading aids as a different concept would both solve this bad presentation and make it available for any language.
On the other hand, although reading aids may help with pronunciation, they are inexistant in Latin-based languages, so we won’t be showing the reality of these languages by attaching reading aids. This could e.g. trick Japanese learners into thinking that English can actually have reading aids just like Japanese uses furiganas.
> You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.
Do you mean we should allow multiple readings? I’m not sure it’s a good idea.
Ah, do you mean something like this?
Tom wurde 2013 in Boston geboren.
That would look weird indeed. I haven't thought about a concrete way to display it. There should be a neater way.
> Do you mean we should allow multiple readings? I’m not sure it’s a good idea.
Yes. If you don't want to allow them, you need to allow multiple sentences that look the same. Otherwise people would end up arguing which reading is the best.
#3404030 Tom was born in Boston in 2013. (two thousand thirteen)
#12345678 Tom was born in Boston in 2013. (two thousand and thirteen)
#12345679 Tom was born in Boston in 2013. (twenty thirteen)
If one can come up with a neat way to display these readings for Latin-based languages (like a tooltip or something), how are we going to display multiple readings for Japanese?
What will be the order of the readings (which one we put on the top or bottom of the list)?
What reading should we use for transliterations (like romanization)? We just can’t say that one of the readings is “the main one” because people are likely not to agree.
And there may be other issues I’m not thinking about yet.
Alternatively, we could say that we only allow one reading, which is up to the owner. It doesn’t mean it’s the only way to read, but it’s how the owner would read it.
Maybe that's better. One reason is that the meaning of a sentence can change according to the readings.
But I think we should be able to edit the readings of sentences owned by others when they're wrong. And here again arises the problem "Is this (automatically generated) reading really absolutely wrong? I'd never read it this way, but maybe some native speakers do." We'll need a sensible way to handle it.