Wall (723 threads)

<<< 1234567 >>
ema_rega
6 day(s) ago
Saluton, cxu iu scias kiel sercxi markon specifan inter la aliaj?
Objectivesea
5 day(s) ago
ema_rega skribis: "Saluton, cxu iu scias kiel sercxi markon specifan inter la aliaj?"

Saluton,
Mi ne komprenas vian demandon. Ĉu vi povas doni ekzemplon de tiu, kiun vi serĉas?

Krome, ŝajnas ke vi parolas la piemontan, la italan kaj la esperantan lingvojn. Je via profilpaĝo, http://tatoeba.org/eng/user/profile/ema_rega, vi povas indiki ĉi tiun.

Kiam vi kreas frazojn, estas plej bona se vi uzas vian plej fortan lingvon. Se vi legas bonan tatoeban frazon en iu malpli forta lingvo, kiun vi facile komprenas, vi povas traduki ĝin en vian plej fortan lingvon. Dankon!
ema_rega
5 day(s) ago
Mi pensas pri la etikedoj kiuj markas la kelkajn frazojn, ĉu estas ilo por serĉi frazojn kun specifa etikedo?
al_ex_an_der
5 day(s) ago - edited 5 day(s) ago
[Foliumi] > [Foliumi laŭ etikedo] / [Browse] > [Browse by tag]
Vi vidas la liston de ĉiuj etikedoj.

Nun uzu la serĉilon de via retumilo* (the search function of your browser)
por trovi etikedon.

* Mi uzas Google Chrome kaj devas klaki la simbolon, kiu konsistas el tri horizontalaj strioj. Tiu simbolo estas ĉe mi tute dekstre apud la kampo de la retadreso. Se mi klakas tiun simbolon, mi povas elekti la serĉilon.
al_ex_an_der
5 day(s) ago - edited 5 day(s) ago
Kiam vi elektis la etikedon per klako, vi vidos liston de ĉiuj frazoj havantaj tiun etikedon.
Nun vi povas dekstre elekti lingvon. Poste vi ricevos liston, kiu enhavos nur frazojn en tiu lingvo.

Se vi deziras ofte uzi serĉon por certa etikedo, vi povas faciligi la aferon. Vi simple kopiu la retadreson de la serĉo kaj kopiu ĝin al via profila paĝo (aŭ iu alia loko), kie vi ĉiam povas facile retrovi ĝin kaj denove starti tiun saman serĉon.

Ekzemploj:
http://tatoeba.org/epo/tags/sho...h_tag/2476/epo
kreas liston de ĉiuj okvortaj frazoj en Esperanto

http://tatoeba.org/epo/tags/sho...th_tag/647/epo
kreas liston de ĉiuj citaĵo de Johann Wolfgang Goethe tradukitaj al Esperanto
ema_rega
5 day(s) ago
Jes, dankon nun mi komprenas kiel fari, sed do fakte ne ekzista ilo ene de Tatoba por serĉi specifan etikedon, ni povas tion fari nur pere de ekstera ilo, kaj ekz. ne ekzista listo de la etikedoj laŭ la aboco, ĉu ne?
al_ex_an_der
5 day(s) ago - edited 5 day(s) ago
Demandu iun el la adminstrantoj. Ili eble povas disponigi al vi alfabetan liston, se vi bezonas ĝin.
AlanF_US
5 day(s) ago
The thread above included the following two requests:
- provide an internal tool for searching for a specific tag (rather than force the user to rely on using the browser's search function)
- provide a list of tags sorted alphabetically rather than by frequency

I made the same requests a while ago and wrote tickets for them (number 288 and number 35, respectively). Good to know I'm not the only one interested in this functionality. I'll take another look at them.
al_ex_an_der
5 day(s) ago
Ankaŭ tio povas helpi al vi:
http://tatoeba.org/epo/sentence...ta/indifferent
Vi ricevas liston, kiu indikas ĉiujn frazojn en Esperanto, kiuj ankoraŭ ne havas (rektan) italan tradukon.
(Tio funkcias ne per etikedoj, sed per la menuo "Foliumi laŭ lingvoj".)
Pharamp
16 hour(s) ago
Saluton! Mi ĝojas ke vi aktive kontribuas :D
ema_rega
2 hour(s) ago
Dankon kara! :-)
Aleksandro40
1 day(s) ago
Hogyan lehetne elkerülni a láncfordításokat?
Kiel oni povus eviti la ĉentradukaĵojn?
Wie könnte man die Kettenübersetzungen aŭsweichen?
mraz
1 day(s) ago
Mi az a láncfordítás?
Aleksandro40
6 hour(s) ago
Láncfordítások (pl.): magyarról - eszperantóra, eszperantóról - oroszra, oroszról - szlovákra, stb. Jó lenne, ha mindenki az eredeti mondatot fordítaná le a saját anyanyelvére, vagy egy jól tudott nyelvre.
mraz
5 hour(s) ago
Kedves Aleksandro40!

Gondoltam, hogy ezt jelentheti nálad, amit kérdezel.

A dolog lényege, hogy mindenkinek más az anyanyelve, ill. más nyelven tud, ért, fordít.

Ha tehát egy eszperantó nyelvű mondatot talál egy olasz, lefordítja, lefordíthatja egy

kínai is. Az olaszt lefordítja egy angol és így tovább.

A zöld nyíl azt jelenti, hogy közvetlen a fekete, hogy csak közvetett fordítás.

Ezért kérdeztem rá párszor, hogy "tudsz ilyen vagy olyan nyelven?" - sőt a

magyar mondat fordítása? - mert úgy gondoltam, hogy egy másik nyelvű

mondatot fordítottál le. Vagy nem így gondolod? Üdv. : mraz
CK
CK
7 day(s) ago
** Duplicate Sentences **
I wanted to know this information, so I generated this data from last Saturday's export of the data.
This list includes all languages that had over 100 duplicate sentences.

4.31% of hun (2417 of 56116) are duplicates
3.09% of ita (7991 of 258233)
2.91% of rus (5803 of 199734)
2.79% of tur (5473 of 196471)
2.28% of epo (7961 of 349293)
2.20% of ber (1968 of 89604)
2.17% of tat (122 of 5620)
1.66% of fin (749 of 45003)
1.27% of lat (175 of 13792)
1.15% of nob (112 of 9737)
1.11% of srp (182 of 16460)
1.10% of fra (2629 of 239725)
0.99% of por (1496 of 151162)
0.96% of eng (4473 of 464668)
0.90% of deu (2540 of 282869)
0.88% of ina (146 of 16511)
0.87% of spa (1834 of 211857)
0.82% of bul (132 of 16018)
0.78% of dan (146 of 18657)
0.72% of pol (455 of 63087)
0.56% of mar (117 of 20956)
0.45% of heb (460 of 102880)
0.43% of nld (167 of 38906)
0.08% of jpn (136 of 179148)
CK
CK
2 day(s) ago - edited 2 day(s) ago
* The top 10 languages with exact duplicates added last week *

ita = 234
epo = 164
rus = 115
hun = 93
deu = 46
eng = 37
tur = 21
pol = 12
spa = 11
fra = 10


49,416 = total number of duplicates in the database
sacredceltic
9 hour(s) ago - edited 9 hour(s) ago
Duplicates can be introduced in different ways :

1) because the same sentence is being simultaneously translated in the same target language, by different people who are unaware of it.
This is particularly a side-effect of showing last sentences on the front page => every newcomer jumps on what is being shown and translates it...
Maybe we should not show latest contributions on the front page but on a different, dedicated page...

But this might also happen because several people translate simultaneously from the same list, and I can't see how to avoid this

2) because different sentences, in the same or different languages, end up being translated into the same target sentence.
That is unavoidable, especially since the two source sentences that are being translated are not directly linked.

When I translate "I show her" and then "I show him" into French, I have to create a duplicate, because these 2 sentences are initially unlinked, but they have the same translation in French : « Je lui montre ». I can't see how to avoid this.
All the procedures I have seen so far, based on javascript gimmicks are awkward (you need to copy sentences numbers...) and inconvenient and they don't work on all devices.
I want to be able to translate what I want from any device, at any time.

One way I could see would be to be able to link my latest translations to a sentence from a list. But that would work only if the 2 sentences are close to each other in the list I'm translating...
Pfirsichbaeumchen
6 day(s) ago
Has any progress been made on the development of a new duplicate merging script?
AlanF_US
6 day(s) ago
I just sent saeb a note to ask.
sacredceltic
2 day(s) ago
The former merging script worked wonders. Why on Earth was it ever changed ?!
CK
CK
2 day(s) ago - edited 2 day(s) ago
Unfortunately, the existing script wasn't perfect and some data was being lost when it was run.

Apparently, it's taking a lot of time to rewrite the script and thoroughly test it before running it again, or perhaps our existing programmers can't quite figure out how to write a script that can safely be used. If there are any programmers here who think they can help, there is information in the wiki on how to setup a local version of the Tatoeba website. Write team@tatoeba.org to offer your services after doing that.
sacredceltic
1 day(s) ago
Data is lost when Corpus maintainers manually merge sentences while forgetting to merge comments, tags and lists attachments.
TRANG
13 hour(s) ago
A question for people who use lists. We've added the following behavior[1] to the lists a couple of months ago:

"The interface now remembers the most recent list to which you assigned a sentence and sets that as the default when you want to assign another sentence to a list."

Has anyone found this useful? Or on the contrary, has anyone found this unpractical?
I would like to know if it will affect anyone if we revert this, that is: the first list would be the one that is always selected.

-----

[1] http://blog.tatoeba.org/2014/08...st-2-2014.html
AlanF_US
11 hour(s) ago
I find it extremely useful, which is why I implemented it. It used to drive me crazy, especially when working on a tablet, to have to scroll through the list of lists to find the one I want.

Here's a common workflow for me: as I go through Hebrew sentences, I find ones that I want to study and save them to a list. When that list reaches 100 sentences (the maximum that can be downloaded), I start a new list. I've done this five times. I want to save the lists I've created, which collect at the top of my list of lists, so the first list is guaranteed to be one that I don't want to assign sentences to at the moment.

Note that if, for instance, you had individual lists, and were in a phase where you wanted to always add sentences to a collaborative list, you would be in the same situation of always having to choose a new list.
CK
CK
10 hour(s) ago - edited 10 hour(s) ago
I find it hurts my workflow, since I have my lists numbered in such a way as to prioritize the ones I use more often. I add lots of items to various lists almost every day. It helps me do this faster if I can have the lists in the same order with number 1 being at the top every time I access the lists.

I'd vote for having this reverted to the way it was.
Alan, too, could also just insert an exclamation mark (!) in front of the current list is he compiling to force it to the top.

On another matter, there seem to be a lot of "collaborative lists" that aren't really collaborative. If members set these to personal "my lists," then the amount of (seldom-used) data that needs to be included in each page displaying sentences would be a lot less. Pages would likely load a bit faster. Note that this data is duplicated 10 times on pages with 10 sentences.
AlanF_US
10 hour(s) ago
An alternative to reverting the change would be to control this behavior with a setting accessible from the options. I would vote for that (and implement it), since the renaming trick won't work for everybody. For instance, a collaborative list will never appear in first position (for anyone who has at least one individual list), regardless of its name. People who always submit to a (genuine) collaborative list will thus always have to do extra work unless they have the option of remembering the last list used.
CK
CK
10 hour(s) ago
If you're willing to implement options, I'd like to see an option to not include collaborative list data in the pages. This would lower the amount of data most people would have to download on page views, since the collaborative lists aren't being used by most members (I assume).
AlanF_US
9 hour(s) ago
That makes sense. I'll look into it.
nava
1 day(s) ago
I must say it's a real pleasure working with Tatoeba ever since the improvement in performance/ server upgrade was done! Thank you and well done :)
sabretou
1 day(s) ago
Gonna second this. I really appreciate how smooth and fast everything is now.
gillux
2 day(s) ago - edited 2 day(s) ago
Hello,

I need help from for people who know the Tibetan language. I’m trying to make the search function into Tibetan to work. To better explain my request, I’ll first to explain a little bit how the search function is working.

Let’s say you have the sentence “This costs: 10$.” The search engine needs to extract the words “this”, “costs” and “10” so that people can find this sentence by searching for one of these keywords. It does this by having a list of characters that are part of words. This list includes a to z but not punctuation and currency symbols. Let’s say we include the colon in that list. Then searching for “costs” wouldn’t return that sentence ; only “costs:” would.

For languages like Tibetan it’s actually a little bit more complex since it doesn’t have word boundaries, but the idea is the same. I need someone who knows which characters are parts of real words in Tibetan to review this chart [1] and give me the character codes (numbers like 0F0A, 0F2E…). Thank you.

[1] http://www.unicode.org/charts/PDF/U0F00.pdf
Objectivesea
2 day(s) ago - edited 2 day(s) ago
I think this might be a more difficult challenge than it seems at first. I don't know much about Tibetan, but in common with other Indic scripts like Devanagari, Bengali, etc., the basic Tibetan consonant-plus-inherent-vowel glyph will be modified with vowels other than the inherent 'a'. The printed form will also change when it is preceded or followed by another consonant to form a cluster.

However, if it will help you, the basic consonants are:
0F40 0F41 0F42 0F43 0F44
0F45 0F46 0F47 0F49
0F4A 0F4B 0F4C 0F4D 0F4E
0F4F 0F50 0F51 0F52 0F53
0F54 0F55 0F56 0F57 0F58
0F59 0F5A 0F5B 0F5C 0F5D
0F5E 0F5F 0F60 0F61 0F62
0F63 0F64 0F65 0F66 0F67
0F68 0F69 0F6A 0F6B 0F6C

And the vowels:
0F71 0F72 0F73 0F74 0F75
0F76 0F77 0F78 0F79 0F7A
0F7B 0F7C 0F7D 0F7E 0F7F
0F80 0F81 0F82 0F83 0F84

We then have the subjoined consonants:
0F90 0F91 0F92 0F93 0F94
0F95 0F96 0F97 0F99
0F9A 0F9B 0F9C 0F9D 0F9E
0F9F 0FA0 0FA1 0FA2 0FA3
0FA4 0FA5 0FA6 0FA7 0FA8
0FA9 0FAA 0FAB 0FAC 0FAD
0FAE 0FAF 0FB0 0FB1 0FB2
0FB3 0FB4 0FB5 0FB6 0FB7
0FB8 0FB9 0FBA 0FBB 0FBC

It may help to know that a regular consonant can be transformed to a subjoined consonant by adding 0050 hex or 0080 decimal.

You may want the numbers as well:
0F20 0F21 0F22 0F23 0F24
0F25 0F26 0F27 0F28 0F29
mraz
6 day(s) ago
ADMIN pls # 3127132
CK
CK
6 day(s) ago - edited 6 day(s) ago
They seem to have not all been submitted at the same time.
I wonder how this happened.

Duplicates linked to the German sentence http://tatoeba.org/eng/sentences/show/1879758

mraz - Mar 26th 2014, 09:37
linked to 3127072
mraz - Mar 26th 2014, 09:39
linked to 3127079
mraz - Mar 26th 2014, 09:42
linked to 3127088
mraz - Mar 26th 2014, 09:43
linked to 3127093
mraz - Mar 26th 2014, 09:44
linked to 3127098
mraz - Mar 26th 2014, 09:45
linked to 3127103
mraz - Mar 26th 2014, 09:46
linked to 3127107
etc.
neron
6 day(s) ago - edited 6 day(s) ago
I suppose network connection was really bad, and interface didn't give a feedback that the request was already sent (the interface looked like frozen, nothing changed), the BUTTON for sending it wasn't DISABLED after the first request, and one persistent contributor more than willing to get over with it and to go to the next sentence... It happens all the time, but we usually make only one duplicate (two clicks) and than wait.
AlanF_US
5 day(s) ago
So there are several questions here:

(1) Did mraz actually do what neron described, or did something else go wrong that we don't know about?

(2) If neron's scenario occurred, can we change the code to disable the "translate" button until the interface is updated?

Also, a guideline: if what neron described is what really happened, users should try to avoid this type of multiple submission of the same sentence (unless and until we change the code to prevent it).
gillux
6 day(s) ago - edited 6 day(s) ago
Hello!

I recently worked to improve the furiganas for sentences of the Japanese language. The furiganas are now displayed as hiraganas instead of katakanas. In addition, they are no longer attached to words already in kanas. (Actually, it’s not perfect: when a word contains a mix of kanas and kanjis, the whole word, including the kana parts, is displayed in the furigana.)

In other words, we now have (#3501384):
言い訳[いいわけ] ばっか すん な よ 。
Instead of:
言い訳[イイワケ] ばっか[バッカ] すん[スン] な[ナ] よ[ヨ] 。[。]

Last but not least, the furiganas should contain less errors than they used to. For instance, 来ない is now correctly read as こない instead of *きない. But beware, furiganas are still not 100% accurate.

EDIT: On a side note, I’d like to mention that deploying the updated version of our (terrible) furigana generation software on tatoeba.org was a piece of cake, thanks to the work of pallavshah, one of the GSOC student who worked on Tatoeba this summer. In other words, he saved us hours of tedious work and we can develop faster and safer.
tommy_san
6 day(s) ago - edited 6 day(s) ago
Great! It really looks much better now. Thank you for your hard work, gillux and pallavshah.

I'm looking forward to seeing perfect furigana. I guess the trickiest are words like 飼い犬(かいいぬ), since it's probably difficult for a machine to decide whether it's 飼(か)い犬(いぬ) or 飼(かい)い犬(ぬ). I'm willing to help you if there's anything I can do.
CK
CK
18 day(s) ago - edited 18 day(s) ago
Tatoeba.org Native Speakers
http://bit.ly/nativespeakers

If your username is not yet on this list, please write a private message [ http://tatoeba.org/eng/private_messages/write/CK ] and tell me what your native language or strongest language is.
nickyeow
17 day(s) ago
I think this has been discussed before, but is it possible to make the romanisation of Cantonese sentences editable?

It is simply impossible for computer-generated romanisation to be 100% accurate. To illustrate the problem, the final particle 呀 can be pronounced as aa1, aa3, aa4 and aa6 depending on the tone you want to convey...
AlanF_US
16 day(s) ago
Yes, this has been discussed. It would take me a while to find the thread, though.
Impersonator
15 day(s) ago - edited 15 day(s) ago
Here are related tickets in the issue tracker:
https://github.com/Tatoeba/tatoeba2/issues/264
https://www.assembla.com/spaces...ctivity/ticket [old ticket]
https://github.com/Tatoeba/tatoeba2/issues/87 [related problem: arabic transcription]

Happy Mid-Autumn Festival, by the way! :)
gillux
15 day(s) ago
I actually started to draft a possible implementation in that ticket: https://github.com/Tatoeba/tatoeba2/issues/77

It would be great we could list all the languages that could benefit from having editable alternative scripts, so we can implement a solution that could easily be ported to other languages. We basically need to know:
— the language;
— its alternative script(s), how they derivate from the main script and what are they used for (a link to the Wikipedia page should be enough);
— whether the script(s) can be computer-generated with 100 % accuracy;
— if not, what are the tools out there that can generate a partially accurate script;
tommy_san
7 day(s) ago
Glad to hear that!
I think every language would require some kind of reading aid at least partially to show how to read, say, "2014" or "Louis XIV".
You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.
gillux
7 day(s) ago
> I think every language would require some kind of reading aid at least partially to show how to read, say, "2014" or "Louis XIV".

That makes sense. Actually I have the feeling that the way furiganas are implemented is kinda wrong because they are treated as an alternative script, just like e.g. romanization of Chinese. As a result, Japanese sentences are displayed twice (one with furiganas and one without), which is a total waste a space and a bad way of presenting sentences. Implementing reading aids as a different concept would both solve this bad presentation and make it available for any language.

On the other hand, although reading aids may help with pronunciation, they are inexistant in Latin-based languages, so we won’t be showing the reality of these languages by attaching reading aids. This could e.g. trick Japanese learners into thinking that English can actually have reading aids just like Japanese uses furiganas.

> You need also to keep in mind that sometimes multiple readings are possible (eg. 何 and 明日 in Japanese). Quality of reading aids should be controlled much the same way as the sentences, since there are some readings that are theoretically possible but unlikely.
Do you mean we should allow multiple readings? I’m not sure it’s a good idea.
tommy_san
7 day(s) ago
> On the other hand, although reading aids may help with pronunciation, they are inexistant in Latin-based languages, so we won’t be showing the reality of these languages by attaching reading aids.

Ah, do you mean something like this?

Zweitausenddreizehn
Tom wurde 2013 in Boston geboren.

That would look weird indeed. I haven't thought about a concrete way to display it. There should be a neater way.


> Do you mean we should allow multiple readings? I’m not sure it’s a good idea.

Yes. If you don't want to allow them, you need to allow multiple sentences that look the same. Otherwise people would end up arguing which reading is the best.

#3404030 Tom was born in Boston in 2013. (two thousand thirteen)
#12345678 Tom was born in Boston in 2013. (two thousand and thirteen)
#12345679 Tom was born in Boston in 2013. (twenty thirteen)
gillux
7 day(s) ago
I got your point. I agree that we should allow multiple readings, but then things get complicated.

If one can come up with a neat way to display these readings for Latin-based languages (like a tooltip or something), how are we going to display multiple readings for Japanese?

What will be the order of the readings (which one we put on the top or bottom of the list)?

What reading should we use for transliterations (like romanization)? We just can’t say that one of the readings is “the main one” because people are likely not to agree.

And there may be other issues I’m not thinking about yet.

Alternatively, we could say that we only allow one reading, which is up to the owner. It doesn’t mean it’s the only way to read, but it’s how the owner would read it.
tommy_san
7 day(s) ago
> Alternatively, we could say that we only allow one reading, which is up to the owner. It doesn’t mean it’s the only way to read, but it’s how the owner would read it.

Maybe that's better. One reason is that the meaning of a sentence can change according to the readings.
映画館は人気(にんき/ひとけ)がなかった。
明日来るお客さんって何人(なんにん/なにじん)なの?

But I think we should be able to edit the readings of sentences owned by others when they're wrong. And here again arises the problem "Is this (automatically generated) reading really absolutely wrong? I'd never read it this way, but maybe some native speakers do." We'll need a sensible way to handle it.
<<< 1234567 >>