Muro (3750 fadenoj)

<<< 1234567 >>
gillux
antaŭ 52 minutoj - redaktita antaŭ 49 minutoj
*** The editable transcription feature is now testable on https://dev.tatoeba.org/ ***

Currently only transcriptions for Chinese and Japanese sentences may be edited.

Summary of changes
• On every sentence that may have transcriptions, you’ll see an additional icon “Show transcriptions” along with “Add to list” etc. For the moment it’s the same icon as “Edit sentence” but we’ll change it later.
• Clicking on that icon pops-up transcriptions for every sentence of the group, along with a warning message about their unreviewed state.
• You can review/edit them by clicking on the transcription text. Once sent, a transcription will always appear along with the sentence, just like now.
• In your settings, there is a new option that allows you to show unreviewed transcriptions by default, without having to click on the “Show transcriptions” button. If enabled, the warning text is replaced by a simple warning icon on the right.

Any suggestions are welcome, but before posting a comment, please have a look at the current implementation status here: https://github.com/Tatoeba/tatoeba2/pull/661 There are some unsettled questions, and know problems (like missing icons, which is why you see Hrkt or Latn instead).

Previous thread: https://tatoeba.org/wall/show_message/22679
Pfirsichbaeumchen
antaŭ 5 horoj
Is it just me, or is Tatoeba extremely slow today?
kaŝi la respondojn
marafon
antaŭ 5 horoj
I've noticed it, too.
tatoebix
antaŭ 10 horoj
Suggestion concerning furigana use. I find that after a while the eyes tire really fast
from switching between normal font and the smaller furigana. It appears
- at least to me - that hiragana is a better option. So a japanese sentence
could be displayed in original form with kanji and in hiragana form with
furigana turned off in the same font size . Further the original kanji part and
its hiragana form could be slightly highlighted for easier recognition.
kaŝi la respondojn
gillux
antaŭ 10 horoj - redaktita antaŭ 45 minutoj
I find reading a kanji-less sentences even more tiring to read. I’m always having a hard time to read words in kana that are normally written in kanji, and to figure out where does a word start and ends.
orion17
antaŭ 2 tagoj - redaktita antaŭ 2 tagoj
I hope I can search sentence with wild card feature. It will be very useful for inflectional language such as Arabic or the agglutinative one like Turkish.
Because I always find problem every time I want to search Arabic sentence. For example I want to find word ذهب (to go / he went) and the results are exactly the same. There are no result like ذهبت (you went), نذهب (we go), يذهبون (they go), أذهب (I go) etc.
and don't you think it will be very useful if we can search sentence examples of a language with its part or grammar, for example I want to search all Turkish sentences containing suffix -iyor/-ıyor/-uyor/-üyor without concerning the verb itself...
kaŝi la respondojn
gillux
hieraŭ
Thank you for pointing this out, orion17. I created a ticket for this https://github.com/Tatoeba/tatoeba2/issues/664

We’re likely to add wildcard search soon.
kaŝi la respondojn
orion17
antaŭ 12 horoj
thanks for your response :)
This will be great
gillux
antaŭ 14 tagoj - redaktita antaŭ 14 tagoj
*** Attention to our members knowledgeable in Chinese, Shanghainese, Cantonese, Uzbek or Georgian. ***

I’m currently working on improving transcriptions in Tatoeba. Currently, transcriptions are automatically generated by a piece of software that sometimes fails providing correct transcriptions. A few of these failures are tagged with “incorrect transcription” [1] or “furigana mistake” for Japanese.

[1] https://tatoeba.org/tags/show_s..._with_tag/1673
[2] https://tatoeba.org/tags/show_s..._with_tag/1172

I plan to address this problem in two ways:
(1) improve the software that generates transcriptions based on feedback from users;
(2) allow users to edit transcriptions so that they can fix problems themselves.

Both of these approaches have limits, depending on the accuracy of the current transcriptions, the type of errors and the language. For instance, if a given transcription error is very widespread, it’s better to fix the software rather than to fix the same error by hand on a large number of sentences.

In Japanese, I am knowledgeable enough to know that no software is capable of providing 100% accurate transcriptions, so I will use mostly approach (2), and only a little bit of (1). However, I don’t know if it’s a good idea to use (2) in other languages we provide transcription for, namely Chinese (both traditional/simplified conversion and Pinyin), Shanghainese, Cantonese, Uzbek and Georgian.

So my question is, for each these languages:
• To what extend the current transcriptions are accurate? Try to give a percentage.
• If this percentage is lower than 100%:
   • Do you think it’s a good idea to systematically display autogenerated transcriptions (like we do now) although some are incorrect?
   • What type of errors can you see in the transcriptions? Try to categorize. If you know software development, how easily can they be detected and fixed?

Feel free to translate this post in the concerned languages.
kaŝi la respondojn
pullnosemans
antaŭ 14 tagoj - redaktita antaŭ 14 tagoj
very glad that you're addressing this issue; I was going to mention it on tatoeba day, but the sooner, the better. note that there is also the "wrong transliteration" tag that I've recently been using for japanese and mandarin.

to answer your questions, I would say that
* the percentage of correctly transliterated sentences without any problems should be around 80 to 90 percent. I regularly notice errors, but when I pay active attention, I see that most sentences I come across are in fact transliterated correctly. however, my skills in both languages are only mediocre, so this might be somewhat off from the actual situation.
* yes, autogenerated transcriptions with flaws are better than none. just think of our contributors manually adding the transliteration to every single japanese sentence they own. I wouldn't want to have them do that, not to mention the fact that the majority of japanese sentences on tatoeba are orphans.
* the errors mainly concern characters with multiple readings (duh). the wrong choice of transliteration is then sometimes chosen by the tool for both morphological and syntactic reasons. e.g., in mandarin compounds, the problem is apparently that the compound is saved in the tool's database as one word, but with the wrong pinyin. but often single characters (i.e. where a character doesn't have any direct context with other characters that form a word with it) are wrongly transliterated if the tool cannot correctly analyze the syntax of a sentence. I have no idea about computing of any kind, so this is where my part ends.

I generally think that giving sentence owners the possibility to manually change a sentence's transliteration would be a very good first step. we could also add some kind of marking whether a sentence's transliteration is automatically generated or hand-made.

stoked to see this being improved! I'll be happy to offer all the help I can.
kaŝi la respondojn
gillux
antaŭ 13 tagoj - redaktita antaŭ 13 tagoj
> however, my skills in both languages are only mediocre

Which ones?

About the Japanese furigana/transcription

> * yes, autogenerated transcriptions with flaws are better than none.
I strongly disagree. To me, there is no point into showing furigana if it’s not trustworthy. It’s no good for Japanese learners because they may think it’s trustworthy whereas it’s not, and because they eventually need to know the readings by themselves to be able to judge whether it’s correct or not, which defeats the original purpose. It’s no good for Japanese speakers neither because they may think Tatoeba is not a serious project. See also http://en.wiki.tatoeba.org/articles/show/furigana.

Because of this, I plan not to show furigana by default unless reviewed, and to allow showing untrustworthy transcriptions by default for every sentence like now with an option.

> just think of our contributors manually adding the transliteration to every single japanese sentence they own.

Yes, the task of manually adding transcriptions for every sentence is huge but I think it’s the only way. I plan to ease that process by using autogenerated transcriptions as a base one could review by editing the wrong parts only. Something like this:
1. Show autogenerated transcription by clicking a button http://prntscr.com/71kvcc
2. Edit and send it http://prntscr.com/71kvvy
3. It’s reviewed http://prntscr.com/71kwho

In addition, I plan to allow reviewing transcriptions of other contributor’s sentences unless the sentence owner reviewed it.

What do you think?
kaŝi la respondojn
pullnosemans
antaŭ 13 tagoj - redaktita antaŭ 13 tagoj
>> however, my skills in both languages are only mediocre
> Which ones?
the two I mentioned, japanese and mandarin.

>> * yes, autogenerated transcriptions with flaws are better than none.
> I strongly disagree.
haha wow, alright.
I see your point, though. for me personally, not having furigana on most japanese sentences would be a bummer because I use tatoeba all the time to check the readings of sentences I have in anki, but I can just go back to using jisho.org for this purpose (which has issues similar to those on tatoeba, but maybe a tad less? I'm not sure). if there is a furigana mistake in a sentence, I usually remember it, so I can maneuver around those when studying in anki. but yeah, I see this is a relatively specific need, so I can't assume the majority of tatoeba users will have it as well. I won't be seeing japanese sentences on tatoeba as much as before, thus having less opportunities to notice any problems, but, oh well.

HOWEVER. I really like your idea of introducing a "show autogenerated transcription with a warning label" button. it sounds like a very good compromise between getting rid of wrong transliterations and yet maintaining the possibility to get a transcription for any sentence right away at your own risk. it would also raise the awareness that there are errors by a mile. so:

> What do you think?
I think you should go for it. try the same thing for mandarin, we'll see how it works out.
kaŝi la respondojn
gillux
antaŭ 13 tagoj
Thank you for your positive feedback.

> I see your point, though. for me personally, not having furigana on most japanese sentences would be a bummer because I use tatoeba all the time to check the readings of sentences

That’s why I thought about having an option to display transcriptions by default like it is now. Members would need to opt-in by going to their settings, and this would be a good chance to warn them about untrustworthy transcriptions. I will probably implement this in a later update though, it’s not essential.
Pfirsichbaeumchen
antaŭ 13 tagoj - redaktita antaŭ 13 tagoj
The furigana section takes up a lot of space. A spontaneous suggestion would be to use a smaller font size and perhaps put the furigana in brackets behind the kanji in a similar way as is shown here: http://prntscr.com/71kvvy.

Will the romanisation be editable as well? There are some strange word separations such as "ni hiki" and "i masu" instead of "nihiki" and "imasu".
kaŝi la respondojn
gillux
antaŭ 13 tagoj
The furigana section takes as much space as now (see the reference #187012), modulo the romaji. This is a somewhat provisional display though, I still don’t really know what to do with the romaji. I like having it a bit hidden like now (only displayed when hovering the mouse), but it’s rather impractical.
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 13 tagoj - redaktita antaŭ 12 tagoj
I have the furigana turned off, but I realise it's always been that size. I just thought this would be a good opportunity to suggest using a smaller font size. ☺
gillux
antaŭ 12 tagoj - redaktita antaŭ 12 tagoj
> Will the romanisation be editable as well? There are some strange word separations such as "ni hiki" and "i masu" instead of "nihiki" and "imasu".

You can, but not directly. I don't want to make it fully editable because I feel it's gonna become too much inconsistent, and because it is based on the furigana. Instead, if you feel like editing the romaji word separation, edit the furigana (remove or insert spaces) and the romaji will update according to it.

I already dealt with the problem you're mentioning (verbs like "i masu" separated) in a recent update, but only for the furiganas. They now display います instead of い ます, and so will romajis once I'll be done with this.
Impersonator
antaŭ 12 tagoj - redaktita antaŭ 12 tagoj
I was the one who provided the initial Uzbek code. Please note this is not a transcription, but a transliteration/script conversion. It is quite correct, I've browsed through 20 pages and found only 1 mistake: https://tatoeba.org/eng/sentences/show/3837742 (obviously, WҳацАпп should be WhatsApp). My Uzbek is very limited, but the Latin script was basically created as a one-to-one mapping for Cyrillic, so most problems are with Russian loanwords (and even these are quite predictable).

As a Cantonese learner, I'm using the Cantonese transcription quite a lot. It does have flaws (hard for me to estimate the percentage, will try to take several pages and count the number), but it is useful for me because I have problems memorising the tones, and I usually can spot most transcription errors. Generating a completely correct transcription is very complex programmatically, probably even more complex than for Standard Written Chinese, because SWC is more codified.
kaŝi la respondojn
gillux
antaŭ 12 tagoj
Thank you for your feedback about Uzbek. The conversion algorithm is indeed very simple and nearly 100% accurate. I don’t think there is any problem keeping all the Uzbek transliterations displayed, like now. I tend to use the word “transcription” for “transliteration” because they are handled very similarly on Tatoeba, but I totally understand the difference.

Your comment about Cantonese transcriptions and pullnosemans’s about Mandarin transcriptions suggest that despite they are inaccurate, these transcriptions are still very useful to you so I will definitely need to implement the “display everything by default” option before the first release.

Speaking of which, I’d like to have your opinions (especially Pfirsichbaeumchen’s since you’re admin) about how to manage transcriptions edition permissions. I tried to find a balance between keeping it open yet preventing edition problems. There are so many transcriptions that it doesn’t make much sense to me to only allow sentence authors to submit transcriptions of their sentences.

For a given sentence of a language in which we allow transcription edition:
• If nobody submitted a transcription, anybody may submit one.
• If you’re the owner of the sentence and someone submitted a transcription, you may overwrite it with your own.
• Once the sentence owner submitted a transcription to his/her own sentence, only he/she, corpus maintainers and admins may further modify it.

It sounds about right for transcriptions that are well-known by natives such as in Uzbek Cyrillic/Latin or Japanese furigana, but what about Pinyin or Jyutping? To what extend Mandarin and Cantonese speakers are confident with these? And for Shanghainese, the current transcription is based on IPA, but I believe most learners can’t read it and most natives can’t write it… What do we do with this?
kaŝi la respondojn
Impersonator
antaŭ 12 tagoj - redaktita antaŭ 12 tagoj
Most Cantonese native speakers I've talked to can’t use either Jyutping or any other transcription.

In my experience, it's easiest to talk about pronounciation with native speakers by using other characters with the same pronounciation. But I'm not sure how this can be implemented in an intuitive way. Probably, have an on-hover hint for all the syllables?.. :?

The situation with Pinyin seems better. If I'm not mistaken, it is taught at school in Mainland China (but I believe Taiwan uses Zhuyin instead).
tommy_san
antaŭ 12 tagoj - redaktita antaŭ 12 tagoj
A possible scenario:
1. A non-native speaker provides a wrong transcription for a bad sentence.
2. Native speakers don't want to correct the transcription because that might give the wrong impression that it were a good sentence, nor do they want to correct the sentence because that would prompt non-native speakers to add even more bad sentences. As a result, the wrong transcription remains uncorrected.
3. People "learn" from it, believing it to be trustworthy because the transcription isn't machine-generated.
kaŝi la respondojn
gillux
hieraŭ
I added a function to reset a human-submitted translation to its initial machine-generated state. This way, wrong transcriptions can be deleted without needing to provide a correct transcription.
kaŝi la respondojn
tommy_san
hieraŭ
That doesn't sound very constructive... I wonder if there's not a better way to deal with it, but I can't think of any right now. I hope I won't have to use that function too often.
kaŝi la respondojn
gillux
hieraŭ
Note that I didn’t initially add this function for that purpose. I’d just a way to “remove” a transcription, which is an essential part of the whole thing.

I don’t think there is an easy solution to this, it’s just like trying to deal with people adding incorrect sentences to Tatoeba.
Impersonator
hieraŭ
Will changes to the transcription be logged?
kaŝi la respondojn
gillux
hieraŭ
No, that’s not planned.
Impersonator
antaŭ 12 tagoj
By the way, about the 'Chinese' language. The Standard Written Chinese can be read not just with Mandarin readings (which are auto-generated now), but also with Cantonese readings. For example, the sentence #4194512 can be read 'Deoi3 ngo5 ji4jin4 ze3 si6 zung6jiu3 dik1'. The usage of this form of the language is limited (it is used for reading written texts aloud, singing songs, but usually not for day-to-day conversation), but I would find it useful if there was an option to display 'Chinese' texts with Cantonese pronounciation, because it's the pronounciation I'm trying to learn.

Of course, this should ideally be a selectable option (probably selectable in the settings) because most Chinese learners do want to read it with Mandarin pronounciation, not with Cantonese.
kaŝi la respondojn
nickyeow
antaŭ 22 horoj
Agreed!

On a side note, it would be nice if pronunciations of compounds could be displayed with spaces in between the individual syllables. 'zung6 jiu3' looks much better than 'zung6jiu3' in my opinion.
kaŝi la respondojn
gillux
antaŭ 14 horoj
I know absolutely nothing about Cantonese, but I thought displaying compounds glued would help reading, just like we usually display Japanese romaji with spaces between each “word”.
nickyeow
antaŭ 22 horoj
Wow, thank you so much for taking on this issue!

I think the autogenerated transcriptions for Cantonese are doing okay. There are mistakes here and there, but most of them are caused by a small set of characters with multiple pronunciations. Many of these can be solved by adding more pronunciations for compounds. I'd say the percentage of sentences with completely correct transliterations is around 90% to 95%.

Some characters can be quite tricky though. For instance, the final particle 喎 is pronounced wo3 when it indicates a casual remark, wo4 when it indicates a sort of playful scolding, and wo5 when it is used to quote something undesirable. Apparently, it can also be pronounced kwaa1, waa1, and wo1, although these are extremely rare. To be fair, I only found out about these (I blush to confess) when I looked them up in a dictionary. :p

I think the review system you suggested would be the best way to solve the problem. Perhaps autogenerated transcriptions could still be displayed—they are correct most of the time after all—but it would be nice if reviewed transcriptions could be given a little green tick or something.
Guybrush88
hieraŭ
Concerning #4233047, where the sentence has been automatically truncated because it's a long sentence, I wonder if it would be possible to broaden the characters limits, otherwise it won't be possible to have complete translations if the original sentence is quite long
kaŝi la respondojn
gillux
hieraŭ
I think the current limit (which is by the way 1500 bytes of UTF-8) is more than enough. We can’t possibly call #4233047 *a* sentence. Tatoeba isn’t a database for texts, but for sentences.
Ooneykcall
hieraŭ
I'll deal with those next week, guess I'll have to cut most in two~
gillux
antaŭ 4 tagoj - redaktita antaŭ 51 minutoj
EDIT: temporarily disabled to test editable transcriptions instead.

I’m adding additional criteria to the search feature. You can test this ongoing work on https://dev.tatoeba.org/

Perform a regular search, and then you’ll see additional criteria on the right: sentence owner and orphan sentences for the moment. I made orphan sentences hidden by default. This way, they are hidden from top bar searches, but can be displayed by checking the additional criterion, lowering their visibility to newcomers.

What do you think?
kaŝi la respondojn
Guybrush88
antaŭ 4 tagoj
I found an issue with accents. First query: https://dev.tatoeba.org/ita/sen...ita&to=und

It becomes this query when I search for my sentences corresponding to that query: https://dev.tatoeba.org/ita/sen...ser=Guybrush88

As you can see, no results are shown because the accent is changed by the query, while sentences I own are shown without specifying my username
kaŝi la respondojn
gillux
antaŭ 4 tagoj
Problem solved, thank you.
kaŝi la respondojn
Guybrush88
antaŭ 4 tagoj
thanks for the fix, gillux. everything seems to be perfectly working for me now
Ooneykcall
antaŭ 4 tagoj
By the way, is there a way to bring the native speaker factor into the search, e.g. arrange for 'sentences in language X by native speakers' and, conversely, 'non-native speakers / undefined'?
kaŝi la respondojn
gillux
antaŭ 4 tagoj
Yes. That’s a good idea, I’ll definitely add this criterion. Though I’m not sure about how to organize the form since we’d have 3 exclusive filters for users: unowned, owned by a given user, owned by a native. It’s already a bit confusing because one can check “Show orphan sentences” while specifying a username (in which case the checkbox is ignored). Adding a third exclusive filter will make things worse.
kaŝi la respondojn
CK
CK
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
1.

Would it be possible to add more than one username in a comma-delimited list, similar to how members can limit languages in their settings?

For example, here is a list of the Japanese native speakers who have contributed the most sentences.

bunbuku,mookeee,tommy_san,arnab,Banka_Meduzo,thyc244,arihato,OrangeTart,Fukuko,wakatyann630,qahwa,Ianagisacos,fouafouadougou,tomo

This would allow members to search for sentences by members that they feel can be trusted.


2.

Would it be possible to allow us to also limit searches to only sentences with audio?


3.

I'd suggest this change in wording.

FROM:
Oprhan sentences are likely to be incorrect.
TO:
Orphan sentences are less likely to be correct.
kaŝi la respondojn
pullnosemans
antaŭ 3 tagoj - redaktita antaŭ 2 tagoj
I like ck's ideas #1 and #3, and I don't mind #2, either.

an automatic 'native speakers' filter would probably be cool, too, but I also very much agree with sacredceltic's caveat below; you just never know who claims to be native. having an individual list as in ck's suggestion #1 would be a good way to cope with this problem.

I don't think, however, that hiding orphans should be the default in the way that you have to check "show orphans" every single time you submit a search query. I think this would lead to a decrease in orphans being adopted and amended. let's rather have it so that you can check "show orphans" and it stays like that until you manually uncheck it again.

it's great seeing this site improving constantly!
kaŝi la respondojn
gillux
antaŭ 3 tagoj
I see your points about native speakers. However, I don’t think this problem should be solved by changing the search criterion, but rather by changing the way we identify native speakers in the first place. The search criterion could only be “limit to sentences by self-proclamed natives” because that’s the only information we have in our database so far.

I don’t really like the idea of providing a comma-separated list instead of filtering by self-proclamed natives. First, because it’s rather impractical to use as the list grows. Second, because it restricts the ability to filter by native speakers to a handful of long-time contributors who have their own idea on that matter. I’m worrying about newcomers (who obviously won’t express themselves in this thread) being unable to use the search as efficiently as you guys would. That would be unfair. The current lack of native speakers identification and proper review mechanism to sort out “bad” sentences should be solved first, rather than worked around by that kind of “feature”. I can already see members providing ready-to-use search links in their profiles that filters users from their list. That said, filtering by multiple users itself (regardless of the motivation) seems legit, and is easy to implement.

I agree about what you said about orphans visibility. I initially wanted to limit the visibility of orphans because they are a major problem in some languages like Japanese where more than the half of the corpus are orphans that are mostly wrong. But that’s another problem.
kaŝi la respondojn
CK
CK
antaŭ 2 tagoj - redaktita antaŭ 2 tagoj
>Re: I don’t really like the idea of providing a comma-separated list ...

If it's difficult to program this capability, I can understand.

However, being able to search for sentences by more than one username would be useful.

For example, ...

1. People could limit searches to sentences owned by Brazilian Portuguese speakers, or Mexican Spanish speakers if they knew which members spoke which dialect.

2. People could use all the native speakers listed on http://bit.ly/nativespeakers rather than just the few that are listed using the new system on tatoeba.org. We have a lot of sentences written by native speakers that are never likely to come back and change the setting in their profiles.

3. People could choose to exclude certain self-proclaimed native speakers that they didn't trust.

4. People could choose to also include a few non-native speakers that they feel they can trust.

5. Some researchers may want to study typical English errors made by native Russian speakers, so they could browse through search results limited to English sentences written by Russian speakers.


It would probably also be a good idea to have the “limit to sentences by self-proclaimed natives” that you are suggesting.


** Added 6 hours later **

1.

Here are the number of members claiming "native speaker level" in more than one language
(https://tatoeba.org/eng/stats/users_languages)

1 member claims 4 languages at native level.
7 members claim 3 languages at native level.
54 members claim 2 languages at native level.

This is based on the exported data of May 23, 2015.

I wonder if other members are as skeptical as I am about these claims.
This is one reason I'd like the option to search with results limited to usernames of my own choosing.

If you want to see the usernames, go to http://goo.gl/K8vGKl.
There are perhaps a few on this list that I might trust as being true native speakers of two languages.


2.

I updated http://bit.ly/nativespeakers so now you can easily copy a comma-delimited set of usernames for each language.
If searching by multiple usernames is enabled, you can easily go here and copy the usernames, and then edit out members you don't trust (if there are any).

If you've been on the page recently, you may need to force a reload of the page to get the newest external JavaScript file.

kaŝi la respondojn
tommy_san
antaŭ 2 tagoj
I like this idea, too, but I'd hate typing lots of usernames each time because I'm sure I'd use the same sets of usernames many times. It would be nice if we could make lists of usernames that we can use anytime for search. We could also provide some default lists of self-proclaimed native speakers of each language.

> 2. People could use all the native speakers listed on http://bit.ly/nativespeakers rather than just the few that are listed using the new system on tatoeba.org. We have a lot of sentences written by native speakers that are never likely to come back and change the setting in their profiles.

How about incorporating the information on this page into the official system? Would anyone object to it?
Silja
antaŭ 3 tagoj
+1 to all CK's suggestions.
gillux
antaŭ 3 tagoj
> Would it be possible to allow us to also limit searches to only sentences with audio?

Yes. It won’t be testable on dev.tatoeba.org until the next update though.
sacredceltic
antaŭ 3 tagoj
"Native speakers", by Tatoeba's definition, is anybody who self-proclaims to be such : Russians claiming to be French or Turkish claiming to be British, just for the challenge...teenagers have such an oversized ego and Tatoeba often ends up being their egos's grave.. and makes them so much more aggressive and bitter, as a result...
Guybrush88
antaŭ 3 tagoj - redaktita antaŭ 3 tagoj
would it also be possible to search for given words/expressions that are not translated in a given language? for example: I want to search for "once in a blue moon" (or any other expression in any other language) and I want to see all the sentences containing that expression that are not translated in Italian (or any other language). I would also find it useful if i could see all the sentences with a given expression/word that are translated in a given language. for example: i search for "apple pie" and i want to see only the sentences containing "apple pie" that have translations in Italian
kaŝi la respondojn
Silja
antaŭ 3 tagoj
+1. I'd also like to have "Show translations in", "Not directly translated into" and "Not translated into" sorting opitions.
gillux
antaŭ 3 tagoj
> would it also be possible to search for given words/expressions that are not translated in a given language?

Yes. I’ll implement this.

> I would also find it useful if i could see all the sentences with a given expression/word that are translated in a given language. for example: i search for "apple pie" and i want to see only the sentences containing "apple pie" that have translations in Italian

You mean https://tatoeba.org/sentences/s...eng&to=ita ?
kaŝi la respondojn
Guybrush88
antaŭ 3 tagoj
Silja
antaŭ 3 tagoj
I find it pretty difficult to remember the syntax we need to use when we want to search for exact phrases, sentences beginning with a certain word etc. I basically need to go every time to the wiki article to verify what characters mean what in the search (http://en.wiki.tatoeba.org/arti...w/text-search#).

Many online-dictionaries I use have a drop-down list where you can choose what kind of search you want to make. For example, this Japanese dictionary http://dictionary.goo.ne.jp/ has options "begins with", "exact match" and "ends with" and you can specify your search with those.

I would also like to see something like that in Tatoeba. So there would be next to the search field another drop-down list with options to choose, eg.
- vague matches (eg. "live in boston" or "live") <-- this would be the default. I'm assuming the quotation marks don't do anything if you are searching with only one word, eg. the search "live" returns the same results as plain live, right?
- exact matches (eg. "=live =in =boston" or "=live") (though this wouldn't work when searching phrases in languages without spaces, I guess)
- begins with (eg. "^live in boston" or "^live")
- ends with (eg. "live in boston$" or "live$")
+ maybe something else, like "begins and ends with" (eg. "^live in boston$" or "^live$".)
kaŝi la respondojn
Guybrush88
antaŭ 3 tagoj
+1, i would find it better to have the opportunity of making exact searches instead of using "=word" each time i want to see the exact occurrences of something
tommy_san
antaŭ 2 tagoj
These criteria seem to limit only the sentences of the "from" language, but we're sometimes rather interested in the "to" language. For example, when I want to know how to say something in French and type a Japanese phrase, I don't mind seeing orphan Japanese sentences but I don't want orphan French sentences. I wonder how we could work this out.
kaŝi la respondojn
gillux
antaŭ 2 tagoj
That’s a very relevant point. I’d like to be able to perform such searches too. Either that, or I’d like to be able to distinguish orphans from non-orphans directly within a list of translations. I’ll keep that in mind.
CK
CK
antaŭ 2 tagoj - redaktita antaŭ 2 tagoj
I'd like to see the "OK" tag remain if someone releases a sentence.

Now, if someone chooses to "unown" a sentence, the OK tag disappears, so we lose important information.

In the past, when a non-native English speaker chose to release all their English sentences, I could easily find all of their sentences that I had tagged OK and adopt those sentences.
gleki
antaŭ 5 tagoj
Again the issue with written tone, emphasis and emoticons is raised.

Definitely, sign language may have problems with being compatible with the current state of Tatoeba.org but what about smileys, capitalizing words for emphasis, sarcasm, irony etc.?

The thread:
http://tatoeba.org/por/sentences/show/2096210

Which parts of languages are allowed to be added to the database and which are not?
kaŝi la respondojn
pullnosemans
antaŭ 3 tagoj
overall I think this topic is kinda meh and hard to decide on, but generally, I would say that emoticons as used in text messages etc. are a part of an established style of writing, so banning them on here would be somewhat discriminating to the people who want to use them.

problem with using things like *this* or -this- for emphasis is that there is no consistent code how to use them, so they might be interpreted differently from what you wanted to express. then again, I think people will have enough empathic intuition to figure it out in the majority of cases.
Amastan
antaŭ 5 tagoj
Ad ken-henniɣ a imeddukal... tamajaṛit tla 100.000 n tefyar!!!
Gratulálok barátaim... Magyar 100.000 mondatokat van!!!
kaŝi la respondojn
bandeirante
antaŭ 4 tagoj
Köszönjük!
<<< 1234567 >>