İpuçları

Burada Tatoeba'nın nasıl kullanılacağı, hatalar veya garip davranışların nasıl raporlanacağı gibi genel sorular sorabilir ya da en basitinden topluluğun geri kalanı ile kaynaşabilirsiniz.

Soru sormadan önce SSS'yi okuduğunuzdan emin olun.

Wall (3749 threads)

<<< 1234567 >>
Pfirsichbaeumchen
3 hours ago
Is it just me, or is Tatoeba extremely slow today?
cevapları gizle
marafon
3 hours ago
I've noticed it, too.
tatoebix
8 hours ago
Suggestion concerning furigana use. I find that after a while the eyes tire really fast
from switching between normal font and the smaller furigana. It appears
- at least to me - that hiragana is a better option. So a japanese sentence
could be displayed in original form with kanji and in hiragana form with
furigana turned off in the same font size . Further the original kanji part and
its hiragana form could be slightly highlighted for easier recognition.
cevapları gizle
gillux
8 hours ago
I find reading a kanji-less sentences even more tiring. I’m always having a hard time to read words in kana that are normally written in kanji, and to figure out where does a word start and ends.
orion17
2 days ago - düzenlendi 2 days ago
I hope I can search sentence with wild card feature. It will be very useful for inflectional language such as Arabic or the agglutinative one like Turkish.
Because I always find problem every time I want to search Arabic sentence. For example I want to find word ذهب (to go / he went) and the results are exactly the same. There are no result like ذهبت (you went), نذهب (we go), يذهبون (they go), أذهب (I go) etc.
and don't you think it will be very useful if we can search sentence examples of a language with its part or grammar, for example I want to search all Turkish sentences containing suffix -iyor/-ıyor/-uyor/-üyor without concerning the verb itself...
cevapları gizle
gillux
22 hours ago
Thank you for pointing this out, orion17. I created a ticket for this https://github.com/Tatoeba/tatoeba2/issues/664

We’re likely to add wildcard search soon.
cevapları gizle
orion17
10 hours ago
thanks for your response :)
This will be great
gillux
14 days ago - düzenlendi 14 days ago
*** Attention to our members knowledgeable in Chinese, Shanghainese, Cantonese, Uzbek or Georgian. ***

I’m currently working on improving transcriptions in Tatoeba. Currently, transcriptions are automatically generated by a piece of software that sometimes fails providing correct transcriptions. A few of these failures are tagged with “incorrect transcription” [1] or “furigana mistake” for Japanese.

[1] https://tatoeba.org/tags/show_s..._with_tag/1673
[2] https://tatoeba.org/tags/show_s..._with_tag/1172

I plan to address this problem in two ways:
(1) improve the software that generates transcriptions based on feedback from users;
(2) allow users to edit transcriptions so that they can fix problems themselves.

Both of these approaches have limits, depending on the accuracy of the current transcriptions, the type of errors and the language. For instance, if a given transcription error is very widespread, it’s better to fix the software rather than to fix the same error by hand on a large number of sentences.

In Japanese, I am knowledgeable enough to know that no software is capable of providing 100% accurate transcriptions, so I will use mostly approach (2), and only a little bit of (1). However, I don’t know if it’s a good idea to use (2) in other languages we provide transcription for, namely Chinese (both traditional/simplified conversion and Pinyin), Shanghainese, Cantonese, Uzbek and Georgian.

So my question is, for each these languages:
• To what extend the current transcriptions are accurate? Try to give a percentage.
• If this percentage is lower than 100%:
   • Do you think it’s a good idea to systematically display autogenerated transcriptions (like we do now) although some are incorrect?
   • What type of errors can you see in the transcriptions? Try to categorize. If you know software development, how easily can they be detected and fixed?

Feel free to translate this post in the concerned languages.
cevapları gizle
pullnosemans
14 days ago - düzenlendi 14 days ago
very glad that you're addressing this issue; I was going to mention it on tatoeba day, but the sooner, the better. note that there is also the "wrong transliteration" tag that I've recently been using for japanese and mandarin.

to answer your questions, I would say that
* the percentage of correctly transliterated sentences without any problems should be around 80 to 90 percent. I regularly notice errors, but when I pay active attention, I see that most sentences I come across are in fact transliterated correctly. however, my skills in both languages are only mediocre, so this might be somewhat off from the actual situation.
* yes, autogenerated transcriptions with flaws are better than none. just think of our contributors manually adding the transliteration to every single japanese sentence they own. I wouldn't want to have them do that, not to mention the fact that the majority of japanese sentences on tatoeba are orphans.
* the errors mainly concern characters with multiple readings (duh). the wrong choice of transliteration is then sometimes chosen by the tool for both morphological and syntactic reasons. e.g., in mandarin compounds, the problem is apparently that the compound is saved in the tool's database as one word, but with the wrong pinyin. but often single characters (i.e. where a character doesn't have any direct context with other characters that form a word with it) are wrongly transliterated if the tool cannot correctly analyze the syntax of a sentence. I have no idea about computing of any kind, so this is where my part ends.

I generally think that giving sentence owners the possibility to manually change a sentence's transliteration would be a very good first step. we could also add some kind of marking whether a sentence's transliteration is automatically generated or hand-made.

stoked to see this being improved! I'll be happy to offer all the help I can.
cevapları gizle
gillux
13 days ago - düzenlendi 13 days ago
> however, my skills in both languages are only mediocre

Which ones?

About the Japanese furigana/transcription

> * yes, autogenerated transcriptions with flaws are better than none.
I strongly disagree. To me, there is no point into showing furigana if it’s not trustworthy. It’s no good for Japanese learners because they may think it’s trustworthy whereas it’s not, and because they eventually need to know the readings by themselves to be able to judge whether it’s correct or not, which defeats the original purpose. It’s no good for Japanese speakers neither because they may think Tatoeba is not a serious project. See also http://en.wiki.tatoeba.org/articles/show/furigana.

Because of this, I plan not to show furigana by default unless reviewed, and to allow showing untrustworthy transcriptions by default for every sentence like now with an option.

> just think of our contributors manually adding the transliteration to every single japanese sentence they own.

Yes, the task of manually adding transcriptions for every sentence is huge but I think it’s the only way. I plan to ease that process by using autogenerated transcriptions as a base one could review by editing the wrong parts only. Something like this:
1. Show autogenerated transcription by clicking a button http://prntscr.com/71kvcc
2. Edit and send it http://prntscr.com/71kvvy
3. It’s reviewed http://prntscr.com/71kwho

In addition, I plan to allow reviewing transcriptions of other contributor’s sentences unless the sentence owner reviewed it.

What do you think?
cevapları gizle
pullnosemans
13 days ago - düzenlendi 13 days ago
>> however, my skills in both languages are only mediocre
> Which ones?
the two I mentioned, japanese and mandarin.

>> * yes, autogenerated transcriptions with flaws are better than none.
> I strongly disagree.
haha wow, alright.
I see your point, though. for me personally, not having furigana on most japanese sentences would be a bummer because I use tatoeba all the time to check the readings of sentences I have in anki, but I can just go back to using jisho.org for this purpose (which has issues similar to those on tatoeba, but maybe a tad less? I'm not sure). if there is a furigana mistake in a sentence, I usually remember it, so I can maneuver around those when studying in anki. but yeah, I see this is a relatively specific need, so I can't assume the majority of tatoeba users will have it as well. I won't be seeing japanese sentences on tatoeba as much as before, thus having less opportunities to notice any problems, but, oh well.

HOWEVER. I really like your idea of introducing a "show autogenerated transcription with a warning label" button. it sounds like a very good compromise between getting rid of wrong transliterations and yet maintaining the possibility to get a transcription for any sentence right away at your own risk. it would also raise the awareness that there are errors by a mile. so:

> What do you think?
I think you should go for it. try the same thing for mandarin, we'll see how it works out.
cevapları gizle
gillux
13 days ago
Thank you for your positive feedback.

> I see your point, though. for me personally, not having furigana on most japanese sentences would be a bummer because I use tatoeba all the time to check the readings of sentences

That’s why I thought about having an option to display transcriptions by default like it is now. Members would need to opt-in by going to their settings, and this would be a good chance to warn them about untrustworthy transcriptions. I will probably implement this in a later update though, it’s not essential.
Pfirsichbaeumchen
13 days ago - düzenlendi 13 days ago
The furigana section takes up a lot of space. A spontaneous suggestion would be to use a smaller font size and perhaps put the furigana in brackets behind the kanji in a similar way as is shown here: http://prntscr.com/71kvvy.

Will the romanisation be editable as well? There are some strange word separations such as "ni hiki" and "i masu" instead of "nihiki" and "imasu".
cevapları gizle
gillux
13 days ago
The furigana section takes as much space as now (see the reference #187012), modulo the romaji. This is a somewhat provisional display though, I still don’t really know what to do with the romaji. I like having it a bit hidden like now (only displayed when hovering the mouse), but it’s rather impractical.
cevapları gizle
Pfirsichbaeumchen
13 days ago - düzenlendi 12 days ago
I have the furigana turned off, but I realise it's always been that size. I just thought this would be a good opportunity to suggest using a smaller font size. ☺
gillux
12 days ago - düzenlendi 12 days ago
> Will the romanisation be editable as well? There are some strange word separations such as "ni hiki" and "i masu" instead of "nihiki" and "imasu".

You can, but not directly. I don't want to make it fully editable because I feel it's gonna become too much inconsistent, and because it is based on the furigana. Instead, if you feel like editing the romaji word separation, edit the furigana (remove or insert spaces) and the romaji will update according to it.

I already dealt with the problem you're mentioning (verbs like "i masu" separated) in a recent update, but only for the furiganas. They now display います instead of い ます, and so will romajis once I'll be done with this.
Impersonator
12 days ago - düzenlendi 12 days ago
I was the one who provided the initial Uzbek code. Please note this is not a transcription, but a transliteration/script conversion. It is quite correct, I've browsed through 20 pages and found only 1 mistake: https://tatoeba.org/eng/sentences/show/3837742 (obviously, WҳацАпп should be WhatsApp). My Uzbek is very limited, but the Latin script was basically created as a one-to-one mapping for Cyrillic, so most problems are with Russian loanwords (and even these are quite predictable).

As a Cantonese learner, I'm using the Cantonese transcription quite a lot. It does have flaws (hard for me to estimate the percentage, will try to take several pages and count the number), but it is useful for me because I have problems memorising the tones, and I usually can spot most transcription errors. Generating a completely correct transcription is very complex programmatically, probably even more complex than for Standard Written Chinese, because SWC is more codified.
cevapları gizle
gillux
12 days ago
Thank you for your feedback about Uzbek. The conversion algorithm is indeed very simple and nearly 100% accurate. I don’t think there is any problem keeping all the Uzbek transliterations displayed, like now. I tend to use the word “transcription” for “transliteration” because they are handled very similarly on Tatoeba, but I totally understand the difference.

Your comment about Cantonese transcriptions and pullnosemans’s about Mandarin transcriptions suggest that despite they are inaccurate, these transcriptions are still very useful to you so I will definitely need to implement the “display everything by default” option before the first release.

Speaking of which, I’d like to have your opinions (especially Pfirsichbaeumchen’s since you’re admin) about how to manage transcriptions edition permissions. I tried to find a balance between keeping it open yet preventing edition problems. There are so many transcriptions that it doesn’t make much sense to me to only allow sentence authors to submit transcriptions of their sentences.

For a given sentence of a language in which we allow transcription edition:
• If nobody submitted a transcription, anybody may submit one.
• If you’re the owner of the sentence and someone submitted a transcription, you may overwrite it with your own.
• Once the sentence owner submitted a transcription to his/her own sentence, only he/she, corpus maintainers and admins may further modify it.

It sounds about right for transcriptions that are well-known by natives such as in Uzbek Cyrillic/Latin or Japanese furigana, but what about Pinyin or Jyutping? To what extend Mandarin and Cantonese speakers are confident with these? And for Shanghainese, the current transcription is based on IPA, but I believe most learners can’t read it and most natives can’t write it… What do we do with this?
cevapları gizle
Impersonator
12 days ago - düzenlendi 12 days ago
Most Cantonese native speakers I've talked to can’t use either Jyutping or any other transcription.

In my experience, it's easiest to talk about pronounciation with native speakers by using other characters with the same pronounciation. But I'm not sure how this can be implemented in an intuitive way. Probably, have an on-hover hint for all the syllables?.. :?

The situation with Pinyin seems better. If I'm not mistaken, it is taught at school in Mainland China (but I believe Taiwan uses Zhuyin instead).
tommy_san
12 days ago - düzenlendi 12 days ago
A possible scenario:
1. A non-native speaker provides a wrong transcription for a bad sentence.
2. Native speakers don't want to correct the transcription because that might give the wrong impression that it were a good sentence, nor do they want to correct the sentence because that would prompt non-native speakers to add even more bad sentences. As a result, the wrong transcription remains uncorrected.
3. People "learn" from it, believing it to be trustworthy because the transcription isn't machine-generated.
cevapları gizle
gillux
yesterday
I added a function to reset a human-submitted translation to its initial machine-generated state. This way, wrong transcriptions can be deleted without needing to provide a correct transcription.
cevapları gizle
tommy_san
yesterday
That doesn't sound very constructive... I wonder if there's not a better way to deal with it, but I can't think of any right now. I hope I won't have to use that function too often.
cevapları gizle
gillux
22 hours ago
Note that I didn’t initially add this function for that purpose. I’d just a way to “remove” a transcription, which is an essential part of the whole thing.

I don’t think there is an easy solution to this, it’s just like trying to deal with people adding incorrect sentences to Tatoeba.
Impersonator
yesterday
Will changes to the transcription be logged?
cevapları gizle
gillux
22 hours ago
No, that’s not planned.
Impersonator
12 days ago
By the way, about the 'Chinese' language. The Standard Written Chinese can be read not just with Mandarin readings (which are auto-generated now), but also with Cantonese readings. For example, the sentence #4194512 can be read 'Deoi3 ngo5 ji4jin4 ze3 si6 zung6jiu3 dik1'. The usage of this form of the language is limited (it is used for reading written texts aloud, singing songs, but usually not for day-to-day conversation), but I would find it useful if there was an option to display 'Chinese' texts with Cantonese pronounciation, because it's the pronounciation I'm trying to learn.

Of course, this should ideally be a selectable option (probably selectable in the settings) because most Chinese learners do want to read it with Mandarin pronounciation, not with Cantonese.
cevapları gizle
nickyeow
20 hours ago
Agreed!

On a side note, it would be nice if pronunciations of compounds could be displayed with spaces in between the individual syllables. 'zung6 jiu3' looks much better than 'zung6jiu3' in my opinion.
cevapları gizle
gillux
13 hours ago
I know absolutely nothing about Cantonese, but I thought displaying compounds glued would help reading, just like we usually display Japanese romaji with spaces between each “word”.
nickyeow
20 hours ago
Wow, thank you so much for taking on this issue!

I think the autogenerated transcriptions for Cantonese are doing okay. There are mistakes here and there, but most of them are caused by a small set of characters with multiple pronunciations. Many of these can be solved by adding more pronunciations for compounds. I'd say the percentage of sentences with completely correct transliterations is around 90% to 95%.

Some characters can be quite tricky though. For instance, the final particle 喎 is pronounced wo3 when it indicates a casual remark, wo4 when it indicates a sort of playful scolding, and wo5 when it is used to quote something undesirable. Apparently, it can also be pronounced kwaa1, waa1, and wo1, although these are extremely rare. To be fair, I only found out about these (I blush to confess) when I looked them up in a dictionary. :p

I think the review system you suggested would be the best way to solve the problem. Perhaps autogenerated transcriptions could still be displayed—they are correct most of the time after all—but it would be nice if reviewed transcriptions could be given a little green tick or something.
Guybrush88
yesterday
Concerning #4233047, where the sentence has been automatically truncated because it's a long sentence, I wonder if it would be possible to broaden the characters limits, otherwise it won't be possible to have complete translations if the original sentence is quite long
cevapları gizle
gillux
yesterday
I think the current limit (which is by the way 1500 bytes of UTF-8) is more than enough. We can’t possibly call #4233047 *a* sentence. Tatoeba isn’t a database for texts, but for sentences.
Ooneykcall
yesterday
I'll deal with those next week, guess I'll have to cut most in two~
gillux
4 days ago - düzenlendi 4 days ago
I’m adding additional criteria to the search feature. You can test this ongoing work on https://dev.tatoeba.org/

Perform a regular search, and then you’ll see additional criteria on the right: sentence owner and orphan sentences for the moment. I made orphan sentences hidden by default. This way, they are hidden from top bar searches, but can be displayed by checking the additional criterion, lowering their visibility to newcomers.

What do you think?
cevapları gizle
Guybrush88
4 days ago
I found an issue with accents. First query: https://dev.tatoeba.org/ita/sen...ita&to=und

It becomes this query when I search for my sentences corresponding to that query: https://dev.tatoeba.org/ita/sen...ser=Guybrush88

As you can see, no results are shown because the accent is changed by the query, while sentences I own are shown without specifying my username
cevapları gizle
gillux
4 days ago
Problem solved, thank you.
cevapları gizle
Guybrush88
3 days ago
thanks for the fix, gillux. everything seems to be perfectly working for me now
Ooneykcall
3 days ago
By the way, is there a way to bring the native speaker factor into the search, e.g. arrange for 'sentences in language X by native speakers' and, conversely, 'non-native speakers / undefined'?
cevapları gizle
gillux
3 days ago
Yes. That’s a good idea, I’ll definitely add this criterion. Though I’m not sure about how to organize the form since we’d have 3 exclusive filters for users: unowned, owned by a given user, owned by a native. It’s already a bit confusing because one can check “Show orphan sentences” while specifying a username (in which case the checkbox is ignored). Adding a third exclusive filter will make things worse.
cevapları gizle
CK
CK
3 days ago - düzenlendi 3 days ago
1.

Would it be possible to add more than one username in a comma-delimited list, similar to how members can limit languages in their settings?

For example, here is a list of the Japanese native speakers who have contributed the most sentences.

bunbuku,mookeee,tommy_san,arnab,Banka_Meduzo,thyc244,arihato,OrangeTart,Fukuko,wakatyann630,qahwa,Ianagisacos,fouafouadougou,tomo

This would allow members to search for sentences by members that they feel can be trusted.


2.

Would it be possible to allow us to also limit searches to only sentences with audio?


3.

I'd suggest this change in wording.

FROM:
Oprhan sentences are likely to be incorrect.
TO:
Orphan sentences are less likely to be correct.
cevapları gizle
pullnosemans
3 days ago - düzenlendi 2 days ago
I like ck's ideas #1 and #3, and I don't mind #2, either.

an automatic 'native speakers' filter would probably be cool, too, but I also very much agree with sacredceltic's caveat below; you just never know who claims to be native. having an individual list as in ck's suggestion #1 would be a good way to cope with this problem.

I don't think, however, that hiding orphans should be the default in the way that you have to check "show orphans" every single time you submit a search query. I think this would lead to a decrease in orphans being adopted and amended. let's rather have it so that you can check "show orphans" and it stays like that until you manually uncheck it again.

it's great seeing this site improving constantly!
cevapları gizle
gillux
3 days ago
I see your points about native speakers. However, I don’t think this problem should be solved by changing the search criterion, but rather by changing the way we identify native speakers in the first place. The search criterion could only be “limit to sentences by self-proclamed natives” because that’s the only information we have in our database so far.

I don’t really like the idea of providing a comma-separated list instead of filtering by self-proclamed natives. First, because it’s rather impractical to use as the list grows. Second, because it restricts the ability to filter by native speakers to a handful of long-time contributors who have their own idea on that matter. I’m worrying about newcomers (who obviously won’t express themselves in this thread) being unable to use the search as efficiently as you guys would. That would be unfair. The current lack of native speakers identification and proper review mechanism to sort out “bad” sentences should be solved first, rather than worked around by that kind of “feature”. I can already see members providing ready-to-use search links in their profiles that filters users from their list. That said, filtering by multiple users itself (regardless of the motivation) seems legit, and is easy to implement.

I agree about what you said about orphans visibility. I initially wanted to limit the visibility of orphans because they are a major problem in some languages like Japanese where more than the half of the corpus are orphans that are mostly wrong. But that’s another problem.
cevapları gizle
CK
CK
2 days ago - düzenlendi 2 days ago
>Re: I don’t really like the idea of providing a comma-separated list ...

If it's difficult to program this capability, I can understand.

However, being able to search for sentences by more than one username would be useful.

For example, ...

1. People could limit searches to sentences owned by Brazilian Portuguese speakers, or Mexican Spanish speakers if they knew which members spoke which dialect.

2. People could use all the native speakers listed on http://bit.ly/nativespeakers rather than just the few that are listed using the new system on tatoeba.org. We have a lot of sentences written by native speakers that are never likely to come back and change the setting in their profiles.

3. People could choose to exclude certain self-proclaimed native speakers that they didn't trust.

4. People could choose to also include a few non-native speakers that they feel they can trust.

5. Some researchers may want to study typical English errors made by native Russian speakers, so they could browse through search results limited to English sentences written by Russian speakers.


It would probably also be a good idea to have the “limit to sentences by self-proclaimed natives” that you are suggesting.


** Added 6 hours later **

1.

Here are the number of members claiming "native speaker level" in more than one language
(https://tatoeba.org/eng/stats/users_languages)

1 member claims 4 languages at native level.
7 members claim 3 languages at native level.
54 members claim 2 languages at native level.

This is based on the exported data of May 23, 2015.

I wonder if other members are as skeptical as I am about these claims.
This is one reason I'd like the option to search with results limited to usernames of my own choosing.

If you want to see the usernames, go to http://goo.gl/K8vGKl.
There are perhaps a few on this list that I might trust as being true native speakers of two languages.


2.

I updated http://bit.ly/nativespeakers so now you can easily copy a comma-delimited set of usernames for each language.
If searching by multiple usernames is enabled, you can easily go here and copy the usernames, and then edit out members you don't trust (if there are any).

If you've been on the page recently, you may need to force a reload of the page to get the newest external JavaScript file.

cevapları gizle
tommy_san
2 days ago
I like this idea, too, but I'd hate typing lots of usernames each time because I'm sure I'd use the same sets of usernames many times. It would be nice if we could make lists of usernames that we can use anytime for search. We could also provide some default lists of self-proclaimed native speakers of each language.

> 2. People could use all the native speakers listed on http://bit.ly/nativespeakers rather than just the few that are listed using the new system on tatoeba.org. We have a lot of sentences written by native speakers that are never likely to come back and change the setting in their profiles.

How about incorporating the information on this page into the official system? Would anyone object to it?
Silja
3 days ago
+1 to all CK's suggestions.
gillux
3 days ago
> Would it be possible to allow us to also limit searches to only sentences with audio?

Yes. It won’t be testable on dev.tatoeba.org until the next update though.
sacredceltic
3 days ago
"Native speakers", by Tatoeba's definition, is anybody who self-proclaims to be such : Russians claiming to be French or Turkish claiming to be British, just for the challenge...teenagers have such an oversized ego and Tatoeba often ends up being their egos's grave.. and makes them so much more aggressive and bitter, as a result...
Guybrush88
3 days ago - düzenlendi 3 days ago
would it also be possible to search for given words/expressions that are not translated in a given language? for example: I want to search for "once in a blue moon" (or any other expression in any other language) and I want to see all the sentences containing that expression that are not translated in Italian (or any other language). I would also find it useful if i could see all the sentences with a given expression/word that are translated in a given language. for example: i search for "apple pie" and i want to see only the sentences containing "apple pie" that have translations in Italian
cevapları gizle
Silja
3 days ago
+1. I'd also like to have "Show translations in", "Not directly translated into" and "Not translated into" sorting opitions.
gillux
3 days ago
> would it also be possible to search for given words/expressions that are not translated in a given language?

Yes. I’ll implement this.

> I would also find it useful if i could see all the sentences with a given expression/word that are translated in a given language. for example: i search for "apple pie" and i want to see only the sentences containing "apple pie" that have translations in Italian

You mean https://tatoeba.org/sentences/s...eng&to=ita ?
cevapları gizle
Guybrush88
3 days ago
Silja
3 days ago
I find it pretty difficult to remember the syntax we need to use when we want to search for exact phrases, sentences beginning with a certain word etc. I basically need to go every time to the wiki article to verify what characters mean what in the search (http://en.wiki.tatoeba.org/arti...w/text-search#).

Many online-dictionaries I use have a drop-down list where you can choose what kind of search you want to make. For example, this Japanese dictionary http://dictionary.goo.ne.jp/ has options "begins with", "exact match" and "ends with" and you can specify your search with those.

I would also like to see something like that in Tatoeba. So there would be next to the search field another drop-down list with options to choose, eg.
- vague matches (eg. "live in boston" or "live") <-- this would be the default. I'm assuming the quotation marks don't do anything if you are searching with only one word, eg. the search "live" returns the same results as plain live, right?
- exact matches (eg. "=live =in =boston" or "=live") (though this wouldn't work when searching phrases in languages without spaces, I guess)
- begins with (eg. "^live in boston" or "^live")
- ends with (eg. "live in boston$" or "live$")
+ maybe something else, like "begins and ends with" (eg. "^live in boston$" or "^live$".)
cevapları gizle
Guybrush88
3 days ago
+1, i would find it better to have the opportunity of making exact searches instead of using "=word" each time i want to see the exact occurrences of something
tommy_san
2 days ago
These criteria seem to limit only the sentences of the "from" language, but we're sometimes rather interested in the "to" language. For example, when I want to know how to say something in French and type a Japanese phrase, I don't mind seeing orphan Japanese sentences but I don't want orphan French sentences. I wonder how we could work this out.
cevapları gizle
gillux
2 days ago
That’s a very relevant point. I’d like to be able to perform such searches too. Either that, or I’d like to be able to distinguish orphans from non-orphans directly within a list of translations. I’ll keep that in mind.
CK
CK
2 days ago - düzenlendi 2 days ago
I'd like to see the "OK" tag remain if someone releases a sentence.

Now, if someone chooses to "unown" a sentence, the OK tag disappears, so we lose important information.

In the past, when a non-native English speaker chose to release all their English sentences, I could easily find all of their sentences that I had tagged OK and adopt those sentences.
gleki
5 days ago
Again the issue with written tone, emphasis and emoticons is raised.

Definitely, sign language may have problems with being compatible with the current state of Tatoeba.org but what about smileys, capitalizing words for emphasis, sarcasm, irony etc.?

The thread:
http://tatoeba.org/por/sentences/show/2096210

Which parts of languages are allowed to be added to the database and which are not?
cevapları gizle
pullnosemans
3 days ago
overall I think this topic is kinda meh and hard to decide on, but generally, I would say that emoticons as used in text messages etc. are a part of an established style of writing, so banning them on here would be somewhat discriminating to the people who want to use them.

problem with using things like *this* or -this- for emphasis is that there is no consistent code how to use them, so they might be interpreted differently from what you wanted to express. then again, I think people will have enough empathic intuition to figure it out in the majority of cases.
Amastan
4 days ago
Ad ken-henniɣ a imeddukal... tamajaṛit tla 100.000 n tefyar!!!
Gratulálok barátaim... Magyar 100.000 mondatokat van!!!
cevapları gizle
bandeirante
4 days ago
Köszönjük!
pullnosemans
8 days ago - düzenlendi 8 days ago
**to all people that have some kind of business with mandarin and/or japanese**

(seeing as CK is an english and japanese bilingual, this was originally directed to him in a pm, but then I thought it couldn't hurt to just make it public to the community and include mandarin.)

I have a question about something that has been growing more and more odd to me here on tatoeba.

if an english sentence contains a constituent with the definite article "the" (or one of the german equivalents die/den/das/etc.), a lot of japanese and mandarin sentences are translated using the demonstrative "その" in japanese or "这/那" in mandarin. I used to think it's weird, but assumed that I simply did not know the two languages well enough to be able to judge.

japanese example: http://tatoeba.org/fra/sentences/show/208196
chinese example with "这": http://tatoeba.org/fra/sentences/show/793355

however, lately I've been noticing that my chinese tandem partner uses the demonstrative "diese/r/s" (the german equivalent of "that") where she should be using the definite article, so I've been wondering: maybe speakers of languages without articles are explained the usage of the english or german definite article by means of pointing to something, or explaining to them that unlike the indefinite article (english "a", german "ein/e"), the definite article refers to something specific. because of this, I've been wondering whether this could actually be the cause of all the translations of "the" by means of "その"/"这/那".


so, to finally get to the point: what do you think about translating something like

"The dog ate a carrot."

as "その犬はニンジンを食べた。"
instead of something like "犬はニンジンを食べた。"

or "那只狗吃了胡萝卜。"
instead of something like "狗吃了胡萝卜。"

Do you think these translations are okay and "その"/"这/那" can be used in japanese and chinese respectively in this way, or do you think it's actually an unfitting translation and that only german/english demonstratives ("diese/r/s", "this/that") should be translated using demonstratives in mandarin and japanese? if yes, do you think the mistranslations might be caused by a frequent misunderstanding of the nature of english/german articles by native speakers of languages without articles?

very curious to hear what you guys have to say.
cevapları gizle
sharptoothed
8 days ago
I'm not sure about Chinese sentences but as for the Japanese, it seems that the most of "その-sentences" come from Tanaka Corpus (most of English and Japanese sentences with numbers lower than ~30000 come from that corpus, actually). Tanaka Corpus is full of literal translations and less-than-natural sentences we often can see in textbooks where they are being used in educational purposes to illustrate different aspects of foreign languages. My Japanese is not good enough to judge if all those "その-sentences" are unnatural so let's wait for some native Japanese opinion. :-)
tommy_san
8 days ago
1. We sometimes do use その when we talk about something that has been mentioned before. Here are some examples. I guess some of them can be translated using a definite article. (Correct me if I'm wrong.)

http://www.aozora.gr.jp/cards/0...773_14560.html
私は再びそこで故郷の【匂い】を嗅ぎました。【その匂い】は私に取って依然として懐かしいものでありました。
或る時先生が例の通りさっさと海から上がって来て、いつもの場所に脱ぎ棄てた【浴衣】を着ようとすると、どうした訳か、【その浴衣】に砂がいっぱい着いていた。
兄妹三人のうちで、一番便利なのはやはり書生をしている【私】だけであった。【その私】が母のいい付け通り学校の課業を放り出して、休み前に帰って来たという事が、父には大きな満足であった。

2. Obviously その in Japanese is used far less often than a definite article in Western languages.

3. At school, we learn to "translate" English sentences into a weird and clumsy Japanese. I once "translated" the English translations of some of my sentences into "Japanese" for fun.

#3052927
"Is that Tom calling again?" "Yes. He calls every evening these days. I shouldn't have given him my number."
「また電話をしているのはトムですか」「はい。彼は最近毎晩電話をします。私は彼に私の番号を教えるべきでありませんでした」

#2441780
"Tom, your dinner's getting cold." "Just a minute. I'll be right there."
「トム、あなたの夕食は冷めつつあります」「ちょっと待ってください。私はすぐそちらに行くでしょう」

When you ask Japanese to translate something into Japanese, there's a high probability that their "translation" looks like this. At least that was the case for most of the students at Hyogo University and contributors on Tatoeba (including myself in my earlier days).

Some words are used markedly more often than the real Japanese, such as personal and demonstrative pronouns, and there are also many other differences.

4. Most Japanese sentences here are not wrong, but there are so many sentences that can only used in limited situations. Some of them are so stilted that you can use them only when you write, even if they look like an example of spoken language. Some of them sound too impolite or vulgar that you should use them when you're talking with close friends. So, if you're seriously interested in learning Japanese and not yet good enough at it to tell the nuance just by reading a sentences out of context, you'd better ignore Japanese sentences on Tatoeba.
cevapları gizle
pullnosemans
7 days ago
thanks for the responses so far, you two.

tommy, I think your very first remark is actually quite interesting. definite articles in english and german are indeed used to refer to something that is already known in discourse. however, on tatoeba, we of course don't have discourse, so that might be a problem when translating them into languages without direct equivalents of the articles.
I'm right now guessing that you could say the translations with demonstratives for definite articles are acceptable if we interpret them all in this way (at least for me personally, because I use tatoeba mainly to boost vocabulary anyway).

still curious to hear more opinions.
cevapları gizle
tommy_san
6 days ago
犬, この犬, その犬 and あの犬 can all be a valid translation for "the dog". Honestly speaking, I have no idea right now how to explain the difference, but one thing is sure, it makes almost no sense to discuss it using out-of-context sentences.

> I use tatoeba mainly to boost vocabulary anyway
Many Japanese sentences here sound somewhat like "Er nahm ein Foto von dem Hunde." You can learn many words from it: nehmen = take, Foto = picture, etc. but the problem is not many German speakers "nehmen" pictures or say "dem Hunde" nowadays. If you don't mind it, just go ahead. If you're more serious about learning Japanese, you may want to take a look at http://yourei.jp/. It lists tons of real Japanese sentences.
cevapları gizle
CK
CK
6 days ago
If you need furigana to read those pages, try starting with this link.

http://trans.hiragana.jp/ruby/http://yourei.jp/
cevapları gizle
tommy_san
6 days ago
Unfortunately, it doesn't work when you search a word, but you can type a URL directly.

http://trans.hiragana.jp/ruby/http://yourei.jp/例文
cevapları gizle
pullnosemans
6 days ago - düzenlendi 6 days ago
hey, thank you two for your help and providing me with yourei.

I'm right now curious how accurate my sense of style in japanese has grown so far, and whether or not I can roughly rely on it in my selection of japanese phrases on this site, so just a very quick test for myself: am I right assuming that "熱をお計りになりましたか。", an orphan sentence from this site, would be an example of a less-than-natural sentence? it appears pretty weird to me. do you think that staying away from sentences with syntactic structures that appear overly cumbersome to me would be a good way to filter out unnatural sentences?

also, it would be important for me to know: can I dodge stylistically outdated sentences on here by avoiding orphans, or sticking to certain contributors?
cevapları gizle
tommy_san
5 days ago
> 熱をお計りになりましたか。

I'd write 測る instead of 計る and I'd say (熱は or お熱は) instead of 熱を.
お測りになりましたか is perfectly fine (it's by no means outdated), though it might be more common to say 測られましたか.

> can I dodge stylistically outdated sentences on here by avoiding orphans, or sticking to certain contributors?

Sentences added by native speakers NOT as translations are usually good.
Sentences added by non-native speakers are often bad.
All the other sentences are sometimes good, sometimes bad.

Take a look at this thread if you haven't read it yet.
https://tatoeba.org/wall/show_message/15743
cevapları gizle
pullnosemans
5 days ago
again, thank you, especially for linking me to this very informative thread.

it seems that I'm still far from knowing idiomatic japanese well enough to be able to rougly judge whether a sentence is natural or not. I guess I will simply be staying away from orphans wherever I can now.

maybe I'll talk to my japanese tandem partner about this again. if I can get him interested in the tatoeba project, I'm sure he could contribute to making the japanese corpus more reliable.

what is the status quo in this respect anyway? are there any concrete plans how to get rid of the huge amount of japanese orphans? if the japanese corpus is that unsafe right now, hiding the orphans like you suggested back then might not be a bad idea. maybe havingthe number of japanese sentence indicated to be 60k instead of 180k would also lead to a greater motivation among japanese contributors to increase the number of good sentences.
<<< 1234567 >>