Notandasíða
Setningar
Orðaforði
Dómar
Listar
Eftirlæti
Ummæli
Ummæli á setningum frá sharptoothed
Veggskilaboð
Saga
Upptökur
Umritunir
Þýða setningar frá sharptoothed

> transcription and stress marks - are extra information which is not trivial to machine-generate
Indeed, and this is true for Japanese furigana as well. So, it would be ideal if we could edit this data just like we edit sentences.

> I think it shouldn't be too nice to language learners
I think, we should distinguish two types of learners: those who use Tatoeba to learn languages and to improve or extend their knowledge but avoid contributing translations they are not certain in (or avoid contributing translations at all), and those who confuse Tatoeba with sites like Lang-8 and similar or consider Tatoeba as a good place to raise their self-esteem, etc. Tatoeba can't be too friendly for the former, I believe. The more tools and additional information on sentences Tatoeba provides, the better. As for the latter, I don't really think that lack of those tools and info would stop much of them. They still have Google Translate and stuff, after all.
By the way, I'm pretty sure that all Russian language learners here, no matter how proficient they are, would be happy to have, for example, a phonetic transcription for every Russian sentence or at least to see stress marks placed on every word. :-)

5.
I don't think we need different kinds of "unsure". The fact that someone feels unsure about some sentence is negative enough by itself, I think, no matter what makes him feel so. And I don't think we should make "unsure" marks anonymous. The author deserves to know who marked his sentence that way and has a right to request explanations, I believe.
6.
Ooneykcall's rating system, as I see it, effectively introduces "OK", "not OK" and 3 levels of "unsure". As I mentioned above, I'm not really sure if it worth having several "unsure" marks.
7.
The idea of distinguishing between sentences I'd use myself and those I wouldn't seems highly subjective to me. I think it often has nothing to do with the real quality/value of a sentence but rather refers to personal preferences. That is, it maybe helpful for personal use but it as well maybe useless in the project-wide scope.
8.
I think we shouldn't take translation quality into consideration. Bad translations can be good sentences and vice versa.
9.
I like the idea of having ""I've read this, but I don't want to rate it" mark. Indeed, there are sentences I don't want to deal with. On the other hand there are also sentences that, in my opinion, don't make up good examples that are worth being kept in Tatoeba.
10.
I like this idea, too. I think I could use it to mark foreign sentences I'd like to translate but have doubts if they are good enough.

1. What kind of sentences do you mark as "unsure"?
I mark as "unsure" the sentences that seem awkward, illogical or somewhat unnatural to me as well as the sentences with questionable punctuation.
2. Have you ever thought about rating a sentence and then decided not to? If so, why?
Yes, it happens sometimes. Mostly when I want to mark a sentence as "not OK" but then realize that I'm ether not so sure about that or don't have enough arguments to mark it that way. I usually mark such sentences as "unsure" or just skip them.
3. How do you rate sentences that you know are correct but which you wouldn't use yourself?
In most cases I skip them and sometimes I use "unsure" mark. I think it's better not marking a sentence at all than giving it wrong or biased mark.
4. In your real life, do you have any kind of collection that include things that you think are bad?
No, I don't have such kind of collection. I don't think they are of any use for me. Bad examples are only good when corresponding good examples exist, I believe.

Thanks, Trang! Everything seems to work just fine.

Wrong sentence count displaying sentences in "Browse by language" feature when both "Not directly translated into:" and "Only sentences with audio:" filters used and, as a consequence, broken paging.
How to reproduce.
1. Open https://tatoeba.org/eng/sentenc...nly-with-audio
Japanese sentences with audio will be found and displayed. Remember the sentence count in the result (474 at the moment).
2. Press ">>" button and make sure that paging is working correctly. You will be taken to the page 48.
3. Now select 'English' in the "Not directly translated into:" drop-down menu. The page will be reloaded and you'll notice that the sentence count remained unchanged.
3. Press ">>" and see the page 48 with blank sentence list. Currently, only 10 pages of the Japanese sentences with audio not translated into English exist.

I think we should have more mercy for our admins and not make them do such a boring job. :-) Instead a robot that sends e-mails to potentially inactive members and processes responds from them could be developed.

> * Contributors can now adopt sentences from deactivated accounts.
It would be nice if we could see those sentences in the "Orphan sentences" list (accessible via "Contribute >> Adopt sentences" menu item).

Horus is our deduplication bot. It finds and merges duplicate sentences in Tatoeba database.

Похоже на то.

https://tatoeba.org/rus/sentences_lists/show/992
This list rather looks like a random sentences collection to me. I think it would be better to delete it.

Indeed, it can solve most of "non-canonical punctuation marks" problems if we come to an agreement.

** To whom it may concern **
[RUS] Список русских предложений, имеющих дубликаты или "почти дубликаты" по состоянию на 18 июля 2015 г.
[ENG] List of Russian sentences that have duplicates or near-duplicates as of July 18, 2015.
http://j-langtools.com/tatoeba/rus-dups.html

I suspect that somehow the word processor used to write the sentence just made an error doing character substitution.

To tell the truth, I have no idea why no-break space was used in the 2nd sentence in that position. No-break space unlike regular one doesn't allow line break at its position and this is the only difference. In Russian typography it's recommended to put no-break space before the dash between two words or after the dash if it starts the line (in dialogues, for example). It seems that the sentence was originally typed in some text editor which performs automatic character substitution (it worked wrongly in our case) and then copied to Tatoeba.
> Let's have a Github issue to have Horus programmed to treat the two spaces identically.
It would be nice but what sentence Horus should prefer during merging: the one that contains special space or the one that don't? This is not a trivial question since in addition to different spaces there are different dashes, hyphens, quotation marks, etc. and different languages have different typographical rules and, ideally, Horus should be aware of all (or the most common at least) of them. The plus is that "more clever" Horus would be able to improve sentences brining them to standard typeset. But it would take time and efforts of our busy developers.

There are different kinds of spaces. In this particular case we can find two of them: regular space that we normally use (all spaces in #2735348) and no-break space (the space after the dash in #4340200). It's impossible to distinguish them by eye but they are completely different symbols for computers.
Of course, we can make those sentences identical manually and let Horus merge. But this won't eliminate the problem.

oops! It seems it was replaced with regular white space when I copy-pasted those sentences. Thanks, CK!

It seems that Horus is incapable of merging identical sentences with em-dashes. I've just found two identical sentences (I don't see any difference at least):
https://tatoeba.org/rus/sentences/show/2735348
https://tatoeba.org/rus/sentences/show/4340200

I see this:
https://dl.dropboxusercontent.c...87/2099363.png
Not bad at all to my slipshod taste. :-)

I see much better picture:
https://dl.dropboxusercontent.c...87/4355771.png
But, yes, the problem exists.