menu
Tatoeba
language English
Register Log in
language English
menu
Tatoeba
Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5,932 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

MisterTrouser

an hour ago

subdirectory_arrow_right

TRANG

2 hours ago

subdirectory_arrow_right

Thanuir

17 hours ago

subdirectory_arrow_right

CK

22 hours ago

subdirectory_arrow_right

MisterTrouser

yesterday

subdirectory_arrow_right

AlanF_US

yesterday

subdirectory_arrow_right

AlanF_US

yesterday

feedback

MisterTrouser

yesterday

feedback

CK

yesterday

subdirectory_arrow_right

rumpelstilzchen

yesterday

mramosch mramosch May 27, 2020 at 8:37 AM, edited May 27, 2020 at 8:43 AM May 27, 2020 at 8:37 AM, edited May 27, 2020 at 8:43 AM link Permalink

Is there a way to force an “upper case sensitive search’ for a word? This would - in German - reduce the amount of false positives considerably when searching for a noun that distinguishes itself from the verb just because of capitalization/case sensitivity?

e.g. Das Mitbringen...

{{vm.hiddenReplies[35355] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 27, 2020 at 11:29 AM May 27, 2020 at 11:29 AM link Permalink

There is no way to do this via our search, but you can do an ordinary case-insensitive search and use your browser (for instance Ctrl-F in Firefox) to search case-sensitively on the page.

As we have things set up, there is a single index for each language, which is case-insensitive for all languages that have capitalization. Using a case-sensitive index would require users to type capital letters when they need them, which would mean extra work. Furthermore, the fact that words are capitalized at the beginning of the sentence, irrespective of whether they are acting as a noun, could cause either false positives or false negatives.

In the case of "mitbringen/Mitbringen", I see 61 hits, none of which is capitalized. Those 61 results fit on a single page (if you have things configured so that you have 100 results per page), so a case-sensitive search is very quick.

{{vm.hiddenReplies[35358] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 11:40 AM, edited May 27, 2020 at 11:41 AM May 27, 2020 at 11:40 AM, edited May 27, 2020 at 11:41 AM link Permalink

‚mitbringen/Mitbringen’ was just the sentence that made me decide to ask for this option but I had other occasions where 1 single case was hiding in hundreds of counterparts, and that was difficult to trace as you can imagine.

I was rather thinking of something in the lines of adding a specifier (in the way ‚=‘ is used) right before the ‚word at hand‘ to request a separate case sensitive search for this word only but in the context of an entire phrase...

{{vm.hiddenReplies[35360] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 27, 2020 at 12:03 PM May 27, 2020 at 12:03 PM link Permalink

If you know how to do it, the easy way would be to download the exported files and do case-sensitive searches offline.

{{vm.hiddenReplies[35366] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 12:12 PM May 27, 2020 at 12:12 PM link Permalink

Thanks, but...

I wanted to do the search on the most recent data possible and I wanted to do it inline, right in the place where I am working on something...

AlanF_US AlanF_US May 27, 2020 at 1:42 PM May 27, 2020 at 1:42 PM link Permalink

Adding a specifier won't help unless the index stores information in the way you want to search it. Since the index is stored case-insensitively, the search will find case-insensitive matches. The equals sign (=) indicates that you want to match the word as spelled rather than perform stemming on it (removing likely suffixes before doing the comparison). It ignores capitalization.

{{vm.hiddenReplies[35369] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 2:09 PM May 27, 2020 at 2:09 PM link Permalink

Yeah, that‘s also how I understood the ‚How to Search for Text‘ wiki.

That‘s why I said ‚similar to the = specifier‘.

That using the current indexing is no option is also understood.

I thought maybe a second pass over the already filtered set that uses the first pass‘ indexed search result and does a case sensitive query on the remaining ‚real’ sentences. Like one does with the search function in the browser you recommended above but just done automatically without needing an explicit action invoked by the user.

And because a new specifier would not be part of the basic daily search routine and is not suited for languages without case sensitivity anyways it might not put too much additional load on the server most of the times, when used in these special occasions only.

mramosch mramosch May 24, 2020 at 11:12 AM May 24, 2020 at 11:12 AM link Permalink

Clicking a ‚linked‘ entry in the log of a sentence highlights the actual sentence in the list of translations. A very hidden feature but I‘m glad I found it ;-)

When unlinking a sentence the ‚linked’ entry still remains in the log and an additional ‚unlinked‘ entry appears in the list.

Having gotten used to the highlighting feature I really miss the same behavior for the ‚unlinked‘ log entries, provided the unlinked sentence is still in the list of translations as an indirect link, of course.

And even highlighting the indirect link when pressing an outdated ‚linked‘ entry would be useful for my workflow.

Needless to mention that
-> clicking on a sentence in the translation list causing the highlighting of all relevant entries in the log - or at least the last relevant ‚linked‘ respectively ‚unlinked‘ log

would be excellent, provided the UI and automatic scrolling allows for that feature.

{{vm.hiddenReplies[35318] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 27, 2020 at 1:50 PM May 27, 2020 at 1:50 PM link Permalink

Have you considered getting an account on GitHub and joining the Tatoeba/tatoeba2 project there? Then you could post your enhancement requests as issues. That would be the best place, since you could carry on an extended discussion there without worrying about monopolizing the Wall, or having your suggestions get lost without being considered. You'd also be able to communicate with developers more easily there. Try this link:

https://github.com/Tatoeba/tatoeba2

which can also be found at the bottom of any Tatoeba page.

mramosch mramosch May 27, 2020 at 10:38 AM, edited May 27, 2020 at 10:39 AM May 27, 2020 at 10:38 AM, edited May 27, 2020 at 10:39 AM link Permalink

Every now and then I see comments regarding some audio problems and people pinging @CK to help out, but at the same time I am told that CK has deactivated all notifications to his account and won’t see them.

Is there any official user/pseudo-user account like @audio where audio issues can be directed to without getting lost in oblivion?

{{vm.hiddenReplies[35356] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 27, 2020 at 11:37 AM May 27, 2020 at 11:37 AM link Permalink

You can send him a private message. While you're at it, you can ask him to turn on e-mail notifications so this kind of thing doesn't happen in the future. It wouldn't be the first time he's gotten that request.

No, there is no account called "audio". If there were, CK would still have to check it, since he's the one who deals with issues that need to be resolved by working with the audio. (Other admins can resolve issues that pertain to text.)

Seeing that you are an advanced contributor and can leave a tag, it's a good idea to leave a tag like "@change", since sentences with this tag are reviewed frequently.

{{vm.hiddenReplies[35359] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 11:56 AM, edited May 27, 2020 at 11:58 AM May 27, 2020 at 11:56 AM, edited May 27, 2020 at 11:58 AM link Permalink

I am only involved in some comment-discussions because of the small amount of contributions I have delivered yet and am already drowning in e-mail notifications.

I can imagine what it means to be in CK‘s shoes, so I won‘t bother him on this one. I do it often enough for other issues... ;-)

I‘m just hinting that at least official administrative tasks should have a maintainer with some functioning channel of communication. I guess that’s what commenters are expecting from a comment section by default instead of having (for every request) to look up names of responsible persons, because they might change along the way,

CK CK May 27, 2020 at 11:41 AM, edited May 27, 2020 at 11:41 AM May 27, 2020 at 11:41 AM, edited May 27, 2020 at 11:41 AM link Permalink

You can tag the sentence with "@change audio" and leave a comment explaining exactly what needs to be done.

If these don't get fixed in a reasonable amount of time, send me a private message.

There are a number of items tagged @change audio that are hard to deal with, since there are no clear explanations about what needs to be done.

{{vm.hiddenReplies[35361] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 11:42 AM May 27, 2020 at 11:42 AM link Permalink

But is everybody eligible for using tags?

{{vm.hiddenReplies[35362] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 27, 2020 at 11:56 AM May 27, 2020 at 11:56 AM link Permalink

No, but every logged in member can leave a comment asking for someone to tag a sentence.

{{vm.hiddenReplies[35363] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 27, 2020 at 12:00 PM, edited May 27, 2020 at 12:07 PM May 27, 2020 at 12:00 PM, edited May 27, 2020 at 12:07 PM link Permalink

Well, if that’s not redundant and complicated then what...? ;-)

Yesterday I filed about 15 requests after listening and digging through 1000 of entries in an audio list, I don‘t wanna do these redundant steps of asking anybody else to ask some friend of a friend of a friend - over and over again...

sharptoothed sharptoothed May 24, 2020 at 7:50 PM May 24, 2020 at 7:50 PM link Permalink

** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[35329] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 May 25, 2020 at 9:12 PM May 25, 2020 at 9:12 PM link Permalink

Thank you :D

Guybrush88 Guybrush88 May 26, 2020 at 7:15 AM May 26, 2020 at 7:15 AM link Permalink

thanks

deniko deniko May 21, 2020 at 1:14 PM May 21, 2020 at 1:14 PM link Permalink

Not sure whether it has been discussed before - sorry if it has - but it looks like if you're using the new interface you can only change the flag to a language that is listed in your profile. As a corpus maintainer I change the flag to something that is not listed there, and I do it quite often - obviously, confirming which one it should be, if I'm unsure. Is it possible to list all the languages in the drop down menu, similar to the old interface? I guess it's a useful feature for everyone, not just for corpus maintainers.

{{vm.hiddenReplies[35261] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 21, 2020 at 2:10 PM May 21, 2020 at 2:10 PM link Permalink

It has not been discussed here on the Wall but I had sent some time ago a message to the corpus maintainers group in Gitter. I don't think a lot of people received it though because no one answered.

It has been originally mentioned by Aiji and CK in GitHub:
https://github.com/Tatoeba/tatoeba2/pull/2077

My questions would be:
- Is there a reason why you don't want to list the languages in your profile?
- Can you describe the usual situations where you'll change the language?
- How often exactly does it happen?

{{vm.hiddenReplies[35264] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko May 21, 2020 at 2:18 PM, edited May 21, 2020 at 2:19 PM May 21, 2020 at 2:18 PM, edited May 21, 2020 at 2:19 PM link Permalink

> Is there a reason why you don't want to list the languages in your profile?

For example, because I don't speak them?


> Can you describe the usual situations where you'll change the language?

For example:

#8724171

I asked the user to change the flag. if they don't change it in 6 more days (2 weeks after I noticed it's wrong), I'll confirm with Lisa or someone else it's Japanese (which is my guess) and change it to Japanese.

> How often exactly does it happen?

Not too often, but I guess at least 1-2 times a month.

{{vm.hiddenReplies[35266] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 21, 2020 at 2:51 PM May 21, 2020 at 2:51 PM link Permalink

> For example, because I don't speak them?

In this case, it would be better to let someone who speaks the language handle the change, wouldn't it?

On your side, when you can identify that a sentence has the wrong language, but you're not sure what is the correct language, you can change ASAP the language "Other language" so that it doesn't pollute the corpus for the languages that you speak.

Then you can leave a comment, where you can even ping directly some corpus maintainers: "This sentence was wrongly flagged as Ukrainian. I think it's Japanese. @small_snow @bunbuku @Pfirsichbaeumchen"

Then one of them will change it when seeing your comment if it is indeed Japanese.

{{vm.hiddenReplies[35267] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko May 21, 2020 at 2:56 PM May 21, 2020 at 2:56 PM link Permalink

> In this case, it would be better to let someone who speaks the language handle the change, wouldn't it?

I don't believe so. If the task is as simple as changing the flag, I can ask a speaker of this language to confirm the flag. If the speaker is a corpus maintainer or an admin they can fix that themselves, sure, once I bring it to their attention, but if they're not? It's not like I edit those sentences or something.

So what if it's say Slovak? I don't speak it, I can make a reasonable guess it is Slovak, I know who to confirm it with, but I don't know any corpus maintainer/admin who has it listed to ask them to change the flag.

{{vm.hiddenReplies[35268] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 21, 2020 at 5:02 PM May 21, 2020 at 5:02 PM link Permalink

> It's not like I edit those sentences or something.

But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

> So what if it's say Slovak? I don't speak it, I can make a reasonable guess
> it is Slovak, I know who to confirm it with, but I don't know any corpus
> maintainer/admin who has it listed to ask them to change the flag.

In that case would there be any issue for you to just add Slovak to your list of languages?

{{vm.hiddenReplies[35269] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko May 21, 2020 at 5:11 PM May 21, 2020 at 5:11 PM link Permalink

> But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

Well, it's all up to you. I find it very safe to do so. I can do it anyway by adding Japanese to the list of my languages although I don't understand it at all, so I still don't see the point of making the task more complicated than it is now.

> In that case would there be any issue for you to just add Slovak to your list of languages?

Well, I'd have to list all Slavic and Romance languages there in this case... And some Germanic just because they're relatively easy to understand in their written form through the languages I already know.

I just don't want to make the list too long. I enjoy having a relatively short list of languages I'm interested in because this list is used in some places to facilitate search, etc. Besides, some features were discussed about using this list to prioritize sentences in those languages when you see translations, that would be a sweet feature, but I don't want to prioritize languages that I'm not truly interested.

{{vm.hiddenReplies[35270] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 21, 2020 at 9:28 PM May 21, 2020 at 9:28 PM link Permalink

> I can do it anyway by adding Japanese to the list of my languages although
> I don't understand it at all, so I still don't see the point of making the task
> more complicated than it is now.

Because by principle, you shouldn't be changing the language of a sentence to Japanese if you don't know Japanese and there is someone else who knows better than you and can perform this change instead of you.

The fact that you can do it anyway is not an intended feature. It is because we don't care much that you can do it anyway, but ideally, we'd rather you don't do it.

> Well, I'd have to list all Slavic and Romance languages there in this case...

No, not necessarily all of them. You would only have to do this for languages that are lacking corpus maintainers and that you are willing to get involved in.

{{vm.hiddenReplies[35277] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko May 22, 2020 at 8:03 AM May 22, 2020 at 8:03 AM link Permalink

Thanks for the explanation Trang.

gillux gillux May 22, 2020 at 3:48 PM May 22, 2020 at 3:48 PM link Permalink

What if the sentence is written in a language that has no corpus maintainer?

{{vm.hiddenReplies[35291] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 22, 2020 at 4:17 PM May 22, 2020 at 4:17 PM link Permalink

The language can always be changed to "Other language" until someone can confidently assign the correct language.

{{vm.hiddenReplies[35292] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 22, 2020 at 4:53 PM, edited May 22, 2020 at 4:53 PM May 22, 2020 at 4:53 PM, edited May 22, 2020 at 4:53 PM link Permalink

In that case, who would be that "someone"? Only corpus maintainers can change someone else’s sentence flag.

{{vm.hiddenReplies[35293] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 22, 2020 at 6:00 PM May 22, 2020 at 6:00 PM link Permalink

That someone will be a corpus maintainer most of the time, yes.

But that someone can be an advanced contributor if the owner of the sentence is inactive. Advanced contributors can adopt the sentence and change the language upon becoming the new owner. This case is probably very rare though.

That someone could also be a regular contributor. If the owner of the sentence is inactive and the sentence is adopted by an advanced contributor who then decides to unadopt. This case is probably even more rare.

The most likely situation is that we find a corpus maintainer who has some knowledge in the language or who has taken the time to learn about the language and who is willing to take care of the language.

Among existing corpus maintainers, this would be typically someone like cueyayotl or Ricardo or shekitten who tend to have a broader interest in languages and could be temporary ambassadors for a language that doesn't have yet a native speaker as corpus maintainer.

Otherwise, there could be an advanced contributor who decides to step up and become corpus maintainer in order to take care of that language, seeing that it wasn't getting much love.

{{vm.hiddenReplies[35295] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 22, 2020 at 9:43 PM May 22, 2020 at 9:43 PM link Permalink

I get your point, but I have to disagree. The principle of restricting flag changes only to corpus maintainers having the proper profile language makes sense in theory. But now you are facing 3 corpus maintainers (so far) asking for full flag selection. So maybe that is not so practical after all.

I assume that most (if not all) sentences in need of flag correction are the result of a user mistake (or bad UI) when adding the sentence. In this context, aren’t most flag corrections more about understanding how the mistake happened, what was the original intention of that particular member, rather than knowledge in a particular language?

In your point of view, it looks like the knowledge of a language is the only way one can accurately change someone else’s sentence flag. I disagree. I’d trust any corpus maintainer to change any sentence flag because they are all responsible and trustworthy. I know they will make the necessary research, ask the right person and make the correct decision, because that’s exactly what they are here for. Denying them this ability feels almost like you don’t really trust them after all. I wouldn’t be very happy about that if I were a corpus maintainer.

{{vm.hiddenReplies[35300] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 10:28 PM, edited May 22, 2020 at 10:30 PM May 22, 2020 at 10:28 PM, edited May 22, 2020 at 10:30 PM link Permalink

I think, I was the reason that this discussion started because out of ignorance I caused the flag problem that had to be fixed manually by Denis.

I tried to direct-link a sentence by creating an already existing sentence (that only had an indirect link) for a second time in order to change the link from being indirect to being direct. I was told this is the best way of doing it if you don’t have AC permission to do linking with the official linker tool.

Because I recently removed my language information from my profile I only got the options ‚English‘ or ‚Other language‘ when trying to submit the new sentence.

I chose ‚Other language‘ because I was told that the bot would automatically detect and correct everything to eventually be left with one correct version and a direct link.

But all I achieved was the appearance of a flag with a question mark that was stuck and couldn’t be changed anymore.

Maybe this is a bug in the UI or a use case that the UI is not prepared for?

{{vm.hiddenReplies[35303] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 23, 2020 at 1:04 AM, edited May 23, 2020 at 1:06 AM May 23, 2020 at 1:04 AM, edited May 23, 2020 at 1:06 AM link Permalink

We have a duplicate-merging script that will merge exact matches together. If the language code (flag) is different, then the 2 sentences won't be merged because they are not an exact match.

CK CK May 23, 2020 at 1:09 AM, edited May 23, 2020 at 1:09 AM May 23, 2020 at 1:09 AM, edited May 23, 2020 at 1:09 AM link Permalink

> In your point of view, it looks like the knowledge of a language is the only way one can accurately change someone else’s sentence flag. I disagree. I’d trust any corpus maintainer to change any sentence flag because they are all responsible and trustworthy.

I agree with Gillux.

If there are corpus maintainers that you feel are not responsible and trustworthy then they should be changed back to advanced contributors.

TRANG TRANG May 24, 2020 at 10:51 AM May 24, 2020 at 10:51 AM link Permalink

I trust corpus maintainers for doing everything with their best intention but I don't think it's reasonable to expect every one of them to be equally competent for every task that corpus maintainers can perform.

For me this is about separation of responsibilities and about transparency. This is about how do we organize ourselves in the best possible way and how to make the features of Tatoeba reflect better this organization.

I know that in many cases it is possible for a corpus maintainer who has no tangible knowledge in a certain language to fix an error in sentence in that language. This includes: setting the correct flag, fixing the punctuation and even fixing very basic mistakes. But in my opinion, if you have the chance to involve another corpus maintainer who has more experience with the language than you have, this should be your default course of action (even if what you need to fix feels obvious to you). And if that is your default course of action, then you don't really need the full list of languages by default.

It doesn't mean you are completely forbidden from making interventions in other languages. If there's a special situation, you can always add the language to your profile at any time, with a low level or unspecified level. Since these situations are supposed to be exceptions, the fact that you have to take these extra steps shouldn't be too much to ask.

I can understand that it's not practical because you may end up with unwanted languages showing up in the dropdown list when adding/translating sentences. But then it's a different problem.

Besides, if you're dealing with a language which seemingly has no corpus maintainer and, as a result, you get in touch with native speakers or/and do some research on the language, then it would be helpful for the community to know it from your profile. The next time there's maintenance to do on other sentences in that language, people could reach out to you and benefit from your past experience instead of asking assistance from a random corpus maintainer.

{{vm.hiddenReplies[35317] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 24, 2020 at 5:53 PM, edited May 24, 2020 at 5:54 PM May 24, 2020 at 5:53 PM, edited May 24, 2020 at 5:54 PM link Permalink

To me, identifying the language of a sentence belongs to a different category from modifying or deleting the sentence. In order to modify or delete a sentence, you need to know the language pretty well, and once you make the modification or deletion, it can become difficult or impossible to revert the change. But it requires a good deal less familiarity with a language to identify a sentence in it (or a sentence that does NOT belong to it), and in any case, an incorrect language identification is easy to switch later.

With the new UI design, changing language identification is grouped together with other operations on the sentence, so I can see why it might be tempting to require corpus maintainer privileges for all of them. But I don't think that logically or practically they fit into the same category.

{{vm.hiddenReplies[35327] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 24, 2020 at 9:04 PM May 24, 2020 at 9:04 PM link Permalink

I agree with what you said but the question still remains.

With the new design, a corpus maintainer now only sees the languages in their profile when editing the language of a sentence.

Should we bring back the full list of languages (as it used to be in the old design) or can we keep it restricted to the profile languages?

{{vm.hiddenReplies[35332] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 24, 2020 at 11:39 PM May 24, 2020 at 11:39 PM link Permalink

Sorry, I missed that question. I vote for showing the full list of languages.

{{vm.hiddenReplies[35334] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 May 25, 2020 at 5:16 AM May 25, 2020 at 5:16 AM link Permalink

+1

brauchinet brauchinet May 24, 2020 at 12:01 PM May 24, 2020 at 12:01 PM link Permalink

This doesn't really belong here - a situation that I often witnessed:
Somebody writes a comment such as "Flag", "Change flag", "Not French!", "French!".
A corpus maintainer comes by and changes the flag.
The owner of the sentence reads the comment and goes "Huh? What?"

A possible solution would be to keep a log for changing flags. Don't know if it's worth it.

{{vm.hiddenReplies[35321] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pandaa Pandaa May 24, 2020 at 1:27 PM May 24, 2020 at 1:27 PM link Permalink

+1

TRANG TRANG May 24, 2020 at 9:07 PM May 24, 2020 at 9:07 PM link Permalink

> A possible solution would be to keep a log for changing flags.
> Don't know if it's worth it.

We have an old ticket for this already :)
https://github.com/Tatoeba/tatoeba2/issues/533

And yes, I think it would be worth it.

Aiji Aiji May 23, 2020 at 10:39 AM May 23, 2020 at 10:39 AM link Permalink

> But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

> The fact that you can do it anyway is not an intended feature. It is because we don't care much that you can do it anyway, but ideally, we'd rather you don't do it.

What about linking sentences, where there are two languages involved?

Whatever the official guidelines become, in any case, please don't make the maintenance work a complicated ball of unnecessary bureaucracy. Right now, a non-negligible part of corpus maintenance rests on mutual help, both for detecting and correcting potential mistakes. And we sometimes need to trust each others judgment, even (in particular) in languages we aren't fully capable.

A simple, illustrative more than completely exact, example : There is a sentence in Hebrew that I suspect don't correspond to French. I will ask Alan about it. I will ask him to help me confirm the meaning of the sentence in English because that's our best common language. I'm sure that he will not mistake his explanation due to a lack of ability and confident that I will not mistake mine. If Alan can't help me, he may ask somebody else to help, maybe even in Hebrew instead of English, if that's there best common language. At the end, if my suspicion is confirmed, I will unlink the Hebrew from the French, although I don't speak a bit of Hebrew. There might have been a piece of information lost in translation but I trust that my fellow maintainers and I, at the best of our combined ability, did the good choice.

(The fact that Alan understands French somehow goes against my example, but I hope you get the idea I wanted to express).

{{vm.hiddenReplies[35310] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 23, 2020 at 11:31 AM May 23, 2020 at 11:31 AM link Permalink

It just looks to me like the half-empty vs. half-full syndrome or in other words

• It is incorrect as long as it is not proven to be correct
vs.
• It is correct as long as it is not proven to be incorrect

and by ‘proven’ I mean something along the lines of e.g. ‘confirmed by a native speaker’ etc.

Perhaps the guidelines on Tatoeba should be a little more clear of which approach is preferable as modus operandi when doing clean up work. I don’t wanna have some other AC or CM constantly having to clean up the mess after me just because I assumed that I have to make my own guidelines because of missing official policies.

It seems that in my first day of linking sentences I have already stepped in some dodo and although having read all those endless threads of near-duplicates, indirect vs. direct links etc. I never really found some conclusion that sounded like a recommendation, even if just a temporary one.

Aiji’s example is a good illustration of the issue, even with only having to deal with languages that are relatively high in the food chain. But imagine less prominent languages, where there is almost no way of gaining traction without relying on second hand guesswork via already existing translations into better distributed languages.

So, is there a general recommendation for the “permissive/prohibitive goes first” problem or does every use case have to be considered carefully in its own domain?

{{vm.hiddenReplies[35311] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 24, 2020 at 11:26 AM May 24, 2020 at 11:26 AM link Permalink

> So, is there a general recommendation for the “permissive/prohibitive
> goes first” problem or does every use case have to be considered
> carefully in its own domain?

I think every use case should be considered carefully. When in doubt, just ask on the Wall.

TRANG TRANG May 24, 2020 at 11:12 AM May 24, 2020 at 11:12 AM link Permalink

> What about linking sentences, where there are two languages involved?

When there are two languages involved, you obviously don't have to have both languages in your profile. But you should know one of them.

If you don't know Arabic and don't know Kabyle, you shouldn't be linking Arabic and Kabyle sentences.

> Whatever the official guidelines become, in any case, please don't
> make the maintenance work a complicated ball of unnecessary
> bureaucracy.

There's no guidelines being changed here. The guideline of not making interventions in languages that you have no clue about is not a new thing, is it?

mramosch mramosch May 24, 2020 at 10:30 AM, edited May 24, 2020 at 11:40 AM May 24, 2020 at 10:30 AM, edited May 24, 2020 at 11:40 AM link Permalink

In Spanish, although there are clearly defined grammatical rules of when to use which past tense e.g., in practice there exist regional preferences that stick to one tense only and avoid some other tenses almost completely. (Preterite(Indefinido) vs. Present Perfect)

In German, same thing. The Past Tense (Imperfekt) is much more prominent in Germany. In Austria you may be writing ‚I bought a watch yesterday‘ but it sounds ridiculously posh when spoken out. Only ‚I have bought a watch yesterday‘ works in colloquial speech. And I think even Germans might in a certain context prefer it

• Ah, du warst gestern in der Stadt, Und? Hast du dir die neue, schicke Uhr gekauft?

instead of the clunky Imperfekt (Past Tense)

• Und? Kauftest du dir die neue, schicke Uhr?

——————————————————

How do we correctly incorporate such differences in the linking system?

One, who argues with the correct grammatical rules in mind, would apply less links. If the point of reference is reality then we might end up with spaghetti-linking all over the place, although being correct.

What is the ‚official‘ recommendation?

{{vm.hiddenReplies[35316] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 24, 2020 at 4:42 PM May 24, 2020 at 4:42 PM link Permalink

Kielioppirakenteiden vastaavuus ei ole oleellista. Lauseiden merkitysten pitäisi vastata toisiaan.

En osaa saksaa, joten en kommentoi esimerkkejäsi sen enempää.

AlanF_US AlanF_US May 24, 2020 at 8:22 PM, edited May 24, 2020 at 11:40 PM May 24, 2020 at 8:22 PM, edited May 24, 2020 at 11:40 PM link Permalink

The official recommendation is that there is no official recommendation. :)

There is a huge amount of information that can be stored in a sentence, including register (formality). Some of it cannot be perfectly represented in a sentence in another language. As you've noted, once you realize this, you could react by going toward either of two extremes: linking almost nothing, or linking almost everything. A moderate path, relying on your intuition and on the precedents that you see around you, is best. But if in doubt, don't link.

moman moman May 24, 2020 at 5:33 PM May 24, 2020 at 5:33 PM link Permalink

Hi all,

There appears to be a slight issue with selecting language flags for a sentence. It has happened multiple times where it reverts back to an incorrect flag after I've specifically selected the correct one. And when editing it after the sentence exists, I have to click on the flag to change it; choosing the edit button and then changing the language will not save the change.

On another note, I'm really enjoying this site. Thanks to all who manage it!

{{vm.hiddenReplies[35326] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 24, 2020 at 6:01 PM May 24, 2020 at 6:01 PM link Permalink

Hi moman, thank you for the kind words.
You are describing two different problems.

1. Explicitly selecting the language flag while adding a new sentence doesn’t create a sentence with the flag you selected. Are you sure you are not selecting "Auto detect" as language flag? Could you show us a sentence on which that problem happened?

2. On an existing sentence, clicking the edit button and then changing the flag and then clicking OK won’t change the flag. This is a know problem. To edit the flag, just click on the flag without clicking on the edit button. This problem will go away once we switch to our new interface.

{{vm.hiddenReplies[35328] ? 'expand_more' : 'expand_less'}} hide replies show replies
moman moman May 24, 2020 at 7:56 PM May 24, 2020 at 7:56 PM link Permalink

Hi gillux,

I'll have to post an example here when it does it again. I can't recall with which sentences it happened. I have never purposely clicked "Auto detect."

mramosch mramosch May 14, 2020 at 8:54 PM, edited May 14, 2020 at 8:55 PM May 14, 2020 at 8:54 PM, edited May 14, 2020 at 8:55 PM link Permalink

Is there an easy way to find out how many of the 503.559 german sentences are originals and how many are translations?

I could only filter out that there are 30.635 german sentences that have no translation at all so they must be originals, but how many originals are there in total?

And what is the distribution of languages which the non-originals were translated from.
I guess the majority are translations from english sentences (either originals or translations themselves).

How can we retrieve those metrics? Anyone?

{{vm.hiddenReplies[35172] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji May 15, 2020 at 1:48 AM, edited May 15, 2020 at 1:49 AM May 15, 2020 at 1:48 AM, edited May 15, 2020 at 1:49 AM link Permalink

On Tatoeba directly, it's impossible to do yet.

Offline, if you're proficient in some programming language, you can probably use the exported sentences file and links file to find about.

If none of above is possible, you may want to wait that I add this function to Tatoeba playground, the external exploration tool I develop for fun and for others to explore the corpus in ways that aren't possible on Tatoeba (self-promotion ^^) https://github.com/agrodet/Tatoeba-playground. I plan an update this week-end and maybe I will incorporate this possibility. I was thinking about it for quite some time :)

PS : Note that even if a sentence has no translation it might happen that it was a translation that was unlinked later on, hence not an original sentence.

{{vm.hiddenReplies[35177] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 8:17 PM May 21, 2020 at 8:17 PM link Permalink

Hi Aiji,

which platform is this tool for - or is it web-based?

And what’s the news on the update?

TRANG TRANG May 15, 2020 at 9:23 PM May 15, 2020 at 9:23 PM link Permalink

> how many originals are there in total?

There are 80471 original sentences in German at the moment.

> How can we retrieve those metrics? Anyone?

This is how original sentences were calculated when we introduced this information in Tatoeba:
https://github.com/Tatoeba/tato...ationShell.php

You could technically install Tatoeba, import the sentences and contributions (using the file we export weekly) into the local database then run that shell. But let's say it's not the easiest way to do it :)

There's an issue in GitHub which would solve your problem here if it was implemented:
https://github.com/Tatoeba/tatoeba2/issues/2159
When this is implemented, you would be able to get the number from the search.

In the meantime if you just needs punctually some stats, to get an idea, we can query the production database.

{{vm.hiddenReplies[35180] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 15, 2020 at 9:31 PM, edited May 15, 2020 at 9:32 PM May 15, 2020 at 9:31 PM, edited May 15, 2020 at 9:32 PM link Permalink

Thanks Trang!

Actually I would really like to see the sources for the translations of the remaining 420.000 sentences.

I guess around 3/4 will be translation from english sources, but the rest...?

{{vm.hiddenReplies[35182] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 15, 2020 at 9:54 PM May 15, 2020 at 9:54 PM link Permalink

It’s not exactly what you are asking, but you might be interested in this post: https://tatoeba.org/wall/show_message/21926

The data is 5+ years old, but it looks like it’s possible to rebuild it from source: https://github.com/tguinard/tatoeba_visualization

{{vm.hiddenReplies[35183] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 15, 2020 at 10:15 PM, edited May 21, 2020 at 8:19 PM May 15, 2020 at 10:15 PM, edited May 21, 2020 at 8:19 PM link Permalink

Woa, woa, woa...

That looks like a wallpaper from the 1970s. - Psychedelic...

Actually I was just hoping for an informative little five liner ;-)

Thanks anyways!

TRANG TRANG May 15, 2020 at 10:04 PM May 15, 2020 at 10:04 PM link Permalink

I've created this issue in GitHub:
https://github.com/Tatoeba/tatoeba2/issues/2325

This might be a task for someone during Kodoeba...

Otherwise, if I got the query correctly, German sentences are mostly translated from English, Esperanto, French, Japanese and Russian (for the top 5). Details here:
https://gist.github.com/trang/0...2caedb4c678c46

{{vm.hiddenReplies[35184] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 15, 2020 at 10:08 PM May 15, 2020 at 10:08 PM link Permalink

I see German in line 20 of the listing...

Sure you got the query right?

{{vm.hiddenReplies[35185] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 15, 2020 at 10:15 PM May 15, 2020 at 10:15 PM link Permalink

It's possible for people to translate from the same language. Here are some examples:

https://tatoeba.org/eng/sentences/show/331940
https://tatoeba.org/eng/sentences/show/331942
https://tatoeba.org/eng/sentences/show/340727
https://tatoeba.org/eng/sentences/show/341099
https://tatoeba.org/eng/sentences/show/347688
https://tatoeba.org/eng/sentences/show/349821
https://tatoeba.org/eng/sentences/show/349822
https://tatoeba.org/eng/sentences/show/350135
https://tatoeba.org/eng/sentences/show/387987
https://tatoeba.org/eng/sentences/show/430928

If you go to the sentence indicated in the logs in "This sentence was initially added as a translation of sentence ...", you'll see it leads to another German sentences.

{{vm.hiddenReplies[35186] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 15, 2020 at 10:20 PM May 15, 2020 at 10:20 PM link Permalink

But what’s the point here?

Translation from one language to the very same language? That’s not what I would consider a TRANS-lation.

Are you sure this is not just a linking error?

{{vm.hiddenReplies[35188] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 15, 2020 at 10:53 PM May 15, 2020 at 10:53 PM link Permalink

> But what’s the point here?

https://tatoeba.org/eng/wall/sh...#message_34400

> Are you sure this is not just a linking error?

It could be an error, yes. It does happen that people create translations from the wrong language as reported here:
https://github.com/Tatoeba/tatoeba2/issues/2132

In most cases, I think it's intentionally. I can't say for sure though.

{{vm.hiddenReplies[35189] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 15, 2020 at 11:44 PM May 15, 2020 at 11:44 PM link Permalink

But this is pure chaos.

e.g.: https://tatoeba.org/eng/sentences/show/123970

We have a wide choice of books.
En esta tienda tenemos una gran variedad de libros.

These are two totally independent sentences, yet they appear as direct translations...

{{vm.hiddenReplies[35191] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 16, 2020 at 8:43 AM May 16, 2020 at 8:43 AM link Permalink

I'm not sure exactly what you think is chaos.

If you consider one of the sentences listed in "Translations" is not a correct translation of "当店にはいろいろな種類の本がございます。", then you can report it in the comment. Someone will unlink the sentence.

As a general rule: if B and C are correct translations of A, B and C don't have to be correct translations of each other.

AlanF_US AlanF_US May 16, 2020 at 3:37 PM May 16, 2020 at 3:37 PM link Permalink

>We have a wide choice of books.
> En esta tienda tenemos una gran variedad de libros.

>These are two totally independent sentences, yet they appear as direct translations...

"Totally independent" is not accurate. The translation of the Spanish is "In this store, we have a wide choice of books", so as you can see, the second part of the Spanish sentence matches the full English sentence. However, it would better to have a closer match, so I added a comment asking for an improvement, as Trang suggested.

{{vm.hiddenReplies[35195] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 16, 2020 at 5:49 PM, edited May 19, 2020 at 7:18 PM May 16, 2020 at 5:49 PM, edited May 19, 2020 at 7:18 PM link Permalink

By ‚totally independent‘ I meant that it is impossible to evaluate (direct link)

• En esta tienda tenemos una gran variedad de libros.
• Tenemos una gran variedad de libros.

as two sentences that can be equally treated - as if they were identical.

Either the source sentence has a local prepositional complement or not.

BTW: in the english translation neither of the two spanish versions is directly linked -
and in the two portuguese versions neither the spanish nor the french versions are directly linked...

I find this pattern of lack of accuracy (regarding linking) all over the place!

When you look at a sentence page the most prominent property is not the single translation itself but rather the multiple lines of links - direct and indirect. So when I am new to Tatoeba I would definitely consider this fact as an integral part of the service. However, after looking up a few examples and finding out pretty quick that the whole linking thing is incomplete as hell, I would surely ask myself how reliable this service is and very likely move on in search of something more accurate.


EDIT:

‪The term ‘‬local prepositional complement’ did cause some misunderstandings in the following posts.

So to clarify beforehand: This was just meant as a hint. I am not referring per se to the grammatical concept of using prepositions for adding additional information about placement of things or situations. May it be a particle, a preposition or even a dedicated case to convey the notion of placement - the point is that it somehow has to be explicitly mentioned in the very sentence in order to be taken under consideration for any translation. Presuming implicit context will lead to inconsistent results.

Reducing a noun phrase to a pronoun (Our firm -> we) may be eligible for some translations from certain languages but the other way round would just be guesswork.

So I should have better said:

Either the source sentence contains any explicitly mentioned notion of placement or not.
However, the point I am trying to make is the inconsistency in the usage of direct/indirect links that inevitably lead to wrong assumptions or misunderstanding.

For the practical consequences in the Tatoeba corpus see further down

https://tatoeba.org/eng/wall/sh...#message_35230

{{vm.hiddenReplies[35196] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 16, 2020 at 6:34 PM May 16, 2020 at 6:34 PM link Permalink

Siitä vain linkkejä tekemään. Vapaaehtoisuuteen perustuva tietokanta tarkoittaa, että jos haluat jotain, tee se itse.

Suomennan tällä hetkellä tanskankielisiä lauseita. Linkitän niitä samalla muihin lauseisiin, joista olen riittävän varma. Mutta en linkitä omia käännöksiäni samalla, koska silloin minun pitäisi avata jokainen niistä omaan välilehteensä. Tein tätä jossain vaiheessa; se on hyvin vaivalloista.

Käyttöliittymää pitäisi muuttaa merkittävästi, jos haluaisi näyttää siinä, mitkä näkyvät lauseet ovat linkitettyjä toisiinsa ja jos haluaisi mahdollistaa näiden linkkien helpon lisäyksen ja poiston.

Aiji Aiji May 17, 2020 at 2:22 AM May 17, 2020 at 2:22 AM link Permalink

> I would surely ask myself how reliable this service is and very likely move on in search of something more accurate.

Good luck with that.


> Either the source sentence has a local prepositional complement or not.

That's a very funny statement considering that the original sentence you were talking about is in Japanese, a language about which you apparently have no knowledge, and also considering that all the languages listed in your profile are Western European.
A thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how languages work is a very wrong approach. I think that's why we let this part to linguists and other specialists.


I'll stop here and not comment on the "service" part to avoid my good friends around here to advice against my unnecessary aggressiveness.

{{vm.hiddenReplies[35198] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 18, 2020 at 11:56 AM May 18, 2020 at 11:56 AM link Permalink

> A thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how languages work is a very wrong approach. I think that's why we let this part to linguists and other specialists.

Another thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how to interpret some other persons profiles is a very wrong approach. I think that's why we let this part to the profile owner - whether why he/she/it adds some language to the list or not.

There may be some of them who won’t add languages to their profiles unless they have lived for a certain period in a part of the world where these languages are spoken natively - although they might know a thing or two about these languages.

You see, sometimes appearances are simply deceiving...

If you don’t agree I suggest you implement some policies that comments and contributions are only allowed by certified ‘linguists and other specialists’ and instead let them do the tedious conveyor belt work we ignorant dummies do ;-)

I have to quote you again - “Good luck with that!”

So when some ordinary sheep criticizes a feature that may have been implemented by you - don’t take it personally - we only see the feature and its affect on us but we don’t know YOU. I am sorry that I have rattled your cage...

And what a shame that you only consider those as friends who pat you on the shoulder but not the other ones who challenge you to sometimes have a more thorough look at things that you simply might have gotten used to over time.

Nevertheless, thanks for keeping on adding useful features to Tatoeba in the future. Much appreciated!

{{vm.hiddenReplies[35222] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji May 19, 2020 at 2:02 AM May 19, 2020 at 2:02 AM link Permalink

> You see, sometimes appearances are simply deceiving...

Did I say something about your knowledge in languages other than the one listed in your profile? No, because I have no way to judge that.
Did I imply that your comment was Western European-biased? Yes, because it was.
Did I say that you have no knowledge in Japanese? Yes, because if you'd knew "one thing or two" in Japanese, you wouldn't make such the following statement.
> Either the source sentence has a local prepositional complement or not.

So, even if I agree with you that appearances are sometimes deceiving, I think that in this case, they're far from being deceiving.

Now, for the second part of your comment, which is only here to provoke and is near the zero-level of argumentation, it amused me, so I will answer.
- First of all, to belittle yourself so others will take your side only works in school yard or political debate. You're not a sheep. Nobody says you're a sheep. And nobody criticized your right to criticize.
- You're talking things that have nothing to do with my comment, about the feature and how I was hurt, but again it doesn't work. If you have nothing to oppose to a comment, don't get lost in argumentation about unrelated topics. If you're debating alone, it's an ok-strategy but if somebody answers you, that's a loss of credit.
- I think one is free to consider who their friends are. You should read what I wrote again because I don't think I said that people who don't agree are not my friend, but more something like "my friends here", period. Again, you're arguing on a statement that you wrongly extrapolated from a simple one. It doesn't work.

If we were in speech class, I would conclude by saying that your argumentation is a very good approach if you're giving a speech to a mass audience, but a very risky one if you're arguing with someone, because it relies more on extrapolation than solid ground. If you want to lead a discussion, you have to answer or counter-attack with solid, logical counter-argument(s) that destabilize the opponent part.

Nevertheless, thanks for keeping on giving feedback on your experience on Tatoeba. Much appreciated.

TRANG TRANG May 17, 2020 at 7:52 AM May 17, 2020 at 7:52 AM link Permalink

> I find this pattern of lack of accuracy (regarding linking) all over
> the place!

Could you point out some examples?

The Spanish sentences you mentioned were both translated from Japanese. And based on the comments on #8764247, they are both considered valid translations of the Japanese sentence. So there's no problem with the links there.

You are perhaps misunderstanding something about the structure of the Tateoba corpus.

{{vm.hiddenReplies[35199] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 19, 2020 at 7:55 AM, edited May 19, 2020 at 10:53 AM May 19, 2020 at 7:55 AM, edited May 19, 2020 at 10:53 AM link Permalink

I don’t derive any conclusion whatsoever by taking into account the original japanese sentences.

You wrote above:
> As a general rule: if B and C are correct translations of A, B and C don't have to be correct translations of each other.

That is understood, no problem with that.

But if the original sentence e.g. somehow includes some notion that is remotely related to a local complement expressed with a preposition in many western languages and therefore can be translated in two different ways

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)
• En esta tienda tenemos una gran variedad de libros. (C)

then this ‘special property’ - this ‘duplicity’ as it were - is rooted in the source language (A) and has to be an equally valid argument for any translation into any other language.

So with that in mind - if I now compare Group (B)
• We have a wide choice of books.
• Tenemos una gran variedad de libros.
• Nous avons un large choix de livres.
• Nós temos uma ampla variedade de livros.
etc.

and respectively Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

- you get the point - then I would subsequently assume that they ALL had to be directly linked to the japanese source sentence.

But what I find instead is
• Tenemos una gran variedad de libros. (DIRECT)
• Nous avons un large choix de livres. (INDIRECT)
• We have a wide choice of books. (DIRECT)
• Nós temos uma ampla variedade de livros. (INDIRECT)

• En esta tienda tenemos una gran variedad de libros. (DIRECT)
• In this store, we have a wide variety of books. (INDIRECT)

Neither is there consistency within a group of identical sentences nor with regards to the intricacies of the source language.

And it goes on in the english version of Group (B)
Sentence #280025: We have a wide choice of books.
• Nous avons un large choix de livres. (DIRECT)
• Tenemos una gran variedad de libros. (INDIRECT)
• Nós temos uma ampla variedade de livros. (DIRECT)

I can even see
• En esta tienda tenemos una gran variedad de libros. (INDIRECT)

however no trace of
• In this store, we have a wide variety of books. (INDIRECT)

whereas in the spanish version of
Sentence #8764247: En esta tienda tenemos una gran variedad de libros.
• We have a wide choice of books. (INDIRECT)
• Tenemos una gran variedad de libros. (INDIRECT)

both indirect links are shown but in the identical english version
Sentence #8768755: In this store, we have a wide variety of books.

none of the indirect links show up!!!

———————————————————

So what should someone new to Tatoeba conclude?

Sentence #123970: 当店にはいろいろな種類の本がございます。
• En esta tienda tenemos una gran variedad de libros. (DIRECT)

This must be wrong because the INDIRECT english version ‘should’ be correct? Or vice versa? Or do just

• En esta tienda tenemos una gran variedad de libros. (DIRECT)
• In this store, we have a wide variety of books. (INDIRECT)

not correspond with each other because they are obviously handled differently regarding to their linking to the japanese original?


If you are not wildly proficient in a multitude of languages and cross culture domains it’s pretty much impossible to come to a conclusion on your own, except for the conclusion that maybe there is either something very complicated going on under the hood that doesn’t meet your understanding and expectation of how it “should” work - or the quality of the service is not sufficient! Neither of both conclusions is desirable!

And as I already stated above:

> When you look at a sentence page the most prominent property is not the single translation itself but rather the multiple lines of links - direct and indirect.

So the average person will evaluate the cross language facility as being the prominent feature of Tatoeba and hence expect a certain accuracy and quality of the service.

The current workflow for contributors however doesn’t allow for a decent management and maintenance of the linking system when providing a new contribution.

Essentially, with every new translation we gain quantity and even an automatic ‘free link’ but at the same time we introduce a multiple of inaccuracies for every already existing translation that is not correctly being linked to the new entry immediately, which of course reduces the overall quality of the whole corpus, provided that you don’t see the “single sentence approach” as the main service of Tatoeba but rather how the whole prominent linking system plays well together.

{{vm.hiddenReplies[35230] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 19, 2020 at 8:34 AM May 19, 2020 at 8:34 AM link Permalink

Jos ymmärrän argumenttisi oikein, niin koet Tatoeban käyttöliittymän antavan olettaa, että se sisältäisi kaikki mahdolliset suorat ja epäsuorat linkit lauseiden välillä.

Tämä ei nykyään ole tilanne ja tuskin kukaan Tatoebaan lauseita lisäävä henkilö olettaa tämän olevan totta. Syitä on muutamia:

1. Kuten aiemmin mainittu, ja kuten sinäkin mainitset, käyttöliittymä tekee tämän vaikeaksi.

2. Jotta henkilö voisi lisätä linkin kahden lauseen välille, tulee hänen osata näitä molempia kieliä riittävän hyvin. Jos palvelua ei käytä kukaan, joka osaa vaikkapa sekä viroa että italiaa, ei näiden kahden kielen välisiä lauseita tule kukaan linkittäneeksi suoraan. Lisäksi kaikki käyttäjät eivät saa linkitettyä lauseita helposti.

3. Käyttäjät ovat erimielisiä siitä, millaiset lauseet tulee linkittää toisiinsa. Esimerkiksi jotkut linkittävät lauseet kuten ”Hei.” ja ”Hei!” toisiinsa, toiset eivät.

Muitakin esteitä saattaa olla.

Näistä esteistä 2 lienee väistämätön. Emme halua ihmiset linkittävän lauseita, jos he eivät ymmärrä niitä molempia riittävän hyvin.

Estettä 3 voi madaltaa keskustelemalla siitä, millaiset lauseet tulee linkittää toisiinsa ja mitä ei.

Estettä 1 voisi madaltaa käyttöliittymää muuttamalla.

Olen melko varma, että konkreettiset ehdotukset ja työpanokset näiden esteiden poistamiseksi otetaan ilolla vastaan.

TRANG TRANG May 19, 2020 at 12:52 PM, edited May 19, 2020 at 3:15 PM May 19, 2020 at 12:52 PM, edited May 19, 2020 at 3:15 PM link Permalink

Okay, let's consider this set of sentences:

[JPN] 当店にはいろいろな種類の本がございます。(#123970)
[SPA] En esta tienda tenemos una gran variedad de libros. (#8764247)
[ENG] In this store, we have a wide variety of books. (#8768755)

If I understand correctly, your problem is that [ENG] is shown as an *indirect* translation of [JPN]. You think that if [SPA] is a direct translation of [JPN], then [ENG] should be a direct translation of [JPN] as well.

Assuming someone who speaks both English and Japanese agrees with that, then we have two ways to solve this inaccuracy:
1) An advanced contributor has to link [ENG] to [JPN].
2) A regular contributor has to add a translation to [JPN] that is the exact same text as [ENG].

In both cases, [ENG] will become a direct translation of [JPN].

The inaccuracy that you've seen everywhere is the result of another rule: if A is translated into B and B is translated into C, C is not necessarily a valid translation of A. A human has to confirm that A and C are equivalent.

In our set of sentences, what happened was:
- [JPN] was translated into [SPA].
- [SPA] was translated into [ENG].

By the rule I just mentioned, we cannot automatically assume that [ENG] is a translation of [JPN]. We have to wait until someone explicitly makes these two sentences translations of each other.

If you are very confident that [ENG] is a valid translation of [JPN], then go ahead and add it as a translation.

{{vm.hiddenReplies[35234] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 19, 2020 at 1:58 PM May 19, 2020 at 1:58 PM link Permalink

Don’t get me wrong, I have never said that this is a task that can be automated. It’s obvious that it takes some human interaction to link these cases. That is the reason why I was questioning the available workflow.

My idea is that if (A) gets translated into (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including with (A).

Or can you think of a situation where this rule would be ambiguous?

BTW: Is the creation of the first link when translating a sentence the only automated creation of a link or are there other rules that could cause an automated creation of a link, be it direct or indirect?

Because if there is no auto-creation beside the first one I am wondering where all those INDIRECT inconsistencies like

• Tenemos una gran variedad de libros. (DIRECT)
• Nous avons un large choix de livres. (INDIRECT)
• We have a wide choice of books. (DIRECT)
• Nós temos uma ampla variedade de livros. (INDIRECT)

come from. We already have established the two ways for direct linking but someone must have created those INDIRECT links, too. Who or What is responsible for their existence?

{{vm.hiddenReplies[35235] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 19, 2020 at 2:30 PM May 19, 2020 at 2:30 PM link Permalink

Epäilen, että yhtäpitävien lauseiden kokoelmia on melko vähän. Esimerkiksi persoonapronominit ja artikkelit ovat erilaisia kielestä toiseen, mikä poistaa jo monia yhtäpitävyyksiä.


Epäsuora linkki:

Jos A on linkitetty B:hen ja B on linkitetty C:hen, niin A ja C ovat linkitetty epäsuorasti.

Lauseiden välillä on epäsuora linkki jos ja vain jos ne ovat kahden käännöksen päässä toisistaan, mutta eivät yhden.

Epäsuoria linkkejä siis ei niinkään tehdä, vaan niitä muodostuu suorien linkkien seurauksena.

{{vm.hiddenReplies[35236] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 19, 2020 at 3:07 PM, edited May 19, 2020 at 3:13 PM May 19, 2020 at 3:07 PM, edited May 19, 2020 at 3:13 PM link Permalink

Thanks :-)

Well, that explains a lot! So you are essentially inheriting one indirect link with every automated direct link on creation of a new translation as well as when linking two sentences manually.

1. What happens if somewhere along the chain somebody decides to unlink a sentence? Is there also some automated unlinking of indirekt links going on?

2. Is there a way to find out whether
• a generated indirect link on automated creation (when adding a new translation) is in reality - as seen from a human perspective - more likely going to be useful as a direct link or an indirect one
• a generated indirect link on manual creation (when linking in post production) is in reality rather going to be useful as a direct link or an indirect one.

So all those incorrect indirect links I was referring to in my post above seem to be just wrong guesses of the automation which is essentially always applying an indirect connection. Either at creation or re-linking.

I am wondering whether the error quote would be bigger or smaller when applying a direct link as default. Or do you guys just play save by saying “better an incorrect indirect link than an incorrect direct link” - no matter of the hit ratio?

{{vm.hiddenReplies[35238] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 19, 2020 at 3:19 PM May 19, 2020 at 3:19 PM link Permalink

1. Ymmärtääkseni epäsuorat linkit lasketaan suorien perusteella, eli jos suora linkki katkaistaan, epäsuoria luultavasti katoaa (mutta joskus katkaistu linkki muuttuu epäsuoraksi).

2. En ole tietoinen automaattisista tavoista pohtia linkkien pätevyyttä. Tämä olisi luultavasti hankalaa. Joku tekoälytutkija voisi sellaisen saada tehtyä.

Itse ajattelen epäsuoria linkkejä kahdessa roolissa:

a: Ne ovat työkaluja suorien linkkien luomiseksi. Voin muuttaa niitä suoriksi linkeiksi kätevästi.
b: Jos suoria linkkejä ei ole, ne antavat silti jonkinlaisen käsityksen lauseen merkityksestä. Ne siis toimivat epäluotettavampina suorien linkkien korvikkeina.

En siis pidä epäsuoraa linkkiä kahden merkitykseltään eroavan lauseen välillä ongelmana, vaan välttämättömänä seurauksena epäsuoran linkin määritelmästä/luonteesta.

...

Jos suoria linkkejä sovellettaisiin transitiivisesti, virheitä tulisi valtavasti. Lauseiden ei tarvitsisi edes olla monimutkaisia.

Esimerkiksi: He swims. <-> Hän ui. <-> She is swimming.
Tai: Tu manges. <-> You are eating. <-> Vous mangez.

TRANG TRANG May 19, 2020 at 3:59 PM May 19, 2020 at 3:59 PM link Permalink

As a general tip: you need to visualize the sentences as a graph.

There is a short explanation in the wiki about the structure of the corpus:
https://en.wiki.tatoeba.org/art...-is-structured

{{vm.hiddenReplies[35240] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 19, 2020 at 7:25 PM May 19, 2020 at 7:25 PM link Permalink

Trang, would you mind giving a short statement about the rule I mentioned a little further above:

> My idea is that if (A) gets translated to (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including to (A).

Or can you think of any situation where this rule would be ambiguous?

{{vm.hiddenReplies[35241] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 20, 2020 at 5:41 AM May 20, 2020 at 5:41 AM link Permalink

Järjestelmä ei sisällä tietoa siitä, mitkä lauseet ovat toistensa täsmällisiä käännöksiä.

TRANG TRANG May 20, 2020 at 9:36 AM May 20, 2020 at 9:36 AM link Permalink

Your rule is conflicting with the rule I mentioned: if A is translated into B and B is translated into C, C is not necessarily a valid translation of A.

Just replace C with B1, then B2.

Here's an example with a graph visualization: https://imgur.com/a/YMwGyS7

- In case #1, your rule works, you can have (A) directly linked to (B1) and (B2).
- In case #2, your rule doesn't work. It would be wrong to directly link (A) to (B1) and (B2).

{{vm.hiddenReplies[35243] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 20, 2020 at 11:39 AM, edited May 20, 2020 at 12:03 PM May 20, 2020 at 11:39 AM, edited May 20, 2020 at 12:03 PM link Permalink

@Thanuir
@TRANG

Sorry, but I should have rather linked the citation instead of just copying it. Because you missed the context. So I give you two examples from above with their context.

Both are meant to be executed by a HUMAN and not expected to be solved by ML.

1. ——————————————————
> Don’t get me wrong, I have never said that this is a task that can be automated. It’s obvious that it takes some human interaction to link these cases. That is the reason why I was questioning the available workflow.

> My idea is that if (A) gets translated into (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including with (A).

2. ——————————————————

But if the original sentence e.g. somehow includes some notion that is remotely related to a local complement expressed with a preposition in many western languages and therefore can be translated in two different ways

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)
• En esta tienda tenemos una gran variedad de libros. (C)

then this ‘special property’ - this ‘duplicity’ as it were - is rooted in the source language (A) and has to be an equally valid argument for any translation into any other language.

So with that in mind - if I now compare Group (B)
• We have a wide choice of books.
• Tenemos una gran variedad de libros.
• Nous avons un large choix de livres.
• Nós temos uma ampla variedade de livros.
etc.

and respectively Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

- you get the point - then I would subsequently assume that they ALL had to be directly linked to the japanese source sentence.

————————————————————
————————————————————

Example 2 is just an extended version of example 1 due to the fact that the japanese source can be split into two translations
• Our store... -> Group (C )
• We... -> Group (B)

We know for fact that (B) is the translation of (A).
We know for fact that (C) is the translation of (A).

So if I, as a human, assess that
• translation (B) is an exact unambiguous translation of all the unambiguous sentences in Group (B) then all the Group (B) members should also be a direct translation of (A).
• translation (C) is an exact unambiguous translation of all the unambiguous sentences in Group (C ) then all the Group (C ) members should also be a direct translation of (A).

And by ‘unambiguous’ I mean that all ‘du/Sie/You/Usted’ ambiguities are taken under consideration.

Of course I have to blindly rely that the information (A)==(B) respectively (A==(C ) is correct.

And of course (B)!=(C ) [is not equal]

Do you see any problem with this approach? Or let me phrase it differently:

Can I safely draw this LOGICAL conclusion about (B) respectively (C ) even only having little or no LINGUISTIC knowledge about the SOURCE language (A)?

{{vm.hiddenReplies[35245] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 20, 2020 at 2:24 PM May 20, 2020 at 2:24 PM link Permalink

Yritän kirjoitta kysymyksen ensin omin sanoin, jotta saat tietää, ymmärsinkö sen oikein vaiko enkö.

Ehdotuksesi on, että mikäli ymmärrät lauseet B ja C ja niiden merkitys on mielestäsi täsmälleen sama, ja jos A on linkitetty B:hen ja jos linkki on luotettava, niin myös A ja C olisivat suoraan linkitettäviä.

Olisin erittäin varovainen tämän kanssa. Jos olisi olemassa kaksi lausetta, B ja B', jotka molemmat ovat lauseen C tarkkoja käännöksiä, mutta niiden välillä on sävyero, niin et voi tietää vastaako A kumpaa niistä, vai kenties kumpaakin.

Esimerkiksi:
C = "Hän on siellä aina." <-> B = "He is always there."
Lausetta C' = "Hän on aina siellä." ei ole linkitetty noihin, mutta se olisi täysin pätevä ja tarkka käännös B:lle.

Sinulla on jokin lause A, joka on linkitetty lauseeseen B. Onko se hyvä käännös C:lle? Ehkä. Ehkä kyseinen kieli sallii myös englantia vapaamman sanajärjestyksen, tai sisältää jonkin muun tavan tehdä hienovaraisia eroja.

Kuitenkin minä sanoisin, että lauseiden B ja C välillä ei ole merkityseroa. Lause C' ei välttämättä tulisi mieleen, kun ajattelisin asiaa.

Epäilen siis, että onko inhimillisesti mahdollista olla varma siitä, että kaksi lausetta tarkoittavat täsmälleen samaa asiaa, tai että onko käsite edes mielekäs.

Käytännössä Tatoebassa on äärettömästi työtä kenelle tahansa, joten käyttäjä voi aivan hyvin keskittyä linkittämään lauseita tuntemissaan kielissä. Harvoin se työ loppuu kesken.

{{vm.hiddenReplies[35248] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 20, 2020 at 2:51 PM, edited May 21, 2020 at 8:38 PM May 20, 2020 at 2:51 PM, edited May 21, 2020 at 8:38 PM link Permalink

I never correlate B to C in any ways. I just threw in C because it was a concrete real life example of this japanese sentence that could be split into two different threads A - B respectively A - C...

Consider this example just being about A and B.

• If all sentences of Group B are unambiguously identical in several languages (B1, B2, B3)
• and sentence B is unambiguously identical with all the sentences in Group B

given that B is a DIRECT translation of A

• Can all sentences of Group B also be safely DIRECTLY linked to A?

————————————————————

In a concrete practical example

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)

Group (B)
• We have a wide choice of books. (B1)
• Nous avons un large choix de livres. (B2)
• Nós temos uma ampla variedade de livros. (B3)
etc.

————————————————————

You can ask the same question about A -> C without ever correlating B and C.

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• En esta tienda tenemos una gran variedad de libros. (C)

Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

———————————————————

I am LOGICALLY inferring a correlation between Group B and sentence A simply based
• on my knowledge of language B
• on my knowledge of languages of Group B (B1, B2, B3)
• on the fact that B is a direct translation of A

Is this vulnerable?

{{vm.hiddenReplies[35249] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 20, 2020 at 2:59 PM, edited May 20, 2020 at 3:00 PM May 20, 2020 at 2:59 PM, edited May 20, 2020 at 3:00 PM link Permalink

En uskaltaisi tehdä tuota, koska vaikka lauseet B minulle vaikuttaisivat yhtäpitäviltä, kenties A tekee erottelun tai sisältää vivahteen, josta en tiedä mitään.

B:n lauseet eivät välttämättä enää ole yhtäpitäviä, kun tämän vivahteen ottaa huomioon.

{{vm.hiddenReplies[35250] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 20, 2020 at 4:11 PM, edited May 20, 2020 at 4:15 PM May 20, 2020 at 4:11 PM, edited May 20, 2020 at 4:15 PM link Permalink

So you are essentially saying that a B1, B2, B3 speaker has to inevitably know [LINGUISTICALLY] language A in order to link to A and should not draw any [LOGICAL] conclusion at all.

Because this nuance in (A) that you are mentioning is intrinsic to (A) and must already have been taken into consideration by the A-to-B translator, otherwise B would already be incorrect.

(Of course if translator A-to-B makes his mind up later on, this would jeopardize the whole chain, but this is a general problem...)

Could you come up with an example of such a nuance because I personally can’t find any vulnerability in the logical approach yet and still consider it as an option until proven wrong. (I know I am a persistent thorough little sucker :-)

As you can surely tell my main background are Romance and Germanic languages.

So if e.g. the source A were English and I had to deal with e.g.

• You are

and I see B (German translation)

• Du bist

I can safely assume that these translations are correct

• Você é
• (Tu) eres
• Tu es
• (Tu) sei

However if I saw B

• Você é
• Ustedes son

I would know that these forms are ambiguous and wouldn’t draw any logical conclusions.

However if I saw B (Spanish translation)

• Nosotras

I would know that source (A) either is referring to women only or it doesn’t make any difference and the translation to Spanish just took the liberty to use its feminine form only - which is totally valid.

So without knowing anything of the source language (A) I can only consider linking languages that make that distinction too and do also use a feminine form, although the source might allow for a masculine form too.

Knowing the source language (A) of course I could see its real intention and in case of a bi-neutral source form even add a new Spanish sentence with its masculine counterpart.

If instead of ‘nosotras’ I saw

• Nosotros

that would tell me that the source A is either explicitly masculine or bi-neutral which would only allow for translations to languages that have a dedicated masculine form because I can’t evaluate source A.

However, if I see two translations, either of the same language or even across two different languages, where one uses a masculine form and the second a feminine form, I know that the source language allows for both, otherwise one of the two translations must be wrong.


So you see where I am going with that!

{{vm.hiddenReplies[35251] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 21, 2020 at 2:16 PM May 21, 2020 at 2:16 PM link Permalink

En linkittäisi lauseita, joita en ymmärrä.

Ei nyt tule helppoa esimerkkiä mieleen. Epäilen että tässä on kyse tasapainosta:
Jos tulkitset lauseiden yhtäpitävyyden tarpeeksi tiukasti, huteja ei välttämättä tapahdu, mutta et löydä montakaan tilannetta, jossa sääntöä pääsisi soveltamaan.
Jos tulkitset lauseiden yhtäpitävyyttä löyhästi, niin tulee virheitä.

mramosch mramosch May 20, 2020 at 1:15 PM, edited May 20, 2020 at 2:12 PM May 20, 2020 at 1:15 PM, edited May 20, 2020 at 2:12 PM link Permalink

> Jos A on linkitetty B:hen ja B on linkitetty C:hen, niin A ja C ovat linkitetty epäsuorasti.

> Lauseiden välillä on epäsuora linkki jos ja vain jos ne ovat kahden käännöksen päässä toisistaan, mutta eivät yhden.

> Epäsuoria linkkejä siis ei niinkään tehdä, vaan niitä muodostuu suorien linkkien seurauksena.

————————————————————

Does that automatically imply that if there are (in addition to A) more sentences (even from different languages) directly linked to (B) - let’s simply call them A1, A2, A3 etc. - then they would all be automatically indirectly linked to (C ) after (B) gets directly linked to (C )?

Or in other words, the creation of the

• direct link B - C

autocreates (or calculates)

• indirect link A - C
• indirect link A1 - C
• indirect link A2 - C
• indirect link A3 - C

given the fact that every member of Group A is directly linked to (B)?

————————————————————

@TRANG

So when my simplistic point of view above (autocreation!!!) has to be translated into the world of graphs (that you mentioned in another post) I guess there is no creation of any stored INDIRECT links but rather a ‘calculation on display’ based on the object graph.

Could you share some short thoughts of how this works internally. You mentioned nodes (sentences) and their connections (links) as the only objects in this graph. So is this some kind of one/many-to-one/many clusters working together?

Or - if this is asked too much - you could just simply provide a complete list under which circumstances links are created/calculated/generated.

DIRECT LINK
• creating a new translation (system is auto-creating a link)
• manually linking by authorized user



INDIRECT LINK





And in case you are wondering why I am interested in this information...

I am working on some suggestions for a better review workflow/UI and in order to be able to reasonably argue about improvements I’d prefer to have a pretty complete understanding of the underlying mechanics...

{{vm.hiddenReplies[35246] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 20, 2020 at 1:48 PM May 20, 2020 at 1:48 PM link Permalink

> Does that automatically imply that if there are (in addition to A) more sentences (even from different languages) directly linked to (B) - let’s simply call them A1, A2, A3 etc. - then they would all be automatically indirectly linked to (C ) after (B) gets directly linked to (C )?

Tietääkseni kyllä.

En tiedä, miten Tatoeba löytää epäsuorat linkit, mutta periaatteessa ne kaikki voisi selvittää tällä tavalla: voisi laskea naapuruusmatriisin toisen potenssin, muuttaa kaikki positiiviset luvut ykkösiksi, vähentää siitä naapuruusmatrisin ja vielä nollata diagonaalin. Tämä on matemaatikon, ei ohjelmoijan, ratkaisu, eli tuskin käyttökelpoinen.

Katso esimerkiksi https://en.wikipedia.org/wiki/A...#Matrix_powers

TRANG TRANG May 20, 2020 at 6:08 PM May 20, 2020 at 6:08 PM link Permalink

> Could you share some short thoughts of how this works
> internally.

Imagine two tables, "sentences" and "links".

"sentences" has the following columns:
- id
- lang
- text

"links" has the following columns:
- sentence_id
- translation_id

Whenever you add a new sentence (A), a new line is added in "sentences".
- id=1, lang=eng, text=A

Whenever you add a translation (B) to the sentence (A), a new line is added in "sentences", then two lines are added in "links".
- id=2, lang=fra, text=B
- sentence_id=1, translation_id=2
- sentence_id=2, translation_id=1

The tables I have described are part of the files that we distribute under "Sentences" and "Links" on our Downloads page.
https://tatoeba.org/eng/downloads

> I am working on some suggestions for a better review
> workflow/UI

I can already tell you what a better workflow and UI could look like in the grand scheme.

1) We should have a page that allows contributor to check sentences alone, without their translations. This ensures that the items in the table "sentences" are correct.
2) We should have another page that shows only a pair of sentences and let people confirm whether or not the two sentences are translations of each other. This ensures that the items in the table "links" are correct.
3) We should provide the possibility to attach meta-data to links, not just sentences.

{{vm.hiddenReplies[35252] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 20, 2020 at 9:17 PM, edited May 20, 2020 at 9:34 PM May 20, 2020 at 9:17 PM, edited May 20, 2020 at 9:34 PM link Permalink

OK - so let me get a little bit ‚unstructured‘ and ask you some loose questions to help me get on track. I will be calling this reciprocal pair of links a ‘connection’ from now on

1. Initially a connection between two sentences always points in two directions with the help of two links.
• A is a translation of B
• B is a translation of A.

2. Breaking a connection between two sentences is achieved by removing both links.

3. Removing both links of a connection
• sentence_id=1, translation_id=2
• sentence_id=2, translation_id=1
does not affect any other link to/from either of the two participants (id=1 and id=2)

4. Re-linking a sentence is achieved by breaking the old connection (removing 2 links) and establishing a new connection (adding 2 links)

5. Is the database considered as being inconsistent if for some reason one link of this pair survives and this ‘half connection’ only points in one direction? Something like
• A is a translation of B
• B is not a translation of A

6. Sentences that are considered as being ‘indirectly linked’ are simply sentences that are two hops away from each other.
• A==B
• B==C
• A—C

7. Finding all direct links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of sentence A.

8. Finding all indirect links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of every single result of the query conducted in (7.)

9. Turning an indirect link into a direct link is achieved by establishing a new connection (adding 2 links)
• A==B (existing connection)
• A==C (existing connection)
• B==D (existing connection)

• A—D indirect link because of two hops (A==B, B==D)

• A==D (newly created connection)

But creating A==D doesn’t change anything for the already existing 2 hop relationship between A—D (A==B, B==D) - so I am essentially left with a direct and an indirect link at the same time?!?!?

There is obviously something I got wrong at an earlier stage...

10. Is there a way to distinguish between
• a connection (2 links) that is automatically being supplied/created by the system when a user contributes a translation
• a connection (2 links) that is manually created by a user either by turning an indirect link into a direct link or by simply establishing a new connection

11. How can you determine/trace the owner of a sentence and all his/her metrics (sentence count etc.)

{{vm.hiddenReplies[35253] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 21, 2020 at 1:22 AM May 21, 2020 at 1:22 AM link Permalink

1, 2, 3, 4: Yes.

5: Yes. If the database says that A is a translation of B, but B is not a translation of A, then it is inconsistent.

6: Yes.

7:
> Finding all direct links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of sentence A.

Well, yes and no. The database is organized in such a way that you don't need to page through all links in the database in order to find the ones connected to the ID of sentence A. But aside from efficiency considerations, the net result is the same: you are looking for all links associated with the ID of sentence A.

8.
> Finding all indirect links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of every single result of the query conducted in (7.)

Again, the algorithm is much more efficient than that, but the net result is the same.

9.
> But creating A==D doesn’t change anything for the already existing 2 hop relationship between A—D (A==B, B==D) - so I am essentially left with a direct and an indirect link at the same time?!?!?

Yes. In fact, between any two sentences that are directly linked (A==D), there can also be any number of indirect links (A==E, E==D; A==F, F==D; and so on).

10.
> Is there a way to distinguish between
• a connection (2 links) that is automatically being supplied/created by the system when a user contributes a translation
• a connection (2 links) that is manually created by a user either by turning an indirect link into a direct link or by simply establishing a new connection

Yes. There are various ways that one could keep track of such a distinction.

11.
> How can you determine/trace the owner of a sentence and all his/her metrics (sentence count etc.)

A database is designed to let you execute such queries, and do it efficiently, as long as you have recorded the relationships. In the same way that you can have one table that keeps track of sentences and their IDs, and another that keeps track of the links between pairs of sentences, you can have a table that associates sentences with owners, or vice versa.

{{vm.hiddenReplies[35255] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 3:23 AM May 21, 2020 at 3:23 AM link Permalink

That was a fine and helpful response, Alan.

5. Does this happen sometimes with the Tatoeba database - and if yes, is this always due to a bug or are there other possible reasons.

Can such an inconsistency make its way until the final LINK output file?

7./8. Well I am sure that the server is index optimized for that application but if somebody wants to work with the two downloadable offline output files of the database would the methods I described above be the starting point for a query and I would have to write any optimization myself?

9. I am not sure whether you really understood where I was going here.

I really meant that for my direct link A==D I could also have an indirect link A—D in the same list due to the ‘2 hop rule derivation’ from (A==B, B==D).

A—D is always present and isn’t simply invalidated by the fact that I created an additional direct link A==D. Theoretically I could even have several identical indirect links A—D (derived from several different 2 hops) for one direct link A==D.

If that is true, then for a sentence UI presentation like the Tatoeba sentence page I had to diff all indirect links A—D against a potentially already existing direct link A==D in order not to have a translation show up as both, a direct link and an indirect link in one listing of translations?

10./11. I was a little confused here because Trang made it look like the whole Tatoeba database is just comprised of these two files - SENTENCES and LINKS - and everything could be interpolated and calculated from them :-)

But it seems there is more information stored about relationships of records.

So the most important questions for me right now are

12. Which events do create a direct link?

DIRECT LINK CREATOR LIST
• creating a new translation -> system is auto-creating a connection (2 links)
• manually linking by authorized user
• manually de-linking and re-linking to another sentence by authorized user



13. Are indirect links solely derived from ‘two hop derivations’ in the graph or are there other methods/events for creating (respectively other fields for storing) indirect links somewhere in the database?

INDIRECT LINK CREATOR LIST
• Derivation from the object graph of the LINK file (2 hop rule) by the system



14. Can a programmer at Tatoeba retrieve the following information from the database.

• What - in the DIRECT LINK CREATOR LIST - has initiated the creation of every individual connection in the database?
• What - in the INDIRECT LINK CREATOR LIST - was responsible for the existence of every individual indirect connection in the database?

{{vm.hiddenReplies[35257] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG May 21, 2020 at 1:15 PM May 21, 2020 at 1:15 PM link Permalink

9. We don't display a sentence as indirect translation if it is also direct translation.

The sentences are stored using a graph structure, but they are displayed using a table structure. When we turn the graph into a table, we have to make some choices.

Yes, technically speaking, if A = B and B = D and A = D, we could display D both in the list of "Translations" and the list of "Translations of translations". But we choose not too because it's not really useful.

10/11. The core of Tatoeba are those two files: sentences and links. If you would look at the rest of the files you'll see there's more.

The exact structure of Tatoeba's database is described here:
https://tatoeba.org/eng/wall/sh...#message_35234

12. Links are created when someone adds a translation or when an advanced contributor clicks the link button.

https://en.wiki.tatoeba.org/art.../intro-linking

13. Indirect links are solely derived from two hop derivations.

14. Your question is unclear. We can retrieve who has created the link. We can (but not always) retrieve if it was created by clicking on the "translate" button or by clicking on the "link" button.

{{vm.hiddenReplies[35262] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 9:00 PM, edited May 21, 2020 at 9:02 PM May 21, 2020 at 9:00 PM, edited May 21, 2020 at 9:02 PM link Permalink

13. So if I create a new sentence and there is no translation available yet, but I do indeed see some similar sentences that offer the opportunity for being useful as indirect links, there is no way of doing this explicitly because of the two-hop-rule?

So what is the procedure to achieve this, how do I place my new sentence two hops away from all the potential candidates for getting an indirect link?

10./11. Gonna have a look at those links, thanks!

5. (after Alan‘s answer) ???

7./8. (after Alan#s answer) ???

————

{{vm.hiddenReplies[35273] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 21, 2020 at 9:10 PM, edited May 21, 2020 at 9:11 PM May 21, 2020 at 9:10 PM, edited May 21, 2020 at 9:11 PM link Permalink

One possibility is to leave a comment like I left on this sentence.

[#2645879] Tom has stopped crying. (CK) *audio*


Related:

[#6355107] Tom isn't crying anymore. (CK) *audio*


Eventually, perhaps these are related closely enough that some language can use the same sentence as a translation for both and then they will become indirectly-linked to each other.

{{vm.hiddenReplies[35274] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 9:21 PM May 21, 2020 at 9:21 PM link Permalink

I understand, but I would rather like to know a way to achieve this right away instead of waiting for better times.

{{vm.hiddenReplies[35276] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 22, 2020 at 7:00 AM May 22, 2020 at 7:00 AM link Permalink

Epäsuoria linkkejä syntyy sitä enemmän, mitä enemmän linkkejä tietokannassa on.

Yleisesti voit siis kääntää niin monia lauseita kuin osaat, niin monella tavalla kuin osaat.

Jos haluat tietylle lauseelle käännöksen, voit pyytää jotakuta, joka kääntää kyseistä kielestä, kääntämään lauseen.

TRANG TRANG May 21, 2020 at 9:47 PM May 21, 2020 at 9:47 PM link Permalink

I will have to leave it to the rest of the community to answer questions that are still pending.

Note that we have a dev website where you can experiment as much as you need and are free to pollute the database with test sentences and test translations.

https://dev.tatoeba.org/

You can register a new account. If you need to be granted advanced contributor status over there so that you can use the linking feature, let me know what is your account.

rumpelstilzchen rumpelstilzchen May 22, 2020 at 5:09 PM May 22, 2020 at 5:09 PM link Permalink

> how do I place my new sentence two hops away from all the potential candidates for getting an indirect link?

You create a link between your new sentence and a sentence which has a direct link (i.e. it is "one hop away") to the "potential candidate".
But I don't understand why you want to have a sentence "two hops away" from another one. Can you give a concrete example?

> 5. Does this happen sometimes with the Tatoeba database - and if yes, is this always due to a bug or are there other possible reasons. Can such an inconsistency make its way until the final LINK output file?

I've found the following pairs in the current links.csv where only one part of the link is recorded in the database:
#247164 #5078553
#1423834 #3214227
#1918219 #1918220
#1918235 #1918236
#1943238 #3752771
#1943243 #3075927
#1943259 #3942190
#1943259 #3942318
#1943259 #3942320
#1943259 #3942329
#1943259 #3942351
#3778082 #5094901
#5755767 #7207453
#5755769 #7207455
#5850721 #5868889

Without further investigation I guess these inconsistencies are results of a bug (already fixed or still in the code).

> 7./8. Well I am sure that the server is index optimized for that application but if somebody wants to work with the two downloadable offline output files of the database would the methods I described above be the starting point for a query and I would have to write any optimization myself?

You don't need the optimization if you don't care that the query takes a little bit longer.
You could also import the data into a local database.

{{vm.hiddenReplies[35294] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 8:31 PM, edited May 22, 2020 at 8:38 PM May 22, 2020 at 8:31 PM, edited May 22, 2020 at 8:38 PM link Permalink

Thanks a lot for chiming in.

13. If I create a new sentence from scratch in my own language then there is no translation (direct link) available yet. OK?

I do however see some similar sentences that I want to show up as indirect links in the translation list beneath my naked new sentence, to indicate that they are ‘only similar’ - that’s what indirect links do.

However, according to the two-hop rule there is no indirect link without a direct link inbetween - translation of a translation!

So what hack of linking and unlinking do I have to perform to have a similar sentence show up as indirect link in the list of translations without having a directly linked translation yet?

Or is there a (legal) way of doing this explicitly?

5. I was wondering if there is a special reason behind duplicating/splitting every edge into two links if they only get created and destroyed in pairs anyway?

Instead of having two links
• A==B
• B==A

I could easily get along with
• A==B
and just read it out twice, the second time just in reverse (back-to-front).

In this case the links-file would only be half the size and inconsistency would be impossible.

7./8. I just wanted to make sure whether the two downloadable files give me any bells and whistles or whether I am all alone with the pure basic data and my approach of querying the whole list is the right starting point before any optimization should kick in.

Querying the whole set several times just seems to be so ridiculously expensive.

{{vm.hiddenReplies[35297] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen May 23, 2020 at 5:03 AM May 23, 2020 at 5:03 AM link Permalink

> I do however see some similar sentences that I want to show up as indirect links in the translation list beneath my naked new sentence, to indicate that they are ‘only similar’ - that’s what indirect links do.

Do they? I may be wrong (I'm rather new to Tatoeba) but I don't think indirect links are supposed to indicate similarity between sentences.
As I understand it they are only shown because they are helpful in finding indirect translations which could be turned into direct translations.
I guess your use case is described in https://github.com/Tatoeba/tatoeba2/issues/1902 but you want to have similarity between different languages.

(I would still be interested in a concrete example from Tatoeba's corpus.)

> So what hack of linking and unlinking do I have to perform to have a similar sentence show up as indirect link in the list of translations without having a directly linked translation yet?

Since indirect links don't exist in the database and are only calculated there is no way to do that without the direct link.

> 5. I was wondering if there is a special reason behind duplicating/splitting every edge into two links if they only get created and destroyed in pairs anyway?

Good question :-) I don't think that's necessary but that part of the code was written in 2010 so I don't know the reason: https://github.com/Tatoeba/tato...52525f1R45-R47

> Querying the whole set several times just seems to be so ridiculously expensive.

That's why I would import the data into a local database.
Did you write some program/script for querying the data? I guess the most time consuming part is reading in all the data.
And yes, the two files only contain the pure data, no bells and whistles included.

AmarMecheri AmarMecheri May 21, 2020 at 10:54 PM May 21, 2020 at 10:54 PM link Permalink

@mramosh
If someone translates the orphan German sentences into English or French, I could follow in Kabyle language and many others could do the same in their language provided that they understand well. It's my opinion, even though it could be "unfair" and considered as an indue advantage for the most used wideworld languages. In the same time, I suggest to you to follow our orphan Kabyle sentences where they are made visible by other translations. This could be reciprocally helpful for German and Kabyle sentences and further for all others.

{{vm.hiddenReplies[35280] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 11:19 PM May 21, 2020 at 11:19 PM link Permalink

> I suggest to you to follow our orphan Kabyle sentences where they are made visible by other translations.

I am told we shouldn’t translate from languages that we don’t speak just by deducing their meaning from already existing translations to languages that we know.

{{vm.hiddenReplies[35281] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri May 22, 2020 at 1:59 PM, edited May 22, 2020 at 2:17 PM May 22, 2020 at 1:59 PM, edited May 22, 2020 at 2:17 PM link Permalink

@mramosh
First, please notice that I had suggested reciprocity: German / Kabyle already translated and Kabyle / German already translated.
If you follow this reasoning suggested to you, most of the little used wideworld languages ​​will remain orphans. It goes against the spirit of a multilingual platform. Notice that I don't mind too much, not at all; I'm used to breaking my head to understand after multiple cross translations and painstaking research. If you follow my gaze, I think you will end up agreeing with me.
Only idiomatic expressions can be problematic, this difficulty being circumvented by an epistolary exchange between two or more people who really want to work for better mutual understanding. This was the case, especially with @AlanF_US, who worked wonders to understand Kabyle idioms that the great masses do not master, except the initiates.

{{vm.hiddenReplies[35285] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 2:15 PM, edited May 22, 2020 at 2:16 PM May 22, 2020 at 2:15 PM, edited May 22, 2020 at 2:16 PM link Permalink

So give me an example!

e.g. I find a Kabyle sentence that is directly linked to a German one.

How do you suggest to proceed from here and for what goal?

{{vm.hiddenReplies[35286] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri May 22, 2020 at 2:21 PM May 22, 2020 at 2:21 PM link Permalink

> e.g. I find a Kabyle sentence that is directly linked to a German one.
How do you suggest to proceed from here and for what goal?

If you read carefully, the answer is given in my above comment.

{{vm.hiddenReplies[35287] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 2:25 PM, edited May 22, 2020 at 3:17 PM May 22, 2020 at 2:25 PM, edited May 22, 2020 at 3:17 PM link Permalink

I couldn‘t figure it out, that’s why I was asking again.

The above seemed to me like you were asking us for adding a kab-ger translation by looking at some already existing kab-eng or kab-french translations...

So again:

e.g. I find a Kabyle sentence that is directly linked to a German one.

How do you suggest to proceed from here and for what goal?

{{vm.hiddenReplies[35288] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri May 22, 2020 at 8:43 PM May 22, 2020 at 8:43 PM link Permalink

And vice-versa.

Ger-kab / Kab-ger
That's what I wrote above.

How? >> with help of both French-English..

For what goal? >>> for intercomprehension.

{{vm.hiddenReplies[35298] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 11:20 PM May 22, 2020 at 11:20 PM link Permalink

I am still not sure whether I understand you correctly, but if you want to have the german sentence (of a ger-kab translation pair) translated to English/French in order to increase the visibility of Kabyle by creating more translations and links between kab and engl./french, then you must ask an English/French native to translate these german sentences, not a german speaker. We only translate from English/French to German because that’s what we know well.

If I guessed incorrectly could you just step-by-step describe the workflow how I as a native German speaker can be of use in your endeavor.

{{vm.hiddenReplies[35304] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri May 24, 2020 at 1:24 PM May 24, 2020 at 1:24 PM link Permalink

@mramosh

> We only translate from English/French to German because that’s what we know well.

It's exactly what I said!

But don't worry about the Kabyle language.
We are tenacious!

mramosch mramosch May 23, 2020 at 4:19 PM, edited May 23, 2020 at 4:21 PM May 23, 2020 at 4:19 PM, edited May 23, 2020 at 4:21 PM link Permalink

Quick question:

If I have two portuguese sentences
• Temos uma ampla variedade de livros.
• Nós temos uma ampla variedade de livros.

and the equivalent two sentences in Spanish
• Tenemos una gran variedad de libros.
• Nosotros tenemos una gran variedad de libros.

both pairs just distinguishing a normal situation where the pronoun is not necessary from a situation where a more emphasized stress on the person is required (also emphasized by the intonation and the stress in a spoken audio sample).

The equality, despite the intonation, is ‘more or less’ comparable to a preference for structures like
• I know she is here.
• I know that she is here.

or contractions like
• She is here.
• She’s here.

So, when the sentence pair in Portuguese is already linked internally

• Temos uma ampla variedade de livros. == Nós temos uma ampla variedade de livros.

and the Spanish pair is linked the same way

• Tenemos una gran variedad de libros. == Nosotros tenemos una gran variedad de libros.

and I want to link the two languages, do I only link the two sentences with matching availability/unavailability of pronouns

• Temos uma ampla variedade de livros. == Tenemos una gran variedad de libros.
• Nós temos uma ampla variedade de livros. == Nosotros tenemos una gran variedad de libros.

or also cross link, without paying attention to the (anyways) redundant pronoun?

• Temos uma ampla variedade de livros. == Nosotros tenemos una gran variedad de libros.
• Nós temos uma ampla variedade de livros. == Tenemos una gran variedad de libros.

{{vm.hiddenReplies[35313] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir May 23, 2020 at 7:11 PM May 23, 2020 at 7:11 PM link Permalink

Jos suomentaisin espanjasta (mitä teen hyvin harvoin, koska osaan vain pari hassua sanaa espanjaa), niin linkittäisin pronominilliset lauseet toisiinsa ja pronominittomat toisiinsa, olettaen että pronominin läsnäolo tarkoittaa lauseen korostavan tekijää. En linkittäisi ristiin.

En linkittäisi kyseisiä lauseita kielen sisäisesti, koska ne kuitenkin korostavat eri asioita.

Mutta selvästi joku niitä linkittää kielen sisäisesti, joten ilmeisesti joku on kanssani eri mieltä. Sinä voit linkittää oman näkemyksesi mukaisesti.

Muiden tekemien linkkien katkaisemista paheksutaan ilman keskustelua linkin luojan kautta, siinä missä tarkoituksella tekemättömien linkkien luomista ei paheksuta. Muiden on siis kohteliaampaa lisätä puuttuvia linkkejä kuin poistaa olemassaolevia linkkejä. Jos siis et ole varma, lisää linkkejä varovaisesti.

mramosch mramosch May 14, 2020 at 8:25 PM May 14, 2020 at 8:25 PM link Permalink

It took quite some time to find out that in order to find untranslated sentences in a certain language you have to choose

• Translations: EXCLUDE sentences having translations that match all the following criteria
• Language: ANY LANGUAGE

Not very intuitive IMHO ;-)
What about a simple NONE option in the LANGUAGE drop-down like above in
• Sentences - Show translations in:

{{vm.hiddenReplies[35171] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK May 14, 2020 at 10:37 PM May 14, 2020 at 10:37 PM link Permalink

You can also use the advanced search to find which sentences by a given member have no translations.

This is an adaptation of an older "native speaker's contribution" page, with links separated into "with translations" and "with no translations."

http://tatoeba.ueuo.com/stats-2020-04-18v2.html

Click some of the "with no translations" links to see how this works.

{{vm.hiddenReplies[35173] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 14, 2020 at 11:41 PM, edited May 14, 2020 at 11:43 PM May 14, 2020 at 11:41 PM, edited May 14, 2020 at 11:43 PM link Permalink

Yes, I am doing this in the Advanced Search already...

But still, it would be more logical to say

• Search for all sentences with translations: NONE

than to say

• Search for all sentences and EXCLUDE all sentences with translations: ANY LANGUAGE

{{vm.hiddenReplies[35175] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji May 15, 2020 at 1:41 AM May 15, 2020 at 1:41 AM link Permalink

What's "intuitive" is often very subjective. What many people means by "intuitive" is actually "intuitive to me". For example, I'm a human being and not a machine so that's not really intuitive to me what "NONE" means in "Search for all sentences with translations: NONE"

To me YES / NO will be more intuitive than NONE. But since we're filtering translations, it does not really fit in the current form.

Actually, I personally can't see what is not intuitive in the sentence
"Exclude sentences having translations that match all the following criteria - Language: Any language"
Because that describes exactly what it does.

However, it's a known fact that the advanced search is a complicated feature and we are thinking about how we can make the search feature(s) more user-friendly in general. Therefore, your feedback is very much appreciated.

{{vm.hiddenReplies[35176] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 21, 2020 at 9:18 PM, edited May 22, 2020 at 2:20 AM May 21, 2020 at 9:18 PM, edited May 22, 2020 at 2:20 AM link Permalink

Well, what in my opinion would really be a big improvement, as well for simplicity as for extended usability, is a little restructuring of the advanced search.

Right now it is divided into three sections

• Sentences
• Translations
• Sort

These are essentially all for reducing the dataset to a filtered minimum for display but don’t give me a lot of choice what I really want to see.

Maybe I am missing out on some functionality because of ignorance but until now I could not achieve certain results that seem essential to me and should not take too much effort to be implemented.

So I would suggest a 4th section SHOW with the same feature set like the other sections give or take.

So if I e.g. wanted to have only sentences with audio displayed I could do this.

Right now I can only filter for base sentences that have/have not/any audio but the result still shows me all the direct/indirect links, whether THEY have audio or not.

I hope this makes sense.

{{vm.hiddenReplies[35275] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji May 22, 2020 at 2:12 AM May 22, 2020 at 2:12 AM link Permalink

It totally makes sense. Beside the feature you explain, many people have asked many options to be extended or added to the advanced search.

I could link to many GitHub tickets but it's not very relevant considering their numbers. However, the advanced search, and the search in general, is receiving attention from developers. gillux has been working hard on refactoring the code so the feature(s) can be improved in the future. All we can do for now is to be more patient until improvements arrive.

In the meanwhile, could you explain why the possibility of searching for sentences only if they have translations with audio would be helpful to you?

{{vm.hiddenReplies[35282] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 22, 2020 at 3:13 PM, edited May 22, 2020 at 3:16 PM May 22, 2020 at 3:13 PM, edited May 22, 2020 at 3:16 PM link Permalink

Well, e.g. if I wanted to listen to some audio only - for checking audio-to-text parity or whatever...- in a monster like sentence #2, I could display

e.g. only direct translations with audio. That would reduce the list from several hundreds of lines down to maybe ~10... - I can’t even tell because you loose track of counting lines with an active speaker symbol when scrolling through these endless lists ;-), and that would give me a much better (even visual) idea of how many translations with audio a given sentence has.

Imagine I were searching for a certain combination of words and got as a result 20 or 30 sentences. Only seeing direct links with audio (about 5 to 10% or even less) would let me correlate these results much easier than having to look at the other 95% of interlaced direct and indirect links without audio, which are polluting the visible result, and ruining the experience when just intending to listen to audio line by line.

mramosch mramosch May 22, 2020 at 3:46 PM, edited May 22, 2020 at 3:47 PM May 22, 2020 at 3:46 PM, edited May 22, 2020 at 3:47 PM link Permalink

https://tatoeba.org/eng/wall/sh...#message_35176

Again - you are missing my point here.

It‘s not about which of the ‚inklusive‘ or ‚exclusive‘ approach is the better one or who‘s logical analysis is the superior or more commonly used one.

It‘s all about adding a simple missing NONE to the list in order to complete the set of options and hence filling a gap which would accommodate an easier workflow for those who‘s brains are wired that way.

It is not changing or cutting anything from your preferred method, it’s just simply complementing other people‘s workflow.

{{vm.hiddenReplies[35290] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji May 23, 2020 at 6:47 AM May 23, 2020 at 6:47 AM link Permalink

The whole point of my message is that "I personally prefer this" is irrelevant. Good job on understanding close to nothing. Read again. And check your Dunning-Kruger curve.

{{vm.hiddenReplies[35308] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch May 23, 2020 at 7:58 AM May 23, 2020 at 7:58 AM link Permalink

And here we go again - a pissed off little boy wields and brandishes his patronizing index finger...

Wasting 3 paragraphs on just drawing the attention to

• “I personally prefer this" is irrelevant.

is utterly “irrelevant” in itself in the context of a proposal to add an option to the search function. Your conduct is highly unproductive and totally beside the point.

So please, just refrain from polluting my messages with unnecessary digressions and ill-mannered personal attacks!

Even if you think they might be justified!

End of story