clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5,868 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages subdirectory_arrow_right

mramosch

an hour ago

subdirectory_arrow_right

CK

an hour ago

subdirectory_arrow_right

mramosch

an hour ago

subdirectory_arrow_right

mramosch

an hour ago

subdirectory_arrow_right

CK

an hour ago

subdirectory_arrow_right

mramosch

an hour ago

subdirectory_arrow_right

CK

an hour ago

subdirectory_arrow_right

mramosch

an hour ago

subdirectory_arrow_right

AlanF_US

an hour ago

subdirectory_arrow_right

AlanF_US

an hour ago

mramosch mramosch 4 hours ago, edited 4 hours ago May 27, 2020 at 8:37 AM, edited May 27, 2020 at 8:43 AM link Permalink

Is there a way to force an “upper case sensitive search’ for a word? This would - in German - reduce the amount of false positives considerably when searching for a noun that distinguishes itself from the verb just because of capitalization/case sensitivity?

e.g. Das Mitbringen...

{{vm.hiddenReplies[35355] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US an hour ago May 27, 2020 at 11:29 AM link Permalink

There is no way to do this via our search, but you can do an ordinary case-insensitive search and use your browser (for instance Ctrl-F in Firefox) to search case-sensitively on the page.

As we have things set up, there is a single index for each language, which is case-insensitive for all languages that have capitalization. Using a case-sensitive index would require users to type capital letters when they need them, which would mean extra work. Furthermore, the fact that words are capitalized at the beginning of the sentence, irrespective of whether they are acting as a noun, could cause either false positives or false negatives.

In the case of "mitbringen/Mitbringen", I see 61 hits, none of which is capitalized. Those 61 results fit on a single page (if you have things configured so that you have 100 results per page), so a case-sensitive search is very quick.

{{vm.hiddenReplies[35358] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch an hour ago, edited an hour ago May 27, 2020 at 11:40 AM, edited May 27, 2020 at 11:41 AM link Permalink

‚mitbringen/Mitbringen’ was just the sentence that made me decide to ask for this option but I had other occasions where 1 single case was hiding in hundreds of counterparts, and that was difficult to trace as you can imagine.

I was rather thinking of something in the lines of adding a specifier (in the way ‚=‘ is used) right before the ‚word at hand‘ to request a separate case sensitive search for this word only but in the context of an entire phrase...

{{vm.hiddenReplies[35360] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK an hour ago May 27, 2020 at 12:03 PM link Permalink

If you know how to do it, the easy way would be to download the exported files and do case-sensitive searches offline.

{{vm.hiddenReplies[35366] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch an hour ago May 27, 2020 at 12:12 PM link Permalink

Thanks, but...

I wanted to do the search on the most recent data possible and I wanted to do it inline, right in the place where I am working on something...

mramosch mramosch 2 hours ago, edited 2 hours ago May 27, 2020 at 10:38 AM, edited May 27, 2020 at 10:39 AM link Permalink

Every now and then I see comments regarding some audio problems and people pinging @CK to help out, but at the same time I am told that CK has deactivated all notifications to his account and won’t see them.

Is there any official user/pseudo-user account like @audio where audio issues can be directed to without getting lost in oblivion?

{{vm.hiddenReplies[35356] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US an hour ago May 27, 2020 at 11:37 AM link Permalink

You can send him a private message. While you're at it, you can ask him to turn on e-mail notifications so this kind of thing doesn't happen in the future. It wouldn't be the first time he's gotten that request.

No, there is no account called "audio". If there were, CK would still have to check it, since he's the one who deals with issues that need to be resolved by working with the audio. (Other admins can resolve issues that pertain to text.)

Seeing that you are an advanced contributor and can leave a tag, it's a good idea to leave a tag like "@change", since sentences with this tag are reviewed frequently.

{{vm.hiddenReplies[35359] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch an hour ago, edited an hour ago May 27, 2020 at 11:56 AM, edited May 27, 2020 at 11:58 AM link Permalink

I am only involved in some comment-discussions because of the small amount of contributions I have delivered yet and am already drowning in e-mail notifications.

I can imagine what it means to be in CK‘s shoes, so I won‘t bother him on this one. I do it often enough for other issues... ;-)

I‘m just hinting that at least official administrative tasks should have a maintainer with some functioning channel of communication. I guess that’s what commenters are expecting from a comment section by default instead of having (for every request) to look up names of responsible persons, because they might change along the way,

CK CK an hour ago, edited an hour ago May 27, 2020 at 11:41 AM, edited May 27, 2020 at 11:41 AM link Permalink

You can tag the sentence with "@change audio" and leave a comment explaining exactly what needs to be done.

If these don't get fixed in a reasonable amount of time, send me a private message.

There are a number of items tagged @change audio that are hard to deal with, since there are no clear explanations about what needs to be done.

{{vm.hiddenReplies[35361] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch an hour ago May 27, 2020 at 11:42 AM link Permalink

But is everybody eligible for using tags?

{{vm.hiddenReplies[35362] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK an hour ago May 27, 2020 at 11:56 AM link Permalink

No, but every logged in member can leave a comment asking for someone to tag a sentence.

{{vm.hiddenReplies[35363] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch an hour ago, edited an hour ago May 27, 2020 at 12:00 PM, edited May 27, 2020 at 12:07 PM link Permalink

Well, if that’s not redundant and complicated then what...? ;-)

Yesterday I filed about 15 requests after listening and digging through 1000 of entries in an audio list, I don‘t wanna do these redundant steps of asking anybody else to ask some friend of a friend of a friend - over and over again...

mramosch mramosch yesterday, edited yesterday May 25, 2020 at 3:36 PM, edited May 25, 2020 at 3:55 PM link Permalink

DATA: CREATION, GATHERING and LOSS of information

I think it‘s safe to say that every contributor on Tatoeba is trying to make the most use of their time by finding workflows that suit and complement their skills and knowledge about certain languages.

Every time we consume blocks of information (by reading sentences, going through metrics, analyzing graphs etc.) and correlate them in our brains with other blocks, we essentially create new information on the fly. Some of this pieces of information can be made permanent by contributing or translating sentences, or linking existing ones. So subsequently new pieces of information can be deduced from this data and made available in the UI, like indirect links etc.

However, when I observe my own workflow I realize how much useful knowledge I create on the fly - a vast temporary set of data that is impossible to be remembered in a structured way and therefore gets thrown out of short-term memory very soon.

A consequence of this data loss is that a lot of redundant work has to be done over and over again by every single contributor.

In the last weeks I have given these underlying workings and mechanics a lot of thought and with the help of many of you I have come to some conclusions, which I’d like to share.


THE LINKING SYSTEM:

The simple design of the linking system makes it on the one hand very easy to understand and work with, on the other hand I consider it as being one of the major culprits for how valuable data gets lost.

Indirect links are ‘generated only’, or more precisely put, they are ‘two-hops-relationships’, translation of translations that are not stored but rather deduced from the generic SENTENCE and LINKS datasets.


PROBLEM:

So one way of contributing to Tatoeba is to turn these indirect links into direct ones if need be. That also means we are reading through tons of indirect links and evaluate for every single one of them whether it should be converted or not.

However, when we assess it to be a worthy candidate and we link it to the base sentence on top of the list, we completely ignore the fact that the remaining already checked indirect links - the non eligible candidates - ARE in fact non-eligible. This information is nowhere to be stored! They remain as a big unstructured blob.

So when the next contributor takes their turn, the whole list has to be re-assed entirely -and against the benefit of maybe gaining one or two conversion from indirect links into direct ones, stands this vast inefficient redundant task of having to re-read the list from scratch every single time.


PROPOSAL:

So I am proposing the introduction of a second LINK file that stores the information of a sentence being explicitly marked as an indirect translation of the base sentence. This second LINK list is just an extension and does in no way interfere with the existing dataset.


BENEFITS:

The UI can now display indirect links in two ways (e.g. different colors or different symbols) and present them in a grouped fashion, like direct links appear as a group on top of each sentence list.

So when a contributor is checking for indirect links in the languages they are working with, they can still press the link symbol, but instead of immediately triggering the linking-action, being presented with a small call out selection box

• Direct link (explicit)
• Indirect link (explicit)

which on selection is
• adding a link to the LINK dataset
• adding a link to the new LINK2 dataset

Because of the grouped presentation the next contributor who checks for similar languages doesn’t have to go through the whole set of indirect links anymore but only through the group of not explicitly marked indirect links in order to find candidates for an indirect-to-direct conversion, knowing that the explicitly marked ones have already been checked by someone else, and hence are not eligible or suitable for being direct links.

And if they wanted to double-check the already checked explicit ones they would also find them in a convenient group for doing so.

FURTHER IDEAS:

I haven’t got the complete picture of direct-links mechanics yet but I could very well image that after further investigation the same distinction could be useful for direct links, e.g. to show under which circumstances a direct link was created (system, user explicit, de-coupling etc.).

If such information is valuable to accommodate a certain workflow it could also be used to display direct links in a system of colors/symbols to further enhance information density and quality of the linked sentence list.


CONCLUSION:

My goal here is to provide a facility to make the linked sentence list as accurate as possible by getting rid of as many implicit indirect links as possible in favor of gaining direct links and explicit indirect links.

As an additional bonus we would also be able to create indirect links to new sentences that have no translations yet, and will therefore never show up as similar translations in any list because of the two-hops-rule.

This side effect is essentially extending the model from indirect links just being translations of translation, to being a first class member - a real indicator for related sentences that have similar meanings. This would IMO drastically increase the value of the linked sentence list and the corpus in general.

And because of the modular approach with different LINK datasets and its non-interfering character, extended features could be introduced step-by-step over time.

{{vm.hiddenReplies[35336] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US yesterday, edited yesterday May 25, 2020 at 5:27 PM, edited May 25, 2020 at 5:30 PM link Permalink

> So one way of contributing to Tatoeba is to turn these indirect links into direct ones if need be. That also means we are reading through tons of indirect links and evaluate for every single one of them whether it should be converted or not.

Please take into account that your focus and workflow may not match other people's. From the comments you've made recently, I think that you are particularly concerned with indirect links, and with judging whether they can and should be made into direct links. But the way that many (most?) people look at them, and the way that the database was set up, they are an epiphenomenon, something useful, but not essential, to producing direct links, and also useful in figuring out a likely meaning of a sentence.

> My goal here is to provide a facility to make the linked sentence list as accurate as possible by getting rid of as many implicit indirect links as possible in favor of gaining direct links and explicit indirect links.

By "explicit indirect links", I guess you mean indirect links that are marked as NOT being good direct links. The effort of marking up links in such a way feels like clearing a beach of sand with a teaspoon. I have the same attitude towards this idea as I do toward massive marking of sentences as OK in languages (such as English) where the overwhelming majority of sentences are OK. (One objection is that the task is so mind-numbing that even sentences with errors end up being marked as OK.) But even that gargantuan task has a number of people willing to do it whose only qualification is a knowledge of English. Who would do all this markup, given that we have a limited number of people, each of whom is knowledgeable in a limited number of language pairs?

It's important to take into account cognitive load (how much information someone can keep in their mind), screen real estate (how much room there is on the display), and computational complexity when proposing a feature. It's also worthwhile to balance it against all the other bugs that need to be fixed and features that could be implemented. It's great that you're thinking about how to improve our system, but I can't say I'm enthusastic about this proposal.

{{vm.hiddenReplies[35337] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch yesterday, edited yesterday May 26, 2020 at 1:03 AM, edited May 26, 2020 at 2:21 AM link Permalink

Let me first summarize the points you have made and quickly list them here in a non-chronological order that suits my line of thought and then respond to you in 3 different ways, being

• A. Picture a short UI scenario with a potential workflow without arguing too much
(let us see how you‘ll feel afterwards about the points you made, just letting it sink in as unbiased as possible)

• B. Give a short answer to the points you raised.

• C. Add some additional thoughts to the short version if necessary

That should make it an easy digestible read and maybe liven things up a bit on your enthusiasm scale ;-)


————————————————————

> 1. ‘Indirect Links’ - their prominence, usefulness but (non-)essentiality
> 2. Cognitive load - screen real estate - computational complexity
> 3. The effort of marking up links in such a way feels like clearing a beach of sand with a teaspoon.
> 4. Who would do all this markup, given that we have a limited number of people, each of whom is knowledgeable in a limited number of language pairs?


And yes, by ‘explicit indirect links’ I am referring to the links that would come from the new LINK2 dataset file I proposed to be implemented, and not from the two-hop deduction of the current LINK file - these would be the ‘implicit indirect links’.


————————————————————
A. - THE SCENARIO
————————————————————

• You open up a sentence page and you see the bold faced source sentence with 50+ linked sentences listed beneath

• About 6 of them are marked with a fat blue arrow
• 2 more are marked with a fat blue hyphen only (without the pointing arrow part)

• Then there are those 40+ gray links
• First the ones marked with a gray arrow, maybe 25+
• then the ones marked with a gray hyphen

This is exactly what you see in the current UI - except that neither the blue marked direct links nor the gray marked indirect links are divided into an ‘arrow’ group and a ‘hyphen’ group. But everything else is unchanged!

Today you wanna link some sentences because you know that the system is adding a lot of indirect links to the list every time you create a direct link, and many of them are likely to be potential candidates for direct links too.

• So you look up the sentences in the languages you are specialized in
• and you find 3 in the ‘blue block’ and 3 in the ‘grey block’
• You are of course interested in the ‘implicit indirect links’ marked with gray arrows
• and of course you have to read them sequentially

• You read the first sentence marked with a gray arrow and decide that it is a candidate for a direct link
• You click on the chain icon and choose the option ‘Direct Link’
• The gray arrow turns into a blue arrow instead
• and gets shifted up to the other ‘blue’ sentences immediately

• You read the next sentence and find it to have a similar meaning but not being eligible for a direct link
• So you click on the chain icon and choose ‘Indirect Link (explicit)’
• The gray arrow turns into a gray hyphen indicating that it is registered as not being eligible for a direct link anymore
• and gets immediately shifted down in the group of other sentences with gray hyphens

• Reading the third sentence you are not sure whether it is eligible for a direct link or not, so you simply leave it as is

• You do this for all the languages that you are confident of being enough experienced in.

Sometimes later someone else has the same idea of direct linking some sentences and comes across the same sentence. Having a similar language expertise this someone will only find the sentences you couldn’t decide upon in the list of gray arrows.

Nobody will redundantly be confronted with the already gray-hyphen marked sentences over and over again.


————————————————————
B. - SHORT COMMENT
————————————————————

> 1. ‘Indirect Links’ - their prominence, usefulness but (non-)essentiality

Look at our sentence page from above

• 1 source sentence (slightly bold faced)
• 8 sentences marked with blue arrows
• 40+ sentences marked with gray arrows!!!

I don’t buy in in the argument of non-essentiality when about 75% of the page is filled with sentences marked with gray arrows, especially not as a consumer of the corpus.

————————————————————

> 2. Cognitive load - screen real estate - computational complexity

As you have seen in the SCENARIO above, the interface has not changed at all besides compartmentalizing and displaying the former gray arrow section in two groups (arrows, hyphen) to distinguish between implicit and explicit indirect links

• implicit links being deduced from two-hops in the LINK dataset
• explicit links derived from human evaluation and input (LINK2 dataset)

The chain symbol now lets you choose your linking option.

So I can’t see any bad influence in regards to cognitive load or screen real estate issues.
Computational Complexity has to be assessed by the developer team.

————————————————————

> 3. The effort of marking up links in such a way feels like clearing a beach of sand with a teaspoon.

In the current interface, if you are reading a sentence in the indirectly linked section and you decide to move it over to the directly linked section you have to link it by clicking on the chain symbol. That’s according to you the whole purpose of displaying the indirectly linked sentences anyways.

But if you already have read the sentence anyways and found it not eligible for a direct link you might as well explicitly link it ‘indirectly’ - in the same manner, in one go.

All it requires is to have the chain symbol present a little option call out to choose between ‘direct and indirect link’ (instead of triggering linking directly).

There is almost zero overhead involved for the first contributor but every subsequent contributor would greatly benefit from not having to assess the whole list of already read and evaluated entries again, just because an earlier human assessment was not stored properly.

In addition you could easily check all explicit links for linking errors because they are nicely grouped together.

One quick look tells you how many “unstable” system deduced indirect links you have that are ready for being converted to direct links or ‘stable’ explicit indirect links.

————————————————————

> 4. Who would do all this markup, given that we have a limited number of people, each of whom is knowledgeable in a limited number of language pairs?

Well I guess bullet 3 already gave away the solution.

When you read a sentence for the purpose of linking then link it in either case, according to its eligibility...


————————————————————
C. - SOME ADDITIONAL THOUGHTS
————————————————————

> 1. ‘Indirect Links’ - their prominence, usefulness but (non-)essentiality

Why I am relatively concerned with indirect links is because even taking into consideration their originally intended use and the history of the structure of the corpus, the gray arrows have become very prominent in the UI and this infers a certain ‘importance’ to the user of the corpus, who subsequently expects a certain standard of correctness with regards to the assignment to the ‘direct’ or ‘indirect’ group of sentences. If they find a lot of obvious direct links marked with gray arrows only, they might think, why bother at all about a distinction.

It’s not only the contributor who sees the UI but also the user this corpus is intended for. And as consumer they might not see the gray block merely as a springboard for a streamlined creation of direct links but rather as a source of additional information and inspiration. And as long as it is also a UI for data consumption I think the displayed information deserves the highest possible level of accuracy.

In addition I find it a very conservative approach to refrain from upgrades in functionality (even if it sometimes requires a little paradigm shift) just because of clinging to some legacy conventions.

Rising from a mere convenience existence to a 1st class member of the data structure with relatively little effort (IMO) could be a good thing for indirect links and have positive consequences on the quality of the overall service.


————————————————————

> 2. Cognitive load - screen real estate - computational complexity

The division of the ‘blue” group into blue arrows and blue hyphen is just hinting to a possible extension of the ‘gray indirect link model’ over to the blue direct links group - as shortly mentioned in the FURTHER IDEAS section in the original post above. This is music of the future so lets focus on the gray indirect links.


————
CONCLUSION
————

The UI is the mirror to the soul of the service. And you have to like it to do loads of tedious work in it. Every improvement goes a long way in attracting contributors who eventually keep the whole thing going. No contributors/users - no need for a service.

Being able to display the results of a query as condensed as possible without too much noise can boost productivity. That’s why I am trying to bring little things to your attention

https://tatoeba.org/eng/wall/sh...#message_35275
https://tatoeba.org/eng/wall/sh...#message_35171

Arguments like ‘Please take into account that your focus and workflow may not match other people's’ are some nice pieces of advice, however, as long as they are not based on metrics or measurements they are not very helpful. A handful of people discussing -with their own preferences in mind - are very unlikely a good representation of what is really going on in the wild.

I recently talked to @deniko and he said ‘Well, one of my favorite things to do on Tatoeba is linking sentences.’ So, who knows what is really needed. There are obviously enough sentences being linked, which means they must have been read before. So chances are that 50% have been read and temporary knowledge about them has been discarded instead of being stored.

An argument like ‘It's also worthwhile to balance it against all the other bugs that need to be fixed and features that could be implemented’ of course must be accepted and can’t be reasonably argued against.

brauchinet brauchinet yesterday, edited yesterday May 25, 2020 at 6:03 PM, edited May 25, 2020 at 6:10 PM link Permalink

The Advanced Search offers a possibility to find indirect translations:
for example: German sentences containing the word "Boot" with indirect translations in French:
https://tatoeba.org/eng/sentenc...&sort_reverse=

You could then easily turn these into direct translations where appropriate.

(Unfortunately, the search doesn't display more than 1000 sentences - that's the reason why I choose "Boot"-sentences).
Or you could regularily go through all the recently added German sentences and see if they have indirect links to French and vice versa.

I know it's not what you proposed.

I feel the same as Alan - not many members will be particulary keen to focus on linking. Adding sentences, translating sentences, even correcting sentences involves more creativity. It's just more fun.

Also, I noticed that linking is relatively error-prone. People often do it on the fly, maybe with languages they don't know so well.

{{vm.hiddenReplies[35338] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US yesterday May 25, 2020 at 8:42 PM link Permalink

Good point, brauchinet. I hadn't thought about the fact that we already provide searches that display this information.

Two more points:

(1) Every time a new table (like the table of indirect links that mramosch proposed) is added, it now needs to be reviewed whenever anything it refers to changes. That means that every time that either of a pair of linked sentences is modified, or deleted, the table would need to be revisited.

(2) People will have different opinions on whether two sentences should be linked. It's common enough that person A would be reluctant to link two sentences, but wouldn't protest if person B did it, provided that person B had a good justification. With the approach that mramosch is proposing, person A would have to decide between:
- making a direct link
- blocking a link for everyone (which would require some kind of override/undo mechanism if people decided that this was not justified)
- doing nothing (which would be like what we're doing now, but doesn't conform to the model of giving each direct link a thumbs-up or thumbs-down vote that mramosch is trying to achieve)

{{vm.hiddenReplies[35340] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch yesterday, edited yesterday May 26, 2020 at 1:42 AM, edited May 26, 2020 at 12:38 PM link Permalink

(1) As it is a table of links it follows the same rules and suffers the same problems as the already existing LINKS table that correlates sentences with translations.

Changing code upstream may break code downstream.

Mechanisms like blocking sentences after audio has been assigned is an example of protection against such cases.

The actual implementation has to be decided by the developer team, that’s not my main point. Be it a link table or any other database record field.

I just figured that the link table is as generic as possible and wouldn’t interfere or mess with the existing data structure, just add on.

(2) There is never any blocking of a link going on.

Clicking on the chain icon of an ‘explicit indirect link’ lets you always choose between
• Direct link
• Indirect link (implicit)

which simply deletes the link in the new dataset and, in the case of ‘Direct link’ also creates a new link in the existing LINK file.

Just like breaking and creating links works with regular direct links.

And doing nothing just means you are leaving it for another person to decide whether a sentence eventually is a ‘direct link’ or an “explicit indirect link”.

Both are stable states, evaluated and executed by humans.

An implicit indirect link is always an unstable decision made by the machine and should be escaped from as soon as possible.

{{vm.hiddenReplies[35343] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 17 hours ago, edited 17 hours ago May 26, 2020 at 8:07 PM, edited May 26, 2020 at 8:18 PM link Permalink

> An implicit indirect link is always an unstable decision made by the machine and should be escaped from as soon as possible.

This sentence sounds really strange to me.

You aren't familiar with graphs, are you?

Consider the following graph:
...A
../
.B
..\
...C

Here C is indirectly linked to A. Now when you add a direct link between A and C you get:
...A
../.|
.B.|
..\.|
...C

Now A and C are both directly and indirectly linked. Adding the direct link doesn't remove the indirect link. We just do not show it anymore in the UI (as we don't show "indirectly-indirectly-linked" sentences (i.e. "3-hop-links") and so on).
As long as there is a direct link between B and A and B and C there is also a indirect link between A and C independent of all the other links in the database. I don't see any "unstable decision made by the machine" here.

Edit: Unfortunately whitespace isn't preserved when showing a wall message. You can ignore the dots in my ASCII art, they are only there to align the rest. (Note to myself: we should probably fix this)

{{vm.hiddenReplies[35347] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 16 hours ago, edited 14 hours ago May 26, 2020 at 8:45 PM, edited May 26, 2020 at 10:51 PM link Permalink

> Now A and C are both directly and indirectly linked. Adding the direct link doesn't remove the indirect link. We just do not show it anymore in the UI

We already have discussed this with Trang in another message, I am totally aware of this fact.

By “unstable” I am not referring to data integrity but rather to the fact that it is not established yet, whether it actually is a direct translation or just an indirect one for real.

By making it explicitly ’direct’ we establish its ‘direct’ status. By making it explicitly ’indirect’ we establish its ‘indirect’ status. But leaving it ‘implicitly indirect’ both status that matter in the real world are still possible. That’s why I said the machine made an ‘unstable’ decision.

mramosch mramosch 17 hours ago, edited 17 hours ago May 26, 2020 at 7:50 PM, edited May 26, 2020 at 7:57 PM link Permalink

> I feel the same as Alan - not many members will be particulary keen
> to focus on linking. Adding sentences, translating sentences, even
> correcting sentences involves more creativity. It's just more fun.n the on ono

Well, you have to widen your mind a little bit ;-)

I think you are not aware of the impact and consequences this approach has on other parts of Tatoeba.

• When you decouple a sentence you do not loose all the indirect links anymore just because they got deduced from the link you just broke. The ones that were made explicit will still populate your list.

And what’s even more amazing and crazy is the fact that you can deduce another ‚two-hop-relation’ with the explicit-LINK-table.

That means when you link two sentences the system populates the list of either one not only with

• (1) translations (direct link)
• (2) translation-of-translation (implicit indirect link)

but also with

• (3) all explicit indirect links from the direct links (two-hop)

which can immediately be used to be turned into direct links e.g.

So all the work that was done in the past can be leveraged even more and in different places.

And because (3) is deduced exactly the same way as (2) - just from a second LINK list - there is no hard-linking involved, which means breaking a link is clean and has zero side-effects ;-)

Thanuir Thanuir yesterday May 25, 2020 at 7:18 PM link Permalink

Mahdollisuus merkitä suora linkki huonoksi auttaisi toisaalta siltä kantilta, että se korvaisi nykyisen epäsymmetrian linkkien poistamisen ja lisäämisen välillä. Tulisi symmetrinen tilanne linkin lisäämisen ja kieltämisen välille.

rumpelstilzchen rumpelstilzchen 17 hours ago May 26, 2020 at 7:33 PM link Permalink

> As an additional bonus we would also be able to create indirect links to new sentences that have no translations yet,

I'm still interested in a concrete example (preferably from the current corpus) for this use case.

{{vm.hiddenReplies[35345] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 16 hours ago, edited 16 hours ago May 26, 2020 at 8:32 PM, edited May 26, 2020 at 8:35 PM link Permalink

Don’t ask me how this UI interface would look like for this application, because AFAIK there is even no dedicated interface for directly linking two sentences.

You can

• turn ‘implicit indirect links’ into ‘direct links’
• turn ‘explicit indirect links‘ into ‘direct links’ (hopefully soon ;-) - just joking...

or

• create a redundant new copy of an already existing sentence because your link partner doesn’t show up in the indirect links list and hope that you made no typo and the bot script cleans everything up properly.

As long as you consider indirect links only as translations-of-translations and a springboard for linking this may not be attractive for you, but if you consider them being similar sentences that are worth being present in the context of your source sentence then the following situation is easy to imagine

You enter a new sentence (A), and by default it has no links. And you know about a sentence (B) that is similar but not eligible for a “direct link” but you still want it to show up under your base sentence as indirect link.

You’d have to find a common friend (C ) that happens to be a direct translation of the sentence you want to add (B). And than with the hacks from above link (A) and (C ).

Now (B) is two hops away from (A) and will be shown in the list as indirect link.

Which essentially means - without (C ) no indirect connection between (A) and (B).

With the introduction of ‘explicit indirect links’ you don’t need this whole dance of finding and linking other sentences just in order to connect (A) to (B).

{{vm.hiddenReplies[35348] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 10 hours ago May 27, 2020 at 3:13 AM link Permalink

> Don’t ask me how this UI interface would look like for this application,

I'm not asking about the UI.
What I would like to get from you is a concrete example of two sentences which you want to indirectly link, e.g. "This house is green" should be indirectly linked to "Pierre habite à Paris."

> As long as you consider indirect links only as translations-of-translations and a springboard for linking this may not be attractive for you, but if you consider them being similar sentences that are worth being present in the context of your source sentence then the following situation is easy to imagine

Are you speaking of indirectly linking sentences in the same language? Have you read already https://github.com/Tatoeba/tatoeba2/issues/1902?

TRANG TRANG 16 hours ago May 26, 2020 at 8:37 PM link Permalink

> However, when we assess it to be a worthy candidate and we link it to the
> base sentence on top of the list, we completely ignore the fact that the
> remaining already checked indirect links - the non eligible candidates - ARE
> in fact non-eligible. This information is nowhere to be stored!

This issue (or at least a similar one) has been reported already:
https://github.com/Tatoeba/tatoeba2/issues/1980

I think the hardest part is to design the UI. If you have ideas on that, please draw them :) A picture is worth a thousand words.

In any case I don't consider this a high priority. If the goal is to optimize the workflow of linking, then it would be too early to optimize. We still have a relatively efficient way to link, mentioned by brauchinet and also documented here:
https://en.wiki.tatoeba.org/art...uickly-linking

As long as you can easily find sentences to link from this method of linking, there's no real need need to store which sentences are not translations of each other. When you get to point where 50% of the results are sentences you cannot link, then we can start thinking about how we can optimize.

{{vm.hiddenReplies[35349] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 15 hours ago, edited 15 hours ago May 26, 2020 at 9:31 PM, edited May 26, 2020 at 9:34 PM link Permalink

The goal is not only the optimization of the linking workflow but much more importantly

https://tatoeba.org/eng/wall/sh...#message_35346

which leaves us with even more sentences that can be linked, and in even different places.


And for the UI - as you can see in the SCENARIO above

https://tatoeba.org/eng/wall/sh...#message_35342

the interface has NOT to be changed at all besides displaying the currently gray arrow sentences in two flavors - arrows and hyphen - and group them accordingly, just as an obvious distinction between implicit and explicit indirect links.

The chain symbol lets you select your linking option
• for gray arrows: direct link vs. explicit indirect link
• for gray hyphen: direct link vs. implicit indirect link
• for blue arrows: decouple/de-link vs. explicit indirect link

And as a bonus addition you can display all those new deduced ‘explicit indirect links’ in the list of EVERY directly linked sentence

• e.g. with a gray big dot, instead of arrow and hyphen

which makes the list even more complete and fertile for further linking, translating or simply consuming and enjoying.

{{vm.hiddenReplies[35351] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 14 hours ago May 26, 2020 at 10:27 PM link Permalink

> which leaves us with even more sentences that can be linked, and in even
> different places.

We don't need create "negative links" in order to find more sentences to link. We can use the translations of the indirect translations, and their translations, etc.

The limitation of our current system is that we are using a relational database and this is not optimize to calculate the whole graph around a sentence. We can only read the translations of translations. After that, it's way too slow. With another type of database, we could read the whole chain of translations and suggest people to link sentences that are more than 2 hops away from each other.

> And for the UI

The design you have in mind would not be a good design because it would require twice the amount of clicks for a contributor to link sentences.

{{vm.hiddenReplies[35352] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 12 hours ago, edited 12 hours ago May 27, 2020 at 12:49 AM, edited May 27, 2020 at 12:58 AM link Permalink

Well, if the refinement into gray arrow-hyphen-dot groups is already almost indistinguishable from the current gray arrow-only design, then I guess it shouldn’t be that hard to find a solution for the remaining little chain symbol issue too. I am sure you’ll easily come up with something adequate, like click/double-click, two buttons etc. to avoid the additional click ;-)

> As long as you can easily find sentences to link from this method of linking, there's no real need need to store which sentences are not translations of each other. When you get to point where 50% of the results are sentences you cannot link, then we can start thinking about how we can optimize.

It seems that you are primarily concerned with the contributor perspective, my approach is also focused on the UX. As a consumer being able to assess the whole record with a single gaze (and as a contributor knowing immediately where to start work and where to put effort in) only because of a clever, structured presentation and the knowledge that e.g. a gray arrow means

• undefined status - yet to be defined

but the rest (gray hyphen, gray dot or blue arrow) means

• defined status - already checked by a human

is paramount because it gives me confidence in the accuracy of the service.

Seeing what extent of work has already been put into the record and assess its integrity just by the ratio of implicit:explicit (because everything essentially starts out as implicit without user involvement) gives me an idea of where to look best for reliable information and where to contribute improvements.

Thanks anyways for reading.

mramosch mramosch an hour ago, edited an hour ago May 27, 2020 at 11:26 AM, edited May 27, 2020 at 11:28 AM link Permalink

> The design you have in mind would not be a good design because it
> would require twice the amount of clicks for a contributor to link sentences.

Simply use an identical clone of the sentence page that (for clarity of intent) has a slightly modified chain symbol that on click offers this two-step process for selecting the target.

Contributors can switch back and forth between these two pages depending on the workflow they prefer for the task at hand.

Similar to switching between Normal/Edit Mode.

sharptoothed sharptoothed 2 days ago May 24, 2020 at 7:50 PM link Permalink

** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[35329] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 yesterday May 25, 2020 at 9:12 PM link Permalink

Thank you :D

Guybrush88 Guybrush88 yesterday May 26, 2020 at 7:15 AM link Permalink

thanks

deniko deniko 6 days ago May 21, 2020 at 1:14 PM link Permalink

Not sure whether it has been discussed before - sorry if it has - but it looks like if you're using the new interface you can only change the flag to a language that is listed in your profile. As a corpus maintainer I change the flag to something that is not listed there, and I do it quite often - obviously, confirming which one it should be, if I'm unsure. Is it possible to list all the languages in the drop down menu, similar to the old interface? I guess it's a useful feature for everyone, not just for corpus maintainers.

{{vm.hiddenReplies[35261] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 5 days ago May 21, 2020 at 2:10 PM link Permalink

It has not been discussed here on the Wall but I had sent some time ago a message to the corpus maintainers group in Gitter. I don't think a lot of people received it though because no one answered.

It has been originally mentioned by Aiji and CK in GitHub:
https://github.com/Tatoeba/tatoeba2/pull/2077

My questions would be:
- Is there a reason why you don't want to list the languages in your profile?
- Can you describe the usual situations where you'll change the language?
- How often exactly does it happen?

{{vm.hiddenReplies[35264] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 5 days ago, edited 5 days ago May 21, 2020 at 2:18 PM, edited May 21, 2020 at 2:19 PM link Permalink

> Is there a reason why you don't want to list the languages in your profile?

For example, because I don't speak them?


> Can you describe the usual situations where you'll change the language?

For example:

#8724171

I asked the user to change the flag. if they don't change it in 6 more days (2 weeks after I noticed it's wrong), I'll confirm with Lisa or someone else it's Japanese (which is my guess) and change it to Japanese.

> How often exactly does it happen?

Not too often, but I guess at least 1-2 times a month.

{{vm.hiddenReplies[35266] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 5 days ago May 21, 2020 at 2:51 PM link Permalink

> For example, because I don't speak them?

In this case, it would be better to let someone who speaks the language handle the change, wouldn't it?

On your side, when you can identify that a sentence has the wrong language, but you're not sure what is the correct language, you can change ASAP the language "Other language" so that it doesn't pollute the corpus for the languages that you speak.

Then you can leave a comment, where you can even ping directly some corpus maintainers: "This sentence was wrongly flagged as Ukrainian. I think it's Japanese. @small_snow @bunbuku @Pfirsichbaeumchen"

Then one of them will change it when seeing your comment if it is indeed Japanese.

{{vm.hiddenReplies[35267] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 5 days ago May 21, 2020 at 2:56 PM link Permalink

> In this case, it would be better to let someone who speaks the language handle the change, wouldn't it?

I don't believe so. If the task is as simple as changing the flag, I can ask a speaker of this language to confirm the flag. If the speaker is a corpus maintainer or an admin they can fix that themselves, sure, once I bring it to their attention, but if they're not? It's not like I edit those sentences or something.

So what if it's say Slovak? I don't speak it, I can make a reasonable guess it is Slovak, I know who to confirm it with, but I don't know any corpus maintainer/admin who has it listed to ask them to change the flag.

{{vm.hiddenReplies[35268] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 5 days ago May 21, 2020 at 5:02 PM link Permalink

> It's not like I edit those sentences or something.

But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

> So what if it's say Slovak? I don't speak it, I can make a reasonable guess
> it is Slovak, I know who to confirm it with, but I don't know any corpus
> maintainer/admin who has it listed to ask them to change the flag.

In that case would there be any issue for you to just add Slovak to your list of languages?

{{vm.hiddenReplies[35269] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 5 days ago May 21, 2020 at 5:11 PM link Permalink

> But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

Well, it's all up to you. I find it very safe to do so. I can do it anyway by adding Japanese to the list of my languages although I don't understand it at all, so I still don't see the point of making the task more complicated than it is now.

> In that case would there be any issue for you to just add Slovak to your list of languages?

Well, I'd have to list all Slavic and Romance languages there in this case... And some Germanic just because they're relatively easy to understand in their written form through the languages I already know.

I just don't want to make the list too long. I enjoy having a relatively short list of languages I'm interested in because this list is used in some places to facilitate search, etc. Besides, some features were discussed about using this list to prioritize sentences in those languages when you see translations, that would be a sweet feature, but I don't want to prioritize languages that I'm not truly interested.

{{vm.hiddenReplies[35270] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 5 days ago May 21, 2020 at 9:28 PM link Permalink

> I can do it anyway by adding Japanese to the list of my languages although
> I don't understand it at all, so I still don't see the point of making the task
> more complicated than it is now.

Because by principle, you shouldn't be changing the language of a sentence to Japanese if you don't know Japanese and there is someone else who knows better than you and can perform this change instead of you.

The fact that you can do it anyway is not an intended feature. It is because we don't care much that you can do it anyway, but ideally, we'd rather you don't do it.

> Well, I'd have to list all Slavic and Romance languages there in this case...

No, not necessarily all of them. You would only have to do this for languages that are lacking corpus maintainers and that you are willing to get involved in.

{{vm.hiddenReplies[35277] ? 'expand_more' : 'expand_less'}} hide replies show replies
deniko deniko 5 days ago May 22, 2020 at 8:03 AM link Permalink

Thanks for the explanation Trang.

gillux gillux 4 days ago May 22, 2020 at 3:48 PM link Permalink

What if the sentence is written in a language that has no corpus maintainer?

{{vm.hiddenReplies[35291] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 4 days ago May 22, 2020 at 4:17 PM link Permalink

The language can always be changed to "Other language" until someone can confidently assign the correct language.

{{vm.hiddenReplies[35292] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 4 days ago, edited 4 days ago May 22, 2020 at 4:53 PM, edited May 22, 2020 at 4:53 PM link Permalink

In that case, who would be that "someone"? Only corpus maintainers can change someone else’s sentence flag.

{{vm.hiddenReplies[35293] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 4 days ago May 22, 2020 at 6:00 PM link Permalink

That someone will be a corpus maintainer most of the time, yes.

But that someone can be an advanced contributor if the owner of the sentence is inactive. Advanced contributors can adopt the sentence and change the language upon becoming the new owner. This case is probably very rare though.

That someone could also be a regular contributor. If the owner of the sentence is inactive and the sentence is adopted by an advanced contributor who then decides to unadopt. This case is probably even more rare.

The most likely situation is that we find a corpus maintainer who has some knowledge in the language or who has taken the time to learn about the language and who is willing to take care of the language.

Among existing corpus maintainers, this would be typically someone like cueyayotl or Ricardo or shekitten who tend to have a broader interest in languages and could be temporary ambassadors for a language that doesn't have yet a native speaker as corpus maintainer.

Otherwise, there could be an advanced contributor who decides to step up and become corpus maintainer in order to take care of that language, seeing that it wasn't getting much love.

{{vm.hiddenReplies[35295] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 4 days ago May 22, 2020 at 9:43 PM link Permalink

I get your point, but I have to disagree. The principle of restricting flag changes only to corpus maintainers having the proper profile language makes sense in theory. But now you are facing 3 corpus maintainers (so far) asking for full flag selection. So maybe that is not so practical after all.

I assume that most (if not all) sentences in need of flag correction are the result of a user mistake (or bad UI) when adding the sentence. In this context, aren’t most flag corrections more about understanding how the mistake happened, what was the original intention of that particular member, rather than knowledge in a particular language?

In your point of view, it looks like the knowledge of a language is the only way one can accurately change someone else’s sentence flag. I disagree. I’d trust any corpus maintainer to change any sentence flag because they are all responsible and trustworthy. I know they will make the necessary research, ask the right person and make the correct decision, because that’s exactly what they are here for. Denying them this ability feels almost like you don’t really trust them after all. I wouldn’t be very happy about that if I were a corpus maintainer.

{{vm.hiddenReplies[35300] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago, edited 4 days ago May 22, 2020 at 10:28 PM, edited May 22, 2020 at 10:30 PM link Permalink

I think, I was the reason that this discussion started because out of ignorance I caused the flag problem that had to be fixed manually by Denis.

I tried to direct-link a sentence by creating an already existing sentence (that only had an indirect link) for a second time in order to change the link from being indirect to being direct. I was told this is the best way of doing it if you don’t have AC permission to do linking with the official linker tool.

Because I recently removed my language information from my profile I only got the options ‚English‘ or ‚Other language‘ when trying to submit the new sentence.

I chose ‚Other language‘ because I was told that the bot would automatically detect and correct everything to eventually be left with one correct version and a direct link.

But all I achieved was the appearance of a flag with a question mark that was stuck and couldn’t be changed anymore.

Maybe this is a bug in the UI or a use case that the UI is not prepared for?

{{vm.hiddenReplies[35303] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 4 days ago, edited 4 days ago May 23, 2020 at 1:04 AM, edited May 23, 2020 at 1:06 AM link Permalink

We have a duplicate-merging script that will merge exact matches together. If the language code (flag) is different, then the 2 sentences won't be merged because they are not an exact match.

CK CK 4 days ago, edited 4 days ago May 23, 2020 at 1:09 AM, edited May 23, 2020 at 1:09 AM link Permalink

> In your point of view, it looks like the knowledge of a language is the only way one can accurately change someone else’s sentence flag. I disagree. I’d trust any corpus maintainer to change any sentence flag because they are all responsible and trustworthy.

I agree with Gillux.

If there are corpus maintainers that you feel are not responsible and trustworthy then they should be changed back to advanced contributors.

TRANG TRANG 3 days ago May 24, 2020 at 10:51 AM link Permalink

I trust corpus maintainers for doing everything with their best intention but I don't think it's reasonable to expect every one of them to be equally competent for every task that corpus maintainers can perform.

For me this is about separation of responsibilities and about transparency. This is about how do we organize ourselves in the best possible way and how to make the features of Tatoeba reflect better this organization.

I know that in many cases it is possible for a corpus maintainer who has no tangible knowledge in a certain language to fix an error in sentence in that language. This includes: setting the correct flag, fixing the punctuation and even fixing very basic mistakes. But in my opinion, if you have the chance to involve another corpus maintainer who has more experience with the language than you have, this should be your default course of action (even if what you need to fix feels obvious to you). And if that is your default course of action, then you don't really need the full list of languages by default.

It doesn't mean you are completely forbidden from making interventions in other languages. If there's a special situation, you can always add the language to your profile at any time, with a low level or unspecified level. Since these situations are supposed to be exceptions, the fact that you have to take these extra steps shouldn't be too much to ask.

I can understand that it's not practical because you may end up with unwanted languages showing up in the dropdown list when adding/translating sentences. But then it's a different problem.

Besides, if you're dealing with a language which seemingly has no corpus maintainer and, as a result, you get in touch with native speakers or/and do some research on the language, then it would be helpful for the community to know it from your profile. The next time there's maintenance to do on other sentences in that language, people could reach out to you and benefit from your past experience instead of asking assistance from a random corpus maintainer.

{{vm.hiddenReplies[35317] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 2 days ago, edited 2 days ago May 24, 2020 at 5:53 PM, edited May 24, 2020 at 5:54 PM link Permalink

To me, identifying the language of a sentence belongs to a different category from modifying or deleting the sentence. In order to modify or delete a sentence, you need to know the language pretty well, and once you make the modification or deletion, it can become difficult or impossible to revert the change. But it requires a good deal less familiarity with a language to identify a sentence in it (or a sentence that does NOT belong to it), and in any case, an incorrect language identification is easy to switch later.

With the new UI design, changing language identification is grouped together with other operations on the sentence, so I can see why it might be tempting to require corpus maintainer privileges for all of them. But I don't think that logically or practically they fit into the same category.

{{vm.hiddenReplies[35327] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 2 days ago May 24, 2020 at 9:04 PM link Permalink

I agree with what you said but the question still remains.

With the new design, a corpus maintainer now only sees the languages in their profile when editing the language of a sentence.

Should we bring back the full list of languages (as it used to be in the old design) or can we keep it restricted to the profile languages?

{{vm.hiddenReplies[35332] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 2 days ago May 24, 2020 at 11:39 PM link Permalink

Sorry, I missed that question. I vote for showing the full list of languages.

{{vm.hiddenReplies[35334] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 2 days ago May 25, 2020 at 5:16 AM link Permalink

+1

brauchinet brauchinet 3 days ago May 24, 2020 at 12:01 PM link Permalink

This doesn't really belong here - a situation that I often witnessed:
Somebody writes a comment such as "Flag", "Change flag", "Not French!", "French!".
A corpus maintainer comes by and changes the flag.
The owner of the sentence reads the comment and goes "Huh? What?"

A possible solution would be to keep a log for changing flags. Don't know if it's worth it.

{{vm.hiddenReplies[35321] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pandaa Pandaa 2 days ago May 24, 2020 at 1:27 PM link Permalink

+1

TRANG TRANG 2 days ago May 24, 2020 at 9:07 PM link Permalink

> A possible solution would be to keep a log for changing flags.
> Don't know if it's worth it.

We have an old ticket for this already :)
https://github.com/Tatoeba/tatoeba2/issues/533

And yes, I think it would be worth it.

Aiji Aiji 4 days ago May 23, 2020 at 10:39 AM link Permalink

> But I'd argue that changing the language of a sentence can be just as important as editing the sentence itself.

> The fact that you can do it anyway is not an intended feature. It is because we don't care much that you can do it anyway, but ideally, we'd rather you don't do it.

What about linking sentences, where there are two languages involved?

Whatever the official guidelines become, in any case, please don't make the maintenance work a complicated ball of unnecessary bureaucracy. Right now, a non-negligible part of corpus maintenance rests on mutual help, both for detecting and correcting potential mistakes. And we sometimes need to trust each others judgment, even (in particular) in languages we aren't fully capable.

A simple, illustrative more than completely exact, example : There is a sentence in Hebrew that I suspect don't correspond to French. I will ask Alan about it. I will ask him to help me confirm the meaning of the sentence in English because that's our best common language. I'm sure that he will not mistake his explanation due to a lack of ability and confident that I will not mistake mine. If Alan can't help me, he may ask somebody else to help, maybe even in Hebrew instead of English, if that's there best common language. At the end, if my suspicion is confirmed, I will unlink the Hebrew from the French, although I don't speak a bit of Hebrew. There might have been a piece of information lost in translation but I trust that my fellow maintainers and I, at the best of our combined ability, did the good choice.

(The fact that Alan understands French somehow goes against my example, but I hope you get the idea I wanted to express).

{{vm.hiddenReplies[35310] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago May 23, 2020 at 11:31 AM link Permalink

It just looks to me like the half-empty vs. half-full syndrome or in other words

• It is incorrect as long as it is not proven to be correct
vs.
• It is correct as long as it is not proven to be incorrect

and by ‘proven’ I mean something along the lines of e.g. ‘confirmed by a native speaker’ etc.

Perhaps the guidelines on Tatoeba should be a little more clear of which approach is preferable as modus operandi when doing clean up work. I don’t wanna have some other AC or CM constantly having to clean up the mess after me just because I assumed that I have to make my own guidelines because of missing official policies.

It seems that in my first day of linking sentences I have already stepped in some dodo and although having read all those endless threads of near-duplicates, indirect vs. direct links etc. I never really found some conclusion that sounded like a recommendation, even if just a temporary one.

Aiji’s example is a good illustration of the issue, even with only having to deal with languages that are relatively high in the food chain. But imagine less prominent languages, where there is almost no way of gaining traction without relying on second hand guesswork via already existing translations into better distributed languages.

So, is there a general recommendation for the “permissive/prohibitive goes first” problem or does every use case have to be considered carefully in its own domain?

{{vm.hiddenReplies[35311] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 3 days ago May 24, 2020 at 11:26 AM link Permalink

> So, is there a general recommendation for the “permissive/prohibitive
> goes first” problem or does every use case have to be considered
> carefully in its own domain?

I think every use case should be considered carefully. When in doubt, just ask on the Wall.

TRANG TRANG 3 days ago May 24, 2020 at 11:12 AM link Permalink

> What about linking sentences, where there are two languages involved?

When there are two languages involved, you obviously don't have to have both languages in your profile. But you should know one of them.

If you don't know Arabic and don't know Kabyle, you shouldn't be linking Arabic and Kabyle sentences.

> Whatever the official guidelines become, in any case, please don't
> make the maintenance work a complicated ball of unnecessary
> bureaucracy.

There's no guidelines being changed here. The guideline of not making interventions in languages that you have no clue about is not a new thing, is it?

mramosch mramosch 3 days ago, edited 3 days ago May 24, 2020 at 10:30 AM, edited May 24, 2020 at 11:40 AM link Permalink

In Spanish, although there are clearly defined grammatical rules of when to use which past tense e.g., in practice there exist regional preferences that stick to one tense only and avoid some other tenses almost completely. (Preterite(Indefinido) vs. Present Perfect)

In German, same thing. The Past Tense (Imperfekt) is much more prominent in Germany. In Austria you may be writing ‚I bought a watch yesterday‘ but it sounds ridiculously posh when spoken out. Only ‚I have bought a watch yesterday‘ works in colloquial speech. And I think even Germans might in a certain context prefer it

• Ah, du warst gestern in der Stadt, Und? Hast du dir die neue, schicke Uhr gekauft?

instead of the clunky Imperfekt (Past Tense)

• Und? Kauftest du dir die neue, schicke Uhr?

——————————————————

How do we correctly incorporate such differences in the linking system?

One, who argues with the correct grammatical rules in mind, would apply less links. If the point of reference is reality then we might end up with spaghetti-linking all over the place, although being correct.

What is the ‚official‘ recommendation?

{{vm.hiddenReplies[35316] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 2 days ago May 24, 2020 at 4:42 PM link Permalink

Kielioppirakenteiden vastaavuus ei ole oleellista. Lauseiden merkitysten pitäisi vastata toisiaan.

En osaa saksaa, joten en kommentoi esimerkkejäsi sen enempää.

AlanF_US AlanF_US 2 days ago, edited 2 days ago May 24, 2020 at 8:22 PM, edited May 24, 2020 at 11:40 PM link Permalink

The official recommendation is that there is no official recommendation. :)

There is a huge amount of information that can be stored in a sentence, including register (formality). Some of it cannot be perfectly represented in a sentence in another language. As you've noted, once you realize this, you could react by going toward either of two extremes: linking almost nothing, or linking almost everything. A moderate path, relying on your intuition and on the precedents that you see around you, is best. But if in doubt, don't link.

moman moman 2 days ago May 24, 2020 at 5:33 PM link Permalink

Hi all,

There appears to be a slight issue with selecting language flags for a sentence. It has happened multiple times where it reverts back to an incorrect flag after I've specifically selected the correct one. And when editing it after the sentence exists, I have to click on the flag to change it; choosing the edit button and then changing the language will not save the change.

On another note, I'm really enjoying this site. Thanks to all who manage it!

{{vm.hiddenReplies[35326] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 2 days ago May 24, 2020 at 6:01 PM link Permalink

Hi moman, thank you for the kind words.
You are describing two different problems.

1. Explicitly selecting the language flag while adding a new sentence doesn’t create a sentence with the flag you selected. Are you sure you are not selecting "Auto detect" as language flag? Could you show us a sentence on which that problem happened?

2. On an existing sentence, clicking the edit button and then changing the flag and then clicking OK won’t change the flag. This is a know problem. To edit the flag, just click on the flag without clicking on the edit button. This problem will go away once we switch to our new interface.

{{vm.hiddenReplies[35328] ? 'expand_more' : 'expand_less'}} hide replies show replies
moman moman 2 days ago May 24, 2020 at 7:56 PM link Permalink

Hi gillux,

I'll have to post an example here when it does it again. I can't recall with which sentences it happened. I have never purposely clicked "Auto detect."

mramosch mramosch 12 days ago, edited 12 days ago May 14, 2020 at 8:54 PM, edited May 14, 2020 at 8:55 PM link Permalink

Is there an easy way to find out how many of the 503.559 german sentences are originals and how many are translations?

I could only filter out that there are 30.635 german sentences that have no translation at all so they must be originals, but how many originals are there in total?

And what is the distribution of languages which the non-originals were translated from.
I guess the majority are translations from english sentences (either originals or translations themselves).

How can we retrieve those metrics? Anyone?

{{vm.hiddenReplies[35172] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 12 days ago, edited 12 days ago May 15, 2020 at 1:48 AM, edited May 15, 2020 at 1:49 AM link Permalink

On Tatoeba directly, it's impossible to do yet.

Offline, if you're proficient in some programming language, you can probably use the exported sentences file and links file to find about.

If none of above is possible, you may want to wait that I add this function to Tatoeba playground, the external exploration tool I develop for fun and for others to explore the corpus in ways that aren't possible on Tatoeba (self-promotion ^^) https://github.com/agrodet/Tatoeba-playground. I plan an update this week-end and maybe I will incorporate this possibility. I was thinking about it for quite some time :)

PS : Note that even if a sentence has no translation it might happen that it was a translation that was unlinked later on, hence not an original sentence.

{{vm.hiddenReplies[35177] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 5 days ago May 21, 2020 at 8:17 PM link Permalink

Hi Aiji,

which platform is this tool for - or is it web-based?

And what’s the news on the update?

TRANG TRANG 11 days ago May 15, 2020 at 9:23 PM link Permalink

> how many originals are there in total?

There are 80471 original sentences in German at the moment.

> How can we retrieve those metrics? Anyone?

This is how original sentences were calculated when we introduced this information in Tatoeba:
https://github.com/Tatoeba/tato...ationShell.php

You could technically install Tatoeba, import the sentences and contributions (using the file we export weekly) into the local database then run that shell. But let's say it's not the easiest way to do it :)

There's an issue in GitHub which would solve your problem here if it was implemented:
https://github.com/Tatoeba/tatoeba2/issues/2159
When this is implemented, you would be able to get the number from the search.

In the meantime if you just needs punctually some stats, to get an idea, we can query the production database.

{{vm.hiddenReplies[35180] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 11 days ago, edited 11 days ago May 15, 2020 at 9:31 PM, edited May 15, 2020 at 9:32 PM link Permalink

Thanks Trang!

Actually I would really like to see the sources for the translations of the remaining 420.000 sentences.

I guess around 3/4 will be translation from english sources, but the rest...?

{{vm.hiddenReplies[35182] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 11 days ago May 15, 2020 at 9:54 PM link Permalink

It’s not exactly what you are asking, but you might be interested in this post: https://tatoeba.org/wall/show_message/21926

The data is 5+ years old, but it looks like it’s possible to rebuild it from source: https://github.com/tguinard/tatoeba_visualization

{{vm.hiddenReplies[35183] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 11 days ago, edited 5 days ago May 15, 2020 at 10:15 PM, edited May 21, 2020 at 8:19 PM link Permalink

Woa, woa, woa...

That looks like a wallpaper from the 1970s. - Psychedelic...

Actually I was just hoping for an informative little five liner ;-)

Thanks anyways!

TRANG TRANG 11 days ago May 15, 2020 at 10:04 PM link Permalink

I've created this issue in GitHub:
https://github.com/Tatoeba/tatoeba2/issues/2325

This might be a task for someone during Kodoeba...

Otherwise, if I got the query correctly, German sentences are mostly translated from English, Esperanto, French, Japanese and Russian (for the top 5). Details here:
https://gist.github.com/trang/0...2caedb4c678c46

{{vm.hiddenReplies[35184] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 11 days ago May 15, 2020 at 10:08 PM link Permalink

I see German in line 20 of the listing...

Sure you got the query right?

{{vm.hiddenReplies[35185] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 11 days ago May 15, 2020 at 10:15 PM link Permalink

It's possible for people to translate from the same language. Here are some examples:

https://tatoeba.org/eng/sentences/show/331940
https://tatoeba.org/eng/sentences/show/331942
https://tatoeba.org/eng/sentences/show/340727
https://tatoeba.org/eng/sentences/show/341099
https://tatoeba.org/eng/sentences/show/347688
https://tatoeba.org/eng/sentences/show/349821
https://tatoeba.org/eng/sentences/show/349822
https://tatoeba.org/eng/sentences/show/350135
https://tatoeba.org/eng/sentences/show/387987
https://tatoeba.org/eng/sentences/show/430928

If you go to the sentence indicated in the logs in "This sentence was initially added as a translation of sentence ...", you'll see it leads to another German sentences.

{{vm.hiddenReplies[35186] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 11 days ago May 15, 2020 at 10:20 PM link Permalink

But what’s the point here?

Translation from one language to the very same language? That’s not what I would consider a TRANS-lation.

Are you sure this is not just a linking error?

{{vm.hiddenReplies[35188] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 11 days ago May 15, 2020 at 10:53 PM link Permalink

> But what’s the point here?

https://tatoeba.org/eng/wall/sh...#message_34400

> Are you sure this is not just a linking error?

It could be an error, yes. It does happen that people create translations from the wrong language as reported here:
https://github.com/Tatoeba/tatoeba2/issues/2132

In most cases, I think it's intentionally. I can't say for sure though.

{{vm.hiddenReplies[35189] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 11 days ago May 15, 2020 at 11:44 PM link Permalink

But this is pure chaos.

e.g.: https://tatoeba.org/eng/sentences/show/123970

We have a wide choice of books.
En esta tienda tenemos una gran variedad de libros.

These are two totally independent sentences, yet they appear as direct translations...

{{vm.hiddenReplies[35191] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 11 days ago May 16, 2020 at 8:43 AM link Permalink

I'm not sure exactly what you think is chaos.

If you consider one of the sentences listed in "Translations" is not a correct translation of "当店にはいろいろな種類の本がございます。", then you can report it in the comment. Someone will unlink the sentence.

As a general rule: if B and C are correct translations of A, B and C don't have to be correct translations of each other.

AlanF_US AlanF_US 10 days ago May 16, 2020 at 3:37 PM link Permalink

>We have a wide choice of books.
> En esta tienda tenemos una gran variedad de libros.

>These are two totally independent sentences, yet they appear as direct translations...

"Totally independent" is not accurate. The translation of the Spanish is "In this store, we have a wide choice of books", so as you can see, the second part of the Spanish sentence matches the full English sentence. However, it would better to have a closer match, so I added a comment asking for an improvement, as Trang suggested.

{{vm.hiddenReplies[35195] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 10 days ago, edited 7 days ago May 16, 2020 at 5:49 PM, edited May 19, 2020 at 7:18 PM link Permalink

By ‚totally independent‘ I meant that it is impossible to evaluate (direct link)

• En esta tienda tenemos una gran variedad de libros.
• Tenemos una gran variedad de libros.

as two sentences that can be equally treated - as if they were identical.

Either the source sentence has a local prepositional complement or not.

BTW: in the english translation neither of the two spanish versions is directly linked -
and in the two portuguese versions neither the spanish nor the french versions are directly linked...

I find this pattern of lack of accuracy (regarding linking) all over the place!

When you look at a sentence page the most prominent property is not the single translation itself but rather the multiple lines of links - direct and indirect. So when I am new to Tatoeba I would definitely consider this fact as an integral part of the service. However, after looking up a few examples and finding out pretty quick that the whole linking thing is incomplete as hell, I would surely ask myself how reliable this service is and very likely move on in search of something more accurate.


EDIT:

‪The term ‘‬local prepositional complement’ did cause some misunderstandings in the following posts.

So to clarify beforehand: This was just meant as a hint. I am not referring per se to the grammatical concept of using prepositions for adding additional information about placement of things or situations. May it be a particle, a preposition or even a dedicated case to convey the notion of placement - the point is that it somehow has to be explicitly mentioned in the very sentence in order to be taken under consideration for any translation. Presuming implicit context will lead to inconsistent results.

Reducing a noun phrase to a pronoun (Our firm -> we) may be eligible for some translations from certain languages but the other way round would just be guesswork.

So I should have better said:

Either the source sentence contains any explicitly mentioned notion of placement or not.
However, the point I am trying to make is the inconsistency in the usage of direct/indirect links that inevitably lead to wrong assumptions or misunderstanding.

For the practical consequences in the Tatoeba corpus see further down

https://tatoeba.org/eng/wall/sh...#message_35230

{{vm.hiddenReplies[35196] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 10 days ago May 16, 2020 at 6:34 PM link Permalink

Siitä vain linkkejä tekemään. Vapaaehtoisuuteen perustuva tietokanta tarkoittaa, että jos haluat jotain, tee se itse.

Suomennan tällä hetkellä tanskankielisiä lauseita. Linkitän niitä samalla muihin lauseisiin, joista olen riittävän varma. Mutta en linkitä omia käännöksiäni samalla, koska silloin minun pitäisi avata jokainen niistä omaan välilehteensä. Tein tätä jossain vaiheessa; se on hyvin vaivalloista.

Käyttöliittymää pitäisi muuttaa merkittävästi, jos haluaisi näyttää siinä, mitkä näkyvät lauseet ovat linkitettyjä toisiinsa ja jos haluaisi mahdollistaa näiden linkkien helpon lisäyksen ja poiston.

Aiji Aiji 10 days ago May 17, 2020 at 2:22 AM link Permalink

> I would surely ask myself how reliable this service is and very likely move on in search of something more accurate.

Good luck with that.


> Either the source sentence has a local prepositional complement or not.

That's a very funny statement considering that the original sentence you were talking about is in Japanese, a language about which you apparently have no knowledge, and also considering that all the languages listed in your profile are Western European.
A thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how languages work is a very wrong approach. I think that's why we let this part to linguists and other specialists.


I'll stop here and not comment on the "service" part to avoid my good friends around here to advice against my unnecessary aggressiveness.

{{vm.hiddenReplies[35198] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 9 days ago May 18, 2020 at 11:56 AM link Permalink

> A thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how languages work is a very wrong approach. I think that's why we let this part to linguists and other specialists.

Another thing that some people understand after staying a while in Tatoeba is that being so sure of oneself about how to interpret some other persons profiles is a very wrong approach. I think that's why we let this part to the profile owner - whether why he/she/it adds some language to the list or not.

There may be some of them who won’t add languages to their profiles unless they have lived for a certain period in a part of the world where these languages are spoken natively - although they might know a thing or two about these languages.

You see, sometimes appearances are simply deceiving...

If you don’t agree I suggest you implement some policies that comments and contributions are only allowed by certified ‘linguists and other specialists’ and instead let them do the tedious conveyor belt work we ignorant dummies do ;-)

I have to quote you again - “Good luck with that!”

So when some ordinary sheep criticizes a feature that may have been implemented by you - don’t take it personally - we only see the feature and its affect on us but we don’t know YOU. I am sorry that I have rattled your cage...

And what a shame that you only consider those as friends who pat you on the shoulder but not the other ones who challenge you to sometimes have a more thorough look at things that you simply might have gotten used to over time.

Nevertheless, thanks for keeping on adding useful features to Tatoeba in the future. Much appreciated!

{{vm.hiddenReplies[35222] ? 'expand_more' : 'expand_less'}} hide replies show replies
Aiji Aiji 8 days ago May 19, 2020 at 2:02 AM link Permalink

> You see, sometimes appearances are simply deceiving...

Did I say something about your knowledge in languages other than the one listed in your profile? No, because I have no way to judge that.
Did I imply that your comment was Western European-biased? Yes, because it was.
Did I say that you have no knowledge in Japanese? Yes, because if you'd knew "one thing or two" in Japanese, you wouldn't make such the following statement.
> Either the source sentence has a local prepositional complement or not.

So, even if I agree with you that appearances are sometimes deceiving, I think that in this case, they're far from being deceiving.

Now, for the second part of your comment, which is only here to provoke and is near the zero-level of argumentation, it amused me, so I will answer.
- First of all, to belittle yourself so others will take your side only works in school yard or political debate. You're not a sheep. Nobody says you're a sheep. And nobody criticized your right to criticize.
- You're talking things that have nothing to do with my comment, about the feature and how I was hurt, but again it doesn't work. If you have nothing to oppose to a comment, don't get lost in argumentation about unrelated topics. If you're debating alone, it's an ok-strategy but if somebody answers you, that's a loss of credit.
- I think one is free to consider who their friends are. You should read what I wrote again because I don't think I said that people who don't agree are not my friend, but more something like "my friends here", period. Again, you're arguing on a statement that you wrongly extrapolated from a simple one. It doesn't work.

If we were in speech class, I would conclude by saying that your argumentation is a very good approach if you're giving a speech to a mass audience, but a very risky one if you're arguing with someone, because it relies more on extrapolation than solid ground. If you want to lead a discussion, you have to answer or counter-attack with solid, logical counter-argument(s) that destabilize the opponent part.

Nevertheless, thanks for keeping on giving feedback on your experience on Tatoeba. Much appreciated.

TRANG TRANG 10 days ago May 17, 2020 at 7:52 AM link Permalink

> I find this pattern of lack of accuracy (regarding linking) all over
> the place!

Could you point out some examples?

The Spanish sentences you mentioned were both translated from Japanese. And based on the comments on #8764247, they are both considered valid translations of the Japanese sentence. So there's no problem with the links there.

You are perhaps misunderstanding something about the structure of the Tateoba corpus.

{{vm.hiddenReplies[35199] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 8 days ago, edited 8 days ago May 19, 2020 at 7:55 AM, edited May 19, 2020 at 10:53 AM link Permalink

I don’t derive any conclusion whatsoever by taking into account the original japanese sentences.

You wrote above:
> As a general rule: if B and C are correct translations of A, B and C don't have to be correct translations of each other.

That is understood, no problem with that.

But if the original sentence e.g. somehow includes some notion that is remotely related to a local complement expressed with a preposition in many western languages and therefore can be translated in two different ways

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)
• En esta tienda tenemos una gran variedad de libros. (C)

then this ‘special property’ - this ‘duplicity’ as it were - is rooted in the source language (A) and has to be an equally valid argument for any translation into any other language.

So with that in mind - if I now compare Group (B)
• We have a wide choice of books.
• Tenemos una gran variedad de libros.
• Nous avons un large choix de livres.
• Nós temos uma ampla variedade de livros.
etc.

and respectively Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

- you get the point - then I would subsequently assume that they ALL had to be directly linked to the japanese source sentence.

But what I find instead is
• Tenemos una gran variedad de libros. (DIRECT)
• Nous avons un large choix de livres. (INDIRECT)
• We have a wide choice of books. (DIRECT)
• Nós temos uma ampla variedade de livros. (INDIRECT)

• En esta tienda tenemos una gran variedad de libros. (DIRECT)
• In this store, we have a wide variety of books. (INDIRECT)

Neither is there consistency within a group of identical sentences nor with regards to the intricacies of the source language.

And it goes on in the english version of Group (B)
Sentence #280025: We have a wide choice of books.
• Nous avons un large choix de livres. (DIRECT)
• Tenemos una gran variedad de libros. (INDIRECT)
• Nós temos uma ampla variedade de livros. (DIRECT)

I can even see
• En esta tienda tenemos una gran variedad de libros. (INDIRECT)

however no trace of
• In this store, we have a wide variety of books. (INDIRECT)

whereas in the spanish version of
Sentence #8764247: En esta tienda tenemos una gran variedad de libros.
• We have a wide choice of books. (INDIRECT)
• Tenemos una gran variedad de libros. (INDIRECT)

both indirect links are shown but in the identical english version
Sentence #8768755: In this store, we have a wide variety of books.

none of the indirect links show up!!!

———————————————————

So what should someone new to Tatoeba conclude?

Sentence #123970: 当店にはいろいろな種類の本がございます。
• En esta tienda tenemos una gran variedad de libros. (DIRECT)

This must be wrong because the INDIRECT english version ‘should’ be correct? Or vice versa? Or do just

• En esta tienda tenemos una gran variedad de libros. (DIRECT)
• In this store, we have a wide variety of books. (INDIRECT)

not correspond with each other because they are obviously handled differently regarding to their linking to the japanese original?


If you are not wildly proficient in a multitude of languages and cross culture domains it’s pretty much impossible to come to a conclusion on your own, except for the conclusion that maybe there is either something very complicated going on under the hood that doesn’t meet your understanding and expectation of how it “should” work - or the quality of the service is not sufficient! Neither of both conclusions is desirable!

And as I already stated above:

> When you look at a sentence page the most prominent property is not the single translation itself but rather the multiple lines of links - direct and indirect.

So the average person will evaluate the cross language facility as being the prominent feature of Tatoeba and hence expect a certain accuracy and quality of the service.

The current workflow for contributors however doesn’t allow for a decent management and maintenance of the linking system when providing a new contribution.

Essentially, with every new translation we gain quantity and even an automatic ‘free link’ but at the same time we introduce a multiple of inaccuracies for every already existing translation that is not correctly being linked to the new entry immediately, which of course reduces the overall quality of the whole corpus, provided that you don’t see the “single sentence approach” as the main service of Tatoeba but rather how the whole prominent linking system plays well together.

{{vm.hiddenReplies[35230] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 8 days ago May 19, 2020 at 8:34 AM link Permalink

Jos ymmärrän argumenttisi oikein, niin koet Tatoeban käyttöliittymän antavan olettaa, että se sisältäisi kaikki mahdolliset suorat ja epäsuorat linkit lauseiden välillä.

Tämä ei nykyään ole tilanne ja tuskin kukaan Tatoebaan lauseita lisäävä henkilö olettaa tämän olevan totta. Syitä on muutamia:

1. Kuten aiemmin mainittu, ja kuten sinäkin mainitset, käyttöliittymä tekee tämän vaikeaksi.

2. Jotta henkilö voisi lisätä linkin kahden lauseen välille, tulee hänen osata näitä molempia kieliä riittävän hyvin. Jos palvelua ei käytä kukaan, joka osaa vaikkapa sekä viroa että italiaa, ei näiden kahden kielen välisiä lauseita tule kukaan linkittäneeksi suoraan. Lisäksi kaikki käyttäjät eivät saa linkitettyä lauseita helposti.

3. Käyttäjät ovat erimielisiä siitä, millaiset lauseet tulee linkittää toisiinsa. Esimerkiksi jotkut linkittävät lauseet kuten ”Hei.” ja ”Hei!” toisiinsa, toiset eivät.

Muitakin esteitä saattaa olla.

Näistä esteistä 2 lienee väistämätön. Emme halua ihmiset linkittävän lauseita, jos he eivät ymmärrä niitä molempia riittävän hyvin.

Estettä 3 voi madaltaa keskustelemalla siitä, millaiset lauseet tulee linkittää toisiinsa ja mitä ei.

Estettä 1 voisi madaltaa käyttöliittymää muuttamalla.

Olen melko varma, että konkreettiset ehdotukset ja työpanokset näiden esteiden poistamiseksi otetaan ilolla vastaan.

TRANG TRANG 8 days ago, edited 7 days ago May 19, 2020 at 12:52 PM, edited May 19, 2020 at 3:15 PM link Permalink

Okay, let's consider this set of sentences:

[JPN] 当店にはいろいろな種類の本がございます。(#123970)
[SPA] En esta tienda tenemos una gran variedad de libros. (#8764247)
[ENG] In this store, we have a wide variety of books. (#8768755)

If I understand correctly, your problem is that [ENG] is shown as an *indirect* translation of [JPN]. You think that if [SPA] is a direct translation of [JPN], then [ENG] should be a direct translation of [JPN] as well.

Assuming someone who speaks both English and Japanese agrees with that, then we have two ways to solve this inaccuracy:
1) An advanced contributor has to link [ENG] to [JPN].
2) A regular contributor has to add a translation to [JPN] that is the exact same text as [ENG].

In both cases, [ENG] will become a direct translation of [JPN].

The inaccuracy that you've seen everywhere is the result of another rule: if A is translated into B and B is translated into C, C is not necessarily a valid translation of A. A human has to confirm that A and C are equivalent.

In our set of sentences, what happened was:
- [JPN] was translated into [SPA].
- [SPA] was translated into [ENG].

By the rule I just mentioned, we cannot automatically assume that [ENG] is a translation of [JPN]. We have to wait until someone explicitly makes these two sentences translations of each other.

If you are very confident that [ENG] is a valid translation of [JPN], then go ahead and add it as a translation.

{{vm.hiddenReplies[35234] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 7 days ago May 19, 2020 at 1:58 PM link Permalink

Don’t get me wrong, I have never said that this is a task that can be automated. It’s obvious that it takes some human interaction to link these cases. That is the reason why I was questioning the available workflow.

My idea is that if (A) gets translated into (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including with (A).

Or can you think of a situation where this rule would be ambiguous?

BTW: Is the creation of the first link when translating a sentence the only automated creation of a link or are there other rules that could cause an automated creation of a link, be it direct or indirect?

Because if there is no auto-creation beside the first one I am wondering where all those INDIRECT inconsistencies like

• Tenemos una gran variedad de libros. (DIRECT)
• Nous avons un large choix de livres. (INDIRECT)
• We have a wide choice of books. (DIRECT)
• Nós temos uma ampla variedade de livros. (INDIRECT)

come from. We already have established the two ways for direct linking but someone must have created those INDIRECT links, too. Who or What is responsible for their existence?

{{vm.hiddenReplies[35235] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 7 days ago May 19, 2020 at 2:30 PM link Permalink

Epäilen, että yhtäpitävien lauseiden kokoelmia on melko vähän. Esimerkiksi persoonapronominit ja artikkelit ovat erilaisia kielestä toiseen, mikä poistaa jo monia yhtäpitävyyksiä.


Epäsuora linkki:

Jos A on linkitetty B:hen ja B on linkitetty C:hen, niin A ja C ovat linkitetty epäsuorasti.

Lauseiden välillä on epäsuora linkki jos ja vain jos ne ovat kahden käännöksen päässä toisistaan, mutta eivät yhden.

Epäsuoria linkkejä siis ei niinkään tehdä, vaan niitä muodostuu suorien linkkien seurauksena.

{{vm.hiddenReplies[35236] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 7 days ago, edited 7 days ago May 19, 2020 at 3:07 PM, edited May 19, 2020 at 3:13 PM link Permalink

Thanks :-)

Well, that explains a lot! So you are essentially inheriting one indirect link with every automated direct link on creation of a new translation as well as when linking two sentences manually.

1. What happens if somewhere along the chain somebody decides to unlink a sentence? Is there also some automated unlinking of indirekt links going on?

2. Is there a way to find out whether
• a generated indirect link on automated creation (when adding a new translation) is in reality - as seen from a human perspective - more likely going to be useful as a direct link or an indirect one
• a generated indirect link on manual creation (when linking in post production) is in reality rather going to be useful as a direct link or an indirect one.

So all those incorrect indirect links I was referring to in my post above seem to be just wrong guesses of the automation which is essentially always applying an indirect connection. Either at creation or re-linking.

I am wondering whether the error quote would be bigger or smaller when applying a direct link as default. Or do you guys just play save by saying “better an incorrect indirect link than an incorrect direct link” - no matter of the hit ratio?

{{vm.hiddenReplies[35238] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 7 days ago May 19, 2020 at 3:19 PM link Permalink

1. Ymmärtääkseni epäsuorat linkit lasketaan suorien perusteella, eli jos suora linkki katkaistaan, epäsuoria luultavasti katoaa (mutta joskus katkaistu linkki muuttuu epäsuoraksi).

2. En ole tietoinen automaattisista tavoista pohtia linkkien pätevyyttä. Tämä olisi luultavasti hankalaa. Joku tekoälytutkija voisi sellaisen saada tehtyä.

Itse ajattelen epäsuoria linkkejä kahdessa roolissa:

a: Ne ovat työkaluja suorien linkkien luomiseksi. Voin muuttaa niitä suoriksi linkeiksi kätevästi.
b: Jos suoria linkkejä ei ole, ne antavat silti jonkinlaisen käsityksen lauseen merkityksestä. Ne siis toimivat epäluotettavampina suorien linkkien korvikkeina.

En siis pidä epäsuoraa linkkiä kahden merkitykseltään eroavan lauseen välillä ongelmana, vaan välttämättömänä seurauksena epäsuoran linkin määritelmästä/luonteesta.

...

Jos suoria linkkejä sovellettaisiin transitiivisesti, virheitä tulisi valtavasti. Lauseiden ei tarvitsisi edes olla monimutkaisia.

Esimerkiksi: He swims. <-> Hän ui. <-> She is swimming.
Tai: Tu manges. <-> You are eating. <-> Vous mangez.

TRANG TRANG 7 days ago May 19, 2020 at 3:59 PM link Permalink

As a general tip: you need to visualize the sentences as a graph.

There is a short explanation in the wiki about the structure of the corpus:
https://en.wiki.tatoeba.org/art...-is-structured

{{vm.hiddenReplies[35240] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 7 days ago May 19, 2020 at 7:25 PM link Permalink

Trang, would you mind giving a short statement about the rule I mentioned a little further above:

> My idea is that if (A) gets translated to (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including to (A).

Or can you think of any situation where this rule would be ambiguous?

{{vm.hiddenReplies[35241] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 7 days ago May 20, 2020 at 5:41 AM link Permalink

Järjestelmä ei sisällä tietoa siitä, mitkä lauseet ovat toistensa täsmällisiä käännöksiä.

TRANG TRANG 7 days ago May 20, 2020 at 9:36 AM link Permalink

Your rule is conflicting with the rule I mentioned: if A is translated into B and B is translated into C, C is not necessarily a valid translation of A.

Just replace C with B1, then B2.

Here's an example with a graph visualization: https://imgur.com/a/YMwGyS7

- In case #1, your rule works, you can have (A) directly linked to (B1) and (B2).
- In case #2, your rule doesn't work. It would be wrong to directly link (A) to (B1) and (B2).

{{vm.hiddenReplies[35243] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 7 days ago, edited 7 days ago May 20, 2020 at 11:39 AM, edited May 20, 2020 at 12:03 PM link Permalink

@Thanuir
@TRANG

Sorry, but I should have rather linked the citation instead of just copying it. Because you missed the context. So I give you two examples from above with their context.

Both are meant to be executed by a HUMAN and not expected to be solved by ML.

1. ——————————————————
> Don’t get me wrong, I have never said that this is a task that can be automated. It’s obvious that it takes some human interaction to link these cases. That is the reason why I was questioning the available workflow.

> My idea is that if (A) gets translated into (B) and another group of sentences - regardless of their origin (originals or translations) - happen to be an exact translation of each other in the whole group (including B of course) then I could consider them being (B1, B2, B3, B4 etc.) and under the rule (A)==(B)==(B1)==(B2)== etc. all of them had to be directly linked - including with (A).

2. ——————————————————

But if the original sentence e.g. somehow includes some notion that is remotely related to a local complement expressed with a preposition in many western languages and therefore can be translated in two different ways

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)
• En esta tienda tenemos una gran variedad de libros. (C)

then this ‘special property’ - this ‘duplicity’ as it were - is rooted in the source language (A) and has to be an equally valid argument for any translation into any other language.

So with that in mind - if I now compare Group (B)
• We have a wide choice of books.
• Tenemos una gran variedad de libros.
• Nous avons un large choix de livres.
• Nós temos uma ampla variedade de livros.
etc.

and respectively Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

- you get the point - then I would subsequently assume that they ALL had to be directly linked to the japanese source sentence.

————————————————————
————————————————————

Example 2 is just an extended version of example 1 due to the fact that the japanese source can be split into two translations
• Our store... -> Group (C )
• We... -> Group (B)

We know for fact that (B) is the translation of (A).
We know for fact that (C) is the translation of (A).

So if I, as a human, assess that
• translation (B) is an exact unambiguous translation of all the unambiguous sentences in Group (B) then all the Group (B) members should also be a direct translation of (A).
• translation (C) is an exact unambiguous translation of all the unambiguous sentences in Group (C ) then all the Group (C ) members should also be a direct translation of (A).

And by ‘unambiguous’ I mean that all ‘du/Sie/You/Usted’ ambiguities are taken under consideration.

Of course I have to blindly rely that the information (A)==(B) respectively (A==(C ) is correct.

And of course (B)!=(C ) [is not equal]

Do you see any problem with this approach? Or let me phrase it differently:

Can I safely draw this LOGICAL conclusion about (B) respectively (C ) even only having little or no LINGUISTIC knowledge about the SOURCE language (A)?

{{vm.hiddenReplies[35245] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 6 days ago May 20, 2020 at 2:24 PM link Permalink

Yritän kirjoitta kysymyksen ensin omin sanoin, jotta saat tietää, ymmärsinkö sen oikein vaiko enkö.

Ehdotuksesi on, että mikäli ymmärrät lauseet B ja C ja niiden merkitys on mielestäsi täsmälleen sama, ja jos A on linkitetty B:hen ja jos linkki on luotettava, niin myös A ja C olisivat suoraan linkitettäviä.

Olisin erittäin varovainen tämän kanssa. Jos olisi olemassa kaksi lausetta, B ja B', jotka molemmat ovat lauseen C tarkkoja käännöksiä, mutta niiden välillä on sävyero, niin et voi tietää vastaako A kumpaa niistä, vai kenties kumpaakin.

Esimerkiksi:
C = "Hän on siellä aina." <-> B = "He is always there."
Lausetta C' = "Hän on aina siellä." ei ole linkitetty noihin, mutta se olisi täysin pätevä ja tarkka käännös B:lle.

Sinulla on jokin lause A, joka on linkitetty lauseeseen B. Onko se hyvä käännös C:lle? Ehkä. Ehkä kyseinen kieli sallii myös englantia vapaamman sanajärjestyksen, tai sisältää jonkin muun tavan tehdä hienovaraisia eroja.

Kuitenkin minä sanoisin, että lauseiden B ja C välillä ei ole merkityseroa. Lause C' ei välttämättä tulisi mieleen, kun ajattelisin asiaa.

Epäilen siis, että onko inhimillisesti mahdollista olla varma siitä, että kaksi lausetta tarkoittavat täsmälleen samaa asiaa, tai että onko käsite edes mielekäs.

Käytännössä Tatoebassa on äärettömästi työtä kenelle tahansa, joten käyttäjä voi aivan hyvin keskittyä linkittämään lauseita tuntemissaan kielissä. Harvoin se työ loppuu kesken.

{{vm.hiddenReplies[35248] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 6 days ago, edited 5 days ago May 20, 2020 at 2:51 PM, edited May 21, 2020 at 8:38 PM link Permalink

I never correlate B to C in any ways. I just threw in C because it was a concrete real life example of this japanese sentence that could be split into two different threads A - B respectively A - C...

Consider this example just being about A and B.

• If all sentences of Group B are unambiguously identical in several languages (B1, B2, B3)
• and sentence B is unambiguously identical with all the sentences in Group B

given that B is a DIRECT translation of A

• Can all sentences of Group B also be safely DIRECTLY linked to A?

————————————————————

In a concrete practical example

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• Tenemos una gran variedad de libros. (B)

Group (B)
• We have a wide choice of books. (B1)
• Nous avons un large choix de livres. (B2)
• Nós temos uma ampla variedade de livros. (B3)
etc.

————————————————————

You can ask the same question about A -> C without ever correlating B and C.

Sentence #123970: 当店にはいろいろな種類の本がございます。(A)
• En esta tienda tenemos una gran variedad de libros. (C)

Group (C )
• In this store, we have a wide variety of books.
• En esta tienda tenemos una gran variedad de libros.
etc.

———————————————————

I am LOGICALLY inferring a correlation between Group B and sentence A simply based
• on my knowledge of language B
• on my knowledge of languages of Group B (B1, B2, B3)
• on the fact that B is a direct translation of A

Is this vulnerable?

{{vm.hiddenReplies[35249] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 6 days ago, edited 6 days ago May 20, 2020 at 2:59 PM, edited May 20, 2020 at 3:00 PM link Permalink

En uskaltaisi tehdä tuota, koska vaikka lauseet B minulle vaikuttaisivat yhtäpitäviltä, kenties A tekee erottelun tai sisältää vivahteen, josta en tiedä mitään.

B:n lauseet eivät välttämättä enää ole yhtäpitäviä, kun tämän vivahteen ottaa huomioon.

{{vm.hiddenReplies[35250] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 6 days ago, edited 6 days ago May 20, 2020 at 4:11 PM, edited May 20, 2020 at 4:15 PM link Permalink

So you are essentially saying that a B1, B2, B3 speaker has to inevitably know [LINGUISTICALLY] language A in order to link to A and should not draw any [LOGICAL] conclusion at all.

Because this nuance in (A) that you are mentioning is intrinsic to (A) and must already have been taken into consideration by the A-to-B translator, otherwise B would already be incorrect.

(Of course if translator A-to-B makes his mind up later on, this would jeopardize the whole chain, but this is a general problem...)

Could you come up with an example of such a nuance because I personally can’t find any vulnerability in the logical approach yet and still consider it as an option until proven wrong. (I know I am a persistent thorough little sucker :-)

As you can surely tell my main background are Romance and Germanic languages.

So if e.g. the source A were English and I had to deal with e.g.

• You are

and I see B (German translation)

• Du bist

I can safely assume that these translations are correct

• Você é
• (Tu) eres
• Tu es
• (Tu) sei

However if I saw B

• Você é
• Ustedes son

I would know that these forms are ambiguous and wouldn’t draw any logical conclusions.

However if I saw B (Spanish translation)

• Nosotras

I would know that source (A) either is referring to women only or it doesn’t make any difference and the translation to Spanish just took the liberty to use its feminine form only - which is totally valid.

So without knowing anything of the source language (A) I can only consider linking languages that make that distinction too and do also use a feminine form, although the source might allow for a masculine form too.

Knowing the source language (A) of course I could see its real intention and in case of a bi-neutral source form even add a new Spanish sentence with its masculine counterpart.

If instead of ‘nosotras’ I saw

• Nosotros

that would tell me that the source A is either explicitly masculine or bi-neutral which would only allow for translations to languages that have a dedicated masculine form because I can’t evaluate source A.

However, if I see two translations, either of the same language or even across two different languages, where one uses a masculine form and the second a feminine form, I know that the source language allows for both, otherwise one of the two translations must be wrong.


So you see where I am going with that!

{{vm.hiddenReplies[35251] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 5 days ago May 21, 2020 at 2:16 PM link Permalink

En linkittäisi lauseita, joita en ymmärrä.

Ei nyt tule helppoa esimerkkiä mieleen. Epäilen että tässä on kyse tasapainosta:
Jos tulkitset lauseiden yhtäpitävyyden tarpeeksi tiukasti, huteja ei välttämättä tapahdu, mutta et löydä montakaan tilannetta, jossa sääntöä pääsisi soveltamaan.
Jos tulkitset lauseiden yhtäpitävyyttä löyhästi, niin tulee virheitä.

mramosch mramosch 7 days ago, edited 6 days ago May 20, 2020 at 1:15 PM, edited May 20, 2020 at 2:12 PM link Permalink

> Jos A on linkitetty B:hen ja B on linkitetty C:hen, niin A ja C ovat linkitetty epäsuorasti.

> Lauseiden välillä on epäsuora linkki jos ja vain jos ne ovat kahden käännöksen päässä toisistaan, mutta eivät yhden.

> Epäsuoria linkkejä siis ei niinkään tehdä, vaan niitä muodostuu suorien linkkien seurauksena.

————————————————————

Does that automatically imply that if there are (in addition to A) more sentences (even from different languages) directly linked to (B) - let’s simply call them A1, A2, A3 etc. - then they would all be automatically indirectly linked to (C ) after (B) gets directly linked to (C )?

Or in other words, the creation of the

• direct link B - C

autocreates (or calculates)

• indirect link A - C
• indirect link A1 - C
• indirect link A2 - C
• indirect link A3 - C

given the fact that every member of Group A is directly linked to (B)?

————————————————————

@TRANG

So when my simplistic point of view above (autocreation!!!) has to be translated into the world of graphs (that you mentioned in another post) I guess there is no creation of any stored INDIRECT links but rather a ‘calculation on display’ based on the object graph.

Could you share some short thoughts of how this works internally. You mentioned nodes (sentences) and their connections (links) as the only objects in this graph. So is this some kind of one/many-to-one/many clusters working together?

Or - if this is asked too much - you could just simply provide a complete list under which circumstances links are created/calculated/generated.

DIRECT LINK
• creating a new translation (system is auto-creating a link)
• manually linking by authorized user



INDIRECT LINK





And in case you are wondering why I am interested in this information...

I am working on some suggestions for a better review workflow/UI and in order to be able to reasonably argue about improvements I’d prefer to have a pretty complete understanding of the underlying mechanics...

{{vm.hiddenReplies[35246] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 6 days ago May 20, 2020 at 1:48 PM link Permalink

> Does that automatically imply that if there are (in addition to A) more sentences (even from different languages) directly linked to (B) - let’s simply call them A1, A2, A3 etc. - then they would all be automatically indirectly linked to (C ) after (B) gets directly linked to (C )?

Tietääkseni kyllä.

En tiedä, miten Tatoeba löytää epäsuorat linkit, mutta periaatteessa ne kaikki voisi selvittää tällä tavalla: voisi laskea naapuruusmatriisin toisen potenssin, muuttaa kaikki positiiviset luvut ykkösiksi, vähentää siitä naapuruusmatrisin ja vielä nollata diagonaalin. Tämä on matemaatikon, ei ohjelmoijan, ratkaisu, eli tuskin käyttökelpoinen.

Katso esimerkiksi https://en.wikipedia.org/wiki/A...#Matrix_powers

TRANG TRANG 6 days ago May 20, 2020 at 6:08 PM link Permalink

> Could you share some short thoughts of how this works
> internally.

Imagine two tables, "sentences" and "links".

"sentences" has the following columns:
- id
- lang
- text

"links" has the following columns:
- sentence_id
- translation_id

Whenever you add a new sentence (A), a new line is added in "sentences".
- id=1, lang=eng, text=A

Whenever you add a translation (B) to the sentence (A), a new line is added in "sentences", then two lines are added in "links".
- id=2, lang=fra, text=B
- sentence_id=1, translation_id=2
- sentence_id=2, translation_id=1

The tables I have described are part of the files that we distribute under "Sentences" and "Links" on our Downloads page.
https://tatoeba.org/eng/downloads

> I am working on some suggestions for a better review
> workflow/UI

I can already tell you what a better workflow and UI could look like in the grand scheme.

1) We should have a page that allows contributor to check sentences alone, without their translations. This ensures that the items in the table "sentences" are correct.
2) We should have another page that shows only a pair of sentences and let people confirm whether or not the two sentences are translations of each other. This ensures that the items in the table "links" are correct.
3) We should provide the possibility to attach meta-data to links, not just sentences.

{{vm.hiddenReplies[35252] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 6 days ago, edited 6 days ago May 20, 2020 at 9:17 PM, edited May 20, 2020 at 9:34 PM link Permalink

OK - so let me get a little bit ‚unstructured‘ and ask you some loose questions to help me get on track. I will be calling this reciprocal pair of links a ‘connection’ from now on

1. Initially a connection between two sentences always points in two directions with the help of two links.
• A is a translation of B
• B is a translation of A.

2. Breaking a connection between two sentences is achieved by removing both links.

3. Removing both links of a connection
• sentence_id=1, translation_id=2
• sentence_id=2, translation_id=1
does not affect any other link to/from either of the two participants (id=1 and id=2)

4. Re-linking a sentence is achieved by breaking the old connection (removing 2 links) and establishing a new connection (adding 2 links)

5. Is the database considered as being inconsistent if for some reason one link of this pair survives and this ‘half connection’ only points in one direction? Something like
• A is a translation of B
• B is not a translation of A

6. Sentences that are considered as being ‘indirectly linked’ are simply sentences that are two hops away from each other.
• A==B
• B==C
• A—C

7. Finding all direct links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of sentence A.

8. Finding all indirect links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of every single result of the query conducted in (7.)

9. Turning an indirect link into a direct link is achieved by establishing a new connection (adding 2 links)
• A==B (existing connection)
• A==C (existing connection)
• B==D (existing connection)

• A—D indirect link because of two hops (A==B, B==D)

• A==D (newly created connection)

But creating A==D doesn’t change anything for the already existing 2 hop relationship between A—D (A==B, B==D) - so I am essentially left with a direct and an indirect link at the same time?!?!?

There is obviously something I got wrong at an earlier stage...

10. Is there a way to distinguish between
• a connection (2 links) that is automatically being supplied/created by the system when a user contributes a translation
• a connection (2 links) that is manually created by a user either by turning an indirect link into a direct link or by simply establishing a new connection

11. How can you determine/trace the owner of a sentence and all his/her metrics (sentence count etc.)

{{vm.hiddenReplies[35253] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US 6 days ago May 21, 2020 at 1:22 AM link Permalink

1, 2, 3, 4: Yes.

5: Yes. If the database says that A is a translation of B, but B is not a translation of A, then it is inconsistent.

6: Yes.

7:
> Finding all direct links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of sentence A.

Well, yes and no. The database is organized in such a way that you don't need to page through all links in the database in order to find the ones connected to the ID of sentence A. But aside from efficiency considerations, the net result is the same: you are looking for all links associated with the ID of sentence A.

8.
> Finding all indirect links of sentence A requires to query the sentence_ID field of ALL links in the database against the ID of every single result of the query conducted in (7.)

Again, the algorithm is much more efficient than that, but the net result is the same.

9.
> But creating A==D doesn’t change anything for the already existing 2 hop relationship between A—D (A==B, B==D) - so I am essentially left with a direct and an indirect link at the same time?!?!?

Yes. In fact, between any two sentences that are directly linked (A==D), there can also be any number of indirect links (A==E, E==D; A==F, F==D; and so on).

10.
> Is there a way to distinguish between
• a connection (2 links) that is automatically being supplied/created by the system when a user contributes a translation
• a connection (2 links) that is manually created by a user either by turning an indirect link into a direct link or by simply establishing a new connection

Yes. There are various ways that one could keep track of such a distinction.

11.
> How can you determine/trace the owner of a sentence and all his/her metrics (sentence count etc.)

A database is designed to let you execute such queries, and do it efficiently, as long as you have recorded the relationships. In the same way that you can have one table that keeps track of sentences and their IDs, and another that keeps track of the links between pairs of sentences, you can have a table that associates sentences with owners, or vice versa.

{{vm.hiddenReplies[35255] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 6 days ago May 21, 2020 at 3:23 AM link Permalink

That was a fine and helpful response, Alan.

5. Does this happen sometimes with the Tatoeba database - and if yes, is this always due to a bug or are there other possible reasons.

Can such an inconsistency make its way until the final LINK output file?

7./8. Well I am sure that the server is index optimized for that application but if somebody wants to work with the two downloadable offline output files of the database would the methods I described above be the starting point for a query and I would have to write any optimization myself?

9. I am not sure whether you really understood where I was going here.

I really meant that for my direct link A==D I could also have an indirect link A—D in the same list due to the ‘2 hop rule derivation’ from (A==B, B==D).

A—D is always present and isn’t simply invalidated by the fact that I created an additional direct link A==D. Theoretically I could even have several identical indirect links A—D (derived from several different 2 hops) for one direct link A==D.

If that is true, then for a sentence UI presentation like the Tatoeba sentence page I had to diff all indirect links A—D against a potentially already existing direct link A==D in order not to have a translation show up as both, a direct link and an indirect link in one listing of translations?

10./11. I was a little confused here because Trang made it look like the whole Tatoeba database is just comprised of these two files - SENTENCES and LINKS - and everything could be interpolated and calculated from them :-)

But it seems there is more information stored about relationships of records.

So the most important questions for me right now are

12. Which events do create a direct link?

DIRECT LINK CREATOR LIST
• creating a new translation -> system is auto-creating a connection (2 links)
• manually linking by authorized user
• manually de-linking and re-linking to another sentence by authorized user



13. Are indirect links solely derived from ‘two hop derivations’ in the graph or are there other methods/events for creating (respectively other fields for storing) indirect links somewhere in the database?

INDIRECT LINK CREATOR LIST
• Derivation from the object graph of the LINK file (2 hop rule) by the system



14. Can a programmer at Tatoeba retrieve the following information from the database.

• What - in the DIRECT LINK CREATOR LIST - has initiated the creation of every individual connection in the database?
• What - in the INDIRECT LINK CREATOR LIST - was responsible for the existence of every individual indirect connection in the database?

{{vm.hiddenReplies[35257] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG 6 days ago May 21, 2020 at 1:15 PM link Permalink

9. We don't display a sentence as indirect translation if it is also direct translation.

The sentences are stored using a graph structure, but they are displayed using a table structure. When we turn the graph into a table, we have to make some choices.

Yes, technically speaking, if A = B and B = D and A = D, we could display D both in the list of "Translations" and the list of "Translations of translations". But we choose not too because it's not really useful.

10/11. The core of Tatoeba are those two files: sentences and links. If you would look at the rest of the files you'll see there's more.

The exact structure of Tatoeba's database is described here:
https://tatoeba.org/eng/wall/sh...#message_35234

12. Links are created when someone adds a translation or when an advanced contributor clicks the link button.

https://en.wiki.tatoeba.org/art.../intro-linking

13. Indirect links are solely derived from two hop derivations.

14. Your question is unclear. We can retrieve who has created the link. We can (but not always) retrieve if it was created by clicking on the "translate" button or by clicking on the "link" button.

{{vm.hiddenReplies[35262] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 5 days ago, edited 5 days ago May 21, 2020 at 9:00 PM, edited May 21, 2020 at 9:02 PM link Permalink

13. So if I create a new sentence and there is no translation available yet, but I do indeed see some similar sentences that offer the opportunity for being useful as indirect links, there is no way of doing this explicitly because of the two-hop-rule?

So what is the procedure to achieve this, how do I place my new sentence two hops away from all the potential candidates for getting an indirect link?

10./11. Gonna have a look at those links, thanks!

5. (after Alan‘s answer) ???

7./8. (after Alan#s answer) ???

————

{{vm.hiddenReplies[35273] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK 5 days ago, edited 5 days ago May 21, 2020 at 9:10 PM, edited May 21, 2020 at 9:11 PM link Permalink

One possibility is to leave a comment like I left on this sentence.

[#2645879] Tom has stopped crying. (CK) *audio*


Related:

[#6355107] Tom isn't crying anymore. (CK) *audio*


Eventually, perhaps these are related closely enough that some language can use the same sentence as a translation for both and then they will become indirectly-linked to each other.

{{vm.hiddenReplies[35274] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 5 days ago May 21, 2020 at 9:21 PM link Permalink

I understand, but I would rather like to know a way to achieve this right away instead of waiting for better times.

{{vm.hiddenReplies[35276] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir 5 days ago May 22, 2020 at 7:00 AM link Permalink

Epäsuoria linkkejä syntyy sitä enemmän, mitä enemmän linkkejä tietokannassa on.

Yleisesti voit siis kääntää niin monia lauseita kuin osaat, niin monella tavalla kuin osaat.

Jos haluat tietylle lauseelle käännöksen, voit pyytää jotakuta, joka kääntää kyseistä kielestä, kääntämään lauseen.

TRANG TRANG 5 days ago May 21, 2020 at 9:47 PM link Permalink

I will have to leave it to the rest of the community to answer questions that are still pending.

Note that we have a dev website where you can experiment as much as you need and are free to pollute the database with test sentences and test translations.

https://dev.tatoeba.org/

You can register a new account. If you need to be granted advanced contributor status over there so that you can use the linking feature, let me know what is your account.

rumpelstilzchen rumpelstilzchen 4 days ago May 22, 2020 at 5:09 PM link Permalink

> how do I place my new sentence two hops away from all the potential candidates for getting an indirect link?

You create a link between your new sentence and a sentence which has a direct link (i.e. it is "one hop away") to the "potential candidate".
But I don't understand why you want to have a sentence "two hops away" from another one. Can you give a concrete example?

> 5. Does this happen sometimes with the Tatoeba database - and if yes, is this always due to a bug or are there other possible reasons. Can such an inconsistency make its way until the final LINK output file?

I've found the following pairs in the current links.csv where only one part of the link is recorded in the database:
#247164 #5078553
#1423834 #3214227
#1918219 #1918220
#1918235 #1918236
#1943238 #3752771
#1943243 #3075927
#1943259 #3942190
#1943259 #3942318
#1943259 #3942320
#1943259 #3942329
#1943259 #3942351
#3778082 #5094901
#5755767 #7207453
#5755769 #7207455
#5850721 #5868889

Without further investigation I guess these inconsistencies are results of a bug (already fixed or still in the code).

> 7./8. Well I am sure that the server is index optimized for that application but if somebody wants to work with the two downloadable offline output files of the database would the methods I described above be the starting point for a query and I would have to write any optimization myself?

You don't need the optimization if you don't care that the query takes a little bit longer.
You could also import the data into a local database.

{{vm.hiddenReplies[35294] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago, edited 4 days ago May 22, 2020 at 8:31 PM, edited May 22, 2020 at 8:38 PM link Permalink

Thanks a lot for chiming in.

13. If I create a new sentence from scratch in my own language then there is no translation (direct link) available yet. OK?

I do however see some similar sentences that I want to show up as indirect links in the translation list beneath my naked new sentence, to indicate that they are ‘only similar’ - that’s what indirect links do.

However, according to the two-hop rule there is no indirect link without a direct link inbetween - translation of a translation!

So what hack of linking and unlinking do I have to perform to have a similar sentence show up as indirect link in the list of translations without having a directly linked translation yet?

Or is there a (legal) way of doing this explicitly?

5. I was wondering if there is a special reason behind duplicating/splitting every edge into two links if they only get created and destroyed in pairs anyway?

Instead of having two links
• A==B
• B==A

I could easily get along with
• A==B
and just read it out twice, the second time just in reverse (back-to-front).

In this case the links-file would only be half the size and inconsistency would be impossible.

7./8. I just wanted to make sure whether the two downloadable files give me any bells and whistles or whether I am all alone with the pure basic data and my approach of querying the whole list is the right starting point before any optimization should kick in.

Querying the whole set several times just seems to be so ridiculously expensive.

{{vm.hiddenReplies[35297] ? 'expand_more' : 'expand_less'}} hide replies show replies
rumpelstilzchen rumpelstilzchen 4 days ago May 23, 2020 at 5:03 AM link Permalink

> I do however see some similar sentences that I want to show up as indirect links in the translation list beneath my naked new sentence, to indicate that they are ‘only similar’ - that’s what indirect links do.

Do they? I may be wrong (I'm rather new to Tatoeba) but I don't think indirect links are supposed to indicate similarity between sentences.
As I understand it they are only shown because they are helpful in finding indirect translations which could be turned into direct translations.
I guess your use case is described in https://github.com/Tatoeba/tatoeba2/issues/1902 but you want to have similarity between different languages.

(I would still be interested in a concrete example from Tatoeba's corpus.)

> So what hack of linking and unlinking do I have to perform to have a similar sentence show up as indirect link in the list of translations without having a directly linked translation yet?

Since indirect links don't exist in the database and are only calculated there is no way to do that without the direct link.

> 5. I was wondering if there is a special reason behind duplicating/splitting every edge into two links if they only get created and destroyed in pairs anyway?

Good question :-) I don't think that's necessary but that part of the code was written in 2010 so I don't know the reason: https://github.com/Tatoeba/tato...52525f1R45-R47

> Querying the whole set several times just seems to be so ridiculously expensive.

That's why I would import the data into a local database.
Did you write some program/script for querying the data? I guess the most time consuming part is reading in all the data.
And yes, the two files only contain the pure data, no bells and whistles included.

AmarMecheri AmarMecheri 5 days ago May 21, 2020 at 10:54 PM link Permalink

@mramosh
If someone translates the orphan German sentences into English or French, I could follow in Kabyle language and many others could do the same in their language provided that they understand well. It's my opinion, even though it could be "unfair" and considered as an indue advantage for the most used wideworld languages. In the same time, I suggest to you to follow our orphan Kabyle sentences where they are made visible by other translations. This could be reciprocally helpful for German and Kabyle sentences and further for all others.

{{vm.hiddenReplies[35280] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 5 days ago May 21, 2020 at 11:19 PM link Permalink

> I suggest to you to follow our orphan Kabyle sentences where they are made visible by other translations.

I am told we shouldn’t translate from languages that we don’t speak just by deducing their meaning from already existing translations to languages that we know.

{{vm.hiddenReplies[35281] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 4 days ago, edited 4 days ago May 22, 2020 at 1:59 PM, edited May 22, 2020 at 2:17 PM link Permalink

@mramosh
First, please notice that I had suggested reciprocity: German / Kabyle already translated and Kabyle / German already translated.
If you follow this reasoning suggested to you, most of the little used wideworld languages ​​will remain orphans. It goes against the spirit of a multilingual platform. Notice that I don't mind too much, not at all; I'm used to breaking my head to understand after multiple cross translations and painstaking research. If you follow my gaze, I think you will end up agreeing with me.
Only idiomatic expressions can be problematic, this difficulty being circumvented by an epistolary exchange between two or more people who really want to work for better mutual understanding. This was the case, especially with @AlanF_US, who worked wonders to understand Kabyle idioms that the great masses do not master, except the initiates.

{{vm.hiddenReplies[35285] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago, edited 4 days ago May 22, 2020 at 2:15 PM, edited May 22, 2020 at 2:16 PM link Permalink

So give me an example!

e.g. I find a Kabyle sentence that is directly linked to a German one.

How do you suggest to proceed from here and for what goal?

{{vm.hiddenReplies[35286] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 4 days ago May 22, 2020 at 2:21 PM link Permalink

> e.g. I find a Kabyle sentence that is directly linked to a German one.
How do you suggest to proceed from here and for what goal?

If you read carefully, the answer is given in my above comment.

{{vm.hiddenReplies[35287] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago, edited 4 days ago May 22, 2020 at 2:25 PM, edited May 22, 2020 at 3:17 PM link Permalink

I couldn‘t figure it out, that’s why I was asking again.

The above seemed to me like you were asking us for adding a kab-ger translation by looking at some already existing kab-eng or kab-french translations...

So again:

e.g. I find a Kabyle sentence that is directly linked to a German one.

How do you suggest to proceed from here and for what goal?

{{vm.hiddenReplies[35288] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 4 days ago May 22, 2020 at 8:43 PM link Permalink

And vice-versa.

Ger-kab / Kab-ger
That's what I wrote above.

How? >> with help of both French-English..

For what goal? >>> for intercomprehension.

{{vm.hiddenReplies[35298] ? 'expand_more' : 'expand_less'}} hide replies show replies
mramosch mramosch 4 days ago May 22, 2020 at 11:20 PM link Permalink

I am still not sure whether I understand you correctly, but if you want to have the german sentence (of a ger-kab translation pair) translated to English/French in order to increase the visibility of Kabyle by creating more translations and links between kab and engl./french, then you must ask an English/French native to translate these german sentences, not a german speaker. We only translate from English/French to German because that’s what we know well.

If I guessed incorrectly could you just step-by-step describe the workflow how I as a native German speaker can be of use in your endeavor.

{{vm.hiddenReplies[35304] ? 'expand_more' : 'expand_less'}} hide replies show replies
AmarMecheri AmarMecheri 3 days ago May 24, 2020 at 1:24 PM link Permalink

@mramosh

> We only translate from English/French to German because that’s what we know well.

It's exactly what I said!

But don't worry about the Kabyle language.
We are tenacious!

mramosch mramosch 3 days ago May 24, 2020 at 11:12 AM link Permalink

Clicking a ‚linked‘ entry in the log of a sentence highlights the actual sentence in the list of translations. A very hidden feature but I‘m glad I found it ;-)

When unlinking a sentence the ‚linked’ entry still remains in the log and an additional ‚unlinked‘ entry appears in the list.

Having gotten used to the highlighting feature I really miss the same behavior for the ‚unlinked‘ log entries, provided the unlinked sentence is still in the list of translations as an indirect link, of course.

And even highlighting the indirect link when pressing an outdated ‚linked‘ entry would be useful for my workflow.

Needless to mention that
-> clicking on a sentence in the translation list causing the highlighting of all relevant entries in the log - or at least the last relevant ‚linked‘ respectively ‚unlinked‘ log

would be excellent, provided the UI and automatic scrolling allows for that feature.

CK CK 3 days ago May 23, 2020 at 11:32 PM link Permalink

** Stats - 2020-05-23 - Native Speakers with Contributions **

http://tatoeba.ueuo.com/stats-2020-05-23.html