menu
Tatoeba
language English
Register Log in
language English
menu
Tatoeba
Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
AlanF_US AlanF_US August 23, 2019 at 2:42 AM, edited August 23, 2019 at 2:49 AM August 23, 2019 at 2:42 AM, edited August 23, 2019 at 2:49 AM link Permalink

Search is finding sentences that do not contain the specified word. This was reported 24 days ago (see link 1 below), and @gillux made a change that he thought might have fixed the problem, but apparently it didn't. For instance, a general search for "Tom" in English (see link 2 below) brings up these sentences:

Damn.
#1078143

Beat it.
#37902

Damn you!
#1135061

We talked.
#2107672

How unfortunate!
#2111810

These sentences are owned by a variety of people, and the logs don't reveal anything that suggests that they ever contained the word "Tom". Nor do all the sentences contain a tag, or audio. However, they are all short. If I set the sort order to random or to longest sentences first, I don't see any false hits.

Could it be that the search engine has seen so many sentences with "Tom" that it now hallucinates them even in sentences that don't contain the word?

I reported this problem on GitHub as well:

https://github.com/Tatoeba/tatoeba2/issues/1944

Link 1: https://tatoeba.org/eng/wall/sh...#message_32265

Link 2: https://tatoeba.org/eng/sentenc...io=&sort=words

{{vm.hiddenReplies[32491] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir August 23, 2019 at 10:43 AM August 23, 2019 at 10:43 AM link Permalink

I had some similar issues within a day.

deniko deniko August 23, 2019 at 2:44 PM August 23, 2019 at 2:44 PM link Permalink

Yeah, seems like the same problem resurfaced.

"Where is the butter?"

https://i.imgur.com/846O0b5.png

{{vm.hiddenReplies[32495] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG August 23, 2019 at 9:07 PM August 23, 2019 at 9:07 PM link Permalink

I just searched again for "darn" as reported by brauchinet in the previous thread:
https://tatoeba.org/eng/sentenc...rom=eng&to=und

There are 12 results and 2 are incorrect:

#1202147 - I'm about to die.
#690834 - There are no comments for now.

Do you remember if those were the incorrect sentences when you also checked for this search?
(cf. https://tatoeba.org/eng/wall/sh...message_32269)

{{vm.hiddenReplies[32496] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet August 24, 2019 at 6:25 AM, edited August 24, 2019 at 8:12 AM August 24, 2019 at 6:25 AM, edited August 24, 2019 at 8:12 AM link Permalink

I'm sure that these were not the incorrect results I found the first time.

I noticed that when I use words from wrong search results I get wrong results again.
For example, deniko's "what the fuck" -> "Where is the butter?"

Take "butter":
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Many errors and even "Get the fuck out!"

Edit:
I don’t know if this is of any help:
The “darn” search finds 12 results. I did a search using the downloadable data.
The ones not found by the search engine (that is: wrongly displayed) are:
#643316 Darn!
#1359478 Well I'll be darned!

TRANG TRANG August 23, 2019 at 9:09 PM August 23, 2019 at 9:09 PM link Permalink

To everyone, please report here every strange search results that you find. Posting the URL of the search is enough.

I really have no idea what is causing this right now. We need to find clues on how to reproduce it systematically.

{{vm.hiddenReplies[32497] ? 'expand_more' : 'expand_less'}} hide replies show replies
cojiluc cojiluc August 24, 2019 at 7:37 AM August 24, 2019 at 7:37 AM link Permalink

The search below produces a wrong result, since the searched term is nonexistent in the result.

https://tatoeba.org/eng/sentenc...ery=histrionic

{{vm.hiddenReplies[32499] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet August 24, 2019 at 7:54 AM August 24, 2019 at 7:54 AM link Permalink

This search should have found:
#8110003 He became histrionic upon hearing the news.

AlanF_US AlanF_US August 24, 2019 at 7:19 PM August 24, 2019 at 7:19 PM link Permalink

Trang, I think that your complete reindexing ( https://github.com/Tatoeba/tato...ment-524564520 ) solved the problem. When I search for such words as "Tom", "darn", "histrionic", "away", or "butter", I only see good results now. Thank you!

{{vm.hiddenReplies[32505] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG August 24, 2019 at 7:28 PM August 24, 2019 at 7:28 PM link Permalink

Just a warning: this may solve the issue only for the short term. We have not identified the cause yet.

At least we know that the issue does not happen in the indexation of the main indexes. But something might be going wrong in the delta indexes, or during the merge of the delta indexes into the main indexes...

We will have to see if it happens again.

{{vm.hiddenReplies[32506] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet August 26, 2019 at 11:45 AM, edited August 26, 2019 at 12:00 PM August 26, 2019 at 11:45 AM, edited August 26, 2019 at 12:00 PM link Permalink

Sorry to say it has happened again:

https://tatoeba.org/eng/sentenc...&=Any+language

Most of (all of?) the wrong results are recently added sentences.

Search: The audience gasped
https://tatoeba.org/eng/sentenc...&=Any+language
A driver was blocking the intersection.

Search:
https://tatoeba.org/eng/sentenc...&=Any+language
This city suffers from gridlock.

and so on.

{{vm.hiddenReplies[32515] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG August 26, 2019 at 2:55 PM August 26, 2019 at 2:55 PM link Permalink

Thanks for reporting.

I updated the GitHub issue:
https://github.com/Tatoeba/tatoeba2/issues/1944

Still haven't identified the cause...

{{vm.hiddenReplies[32518] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet August 26, 2019 at 6:09 PM, edited August 26, 2019 at 7:10 PM August 26, 2019 at 6:09 PM, edited August 26, 2019 at 7:10 PM link Permalink

One more observation:

take a wrong search result (for example: totally lost):
https://tatoeba.org/eng/sentenc...rom=eng&to=und

#8130253 Tom read a book with his son.
go back 5 English sentences (5 times "previous" with language=eng)
and you get the correct result:
#8130246 Tom was totally lost.

I tried quite a few, it always worked.

(Well, it only works with the most recent sentences - it doesn't with older ones such as:
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Kiev is the capital of Ukraine.)

{{vm.hiddenReplies[32519] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US August 27, 2019 at 3:20 PM August 27, 2019 at 3:20 PM link Permalink

Just curious: How did you figure that out?

{{vm.hiddenReplies[32524] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet August 27, 2019 at 5:42 PM August 27, 2019 at 5:42 PM link Permalink

Starting from one incorrect result sentence (eg. Tom was totally lost) I got a chain of sentences:
#8130253 Tom read a book with his son
#8130259 He would always break his promise
#8130357 Tom was confused by what had happened
#8130381 Tom thought the attraction was mutual

Obviously the sentence numbers are increasing, but, to my disappointment, not by equal steps. This brought me to the idea that maybe only English sentences count.

soliloquist soliloquist August 26, 2019 at 8:23 PM August 26, 2019 at 8:23 PM link Permalink

Searching for the word 'origami' yields #8130386 (Tom reviewed his notes.)

https://tatoeba.org/eng/sentenc...sort=relevance

{{vm.hiddenReplies[32521] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US August 26, 2019 at 8:33 PM August 26, 2019 at 8:33 PM link Permalink

That instance follows the "brauchinet rule": pressing the "previous" button five times with the language set to "eng" gets you back to a sentence that contains the search word (in this case, #8130378 ).

{{vm.hiddenReplies[32522] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist August 26, 2019 at 8:40 PM August 26, 2019 at 8:40 PM link Permalink

I see. Thanks.

TRANG TRANG August 27, 2019 at 7:05 PM August 27, 2019 at 7:05 PM link Permalink

This search seems to have been solved by itself. I do not see "Tom reviewed his notes" in the results...

{{vm.hiddenReplies[32526] ? 'expand_more' : 'expand_less'}} hide replies show replies
soliloquist soliloquist August 27, 2019 at 7:18 PM August 27, 2019 at 7:18 PM link Permalink

I added translations for that sentence after reporting the issue on the Wall. When I rechecked the search link after several minutes, the irrelevant sentence was gone. Could they be connected? Maybe translating a misindexed sentence triggered something.

cojiluc cojiluc October 7, 2019 at 5:11 PM, edited October 7, 2019 at 5:14 PM October 7, 2019 at 5:11 PM, edited October 7, 2019 at 5:14 PM link Permalink

The below search finds no result

https://tatoeba.org/eng/sentenc...rom=und&to=und

But there is actually a sentence containing these words:

https://tatoeba.org/eng/sentences/show/8130258

I suspect one cause:

At the exact time 2019-08-25 17:03 eleven sentences have been simultaneously added:

https://tatoeba.org/eng/sentences/show/8130248
https://tatoeba.org/eng/sentences/show/8130249
https://tatoeba.org/eng/sentences/show/8130250
https://tatoeba.org/eng/sentences/show/8130251
https://tatoeba.org/eng/sentences/show/8130252
https://tatoeba.org/eng/sentences/show/8130253
https://tatoeba.org/eng/sentences/show/8130254
https://tatoeba.org/eng/sentences/show/8130255
https://tatoeba.org/eng/sentences/show/8130256
https://tatoeba.org/eng/sentences/show/8130257
https://tatoeba.org/eng/sentences/show/8130258

Perhaps the system cannot correctly assign numbers to the sentences if too many of them are added at the same time.

{{vm.hiddenReplies[33129] ? 'expand_more' : 'expand_less'}} hide replies show replies
brauchinet brauchinet October 7, 2019 at 5:49 PM October 7, 2019 at 5:49 PM link Permalink

This because "?" is treated as a one-letter-wildcard.
like in: https://tatoeba.org/eng/sentenc...rom=por&to=und

{{vm.hiddenReplies[33130] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US October 8, 2019 at 2:29 AM October 8, 2019 at 2:29 AM link Permalink

Exactly. On this wiki page:

https://en.wiki.tatoeba.org/art...w/text-search#

you will find the following:

"Leave punctuation out of your search string. Most punctuation will be ignored, but a final exclamation mark (!) or question mark (?) will actually interfere with the search."