Menu
Search is finding sentences that do not contain the specified word. This was reported 24 days ago (see link 1 below), and @gillux made a change that he thought might have fixed the problem, but apparently it didn't. For instance, a general search for "Tom" in English (see link 2 below) brings up these sentences:
Damn.
#1078143
Beat it.
#37902
Damn you!
#1135061
We talked.
#2107672
How unfortunate!
#2111810
These sentences are owned by a variety of people, and the logs don't reveal anything that suggests that they ever contained the word "Tom". Nor do all the sentences contain a tag, or audio. However, they are all short. If I set the sort order to random or to longest sentences first, I don't see any false hits.
Could it be that the search engine has seen so many sentences with "Tom" that it now hallucinates them even in sentences that don't contain the word?
I reported this problem on GitHub as well:
https://github.com/Tatoeba/tatoeba2/issues/1944
Link 1: https://tatoeba.org/eng/wall/sh...#message_32265
Link 2: https://tatoeba.org/eng/sentenc...io=&sort=words
I had some similar issues within a day.
Yeah, seems like the same problem resurfaced.
"Where is the butter?"
https://i.imgur.com/846O0b5.png
I just searched again for "darn" as reported by brauchinet in the previous thread:
https://tatoeba.org/eng/sentenc...rom=eng&to=und
There are 12 results and 2 are incorrect:
#1202147 - I'm about to die.
#690834 - There are no comments for now.
Do you remember if those were the incorrect sentences when you also checked for this search?
(cf. https://tatoeba.org/eng/wall/sh...message_32269)
I'm sure that these were not the incorrect results I found the first time.
I noticed that when I use words from wrong search results I get wrong results again.
For example, deniko's "what the fuck" -> "Where is the butter?"
Take "butter":
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Many errors and even "Get the fuck out!"
Edit:
I don’t know if this is of any help:
The “darn” search finds 12 results. I did a search using the downloadable data.
The ones not found by the search engine (that is: wrongly displayed) are:
#643316 Darn!
#1359478 Well I'll be darned!
To everyone, please report here every strange search results that you find. Posting the URL of the search is enough.
I really have no idea what is causing this right now. We need to find clues on how to reproduce it systematically.
The search below produces a wrong result, since the searched term is nonexistent in the result.
https://tatoeba.org/eng/sentenc...ery=histrionic
This search should have found:
#8110003 He became histrionic upon hearing the news.
Trang, I think that your complete reindexing ( https://github.com/Tatoeba/tato...ment-524564520 ) solved the problem. When I search for such words as "Tom", "darn", "histrionic", "away", or "butter", I only see good results now. Thank you!
Just a warning: this may solve the issue only for the short term. We have not identified the cause yet.
At least we know that the issue does not happen in the indexation of the main indexes. But something might be going wrong in the delta indexes, or during the merge of the delta indexes into the main indexes...
We will have to see if it happens again.
Sorry to say it has happened again:
https://tatoeba.org/eng/sentenc...&=Any+language
Most of (all of?) the wrong results are recently added sentences.
Search: The audience gasped
https://tatoeba.org/eng/sentenc...&=Any+language
A driver was blocking the intersection.
Search:
https://tatoeba.org/eng/sentenc...&=Any+language
This city suffers from gridlock.
and so on.
Thanks for reporting.
I updated the GitHub issue:
https://github.com/Tatoeba/tatoeba2/issues/1944
Still haven't identified the cause...
One more observation:
take a wrong search result (for example: totally lost):
https://tatoeba.org/eng/sentenc...rom=eng&to=und
#8130253 Tom read a book with his son.
go back 5 English sentences (5 times "previous" with language=eng)
and you get the correct result:
#8130246 Tom was totally lost.
I tried quite a few, it always worked.
(Well, it only works with the most recent sentences - it doesn't with older ones such as:
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Kiev is the capital of Ukraine.)
Just curious: How did you figure that out?
Starting from one incorrect result sentence (eg. Tom was totally lost) I got a chain of sentences:
#8130253 Tom read a book with his son
#8130259 He would always break his promise
#8130357 Tom was confused by what had happened
#8130381 Tom thought the attraction was mutual
Obviously the sentence numbers are increasing, but, to my disappointment, not by equal steps. This brought me to the idea that maybe only English sentences count.
Searching for the word 'origami' yields #8130386 (Tom reviewed his notes.)
https://tatoeba.org/eng/sentenc...sort=relevance
That instance follows the "brauchinet rule": pressing the "previous" button five times with the language set to "eng" gets you back to a sentence that contains the search word (in this case, #8130378 ).
I see. Thanks.
This search seems to have been solved by itself. I do not see "Tom reviewed his notes" in the results...
I added translations for that sentence after reporting the issue on the Wall. When I rechecked the search link after several minutes, the irrelevant sentence was gone. Could they be connected? Maybe translating a misindexed sentence triggered something.
The below search finds no result
https://tatoeba.org/eng/sentenc...rom=und&to=und
But there is actually a sentence containing these words:
https://tatoeba.org/eng/sentences/show/8130258
I suspect one cause:
At the exact time 2019-08-25 17:03 eleven sentences have been simultaneously added:
https://tatoeba.org/eng/sentences/show/8130248
https://tatoeba.org/eng/sentences/show/8130249
https://tatoeba.org/eng/sentences/show/8130250
https://tatoeba.org/eng/sentences/show/8130251
https://tatoeba.org/eng/sentences/show/8130252
https://tatoeba.org/eng/sentences/show/8130253
https://tatoeba.org/eng/sentences/show/8130254
https://tatoeba.org/eng/sentences/show/8130255
https://tatoeba.org/eng/sentences/show/8130256
https://tatoeba.org/eng/sentences/show/8130257
https://tatoeba.org/eng/sentences/show/8130258
Perhaps the system cannot correctly assign numbers to the sentences if too many of them are added at the same time.
This because "?" is treated as a one-letter-wildcard.
like in: https://tatoeba.org/eng/sentenc...rom=por&to=und
Exactly. On this wiki page:
https://en.wiki.tatoeba.org/art...w/text-search#
you will find the following:
"Leave punctuation out of your search string. Most punctuation will be ignored, but a final exclamation mark (!) or question mark (?) will actually interfere with the search."