clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search
AlanF_US
2019-08-23 02:42 - 2019-08-23 02:49
Search is finding sentences that do not contain the specified word. This was reported 24 days ago (see link 1 below), and @gillux made a change that he thought might have fixed the problem, but apparently it didn't. For instance, a general search for "Tom" in English (see link 2 below) brings up these sentences:

Damn.
#1078143

Beat it.
#37902

Damn you!
#1135061

We talked.
#2107672

How unfortunate!
#2111810

These sentences are owned by a variety of people, and the logs don't reveal anything that suggests that they ever contained the word "Tom". Nor do all the sentences contain a tag, or audio. However, they are all short. If I set the sort order to random or to longest sentences first, I don't see any false hits.

Could it be that the search engine has seen so many sentences with "Tom" that it now hallucinates them even in sentences that don't contain the word?

I reported this problem on GitHub as well:

https://github.com/Tatoeba/tatoeba2/issues/1944

Link 1: https://tatoeba.org/eng/wall/sh...#message_32265

Link 2: https://tatoeba.org/eng/sentenc...io=&sort=words

hide replies
Thanuir
2019-08-23 10:43
I had some similar issues within a day.
deniko
2019-08-23 14:44
Yeah, seems like the same problem resurfaced.

"Where is the butter?"

https://i.imgur.com/846O0b5.png
hide replies
TRANG
2019-08-23 21:07
I just searched again for "darn" as reported by brauchinet in the previous thread:
https://tatoeba.org/eng/sentenc...rom=eng&to=und

There are 12 results and 2 are incorrect:

#1202147 - I'm about to die.
#690834 - There are no comments for now.

Do you remember if those were the incorrect sentences when you also checked for this search?
(cf. https://tatoeba.org/eng/wall/sh...message_32269)
hide replies
brauchinet
2019-08-24 06:25 - 2019-08-24 08:12
I'm sure that these were not the incorrect results I found the first time.

I noticed that when I use words from wrong search results I get wrong results again.
For example, deniko's "what the fuck" -> "Where is the butter?"

Take "butter":
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Many errors and even "Get the fuck out!"

Edit:
I don’t know if this is of any help:
The “darn” search finds 12 results. I did a search using the downloadable data.
The ones not found by the search engine (that is: wrongly displayed) are:
#643316 Darn!
#1359478 Well I'll be darned!
TRANG
2019-08-23 21:09
To everyone, please report here every strange search results that you find. Posting the URL of the search is enough.

I really have no idea what is causing this right now. We need to find clues on how to reproduce it systematically.
hide replies
cojiluc
2019-08-24 07:37
The search below produces a wrong result, since the searched term is nonexistent in the result.

https://tatoeba.org/eng/sentenc...ery=histrionic
hide replies
brauchinet
2019-08-24 07:54
This search should have found:
#8110003 He became histrionic upon hearing the news.
AlanF_US
2019-08-24 19:19
Trang, I think that your complete reindexing ( https://github.com/Tatoeba/tato...ment-524564520 ) solved the problem. When I search for such words as "Tom", "darn", "histrionic", "away", or "butter", I only see good results now. Thank you!
hide replies
TRANG
2019-08-24 19:28
Just a warning: this may solve the issue only for the short term. We have not identified the cause yet.

At least we know that the issue does not happen in the indexation of the main indexes. But something might be going wrong in the delta indexes, or during the merge of the delta indexes into the main indexes...

We will have to see if it happens again.
hide replies
brauchinet
2019-08-26 11:45 - 2019-08-26 12:00
Sorry to say it has happened again:

https://tatoeba.org/eng/sentenc...&=Any+language

Most of (all of?) the wrong results are recently added sentences.

Search: The audience gasped
https://tatoeba.org/eng/sentenc...&=Any+language
A driver was blocking the intersection.

Search:
https://tatoeba.org/eng/sentenc...&=Any+language
This city suffers from gridlock.

and so on.
hide replies
TRANG
2019-08-26 14:55
Thanks for reporting.

I updated the GitHub issue:
https://github.com/Tatoeba/tatoeba2/issues/1944

Still haven't identified the cause...
hide replies
brauchinet
2019-08-26 18:09 - 2019-08-26 19:10
One more observation:

take a wrong search result (for example: totally lost):
https://tatoeba.org/eng/sentenc...rom=eng&to=und

#8130253 Tom read a book with his son.
go back 5 English sentences (5 times "previous" with language=eng)
and you get the correct result:
#8130246 Tom was totally lost.

I tried quite a few, it always worked.

(Well, it only works with the most recent sentences - it doesn't with older ones such as:
https://tatoeba.org/eng/sentenc...rom=eng&to=und
Kiev is the capital of Ukraine.)

hide replies
AlanF_US
2019-08-27 15:20
Just curious: How did you figure that out?
hide replies
brauchinet
2019-08-27 17:42
Starting from one incorrect result sentence (eg. Tom was totally lost) I got a chain of sentences:
#8130253 Tom read a book with his son
#8130259 He would always break his promise
#8130357 Tom was confused by what had happened
#8130381 Tom thought the attraction was mutual

Obviously the sentence numbers are increasing, but, to my disappointment, not by equal steps. This brought me to the idea that maybe only English sentences count.
soliloquist
2019-08-26 20:23
Searching for the word 'origami' yields #8130386 (Tom reviewed his notes.)

https://tatoeba.org/eng/sentenc...sort=relevance
hide replies
AlanF_US
2019-08-26 20:33
That instance follows the "brauchinet rule": pressing the "previous" button five times with the language set to "eng" gets you back to a sentence that contains the search word (in this case, #8130378 ).
hide replies
soliloquist
2019-08-26 20:40
I see. Thanks.
TRANG
2019-08-27 19:05
This search seems to have been solved by itself. I do not see "Tom reviewed his notes" in the results...
hide replies
soliloquist
2019-08-27 19:18
I added translations for that sentence after reporting the issue on the Wall. When I rechecked the search link after several minutes, the irrelevant sentence was gone. Could they be connected? Maybe translating a misindexed sentence triggered something.
hide replies
brauchinet
11 days ago
This because "?" is treated as a one-letter-wildcard.
like in: https://tatoeba.org/eng/sentenc...rom=por&to=und
hide replies
AlanF_US
11 days ago
Exactly. On this wiki page:

https://en.wiki.tatoeba.org/art...w/text-search#

you will find the following:

"Leave punctuation out of your search string. Most punctuation will be ignored, but a final exclamation mark (!) or question mark (?) will actually interfere with the search."