menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
CK CK May 31, 2020 May 31, 2020 at 4:26:09 AM UTC link Permalink

I think it might be a good idea to eliminate the question mark (?) wildcard for searching.

I think that users should be able to paste in any sentence in the search form and get a result if the sentence is in the database. This doesn't happen now for sentences with question marks.

I think this usefulness far outweighs the usefulness of the one-letter wildcard.

If you're a member of this website and frequently use the ? wildcard, perhaps you could tell us how often and whether you'd be greatly inconvenienced if this were removed.

{{vm.hiddenReplies[35435] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux June 1, 2020 June 1, 2020 at 7:02:16 AM UTC link Permalink

This is not something easy to implement at all. The question mark is not a searchable character in the first place.

{{vm.hiddenReplies[35445] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK June 1, 2020, edited June 1, 2020 June 1, 2020 at 7:16:42 AM UTC, edited June 1, 2020 at 10:31:40 AM UTC link Permalink

How about just disabling the question mark as a wildcard and ignore it in searches?

We can get both "Mr Smith" and "Mr. Smith" with this search.

https://tatoeba.org/eng/sentenc...uery=Mr.+Smith

Not getting results for sentences with question marks seems strange, since question marks are a part of language, and this is a website dealing with language.

It doesn't bother me that the question mark itself isn't searchable, but it does bother me that if you search for something as simple as the following, it appears that we don't yet have it.

Why are you doing that?
https://tatoeba.org/eng/sentenc...rom=und&to=und

It's not really intuitive for new visitors to avoid using question marks. I've been on this site a long time and even I often have to do searches more than once because I forget to leave off the question marks..

Question marks are allowed when doing Google searches which a lot of people are used to using.



{{vm.hiddenReplies[35446] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux June 3, 2020, edited June 3, 2020 June 3, 2020 at 4:38:26 PM UTC, edited June 3, 2020 at 4:38:51 PM UTC link Permalink

This is probably not something easy to implement. The question mark and the star are one single feature of Manticore. As a result, we can only enable both the question mark and the star or none of them.

Thanuir Thanuir June 1, 2020 June 1, 2020 at 9:31:01 AM UTC link Permalink

Olen tästä samaa mieltä CK:n kanssa. En kylläkään ole hakutoiminnon tehokäyttäjä.

orion17 orion17 June 2, 2020, edited June 2, 2020 June 2, 2020 at 8:48:45 AM UTC, edited June 2, 2020 at 8:49:30 AM UTC link Permalink

Please don't remove this feature. I always use this feature every time I search Arabic sentences, because Arabic morphology is really inflected. It involves not only suffixes, but prefixes and infixes as well. The search engine used in Tatoeba doesn't seem work 100% to search Arabic words. For example I try to find sentences containing word "علم" but it shows only words with suffixes like علمت, علموا, علمه, and so on. It doesn't show words like يعلم or تعلم. When I tried to search sentences containing "يعلم", it shows words like يعلمون as well. There are no words like تعلم, نعلم, or أعلم. But, when I type "علم?" I will find them.
This feature also helps me to find words in pattern with different root, since Arabic is from Semitic languages.
For example it helps me to search any sentences containing word with pattern like مفعول. I can type in the search box like this:
م??و?
and it'll show sentences containing words like مبروك مسرور مفتوح مشغول مجنون and so on.

And for your information, I asked this feature years ago here https://tatoeba.org/eng/wall/sh...#message_22849 and it really helps me a lot since then.

gillux gillux June 3, 2020 June 3, 2020 at 5:23:49 PM UTC link Permalink

I have given some more thoughts about this problem. I think we can make use of another feature of Manticore (blended characters) to make the search engine find the sentence that you want. Question marks used in the search query would both be treated as a wildcard and as a normal character. In other words, searching for "Is she?" would return the sentence "Is she?" as well as sentences containing "is shed". Giving shorter sentences appear first by default, copy pasting an interrogative sentence in the search box should let you find it right away.

I will update dev with that setting tomorrow to let you guys try it out.

On a side note, the percent character (%) is also a wildcard that means "zero or one character". As a result, searching for "100%" currently matches sentences containing 1000.

gillux gillux June 4, 2020, edited June 4, 2020 June 4, 2020 at 8:03:46 AM UTC, edited June 4, 2020 at 8:03:51 AM UTC link Permalink

I updated https://dev.tatoeba.org/ with an experiment setting to solve the problem of not finding sentences when searching for interrogative questions.

orion17, could you also try and let me know if it causes any problem when you search for "علم?" and the like?

{{vm.hiddenReplies[35457] ? 'expand_more' : 'expand_less'}} hide replies show replies
orion17 orion17 June 4, 2020, edited June 4, 2020 June 4, 2020 at 1:12:26 PM UTC, edited June 4, 2020 at 1:16:03 PM UTC link Permalink

I have tried and it still works, except that I just wonder why the results on dev are fewer than those on main tatoeba.

I forgot to mention on my earlier post that when I searched for verb "علم", it did show also sentences containing words with prefixes like العلم, ‎معلم and so on. However, it showed neither the present tense forms (يعلم تعلم نعلم أعلم) nor the imperative form (اعلم)
Therefore, I need to use a question mark to find one. But this prevents me to find this kind of words with suffixes, then I write ‏*?علم*‏ and it shows exactly what I want like يعلمون

{{vm.hiddenReplies[35461] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux June 4, 2020 June 4, 2020 at 4:18:15 PM UTC link Permalink

> the results on dev are fewer than those on main tatoeba

The corpus of dev is an old copy of the main Tatoeba, so it only contains old sentences.

> it showed neither the present tense forms (يعلم تعلم نعلم أعلم) nor the imperative form (اعلم)

I suspect that’s because only words with 4 characters or more can be found using a different tense or form. "علم" is only 3 characters long.

This limitation is rather arbitrary and is set because the search engine documentation suggests the stemmer may have problems dealing with short words https://docs.manticoresearch.co...n-stemming-len I think we could consider removing the 4 characters limit.

gillux gillux July 6, 2020 July 6, 2020 at 12:13:45 PM UTC link Permalink

It’s now possible to search for entire questions on tatoeba.org.

{{vm.hiddenReplies[35596] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 July 6, 2020 July 6, 2020 at 12:39:35 PM UTC link Permalink

Whoa! Thank you!