menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
lipao lipao May 18, 2015, edited May 18, 2015 May 18, 2015 at 1:23:31 PM UTC, edited May 18, 2015 at 1:25:33 PM UTC link Permalink

Hello, I'll ask a silly question that must have been asked a thousand times before, but still: Is there any way to search for a part of some words, I mean, for an affix, or even for a single letter inside a word? Would it be possible somehow to teach the search query how to work with asterisks? (I'm computer illiterate, you know.)

{{vm.hiddenReplies[22731] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux May 18, 2015 May 18, 2015 at 6:45:45 PM UTC link Permalink

Hello lipao. No, it’s not possible the way you describe it, but you might want to have a look at this article: http://en.wiki.tatoeba.org/arti...w/text-search. Pretty much all you can do is described there.

{{vm.hiddenReplies[22740] ? 'expand_more' : 'expand_less'}} hide replies show replies
lipao lipao May 18, 2015 May 18, 2015 at 7:04:44 PM UTC link Permalink

OK, thank you.

tornado tornado May 18, 2015 May 18, 2015 at 8:09:41 PM UTC link Permalink

I think what lipao is looking for is called a "wildcard search". It's especially useful to study root words and affixes. There are many search functions on that wiki page, but apparently using wildcards is not possible. Support for that in the future would be highly appreciated.

{{vm.hiddenReplies[22742] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US May 18, 2015 May 18, 2015 at 10:37:05 PM UTC link Permalink

You can download the sentences in a given language and then search through the downloaded file. Naturally, that requires more effort, as well as knowledge of whichever program or programming language you use to process it, but it is an option in the absence of a wildcard search offered on the site itself.

{{vm.hiddenReplies[22743] ? 'expand_more' : 'expand_less'}} hide replies show replies
tornado tornado May 18, 2015, edited May 19, 2015 May 18, 2015 at 11:22:23 PM UTC, edited May 19, 2015 at 12:54:09 AM UTC link Permalink

I didn't think about it. Definitely, it would be much more difficult working on a huge file than Tatoeba would be having native support for that, but it's better than nothing. Thank you.

{{vm.hiddenReplies[22746] ? 'expand_more' : 'expand_less'}} hide replies show replies
tornado tornado June 27, 2015 June 27, 2015 at 1:10:25 PM UTC link Permalink

It's supported now. Thanks to those who spent their time and energy to implement that feature.

{{vm.hiddenReplies[23257] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US June 27, 2015 June 27, 2015 at 3:09:39 PM UTC link Permalink

Yes, many thanks to gillux, who made this change.

You can't search for a single letter (because this would make the index huge), but you can search for a string of three or more letters.

{{vm.hiddenReplies[23258] ? 'expand_more' : 'expand_less'}} hide replies show replies
tornado tornado June 27, 2015 June 27, 2015 at 4:28:20 PM UTC link Permalink

Searching for a single letter is obviously an extreme case, but some may want to search for two letters, depending on language.

I have also managed to conduct some wildcard searches that consist of two letters (they have at least one non-English letter).

http://tatoeba.org/eng/sentence...rom=tur&to=und
http://tatoeba.org/eng/sentence...rom=tur&to=und

Anyway, allowing three letters would usually be sufficient.

{{vm.hiddenReplies[23264] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US June 27, 2015 June 27, 2015 at 5:37:57 PM UTC link Permalink

In the first search you mentioned, "uş" is represented internally as three characters: u + s + cedilla. In the second search, "tı" is represented internally as two characters, but the second one is two bytes long (in UTF-8). So maybe the criterion is that the string needs to be at least three bytes long.

Probably the summary, for people who are not interested in the technical details, is that searches for strings of three characters will work, while searches for strings of two characters may or may not, depending on what they are. Searches for strings of one character probably will not work.

danepo danepo May 19, 2015 May 19, 2015 at 7:31:03 AM UTC link Permalink

You can only search for whole words. You have to search like this:

mi|vi|ni|ili|li|ŝi|Tom|Mary|Tomo|Manjo|Maria havas|havis|havos|havu|havus hundon|hundojn|virhundon|virhundojn|hundinon|hundinojn

I think you can search for more than 200 letters.