Menu
Hello, I'll ask a silly question that must have been asked a thousand times before, but still: Is there any way to search for a part of some words, I mean, for an affix, or even for a single letter inside a word? Would it be possible somehow to teach the search query how to work with asterisks? (I'm computer illiterate, you know.)
Hello lipao. No, it’s not possible the way you describe it, but you might want to have a look at this article: http://en.wiki.tatoeba.org/arti...w/text-search. Pretty much all you can do is described there.
OK, thank you.
I think what lipao is looking for is called a "wildcard search". It's especially useful to study root words and affixes. There are many search functions on that wiki page, but apparently using wildcards is not possible. Support for that in the future would be highly appreciated.
You can download the sentences in a given language and then search through the downloaded file. Naturally, that requires more effort, as well as knowledge of whichever program or programming language you use to process it, but it is an option in the absence of a wildcard search offered on the site itself.
I didn't think about it. Definitely, it would be much more difficult working on a huge file than Tatoeba would be having native support for that, but it's better than nothing. Thank you.
It's supported now. Thanks to those who spent their time and energy to implement that feature.
Yes, many thanks to gillux, who made this change.
You can't search for a single letter (because this would make the index huge), but you can search for a string of three or more letters.
Searching for a single letter is obviously an extreme case, but some may want to search for two letters, depending on language.
I have also managed to conduct some wildcard searches that consist of two letters (they have at least one non-English letter).
http://tatoeba.org/eng/sentence...rom=tur&to=und
http://tatoeba.org/eng/sentence...rom=tur&to=und
Anyway, allowing three letters would usually be sufficient.
In the first search you mentioned, "uş" is represented internally as three characters: u + s + cedilla. In the second search, "tı" is represented internally as two characters, but the second one is two bytes long (in UTF-8). So maybe the criterion is that the string needs to be at least three bytes long.
Probably the summary, for people who are not interested in the technical details, is that searches for strings of three characters will work, while searches for strings of two characters may or may not, depending on what they are. Searches for strings of one character probably will not work.
You can only search for whole words. You have to search like this:
mi|vi|ni|ili|li|ŝi|Tom|Mary|Tomo|Manjo|Maria havas|havis|havos|havu|havus hundon|hundojn|virhundon|virhundojn|hundinon|hundinojn
I think you can search for more than 200 letters.