menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Thanuir Thanuir April 3, 2020 April 3, 2020 at 7:00:52 AM UTC link Permalink

I would like to have some example sentences using the words 'good-man' and 'good-wife'. Based on context the meaning might differ from that of 'a good man/wife'.

1. I added them to my vocabulary. However, the search engine seems to consider 'good man' and 'good-man' as identical expressions. Is there a way to add 'good-man' to my vocabulary?

Remark: In Finnish, when writing a compound word, if the first word ends and the last word begins with the same vowel, we write it thusly: 'ala-aste', 'linja-auto'. Writing 'ala aste' would mean two separate words (and typically be a mistake), though there might be cases where it would be correct language and would have a different meaning. As such, it would be nice if the search engine did not confuse these kinds of expressions, which mean different things.

2. I would appreciate examples of 'good-wife' and 'good-man' in the corpus. Examples illustrating whether these are the same as 'a good wife' (or man) would be particularly appreciated.

{{vm.hiddenReplies[34707] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux April 3, 2020 April 3, 2020 at 7:20:13 AM UTC link Permalink

As you discovered, the search engine currently doesn’t make any difference between 'good-man' and 'good man'. If the hyphen were to be treated like a normal character, it means sentences with 'good-man' wouldn’t show up when searching for 'good,' 'man' or any word sharing the same stem.

That said, our search engine (Manticore) has a feature I believe is exactly what we need: blended characters [1]. This feature would allow a sentence containing 'good-man' to be found by searching for 'good-man' as well as 'good' or 'man'.

Now my question is: should we treat the hyphen character as a blended character in all languages by default, or only in Finnish? I feel like other languages such as French or English could benefit from it, but I wonder if it could cause any harm in other languages I don’t know.

By the way, in Finnish, isn’t the colon character also used in a similar way, when you want to decline an abbreviation or something? I’m talking about this: https://en.wikipedia.org/wiki/C...ffix_separator

[1] https://docs.manticoresearch.co...ml#blend-chars

{{vm.hiddenReplies[34708] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir April 3, 2020 April 3, 2020 at 11:04:28 AM UTC link Permalink

The Wikipedia article seems correct. For the definitive source, see https://www.kielikello.fi/-/kaksoispiste- . Note that colon is not used to combine separate words. (Colon also has other uses such as with quotes.)

Example of genitiv:
metri – metrin
m – m:n

Norwegian (bokmål) uses some kind of dash to the same effect, so that the definite singular of tv is tv-en.
The same kind of dash is used to bind together words in some compound words, it looks like: https://no.wikipedia.org/wiki/Bindestrek

I think that blended characters would not help me here, but it might otherwise be a fine idea. I am not sufficiently knowledgeable about Manticore to have a strong opinion here.

Thanuir Thanuir April 3, 2020, edited April 3, 2020 April 3, 2020 at 11:08:26 AM UTC, edited April 3, 2020 at 11:08:46 AM UTC link Permalink

On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address trang.dictionary.project@gmail.com .

fra: noreply <trang.dictionary.project@gmail.com>
til: *@gmail.com
dato: 3. apr. 2020, 09:20
emne: Tatoeba - gillux has replied to you on the Wall
sendt av: gmail.com
signert av: gmail.com
sikkerhet: Standardkryptering (TLS) Finn ut mer

{{vm.hiddenReplies[34713] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 April 4, 2020 April 4, 2020 at 1:04:21 AM UTC link Permalink

>On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address trang.dictionary.project@gmail.com .

That happened with me too.

gillux gillux April 5, 2020 April 5, 2020 at 5:55:37 PM UTC link Permalink

I recorded the issue: https://github.com/Tatoeba/tatoeba2/issues/2255

{{vm.hiddenReplies[34732] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir April 5, 2020 April 5, 2020 at 6:07:18 PM UTC link Permalink

Merci beaucoup.