I would like to have some example sentences using the words 'good-man' and 'good-wife'. Based on context the meaning might differ from that of 'a good man/wife'.
1. I added them to my vocabulary. However, the search engine seems to consider 'good man' and 'good-man' as identical expressions. Is there a way to add 'good-man' to my vocabulary?
Remark: In Finnish, when writing a compound word, if the first word ends and the last word begins with the same vowel, we write it thusly: 'ala-aste', 'linja-auto'. Writing 'ala aste' would mean two separate words (and typically be a mistake), though there might be cases where it would be correct language and would have a different meaning. As such, it would be nice if the search engine did not confuse these kinds of expressions, which mean different things.
2. I would appreciate examples of 'good-wife' and 'good-man' in the corpus. Examples illustrating whether these are the same as 'a good wife' (or man) would be particularly appreciated.
As you discovered, the search engine currently doesn’t make any difference between 'good-man' and 'good man'. If the hyphen were to be treated like a normal character, it means sentences with 'good-man' wouldn’t show up when searching for 'good,' 'man' or any word sharing the same stem.
That said, our search engine (Manticore) has a feature I believe is exactly what we need: blended characters . This feature would allow a sentence containing 'good-man' to be found by searching for 'good-man' as well as 'good' or 'man'.
Now my question is: should we treat the hyphen character as a blended character in all languages by default, or only in Finnish? I feel like other languages such as French or English could benefit from it, but I wonder if it could cause any harm in other languages I don’t know.
By the way, in Finnish, isn’t the colon character also used in a similar way, when you want to decline an abbreviation or something? I’m talking about this: https://en.wikipedia.org/wiki/C...ffix_separator
The Wikipedia article seems correct. For the definitive source, see https://www.kielikello.fi/-/kaksoispiste- . Note that colon is not used to combine separate words. (Colon also has other uses such as with quotes.)
Example of genitiv:
metri – metrin
m – m:n
Norwegian (bokmål) uses some kind of dash to the same effect, so that the definite singular of tv is tv-en.
The same kind of dash is used to bind together words in some compound words, it looks like: https://no.wikipedia.org/wiki/Bindestrek
I think that blended characters would not help me here, but it might otherwise be a fine idea. I am not sufficiently knowledgeable about Manticore to have a strong opinion here.
On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address firstname.lastname@example.org .
fra: noreply <email@example.com>
dato: 3. apr. 2020, 09:20
emne: Tatoeba - gillux has replied to you on the Wall
sendt av: gmail.com
signert av: gmail.com
sikkerhet: Standardkryptering (TLS) Finn ut mer
>On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address firstname.lastname@example.org .
That happened with me too.
I recorded the issue: https://github.com/Tatoeba/tatoeba2/issues/2255