menu
Tatoeba
language
Inscriber te Aperir session
language Interlingua
menu
Tatoeba

chevron_right Inscriber te

chevron_right Aperir session

Percurrer

chevron_right Monstrar phrase aleatori

chevron_right Percurrer per lingua

chevron_right Percurrer per lista

chevron_right Percurrer per etiquetta

chevron_right Percurrer audio

Communitate

chevron_right Muro

chevron_right Lista de tote le membros

chevron_right Linguas del membros

chevron_right Parlantes native

search
clear
swap_horiz
search
Thanuir Thanuir 3 de april 2020 3 de april 2020 a 07:00:52 UTC link Permaligamine

I would like to have some example sentences using the words 'good-man' and 'good-wife'. Based on context the meaning might differ from that of 'a good man/wife'.

1. I added them to my vocabulary. However, the search engine seems to consider 'good man' and 'good-man' as identical expressions. Is there a way to add 'good-man' to my vocabulary?

Remark: In Finnish, when writing a compound word, if the first word ends and the last word begins with the same vowel, we write it thusly: 'ala-aste', 'linja-auto'. Writing 'ala aste' would mean two separate words (and typically be a mistake), though there might be cases where it would be correct language and would have a different meaning. As such, it would be nice if the search engine did not confuse these kinds of expressions, which mean different things.

2. I would appreciate examples of 'good-wife' and 'good-man' in the corpus. Examples illustrating whether these are the same as 'a good wife' (or man) would be particularly appreciated.

{{vm.hiddenReplies[34707] ? 'expand_more' : 'expand_less'}} celar responsas monstrar responsas
gillux gillux 3 de april 2020 3 de april 2020 a 07:20:13 UTC link Permaligamine

As you discovered, the search engine currently doesn’t make any difference between 'good-man' and 'good man'. If the hyphen were to be treated like a normal character, it means sentences with 'good-man' wouldn’t show up when searching for 'good,' 'man' or any word sharing the same stem.

That said, our search engine (Manticore) has a feature I believe is exactly what we need: blended characters [1]. This feature would allow a sentence containing 'good-man' to be found by searching for 'good-man' as well as 'good' or 'man'.

Now my question is: should we treat the hyphen character as a blended character in all languages by default, or only in Finnish? I feel like other languages such as French or English could benefit from it, but I wonder if it could cause any harm in other languages I don’t know.

By the way, in Finnish, isn’t the colon character also used in a similar way, when you want to decline an abbreviation or something? I’m talking about this: https://en.wikipedia.org/wiki/C...ffix_separator

[1] https://docs.manticoresearch.co...ml#blend-chars

{{vm.hiddenReplies[34708] ? 'expand_more' : 'expand_less'}} celar responsas monstrar responsas
Thanuir Thanuir 3 de april 2020 3 de april 2020 a 11:04:28 UTC link Permaligamine

The Wikipedia article seems correct. For the definitive source, see https://www.kielikello.fi/-/kaksoispiste- . Note that colon is not used to combine separate words. (Colon also has other uses such as with quotes.)

Example of genitiv:
metri – metrin
m – m:n

Norwegian (bokmål) uses some kind of dash to the same effect, so that the definite singular of tv is tv-en.
The same kind of dash is used to bind together words in some compound words, it looks like: https://no.wikipedia.org/wiki/Bindestrek

I think that blended characters would not help me here, but it might otherwise be a fine idea. I am not sufficiently knowledgeable about Manticore to have a strong opinion here.

Thanuir Thanuir 3 de april 2020, modificate le 3 de april 2020 3 de april 2020 a 11:08:26 UTC, modificate le 3 de april 2020 a 11:08:46 UTC link Permaligamine

On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address trang.dictionary.project@gmail.com .

fra: noreply <trang.dictionary.project@gmail.com>
til: *@gmail.com
dato: 3. apr. 2020, 09:20
emne: Tatoeba - gillux has replied to you on the Wall
sendt av: gmail.com
signert av: gmail.com
sikkerhet: Standardkryptering (TLS) Finn ut mer

{{vm.hiddenReplies[34713] ? 'expand_more' : 'expand_less'}} celar responsas monstrar responsas
Ricardo14 Ricardo14 4 de april 2020 4 de april 2020 a 01:04:21 UTC link Permaligamine

>On an unrelated note: Gmail indicated the email notification of your response as suspect. It was sent from the address trang.dictionary.project@gmail.com .

That happened with me too.

gillux gillux 5 de april 2020 5 de april 2020 a 17:55:37 UTC link Permaligamine

I recorded the issue: https://github.com/Tatoeba/tatoeba2/issues/2255

{{vm.hiddenReplies[34732] ? 'expand_more' : 'expand_less'}} celar responsas monstrar responsas
Thanuir Thanuir 5 de april 2020 5 de april 2020 a 18:07:18 UTC link Permaligamine

Merci beaucoup.