clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

soliloquist's messages on the Wall (total 90)

soliloquist
11 days ago
Thanks for your explanation.
soliloquist
12 days ago
What is your opinion on using the asterisk on Spanish vocabulary items? Would it be useful, or more likely cause confusion?

https://tatoeba.org/eng/wall/sh...#message_32418
soliloquist
12 days ago - 12 days ago
In Turkish, words usually end with suffixes and there are dozens of them.

https://en.wiktionary.org/wiki/...rkish_suffixes

Adding new vocabulary items in nominative or infinitive forms (as in dictionaries) would cause most examples to not appear with the default behavior. For example, if you added the word 'school' to your vocabulary items, you would see examples of 'my school', 'your school', 'to school', 'from school', 'at school' etc. but in Turkish, these are all shown with suffixes, so you would need to add a lot of different forms to see them. If your purpose was to learn new vocabulary rather than studying suffixes, it would create difficulty.

I'm quoting from @Thanuir's reply:

> On the other hand, maybe someone wants examples of a particular form of a word

That's right, but at least there could be some tips on the vocabulary page (similar to ones on the advanced search page), informing users about the precision (and hence limitation) of current design, and possible advantages of using an asterisk if they are not looking for only a particular form of a word. It could be used as a stemmer on many occasions. I had a look at the vocabulary items others added, but never saw one with an asterisk. Most users may not even be aware of it. I don't think they were all interested in only a particular form. It might be the case sometimes, but the other way around is more likely.

I encourage anyone who's interested in adding new vocabulary items with 4 or more letters in Turkish to use an asterisk: 'bare infinitive + *' for verbs, and 'nominative + *' for nouns and possibly for others. If it's a relatively long word ending with the letters p,ç,t, or k, even that last letter before the asterisk can be dropped to get more examples affected by consonant alternation, which is a common phenomenon in Turkish.

To use the vocabulary feature more efficiently, other users can share similar tips about their languages on the Wall, too. What works for one language may not work with another.
soliloquist
14 days ago
The stemming function of the vocabulary feature needs to be improved, I believe.

https://tatoeba.org/eng/vocabulary/of/soliloquist

This is especially important for agglutinative languages like Turkish. I noticed it when I checked Ivanovb's vocabulary list (https://tatoeba.org/eng/vocabulary/of/Ivanovb ). Some of the words he listed have actually more examples in the corpus than their counts on the vocabulary page, but the lack of stemming causes inconsistency.

Search links on the vocabulary page have a preceding equal sign before words, so if it is a verb, sentences containing its other conjugations, and if it is a noun, sentences containing its other forms (singular or plural) won't show up. This precision often doesn't work well and creates inefficiency for learning new vocabulary (at least for Turkish).
soliloquist
18 days ago - 18 days ago
> For instance, if I was the most active contributor of English audio and one day I decide I want to record only sentences containing "lol", do you then start adding sentences with "lol"?

I don't think 'Tom' and 'lol' are comparable, but if you recorded audio for sentences with some other short and easy-to-pronounce name instead of Tom, I would appreciate your efforts and might try to contribute sentences with that name. Why not? Having your sentences recorded by a native speaker is a good thing.


> I understand that there is the desire for more content. But please think about it this way: you are basically saying that whoever is the most active contributor gets to decide how the corpus looks like. It shouldn't be that way.

On the contrary, I'm not comfortable with it because thousands of ill-constructed, strange machine translations in Turkish have been added here for years arising from the desire for more content, and that problem is worse than what we're discussing here. The Tom issue looks like a first world problem compared to that.

But after all, that's the nature of it unless a ban or quota is imposed.


> I don't blame you at all for trying to increase your chances at having your sentences recorded. But this is your personal choice there. In this particular case, you see no harm with Tom sentences and you're fine creating such sentences. But other people do not feel that way, they are not okay to make the trade-off you are making.

You'll need to convince CK. He's the locomotive of this tradition. Other members using Tom in their sentences are rather like cars attached to the locomotive. Even if you convinced them, the locomotive could go at top speed, but without the locomotive, the trend wouldn't last long.


> In general, we should not have to care if our decisions displease those who have more influence, more power, more authority or more money.

No objections.
soliloquist
20 days ago
> It probably would solve the issue. But this is not an issue that should be handled only >with CK. Because other people could keep propagating Tom and Mary. And other >people could just create the same problem but with other names (although at this >point, it's a bit difficult to beat Tom and Mary). The real point is to make people aware >of how the choice of names impacts this project.

> But would you be bothered if we reversed the trend? If yes, what would bother you? >Why would you be unhappy not seeing an abundance of Tom sentences anymore?

Although I'm not a native speaker, sometimes I create English sentences and I try to use the name Tom because CK frequently records audio for Tom sentences. I think we should take this factor into account. There are not many people contributing audio. That's my point.
soliloquist
20 days ago
>In fact, many English-speaking countries have so many inhabitants and/or so much >influence from elsewhere that a variety of names better reflects the national culture.

You're right. I'm not against that. It's something you native speakers need to discuss and settle.
soliloquist
20 days ago
If the concern is about multiculturalism, we should encourage users to contribute original sentences about their culture using local names. Since many of the sentences here are translated from English (either as first-hand translations or translations of translations) and there is a contribute-in-your-native-language policy, English (American) names will likely continue to dominate the corpus in one way or another. The real point is that CK is the most active user here. If you convinced him to use different names, so names like Sam, Joe or Jane began to spread instead of Tom and Mary, would it solve the issue?

Another approach might be to encourage using personal pronouns rather than proper names, so sentences would be more culture-neutral.

I'm not bothered by the abundance of Tom sentences here. Tom is a short, well-known (thanks to Tom & Jerry), easy-to-write and easy-to-pronounce name. Besides, there are other Web sites using corpora from different sources including Tatoeba. And not all of them properly give credit to the original sources. Tom sentences indirectly serve as a trademark on such places.
soliloquist
2019-07-18 20:50 - 2019-07-18 21:42
Thanks!

Edit: I made a suggestion.

https://github.com/Tatoeba/tato...ment-512998686
soliloquist
2019-07-18 18:29
Yes, that would be a great time-saver for you, too. You're the second most active user after CK in terms of tagging. (+182,756)

https://tatoeba.j-langtools.com...art/chart2.php
soliloquist
2019-07-18 14:02
I thought you were using some bot/script for the 'List 907' tag. It requires a lot of effort to tag that many sentences one by one. I, too, have thousands of sentences that need to be tagged, but it's discouraging having to visit each sentence's page.

Let's hope the mass-tagging feature will be implemented in the future.

https://github.com/Tatoeba/tatoeba2/issues/785
soliloquist
2019-06-24 14:04
> I think it would be terribly awesome if we had separate spaces for each language. Members having interest in that language, be it natives or learners, could reach out to one another, follow what’s going on, exchange messages, actively *use* that language etc.

How about a forum like this one?

https://forum.wordreference.com

https://tatoeba.org/eng/wall/sh...#message_31426
soliloquist
2019-06-05 19:58
> Maybe we should acknowledge that different translation types exist, and find some way to mark them in Tatoeba? Maybe we need different link types

I agree with this. Some users are in favor of using links for translations only, but in a linguistic sense, the definition of link is not restricted to translations. Synonymous sentences are semantically linked, too. Sentences with similar patterns are also logically linked. What could be linked and what couldn't is a bit relative and vague.

I think it would be more comfortable having different types of links: one for translations, one for synonymous or closely-related sentences in the same language, and one for sentences with similar patterns (or localized sentences like #7947998 ) in the same language. I'm not sure how all these categories could be shown at the same time without creating a visual chaos though. Maybe some filtering/hiding options or a tree view with collapsing/expanding feature might be necessary.

Here is a rough image of the idea.

https://prnt.sc/ny3d99
soliloquist
2019-06-03 16:22
Thanks for that! :-)
soliloquist
2019-06-02 13:45
When you sent me the link of that page a few days ago, Turkmen was not included on the list. But now it is. You should exclude Turkmen. Those self-declared native Turkmen accounts are fake. There has probably never been a native Turkmen speaker on Tatoeba, unfortunately.
soliloquist
2019-05-30 20:02
I sent you a PM.
soliloquist
2019-05-30 12:27 - 2019-05-30 12:37
Thank you. I'm aware that native speaker counts might not be so reliable, but they would at least give a rough picture.

The last stats show that there's a sudden boost in Turkmen this month. Nearly 50 self-declared native Turkmen accounts have been created recently.

https://tatoeba.org/eng/users/for_language/tuk

Unfortunately, I suspect that most, if not all of those accounts might belong to the same person, possibly a non-native speaker. I explained my suspicion here: https://tatoeba.org/eng/sentenc...omment-1098662

The real problem here isn't using multiple accounts or contributing in non-native languages, but falsely stating oneself as a native speaker. Turkmen sentences here shouldn't be trusted as a reliable source until this issue is cleared up.

Having native speaker counts in the stats would help us noticing such unusual changes, too.
soliloquist
2019-05-29 20:00
Thanks for the data. Have you ever considered including numbers of native speakers for each language in the stats? Watching their development might be interesting. The file "user_languages.tar.bz2" on the downloads page has the necessary information for that, I believe.
soliloquist
2019-05-29 19:56
By the way, I don't think linking sentences in the same language is wrong as long as they convey the same idea. I find it rather useful for studying synonymous words and phrases (especially if they're not linked to a sentence in a different language). Most projects using Tatoeba's data focus on links between different languages, but Tatoeba could be used for monolingual studies, too. And current linking system is perhaps the most suitable way to do it unless there are two types of links (one for translations and one for synonyms). Some users leave comments to show synonymous/closely-related sentences, but I don't find it very useful for searching and studying.

Finding such sentences could also be possible by linking them separately to the same sentence in a different language and searching for indirectly-linked results, but this is still not a translation-independent approach. One shouldn't have to find a matching sentence to be linked to in a foreign language before creating synonymous groups in his native language. This would likely end up treating languages like English or French as superior to other languages.
soliloquist
2019-05-28 14:10
Just copy and paste the sentence you want link as a new translation (pay attention to flags). The duplicate-merging script will link them together.

This method is an alternative, indirect way of linking for normal contributors. It's a lot easier for advanced contributors. They can link sentences with a single click.