Menu
Hi,
I just searched "chiamassero" for example sentences, and all the sentences showing the verb "chiamare" (vocabulary form) showed up.
In my personal language learning I need a thing like this: you give it a specific form like "chiamassero" or "chiameremmo", and that thing spits out "chiamare".
How is it done within tatoeba? For which languages do you have this features?
Thanks :)
For many languages, including Italian, our search engine (Manticore) has a feature called stemming. That means that it strips common endings from both the search words and the sentences against which it is looking for a match (unless you request an exact match by putting an equals sign before a search word). That's the behavior you saw when you searched for "chiamassero" and found sentences with "chiamare" (among other words, like "chiamarono"). The list of languages for which our search engine provides stemming can be found (see item 2) on this wiki page:
https://en.wiki.tatoeba.org/art...w/text-search#
You can get that link by clicking the "Help" link above the search bar.
Tatoeba doesn't have a facility for returning the dictionary form of an inflected word, but Wiktionary does. Simply type "chiamassero" into the search field, and you'll get this page:
https://en.wiktionary.org/wiki/chiamassero#Italian
with the text "third-person plural imperfect subjunctive of chiamare".
I've searched in tatoeba's source code for stemming, and I've found it uses snowball if I read the source right. My understanding is that "chiamassero" and "chiamare" and "chiameremmo" are all stemmed to "chiam". However, what I'd really want is for them to be stemmed to "chiamare", so my platform could look it up in a dictionary directly.
An example of what really does exactly what I want is mecab. I give it a sentence, and it gives me back a list of words I can directly look up in a dictionary in their dictionary form.
Can I just look at the closest match in a dictionary for the stemmed form, for most languages, and get the correct word?
It is my understanding that wiktionary's feature is based on manual contributions and is not algorithmic. Do you happen to know how extensive it is (for en.wiktionary)?
> I've searched in tatoeba's source code for stemming, and I've found it uses snowball if I read the source right.
Well, Tatoeba uses Manticore, and Manticore uses Snowball, so basically yes.
> My understanding is that "chiamassero" and "chiamare" and "chiameremmo" are all stemmed to "chiam". However, what I'd really want is for them to be stemmed to "chiamare", so my platform could look it up in a dictionary directly.
That's not stemming, but I can see how that would be a useful transformation.
> Can I just look at the closest match in a dictionary for the stemmed form, for most languages, and get the correct word?
You would need to write an algorithm to define "closest match" to match your needs. For instance, you could define things so that you always added "are" to the stemmed form, then looked for a match, and if that didn't work, you could add "ere", and so on. I don't imagine this being implemented within Tatoeba, but you could do it in software on your side.
> It is my understanding that wiktionary's feature is based on manual contributions and is not algorithmic.
Yes, although I'm sure that people use algorithms to help them produce the manual contributions.
> Do you happen to know how extensive it is (for en.wiktionary)?
Well, for Russian, based on my experience, I estimate that for about 95% of relatively common inflected forms, someone has added a link to the dictionary form. Where that is not the case, it is often possible to find a link by looking at pages with similar words.
> That's not stemming, but I can see how that would be a useful transformation.
As it's probably abundantly clear by now, I have no background in linguistics :P Sorry for the terminology barrier.
> You would need to write an algorithm to define "closest match" to match your needs. For instance, you could define things so that you always added "are" to the stemmed form, then looked for a match, and if that didn't work, you could add "ere", and so on. I don't imagine this being implemented within Tatoeba, but you could do it in software on your side.
Do you happen to know where I should be looking, or what the right term would be, to find a library or an API to do this in my place? I'm currently learning Japanese by myself because I've been blessed by mecab, and that's what does all the heavy lifting in my platform :) but when I'm done with Japanese, I'd like to learn another language, possibly german, swedish or icelandic, not sure yet. I don't know what to search for - I just know it's something similar to what mecab does for japanese.
Lastly, an unrelated question. I see you're an admin so this goes right to the best person to answer it :)
I've tried translating a bunch of sentences to Italian; however, after asking a Italian friend whether they sounded OK or not - since I'm a bit of a shut in and use almost only English online - and some sounded a bit unnatural.
That said, is it OK if I contribute anyway - to the sentences I feel more confident I know a good translation - or is it counterproductive for the project to have translation which are bad?
> However, what I'd really want is for them to be stemmed to "chiamare", so
> my platform could look it up in a dictionary directly.
It seems what you want is lemmatization.
> Do you happen to know where I should be looking, or what the right term
> would be, to find a library or an API to do this in my place?
I would google things like "lemmatizer" or "NLP libraries" (NLP meaning "natural language processing").
> That said, is it OK if I contribute anyway - to the sentences I feel more
> confident I know a good translation - or is it counterproductive for the
> project to have translation which are bad?
Everyone will inevitably create bad translations at some point. It becomes counterproductive when the proportion of bad translations that you contribute becomes too much. It is difficult to really quantify it because it's not a fixed threshold.
My advice is that you contribute as much as you want but always try your best for every translation you submit. If you have doubts about the quality and accuracy of your translation, then don't add it just yet. Take a bit more time to figure out what could be wrong, what could be improved.
If you are confident of your translation but it turns out to be a bad translation, then it's okay. We are not asking people to be perfect. You did your best, you are human, you can make mistakes. You "just" have to learn from your mistakes (and of course fix them).
> My advice is that you contribute as much as you want but always try your best for every translation you submit. If you have doubts about the quality and accuracy of your translation, then don't add it just yet. Take a bit more time to figure out what could be wrong, what could be improved.
That's definitely the base case. I wonder if I could ask for help from some Italian on tatoeba to see if my translations are good or not? If 19 out of 20 sound like what a native Italian would write or say, I can be confident in translating a few hundred more. :)
Please take a look at this page:
https://en.wiki.tatoeba.org/art...ow/non-native#
Yes, if you find an Italian speaker who is willing to check your sentences, that's great. Just make sure they can handle the volume of sentences that you write without getting overwhelmed.
Yep, that makes sense to me. But I am native. I've lived all my life in Italy, I'm still living in Italy, and I only plan on translating from English to Italian.
I still would like to validate that I'm writing natural sounding sentences, because you could make the argument that since most of the media I consume is not in Italian that makes me "rusty" in a way.
I see. That sounds good to me.
> I wonder if I could ask for help from some Italian on tatoeba to see if my
> translations are good or not?
Our main Italian contributor is Guybush88 and I'm pretty sure he will be proofreading your translations as you add more of them. You can contact him via private message otherwise, if you want to make sure he notices your contributions :)
https://tatoeba.org/eng/user/profile/Guybrush88
Thanks a lot! I was trying to think of a way to ask who to ask without being impolite, but I'll surely send him a message then :)
I'm not getting a reply - can I assume he's going to take a look at them anyway in his regular work on the website and tell me if the quality's not great?
Sorry for my late reply, I've been busy elsewhere these days, I took a look at some of your sentences and they seem perfect to me. When I'll have a bit more time, I'll reply in a better way to your pm
Oh, no worries! No hurry at all - I didn't want to bother you; take your time with the reply :)