clear
{{language.name}} Neniu lingvo trovita.
swap_horiz
{{language.name}} Neniu lingvo trovita.
search

Muro (5459 fadenoj)

cojiluc
antaŭ 9 tagoj
The below search produces error. I tried several times.

https://tatoeba.org/eng/sentenc...sort=relevance
Seael
antaŭ 10 tagoj
Is there a way to delete wrong vocab items such as the full sentences included in
https://tatoeba.org/eng/vocabul...ces/spa?page=7 ,
https://tatoeba.org/eng/vocabul...ces/spa?page=8 ,
https://tatoeba.org/eng/vocabul...ces/spa?page=9 ,
https://tatoeba.org/eng/vocabul...es/spa?page=10 and
https://tatoeba.org/eng/vocabul...es/spa?page=11 ?

The few sentences I checked there were all right and could be created but they are not vocabulary items.
kaŝi la respondojn
Thanuir
antaŭ 10 tagoj
Not that I know of, unless whoever placed the sentence into their vocabulary removes it.

But if the sentences are fine, why not add them to Tatoeba while you are at it?

(If the sentences were short, then one could add them as parts of dialogues or such, but those did not seem to be short.)
kaŝi la respondojn
Seael
antaŭ 10 tagoj - antaŭ 10 tagoj
Yes, I'll do that later. It nags me, though, those items will never disappear as they are not likely to be removed by the ones who added them.

The fact they can't be removed also makes the feature susceptible to be clogged with such kind of items in the long term.
kaŝi la respondojn
Thanuir
antaŭ 10 tagoj - antaŭ 10 tagoj
Yes, it is annoying. The problem will become severe if some language is overwhelmed by such bad terms. With español this does not seem to be the case quite yet, as the desired vocabulary looks extensive.
Ricardo14
antaŭ 10 tagoj
Happy birthday, Lisa!
Feliz aniversário, Lisa!
Joyeux anniversaire, Lisa !
¡Feliz cumpleaños, Lisa!

https://tatoeba.org/eng/user/pr...rsichbaeumchen
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 10 tagoj
Thank you so much, Ricardo! Vielen Dank! 😊
kaŝi la respondojn
Hybrid
antaŭ 10 tagoj
Happy birthday!
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 10 tagoj
Thank you! 😊
marafon
antaŭ 10 tagoj
С днём рождения, Лиза!
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 10 tagoj
Спасибо, Марина! Очень рада. 😊
kaŝi la respondojn
mraz
antaŭ 10 tagoj
Liebe Pfirsichbaeumchen, liebe Lisa!

Boldog születésnapot kívánok!

Kívánok jó egészséget, sok szerencsét, boldogságot és
minden jót!

Herzlichen Glückwunsch zum Geburtstag!

mraz
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 10 tagoj
Vielen Dank für den ungarischen Geburtstagsgruß, lieber Ernő! 🙂
soweli_Elepanto
antaŭ 10 tagoj
Feliĉan novan vian jaron, Lisa!
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 10 tagoj
Multan dankon, kara Eŭgeno! 😊
sharptoothed
antaŭ 11 tagoj
* Tatoeba Top 30 Languages Interactive Graphs*

Tatoeba Top 30 Languages Interactive Graphs have been updated:
https://tatoeba.j-langtools.com/igraph/
https://tatoeba.j-langtools.com/igraph/share.html
JulieCasino
antaŭ 11 tagoj
Thank you for this information!
Thanuir
antaŭ 14 tagoj
*Why and how to add sentences with words others want to see*

This is a collection of my thoughts about this subject; maybe someone else is also interested.


1. Go to https://tatoeba.org/swe/Vocabulary/add_sentences/ and choose your native (or very strong) language.

2. You have basically two options: Start from words that are part of no or few sentences (the beginning of the list) or from those that are part of a bit less than ten sentences (the end of the list).

3. Write a sentence or several sentences that use the words.


Motivation:

a. You are adding content that someone wants to see, or at least wanted to see.

b. You are adding to the diversity of the corpus. Since these words are not present (in great quantities), adding more sentences that use them makes the corpus more diverse, by one reasonable measure of diversity.
Furthermore, you are probably adding sentences that you would not otherwise have added; this is a collaboration between two people, one requesting a word and another adding a sentence. This creates more diverse sentences.
If several people add sentences that use a particular word, they are likely to add different types of sentences, hence increasing the diversity even more.

c. You are adding original sentences in the language. In some smaller languages, many sentences are translations. Translated sentences typically have simpler or at least restricted grammar and other distinguishing features. (When searching for more on this, "translationese" is a good keyword to begin with.) Having original sentences in all languages is valuable for the corpus and contributes to the diversity and the quality of the corpus. Of course, having translations is valuable, too, for obvious reasons.

-----

A. Starting from the beginning of the list

Pros: You will be adding sentences that use words which are almost non-existent in the corpus.

Cons: You will meet the same word again and again. If the word appears in no sentences and you add one, then that word now has one sentence and you will face it again when contributing sentences to words with a single sentence only. And again when adding sentences to words with two sentences, etc.
(The page is not dynamic, but if you refresh it or open the next page, the words might have changed positions. Unless you open all the pages at once, this is likely to happen.)
Some languages have poor words - even phrases in other languages - at the beginning of the list. One has to scroll past them.


B: Starting from the end of the list, from words that appear in nine, eight, seven, etc. sentences

Pros: When you add a sentence that uses a word used nine times, it will (later and after a refresh) vanish from the page, since it is used in ten or more sentences. This makes the list shorter, which feels nice. Measurable progress.
The words with some sentences are less likely to be in the wrong language, misspelled, very obscure, etc., so it is easier to contribute sentences that use them.
You will not be meeting the same words again and again when going through the list, since you will increase the number of sentences that use the word, but will move towards translating words that are less and less used.

Cons: You are not adding as much to the diversity as you could be, at first.
---

Notes and tips:

All of the following are considered as different "words" for this feature: "horse", "horses", "can lead a horse to". I doubt capital letters matter, but I have not checked this with any rigour. Adding sentences with declined words will still contribute to the corpus and have all the other benefits explained above, but will not be counted by this feature of Tatoeba's user interface.

Simply ignore words that you do not want to add sentences to, for whatever reason. Sometimes I put in a bit of effort to figure out unknown words in my native language, sometimes not.

It is not terribly useful to add phrases that do not give any indication about the meaning of the word. Thus, adding a sentence like "What is teratology?" is not very helpful, as it only suggests that "teratology" is a noun. Likewise, "Jacob likes teratology." is not terribly helpful in this context. "Jacob likes teratology and other oddities." is better.

Similarly, adding variations of a sentence where the word in question has the same role is not very useful in this context. E.g. "Teratology is related to deformities." and "Teratology has to do with deformities." would both be reasonable sentences, but adding both of them as sentences that use the word "teratology" does not reveal more about that word. Thus, it might not be the ideal thing to do. (Adding both of them as translations of a suitable sentence would, on the other hand, be completely fine and a useful thing to do.)

I think it is healthy for a single member to add a sentence or few to any given word, but trying to clear the entire list by oneself might not be helpful. One is likely to start contributing similar sentences to any given word after the initial inspiration is exhausted. Having several members contribute a sentence or two each would be better for diversity. But if there is only a single member who adds sentences in this way for a given language, then so be it.

If it is difficult to come up with a sentence, I often take a look at a small group of words in the list and try to create a sentence that uses two or even three of them. Constraints breed creativity.

If there is a foreign word on the list, one can (but need not) add a sentence about its pronunciation, etymology, register, social position, meaning, language, script, etc. A single sentence like "Lentokonemekaanikko is a Finnish word." might be okay, but it is better to add more interesting sentences, such as "Lentokonemekaanikko is a Finnish word that combines three different words: "lento" means flight, "kone" machine and "mekaanikko" mechanic (as in a profession or person). The word means someone who repairs, maintains or builds aeroplanes."
Please excuse my English, but I hope the idea is clear: If there is a misspelled word, a foreign word, etc., it might be possible to contribute a meaningful sentence that uses it.
Another example: ""Thier" is a typical misspelling of "their"." would be a reasonable sentence, in my opinion. A sentence about why the misspelling is ubiquitous would make an even more interesting sentence, again in my opinion.
kaŝi la respondojn
TRANG
antaŭ 14 tagoj
Thanks a lot for this.

Whether it was intentional or not, I think you've just made a very good case on why we should put more effort on the vocabulary feature :)

In not a single sentence of your post you have suggested improvement of the vocabulary feature, you have only described how to use it. And yet this post actually makes me really want to improve the feature. It convinces me a hundred times more than regular feature requests that will usually start with "It would be nice if..." or "I think it would be useful if...".

This is a beautiful post and I hope to see more of these.
kaŝi la respondojn
Thanuir
antaŭ 14 tagoj
You are welcome.

My main feature request is for someone to add sentences to the words I have added, but I know it is challenging to implement.
Shishir
antaŭ 14 tagoj
is there a way to delete some "words" from the vocabulary list? Someone added by mistake a couple of sentences...
kaŝi la respondojn
Pfirsichbaeumchen
antaŭ 14 tagoj
There is no way to do that right now. :-(
kaŝi la respondojn
Shishir
antaŭ 13 tagoj
hm pity...
Thanks for the info though :)
Thanuir
antaŭ 13 tagoj
A person can delete the words they have added via https://tatoeba.org/swe/vocabulary/of/user (replacing 'user' with their own username). I do not know if it possible to see who has added the sentences there.
soliloquist
antaŭ 13 tagoj
The stemming function of the vocabulary feature needs to be improved, I believe.

https://tatoeba.org/eng/vocabulary/of/soliloquist

This is especially important for agglutinative languages like Turkish. I noticed it when I checked Ivanovb's vocabulary list (https://tatoeba.org/eng/vocabulary/of/Ivanovb ). Some of the words he listed have actually more examples in the corpus than their counts on the vocabulary page, but the lack of stemming causes inconsistency.

Search links on the vocabulary page have a preceding equal sign before words, so if it is a verb, sentences containing its other conjugations, and if it is a noun, sentences containing its other forms (singular or plural) won't show up. This precision often doesn't work well and creates inefficiency for learning new vocabulary (at least for Turkish).
kaŝi la respondojn
Shishir
antaŭ 13 tagoj
+1

same goes for Spanish.
kaŝi la respondojn
soliloquist
antaŭ 11 tagoj
What is your opinion on using the asterisk on Spanish vocabulary items? Would it be useful, or more likely cause confusion?

https://tatoeba.org/eng/wall/sh...#message_32418
kaŝi la respondojn
Shishir
antaŭ 11 tagoj
That would be helpful for nouns and adjectives (we have no cases or declensions in Spanish, only masculine, feminine, singular and plural form), but not always for verbs because we have many irregular verbs for which even the vowels in the stem might change like in hacer (to do) - hago (I do) - hizo (he did).

kaŝi la respondojn
soliloquist
antaŭ 11 tagoj
Thanks for your explanation.
Thanuir
antaŭ 13 tagoj
As far as I know, this feature does not currently use stemming at all.

It is not clear if it should. On the other hand, that would prevent the same words appearing multiple times in the wanted sentences -vocabulary, which might be nice, and it will allow for much more diverse sentences even when everyone adds words in infinitive or other fixed form.
On the other hand, maybe someone wants examples of a particular form of a word. At least in Finnish, some words have taken a life of their own in phrases or otherwise and no longer have the same meaning as one would expect, and sometimes there are quite clear distinctions that stemming removes.
TRANG
antaŭ 12 tagoj
> Some of the words he listed have actually more examples in the corpus than their
> counts on the vocabulary page, but the lack of stemming causes inconsistency.

As Thanuir explained above, stemming can cause the reverse effect, that is in some cases, your vocabulary item would not show up in the "sentences wanted" because it would already lots of sentences sentences with other forms (but not the form you specifically requested for).

We have not designed the vocabulary for all possible use cases yet. The only use case we designed for so far is that the user wants sentences that contains exactly what they have added as vocabulary and that having more than 10 sentences with exact match will be satisfying.

That is of course not the reality, but that is where we're starting from.

For stemming I don't fully picture the use case behind it. When you add a verb at the infinitive form for instance, do you really want sentences with any possible conjugations of the verb? And even then, do you really want all sentences of Tatoeba that contains all the possible conjugations of the verb? You would probably just want a small set, a custom list with one or two examples for each form.

My guess is that your suggestion regarding stemming is not just about stemming in itself and simply enabling stemming would not be enough. Perhaps the combination of these issues would actually fulfill the needs behind your suggestion to improve stemming:

https://github.com/Tatoeba/tatoeba2/issues/1281
https://github.com/Tatoeba/tatoeba2/issues/1690
https://github.com/Tatoeba/tatoeba2/issues/1715
kaŝi la respondojn
soliloquist
antaŭ 11 tagoj - antaŭ 11 tagoj
In Turkish, words usually end with suffixes and there are dozens of them.

https://en.wiktionary.org/wiki/...rkish_suffixes

Adding new vocabulary items in nominative or infinitive forms (as in dictionaries) would cause most examples to not appear with the default behavior. For example, if you added the word 'school' to your vocabulary items, you would see examples of 'my school', 'your school', 'to school', 'from school', 'at school' etc. but in Turkish, these are all shown with suffixes, so you would need to add a lot of different forms to see them. If your purpose was to learn new vocabulary rather than studying suffixes, it would create difficulty.

I'm quoting from @Thanuir's reply:

> On the other hand, maybe someone wants examples of a particular form of a word

That's right, but at least there could be some tips on the vocabulary page (similar to ones on the advanced search page), informing users about the precision (and hence limitation) of current design, and possible advantages of using an asterisk if they are not looking for only a particular form of a word. It could be used as a stemmer on many occasions. I had a look at the vocabulary items others added, but never saw one with an asterisk. Most users may not even be aware of it. I don't think they were all interested in only a particular form. It might be the case sometimes, but the other way around is more likely.

I encourage anyone who's interested in adding new vocabulary items with 4 or more letters in Turkish to use an asterisk: 'bare infinitive + *' for verbs, and 'nominative + *' for nouns and possibly for others. If it's a relatively long word ending with the letters p,ç,t, or k, even that last letter before the asterisk can be dropped to get more examples affected by consonant alternation, which is a common phenomenon in Turkish.

To use the vocabulary feature more efficiently, other users can share similar tips about their languages on the Wall, too. What works for one language may not work with another.
kaŝi la respondojn
Thanuir
antaŭ 11 tagoj
I did not know that asterisk could be used. Thanks. Similar tips seem to apply to Finnish as to Turkish.
jegaevi
antaŭ 12 tagoj
When I go to 'sentences wanted' and choose Hungarian only 2 words come up. job and üdvözlet. Why is that? I remember that there were several words before. I had the same issue a few months ago but I thought that there wasn't any words added yet and that's why only these to words were displayed. I don't think that someone wrote 10 sentences for all the words except these to, so either I'm doing something wrong or it's a bug.
kaŝi la respondojn
Thanuir
antaŭ 12 tagoj - antaŭ 12 tagoj
Lisäsin juuri sanan "differenciálegyenletek" sanastooni ja se näkyy myös toivotuissa sanoissa. Ehkäpä joku kirjoitti lauseita tai tyhjensi sanastoaan. En tiedä miten käyttäjien jäädyttäminen vaikuttaa sanastoon, jos mitenkään.

I just added "differenciálegyenletek" to my vocabulary and I see it as something that needs sentences. Maybe someone wrote sentences or cleaned up their vocabulary. I do not know how freezing users affects the vocabulary, if it does.
kaŝi la respondojn
jegaevi
antaŭ 12 tagoj
I see the word that you wrote. I still think it's weird that only these two words remained. Maybe someone did write sentences for all the vocab items.
Thank you for checking and replying. :)
maaster
antaŭ 12 tagoj
Under this sentence #901302 there are the same French sentences (with "vous") with and without audio. (The sentences with audo are the older ones.)
kaŝi la respondojn
AlanF_US
antaŭ 12 tagoj
The ones with audio have a nonbreaking space before the question mark. The ones without audio have a regular space. The fact that we have duplicates of this kind is a known issue.
TRANG
antaŭ 13 tagoj
**Final decision on Tom and Mary**

I came to a conclusion for the Tom and Mary issue, the official decision is posted on the blog.

https://blog.tatoeba.org/2019/0...h-tom-and.html

The Tom and Mary discussion helped define a clearer process for making decisions in Tatoeba. A process that anyone can initiate. I dedicated another blog post about it. If you care about the future of Tatoeba, please read it.

https://blog.tatoeba.org/2019/0...rnance-in.html

In any case, thank you to everyone who participated to the conversation :)