Tips

Here you can ask general questions like how to use Tatoeba, report bugs or strange behavior, or simply socialize with the rest of the community.

Before asking a question, make sure to read the FAQ.

Wall (968 threads)

<<< 1234567 >>
korobo4ka
14 minutes ago
Help?
To get all bul <> eng couples, I tried
http://tatoeba.org/eng/sentence...ne/indifferent
That returned all bul sentences, translated or not.
What should I change?

CK
CK
a minute ago
It's not really possible, but you can get a lot of them this way.

Search a lot of commonly-used words like this from English to Bulgarian.

(a|an|the|in|on|at|for|he|she|they|we|us|Tom|Mary)

http://tatoeba.org/eng/sentence...eng&to=bul

You could do the reverse, by searching commonly-used Bulgarian words in the same way.
TRANG
20 hours ago - edited 18 hours ago
** Tatoeba update (January 24th, 2015) **

http://blog.tatoeba.org/2015/01...24th-2015.html


# Sentences with audio are locked

From now on, sentences cannot be deleted or edited if they have audio. We consider that a sentence that has audio can safely be considered as correct and has no reason to be changed or deleted.

Admins can still remove the audio from the sentence, and therefore make it editable and deletable again, if it turns out that the sentence did have a mistake after all.


# UI fixes

* For advanced contributors: the link/unlink button has moved to be placed before the arrow. Users reported that having the link/unlink button next to the text could lead them more often to link or unlink by mistake than to have the button completely to the left.

* The sentence link (that is automatically generated when you type #123 for instance) now works also when you have punctuation before the #. For instance if your comment contains a parenthesis before the #, like (#123), it will now display the link properly.

* Linking sentences from the "Browse by language" when a language is selected in "Show translations in" will no more display all translations after linking, but only show translations in the language selected.


# New language

A language has been: Eastern Punjabi. We actually had the language "Punjabi" but it mistakenly contained sentences in Eastern Punjabi while it was meant to be Western Punjabi.
There fore "Punjabi" was renamed to "Punjabi (Western)" and "Punjabi (Eastern)" has been added.
odexed
18 hours ago - edited 18 hours ago
I personally have much more trouble with the link/unlink button before the arrow. Besides, now it's placed right beneath the green # so it doesn't seem very logic to me.
Also I think that people use link/unlink buttons more seldom than arrows so it would be more convenient if these arrows were on the edge of the page.
TRANG
18 hours ago
The change of the position was based on feedbacks from here:
http://tatoeba.org/eng/wall/sho...#message_21526

If more people can't get used to the new position of the link/unlink buttons, we can change it back.
korobo4ka
15 hours ago
Yes, this almost gave me a heart attack :D but that's probably just me..
Now to the point: is there a poll somewhere, can we vote on this change?
TRANG
14 hours ago
> is there a poll somewhere, can we vote on this change?

There's no poll but anyone who doesn't like the change can voice their opinion in this thread :)
CK
CK
12 hours ago - edited 12 hours ago
I've accidentally unlinked sentences both ways and have had to relink them, so I'm not sure one is better than the other.

Perhaps now that the sentence itself isn't clickable, the tools for linking/unlinking and the actual link to the sentences are too close to each other and such accidents are more likely to happen.

Have you thought about putting the tools for linking/unlinking on the other side of the sentence, just in front of the flag icon? This might help solve the problem.
sharptoothed
5 hours ago
+1
I'd prefer to have linking/unlinking button on the right of the sentence next to the flag icon.
korobo4ka
53 minutes ago
+1
sacredceltic
16 hours ago
>Also I think that people use link/unlink buttons more seldom than arrows so it would be more convenient if these arrows were on the edge of the page.

+1
Silja
2 hours ago
>Linking sentences from the "Browse by language" when a language is selected in "Show >translations in" will no more display all translations after linking, but only show translations >in the language selected.

This happens also when you link/unlink in a sentence list.
ricardo14
3 hours ago
Is that possible to display more than 10 sentences on a list of sentences? http://tatoeba.org/eng/sentences_lists/index
Silja
2 hours ago
I would be nice to see more than 10 sentences in every possible place where you can browse sentences. :) I guess it should no longer be a problem for the server load now that we have a new server.
ravas
11 hours ago
Is it possible to receive a notification when someone links a sentence to one of mine?
AlanF_US
14 hours ago
Another recent fix (by gillux):

Previously, on a "Browse by language" page, linking a sentence to a main sentence caused all translations of the main sentence to be displayed, even when "Show translations in" was set to "None" or a single language. Now the displayed translations are correctly filtered according to the value of "Show translations in".
Silja
15 hours ago
Lately the language detection has marked quite a few of my Finnish sentences as English in the first place. Some of those sentences are these:

Tom justiinsa teki niin. http://tatoeba.org/fin/sentences/show/3785040
Nouse ylös ja taistele. http://tatoeba.org/fin/sentences/show/3785043
Tom hikoili. http://tatoeba.org/fin/sentences/show/3785774
No onko edes siistiä? http://tatoeba.org/fin/sentences/show/3786584
Onko Tom vielä hereillä? http://tatoeba.org/fin/sentences/show/3789015
Onko Tom edelleen hereillä? http://tatoeba.org/fin/sentences/show/3789014
Puhuvatko he ranskaa? http://tatoeba.org/fin/sentences/show/3790314
Emme me puhu ranskaa. http://tatoeba.org/eng/sentences/show/3793029
Ole varovainen. Älä heitä pois noita papereita. http://tatoeba.org/eng/sentences/show/3794756

Yes, there are some words that could also be English in those sentences (no, me), but otherwise I really can't understand why these are detected as English. If I remember correctly, the language detector needs to be updated from to time, so that it "learns" better what kind of combination of letters should be detected as which language. Has this update been made recently?

I'm not complaning, because it's really something like 1 out of 100 sentences that are detected wrongly and it's no big deal to correct them manually, but I'm just curious. :)
gillux
14 hours ago
> Has this update been made recently?
Yes, on the 17th of November, 2014.

I’m not familiar with the language detection tool so I can’t tell you much about its weaknesses.
Guybrush88
3 days ago - edited 3 days ago
It appears that the interface localization menu (the dropdown menu listing the languages in which the interface is translated into) is badly rendered on Firefox 35.0 on Ubuntu 14.10: http://i57.tinypic.com/2przqzo.png

Every language name is displayed as in the screenshot, and in any page of the website
TRANG
3 days ago
Are you using the advanced language selector? If so, what does it look like without the advanced selector?
Guybrush88
3 days ago
nope, i'm not using the advanced language selector
TRANG
3 days ago - edited 3 days ago
Okay I've created an issue: https://github.com/Tatoeba/tatoeba2/issues/559

Edit: actually I just realized the list of UI languages doesn't use the advanced selector.
Guybrush88
3 days ago
thanks, Trang ☺
ricardo14
3 days ago
I'm using it looks fine to me. I prefer it instead of the standard one.
sacredceltic
16 hours ago
Pour moi, le sélecteur avancé de langues ne semble plus fonctionner depuis ces derniers temps. J'ignore pourquoi...
RadiumCat
3 days ago - edited 3 days ago
erm, why is it that whenever I write something in Punjabi (gurmukhi) the punctuation mark shifts to the beginning of the sentence? Is it because Tatoeba thinks the text is in Shahmukhi, which is a RTL script? If so, can Admins kindly look into it?

Ta!
sabretou
3 days ago
Welcome back, Radium. I and others noticed this problem. The solution was to split Punjabi into Eastern and Western versions (which is how ISO 639-3 does it). Eastern, or Indian Punjabi will be the one you'll likely use, with the Gurmukhi script. This change has been pending for a long while now (most probably due to a lack of regular native Punjabi contributors).

You can still add sentences, but if you do, add your sentences to either the Western Punjabi or Eastern Punjabi lists after adding them. These lists will be used when splitting the language later.
TRANG
2 days ago
I looked quickly through the sentences, and since I do not know the language, I would need to know in which case you put the punctuation at the wrong place to make it display at the right place.

Here are 2 sentences where the question mark is not at the same place.

http://tatoeba.org/eng/sentences/show/3788613
--> http://prntscr.com/5vvhv8

http://tatoeba.org/eng/sentences/show/3786275
--> http://prntscr.com/5vviny

Anyway I'd suggest you write your sentences as they normally should, without trying to hack the position of the punctuation. We'll see how we can fix it on our side.
CK
CK
2 days ago - edited 2 days ago
I wonder if it makes a difference whether you're depending on the auto detect, or if you manually select the language before contributing a sentence.

If manually selecting the language first solves the problem, then at least temporarily, you could manually select the language first before making a contribution.
RadiumCat
2 days ago
TRANG: The first sentence is the one without the hack i.e it's written the way it's supposed to be written, with punctuation mark at the end of the sentence (at the right side).

Someone else on here also advised that I keep on translating without manually fixing the punctuation, and that is what I'm doing now.

CK : I've tried manual select multiple times but that doesn't seem to have any effect on the output. I strongly suspect tatoeba code is designed to treat Punjabi text inputs as Shahmukhi by default, which is a RTL script while Gurmukhi is a LTR script, hence the mix-up. It would be nice if a fix can be implemented whereby one gets to choose which particular script to use.. If not, That's OK too so long as it can treat Gurmukhi as LTR :)
TRANG
2 days ago
Okay so I had misunderstood your problem. I thought you were writing with a script that is supposed to be RTL, but some weird bug caused the punctuation to be misplaced. But in fact it's just that the script you are using is a LTR script.

Tommy_san reminded me of this discussion:
http://tatoeba.org/eng/wall/show_message/17381

If most of the Punjabi sentences in Tatoeba are written with Gurmukhi, then we will change the language code to "pan" and specify the language name to be "Punjabi (Eastern)". The sentences added to this language should then use the Gurmukhi, and they will be displayed left to right.
We will add "Punjabi (Western)" later on, if there are contributors for it.
Does this sound okay?

Note that the direction of the language is not influenced by the way you choose the language. We have a list of language code for which the sentence text should be displayed right to left, and "pnb" was in that list.
https://github.com/Tatoeba/tato...uages.php#L518
AlanF_US
2 days ago
I believe that RadiumCat can also contribute in Western Punjabi. If that's the case, I'll add Western Punjabi at the same time, and reassign the sentence languages according to the lists.
RadiumCat
yesterday
Trang: Sounds great to me! :)

I primarily use Gurmukhi script for working with Punjabi sentences but as AlanF_US has rightly guessed in the comment below, I'm also familiar with Shahmukhi, along with major Western (and Eastern) Punjabi dialects so I think I do have the capacity to contribute using Shahmukhi every once in a while. I don't know if you feel that is good enough to add another language code though!

TRANG
20 hours ago
Eastern Punjabi has been added :)

Please check the sentences to fix the punctuation where needed:
http://tatoeba.org/eng/sentence...ne/indifferent
RadiumCat
19 hours ago - edited 19 hours ago
Thanks for the fix, Trang! :)

I've made all the necessary changes, I think :)
tommy_san
2 days ago
See http://tatoeba.org/wall/show_message/17381 for earlier discussions.

It seems that #1754557 is an example of a sentence written with the RTL script Shahmukhi.
TRANG
2 days ago
Ah right, there was this discussion. Thanks lot for refreshing my memory.
sacredceltic
22 hours ago
Il y a toujours des doublons causés par les apostrophes de graphies différentes :

http://tatoeba.org/fre/sentences/show/1401543
http://tatoeba.org/fre/sentences/show/549539

Ne peut-on pas réaligner ces apostrophes selon les conventions typographiques en vigueur, et plus généralement, les convertir à la saisie.

Je cite Wikipédia :
«
L’apostrophe a traditionnellement la forme d’une virgule placée en hauteur. On retrouve déjà cette définition d’« une virgule que l’on met un peu au-dessus du mot » dès la première édition du Dictionnaire de l'Académie française (1694)1 et plus récemment chez Jean-Pierre Lacroux : « Une virgule libérée de la pesanteur qui la clouait sur la ligne de base »2. En allemand, dans le langage courant ou populaire, elle est nommée Hochkomma, littéralement « virgule haute ».

En raison des contraintes techniques des claviers de machines à écrire, puis de nos jours de ceux des ordinateurs, elle est très souvent tracée comme une barre verticale droite dans les documents informatiques. Cette apostrophe est alors appelée « apostrophe dactylographique » (car apparue avec les machines à écrire mécaniques utilisant une seule touche pour l’apostrophe et le guillemet anglais ouvrant ou fermant, ou même d’autres signes comme l’accent aigu), « apostrophe droite » (car elle est souvent droite pour le guillemet anglais ouvrant ou fermant, mais pas toujours), apostrophe informatique3 ou d’autres noms plus imagés4. Les expressions « apostrophe dactylographique » et « apostrophe typographique » sont utilisées par Aurel Ramat5.

Selon les usages des typographes, l’apostrophe dactylographique ne devrait pas être employée6,7 et, par exemple, pour Lacroux, ce « n’est pas une apostrophe. […] Ce n’est typographiquement rien »2.
»
La contrainte technique n'existe plus, donc...
Guybrush88
22 hours ago
Il y a aussi des doublons causés par l'espace avant la punctuation, surtout en français. Par exemple:

http://tatoeba.org/ita/sentences/show/444788
http://tatoeba.org/ita/sentences/show/1869503
sacredceltic
22 hours ago
La solution, c'est la conversion à la saisie.
TRANG
yesterday
** Language skill in profile **

In a distant future there are some changes that I would like to implement in order to improve the quality of the sentences in Tatoeba's corpus.
While thinking about it, I was wondering how to efficiently define someone's language skill, because one thing is sure: we will need to add a feature for members to indicate what languages they speak and how well they speak it. But what kind of information will we need exactly?

We can go for something like "beginner", "intermediate", "advanced", "fluent", "native". Or we could go with a scale from 0 to 10, where 0 is when you know nothing and 10 is when you are an expert in the language.
But these designations/scores may be kind of vague, and everyone has a different way to evaluate their level.

Then there's the question writing versus oral skills.
For instance in my case, I cannot speak, write or read Vietnamese, but I can understand it when spoken. Not everything, but if it's a casual conversation then I can understand. How would I define my level of Vietnamese on only one scale?

We could perhaps follow the CEFR:
http://en.wikipedia.org/wiki/Co..._for_Languages
Or we could use something else. I'm not sure if there are other frameworks out there that attempt to define the language skill of a person.

In any case, before we implement a feature that allows members to indicate the languages they know, we need to discuss and decide of a standard way to evaluate the language skill of a contributor. If you have anything to say on this issue, I'm listening.
Pfirsichbaeumchen
yesterday - edited yesterday
I'd try to keep it simple and go for something like "beginner", "intermediate", "advanced", "native level". In your case, "(mostly) passive knowledge" may be imprecise but would probably be enough, as details could be given individually in the profile description.

I think it most important that members indicate their native language in their profiles and that non-native contributions are automatically marked as needing verification.

Edit: I'd like to add that I hope a way for users to indicate their native (or strongest) language in their profiles will be implemented in a future not too far away.
odexed
yesterday
Maybe we could recommend some on-line test for those who can't precisely estimate their level, for example,
http://www.testden.com/challeng...FeHVcgodZTsA3A

I believe there are similar links for other languages, too.
sabretou
yesterday
I'm partial towards the ILR scale, compared to CEFR: https://en.wikipedia.org/wiki/ILR_scale

ILR is simpler and has less levels (more levels = more vagueness and incorrect assessment).

Perhaps we could make our own system. These scales are generally made with dedicated speakers of languages. On Tatoeba, however, many of us are language nerds and therefore, polyglots to varying degrees.

Going off your Vietnamese example, we could use a system were a language is listed on a user's profile, along with a table of icons representing writing, speaking, reading and listening, which are then colour coded to represent level: grey for no ability, red for poor ability, orange for medium, green for good. Native languages are represented with a special gilded row, a star or a similar mark.

So mine would look like:

Marathi (gold) : [L: Green] [R: Yellow] [S: Green] [W: Yellow]
English: [L: Green] [R: Green] [S: Green] [W: Green]
Hindi: [L: Green] [R: Yellow] [S: Green] [W: Yellow]
Urdu: [L: Yellow] [R: Red] [S: Yellow] [W: Red]
Japanese: [L: Red] [R: Red] [S: Red] [W: greyed]

If it's too complicated, we could leave this as a 'fine-tune' option and allow readers to pick based on a simpler colour coded system based on ILR: 0: Grey, 1: Red, 2: Orange, 3: Yellow, 4: Green, 5: Gold (picking one of these automatically fills the rest of the LRSW icons with that colour).
gillux
yesterday - edited yesterday
> we will need to add a feature for members to indicate what languages they speak and how well they speak it. But what kind of information will we need exactly?

To answer this question, we need to ask ourselves what do we need this information for. If we need to know how well someone masters a language so that his or her contributions should be classified as trusted or not (for instance automatically tagged as NNC), two levels are required: native or not.

If we consider language skill as a way to tell users in what languages one is able to communicate with them, then I’d add a third level: able to read messages from other users.

Of course users are free to add more details about their knowledge in their profile description, like your oral understanding of Vietnamese, but I see no need to store more details in specific form areas if they are not gonna be used for your initial objective (improving the quality of the sentences).
Pfirsichbaeumchen
yesterday
+1
mraz
yesterday
+1
Silja
yesterday
Gillux has valid point here: what is the aim of indicating the language skills?

If the sole purpose is to identify a native speaker from a non-native, then we need only to ask the users to specify their strongest language and nothing more. We should ask for the strongest language, since the native language is not necessarily always the strongest one. This would be also the most useful information for the project. The simplest implementation of this would be a required field where the user has to indicate their strongest language. This would be a drop-down list with all the languages available in Tatoeba and "other" + a text field to type in the language name for those, whose strongest language is not yet available here.

If we want also to have general information about the overall language skills of the users, then we should have more sophisticated system, where the user can indicate all the language skills they have. As it has been commented already, users can indicate their general language skills more specifically in their profiles description.

However, providing a structured way to indicate the language skills would be more user-friendly way to do this. Many users (including me) have tried to list all the languages they know in their profiles and also to define their proficiency levels of those languages. If we go for the structured way to represent the language skills, it wouldn't be of course required information. The ones who want to use it could do so, but you can always just leave it empty or alter or remove the information when ever you want.

The language proficiency system could be something like what sabretou described: the users can evaluate their speaking, listening, writing, and reading abilities all individually. I also like sabretou's idea of colour coding. If we go for the overall language skills, I think we shouldn't create a system of our own, but rather use one of the existing systems. CEFR is OK for this.
TRANG
yesterday
> I see no need to store more details in specific form areas if they are not gonna be used
> for your initial objective (improving the quality of the sentences).

Yes you are right. I realized that I wrote my message to hint towards improving the quality of the sentences, but I didn't want to limit the reflection only to that initial objective.

You mentioned for instance that the language skill could be used to tell what languages others can use to be able to communicate with a member. It's something I didn't have in mind when I wrote the post, but it's actually very useful to know this. I've often found myself browsing a user's comment to try and figure out if they would understand me if I wrote in English, or if I would need to use another language, in which case which language.

Maybe others will think of more things that could be useful, but aren't necessary for improving the quality of the sentences.
sacredceltic
yesterday
>we will need to add a feature for members to indicate what languages they speak and how well they speak it.

Avez-vous prévu un détecteur de mensonges ? Parce qu'il y a quelques menteurs ici, même s'ils ont parfois réussi à faire accroire qu'ils étaient natifs et à se faire estampiller comme tels par CK...

Et avez-vous prévu la détection de mythomanes ?
La plupart des gens surestiment leurs niveaux de langues.
TRANG
yesterday
C'est un problème auquel j'ai réfléchit, oui. Mais je n'ai pas de solution 100% fiables à proposer.

Je ne pense pas que beaucoup de gens vont intentionnellement mentir sur leur niveau, mais je pense que beaucoup de gens vont soit surestimer, soit sous-estimer leur niveau, sans avoir de mauvaises intentions.

Qu'est-ce que vous suggéreriez pour répondre à ce problème ? Comment est-ce que vous détectez que quelqu'un est bel et bien natif dans une certaine langue ?
sacredceltic
22 hours ago
>Comment est-ce que vous détectez que quelqu'un est bel et bien natif dans une certaine langue ?

Je ne le fais pas, tout simplement, et je prétends que ceux qui s'octroient ce pouvoir nous abusent.
« Natif » ne veut rien dire, de toutes les manières. Il y a des natifs qui ne maîtrisent pas leur langue natale. C'est même la majorité de la population, c'est pour ça qu'on a inventé l'éducation, qui obtient plus ou moins de succès en la matière.
Malheureusement, même ceux qui n'ont reçu qu'une très faible ou très mauvaise éducation se piquent d'écrire des phrases sur Internet.

Demain, dans un futur très proche en fait, c'est Google qui décidera si une phrase est correcte ou non. Et pour ce faire, Google se basera sur des Corpus tels que Tatoeba.
Les erreurs qu'ils contiennent deviendront la nouvelle doxa.
<<< 1234567 >>