menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Pharamp Pharamp January 28, 2014 January 28, 2014 at 10:29:13 PM UTC link Permalink

Hallo!

I noticed that a lot of Tatoeba sentences contain numerals instead of words to express quantity, time, length and so on.

Why is this practice so widespread? It has some really obvious disadvantages for language learners as well as for linguistic researches.

What is your opinion about it?

{{vm.hiddenReplies[18452] ? 'expand_more' : 'expand_less'}} hide replies show replies
CK CK January 28, 2014 January 28, 2014 at 11:55:14 PM UTC link Permalink

>Why is the practice so widespread?

In many cases, it's the natural way that native speakers write.

{{vm.hiddenReplies[18453] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 29, 2014 January 29, 2014 at 8:12:36 AM UTC link Permalink

Yes, but some Tatoeba policies don't really follow how native speakers write. We always tried to be on the learners' side.

(p.s. I will mass-disadopt all my English sentences soon.)

{{vm.hiddenReplies[18455] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 January 29, 2014 January 29, 2014 at 12:31:47 PM UTC link Permalink

>> Yes, but some Tatoeba policies don't really follow how
>> native speakers write. We always tried to be on the
>> learners' side.

Really? :o

{{vm.hiddenReplies[18459] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 29, 2014 January 29, 2014 at 1:19:04 PM UTC link Permalink

Well, I was away for a couple of years, but I always had this impression. I can be wrong, though.

By the way, here an example. Many Italian sentences in Tatoeba use personal pronouns as subjects. Italian is, however, a pro-drop language. It means it's more natural to say "Am a boy" rather than "I am a boy". You are allowed to explicitly express the subject and still get a natural sentence only in particular contexts. This context is often absent in Tatoeba.
However, it's obviously easier for a learner to read sentences with explicit personal pronouns, so these sentences should be kept, even if they are NOT (in my opinion) good examples for the Italian language.

alexmarcelo alexmarcelo February 1, 2014 February 1, 2014 at 7:49:31 PM UTC link Permalink

> We always tried to be on the learners' side.
I'm afraid this is not true. Tatoeba never was and never will be intended for beginners; however, it can be used as a complementary resource for teachers and students, beginners included.

{{vm.hiddenReplies[18525] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp February 1, 2014 February 1, 2014 at 8:03:59 PM UTC link Permalink

Tatoeba was never meant for beginners: true. But it was born and meant for learners. But I never said I agree with this direction...

{{vm.hiddenReplies[18526] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo February 1, 2014 February 1, 2014 at 8:08:49 PM UTC link Permalink

Of course I agree that numbers, symbols or whatever (see my comment and examples below) should be indicated, be it on a comment, in an audio recording... but I don't agree that one should replace the usual, correct representation of symbols by its phonetic equivalent. As I said, notation is an integral part of a language and should be treated as such.

al_ex_an_der al_ex_an_der January 29, 2014 January 29, 2014 at 12:22:27 AM UTC link Permalink

I support this idea, Pharamp. That's a detail, but an important one for language learners. I think inside of Tatoeba it's worth writing numbers as words even there where native speakers write otherwise.

{{vm.hiddenReplies[18454] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 29, 2014 January 29, 2014 at 8:19:44 AM UTC link Permalink

The ideas I had to solve this problem backwards were:

1) Mass-edit the database. (Applicable only for a couple of languages, like Italian, French or English) This violates some rules stated by Trang, because we should always ask the owner first, but it's quicker.

2) Mass-comment sentences containing numbers, inviting to change them. It's technically possible but it should be done slowly in order to preserve the database. It also requires that people edit manually their sentences, which can be rather dull.

But firstly, it's fundamental to gather more opinions. If the community doesn't agree, we won't obviously touch anything at all.

{{vm.hiddenReplies[18456] ? 'expand_more' : 'expand_less'}} hide replies show replies
al_ex_an_der al_ex_an_der January 29, 2014 January 29, 2014 at 11:37:49 AM UTC link Permalink

I doubt that these project would trigger enthusiasm if treated in an administrative way. It may be better if also in future the owner of the sentence is given the freedom to decide weather in the example a number is written in words or not. If not he or she should be encouraged to indicate the spelling of the number in a comment beneath the example sentence. Some of us practice this already, deciding in each case for one of these two possibilities depending on the length of the numeral.

{{vm.hiddenReplies[18457] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 29, 2014 January 29, 2014 at 1:23:55 PM UTC link Permalink

I understand your view, but I'm looking at the problem in a more computational way. Some languages decline numbers. If you have a number, it's more difficult for a machine to get the right case. If you have a word, it's easier for a machine to get the cardinal or ordinal number.
In a future (and maybe only in our dreams) Tatoeba, each sentence should get a Part-of-speech tagging, and then people should be able to choose to display numbers as numbers or words.

Pharamp Pharamp January 29, 2014 January 29, 2014 at 1:26:38 PM UTC link Permalink

By the way, this project *is* treated in an administrative way. We have guidelines for everything now and a lot of things are restricted, even if we don't like it. I don't personally think "Please try to express numbers in words" as a heavy imposition :)

{{vm.hiddenReplies[18464] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 January 30, 2014 January 30, 2014 at 9:15:38 AM UTC link Permalink

Yes, we do have guidelines. But so far, the guildelines have been consistent with each other. The rule you're suggesting is clearly at conflict with other guidelines, like «We want sentences to remain as "raw" as possible».

I believe it's important to keep the entry barrier low, and keeping the rules to minimum is important. The 'raw sentence' rule is the easiest to understand and explain (just write natural sentences, like you would do anywhere else).

sacredceltic sacredceltic January 29, 2014 January 29, 2014 at 11:50:49 AM UTC link Permalink

We already had this discussion months ago.
Some find it funny to translate 5 into 5. I find it completely silly on a service that is supposed to help people acquire linguistics skills.
I chose my side. I use words.

{{vm.hiddenReplies[18458] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 29, 2014 January 29, 2014 at 1:24:28 PM UTC link Permalink

Sorry, I was absent. But thanks for your feedback!

Pharamp Pharamp January 29, 2014 January 29, 2014 at 1:27:06 PM UTC link Permalink

p.s. Could you link it to me?

{{vm.hiddenReplies[18465] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 29, 2014 January 29, 2014 at 2:31:50 PM UTC link Permalink

J'ignore comment interroger les historiques du mur. Suite au plantage du service, j'ignore s'ils ont même été conservés. Désolé...

{{vm.hiddenReplies[18469] ? 'expand_more' : 'expand_less'}} hide replies show replies
liori liori January 30, 2014 January 30, 2014 at 9:55:44 PM UTC link Permalink

I've uploaded the pre-crash dump of the wall messages here: http://tatoeba.org/files/downlo...e-crash.csv.xz

Pharamp Pharamp January 29, 2014 January 29, 2014 at 2:24:19 PM UTC link Permalink

Such things should be avoided.
http://tatoeba.org/eng/sentences/show/3433
http://tatoeba.org/eng/sentences/show/504401

{{vm.hiddenReplies[18468] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 January 30, 2014 January 30, 2014 at 9:17:56 AM UTC link Permalink

Why? I think they should be encouraged if both ways of writing are natural.

The only problem is with Tatoeba's display, not with sentence themselves. Tatoeba should group similar sentences, and display only one when there are several alternatives.

{{vm.hiddenReplies[18473] ? 'expand_more' : 'expand_less'}} hide replies show replies
Pharamp Pharamp January 30, 2014 January 30, 2014 at 10:14:22 AM UTC link Permalink

Do we collect sentences native speakers would say or what they would write? Because if the latter is the case, half of the (Italian) corpus is nonsense.

{{vm.hiddenReplies[18474] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 January 30, 2014 January 30, 2014 at 11:20:09 AM UTC link Permalink

Well, at least *I* thought we collect such sentences. I can't say anything about Italian corpus since I don't know a single Italian word, but at least the Russian, Ukrainian and Belarusian subcorpuses generally conform to such a definition.

brauchinet brauchinet January 30, 2014 January 30, 2014 at 11:30:36 AM UTC link Permalink

Does "the latter" refer to "what they would write"? I always thought that we collect examples of "written" language in the frist place.

{{vm.hiddenReplies[18476] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 January 30, 2014 January 30, 2014 at 12:05:49 PM UTC link Permalink

Some languages represent only written text (e.g. Literary Chinese, Latin), some repesent mostly spoken (Iraqi Arabic, Cantonese). Most languages are in-between, with some sentences representing spoken language, and some sentences representing a written language. I don't see a problem here.

Ooneykcall Ooneykcall January 30, 2014 January 30, 2014 at 1:35:36 PM UTC link Permalink

I thought the way of Tatoeba is to collect anything that could be of any actual use? This seems most reasonable, since language is a pretty complex thing... preventing input, however specific its use, doesn't do the learner well.
In fact, spoken language examples are perhaps of more use to an advanced learner, since they are more difficult to come across and pick up; so neither language register should be discriminated against.

Concerning numbers, I think the practice some people adopted - write the number as it is, but post its wording in a comment below - is usually the best one. Longer numbers, like years, would distract from the rest of the sentence...

{{vm.hiddenReplies[18478] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 30, 2014 January 30, 2014 at 6:33:34 PM UTC link Permalink

> I think the practice some people adopted - write the number as it is, but post its wording in a comment below - is usually the best one.

No it isn't, because there are several ways to write numerals in words in the same language. So there should be a sentence for each way, with a corresponding audio.

Example :

80 in French is :
Quatre-vingt in France and Belgium
Octante or Huitante in Switzerland
...

Translating "80$" from English to "80$" in French is, at best, very silly and is of no help to anybody...

{{vm.hiddenReplies[18480] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ooneykcall Ooneykcall January 30, 2014 January 30, 2014 at 7:23:41 PM UTC link Permalink

Hello, the master of absolutes.
Evidently you have wonderfully ignored the adverb 'usually', which is not negated by a special example (it is rather uncommon for a number to have more than one word associated with it, isn't it so?).

Mind that we do not translate words, we translate sentences. Those have numbers in most cases written numerically, as it is, though they can also be written with words. The former can be said to represent the tradition in writing, while the latter would be more demonstrative and also possible. Therefore, either is good, since either is well possible and used.

It is not of such grave importance as to rationalise forcing contributors to double their sentences in both fashions, though, since the rules of number creation are quite finite, generally, and so can quickly be learned by a fresh beginner.

It would certainly enrich the project to have all possible variations of a given sentence, but this tedious doubling is a lot of work which people can use to make other useful sentences instead. We hardly have enough contributors at the moment that assigning mundane tasks to people en masse would be the optimal way.
What's the big problem with comments as a partial supplementary measure, anyway?

I do agree on the subject of audio, but lack of it is not due to people writing things the way they do. (Sentences that do have audio are, I'm inclined to agree, better written in words, so that one can see the relationship between the writing and the pronunciation.)

In fact, even writing the number properly won't always help get accustomed with the common pronunciation, since numbers, being among the commonest words in speech, are often pronounced very 'carelessly' with a lot of consonant and vowel reductions.

Concluding, I'm not actually against wordifying numbers myself - I think I do that more often than using math notation - but I don't think it's worth putting the strain of necessity on everyone. Posting the clarification in comments is an okay measure for now until some automated way can be introduced.

Thanks you for having read this so far!

PS. I understand you're trying to be concise, but it might be more polite to formulate your opinions as opinions and not facts even if you consider them factually true. It would give one the feeling that you are actually interested in hearing others' points...

{{vm.hiddenReplies[18481] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic January 30, 2014 January 30, 2014 at 8:03:50 PM UTC link Permalink

Very fine nuance indeed between "factually true opinions" and facts. It obviously sounds like essential to the debate...
The fact that Tatoeba holds but one audio per sentence is not an opinion at all, even a very factual one. It's just a fact.
And it is also a fact that hearing "Huitante", while learning French in Switzerland won't help you at all understand what "quatre-vingt" means.

It might be that there are few instances of languages where the same numerals have different writings, although you can't know that because there are 6000 + languages in the world and some are very surprising when it comes to numerals...
As a matter of fact, any word, including numerals, can be a modifier to or be modified by the former or the following word in a sentence. I know that this phenomenon takes place in many languages. In the case when a numeral is modified in such a way, writing it with numbers doesn't help at all to learn its pronunciation.

sacredceltic sacredceltic January 30, 2014 January 30, 2014 at 8:22:58 PM UTC link Permalink

Illustration in French :

20 ans
20 mètres

You would think that 20, in both cases, is pronounced the same way if you're a native English speaker.
But you would be very wrong ! And in some cases, you wouldn't be understood at all if you insisted on pronouncing it the same way.
French is a kind of "meta-language", ie it is a language that you cannot pronounce if you can't write its words properly, because it was derived from other languages by well-written people. Hence the emphasis on orthography in French.
So knowing that 20 is written as "vingt" with a final t is ESSENTIAL to its pronunciation.

When you pronounce "20 ans" while ignoring that 20, as a word, ends with a t and subsequently not making the necessary liaison with "ans", I won't understand what you say at all...even if I try very hard and although I'm very much used to hear broken French from various foreigners, because I will believe you're trying to say something else that vaguely sounds like "vin-an" and which relates to nothing in my mind...

tanay tanay February 1, 2014 February 1, 2014 at 5:35:09 PM UTC link Permalink

BTW, what about years. Is it better to write them with digits or in words?

{{vm.hiddenReplies[18518] ? 'expand_more' : 'expand_less'}} hide replies show replies
alexmarcelo alexmarcelo February 1, 2014 February 1, 2014 at 5:57:55 PM UTC link Permalink

As for Latin, I usually add both, because I think both structures (ie, using the Roman numerals and words) are worth adding.
http://tatoeba.org/eng/sentences/show/1338465

alexmarcelo alexmarcelo February 1, 2014 February 1, 2014 at 5:59:15 PM UTC link Permalink

At least in Portuguese, writing them in words would look weird and unusual. I'd rather add a comment saying how to read them.

gillux gillux February 1, 2014 February 1, 2014 at 7:24:21 PM UTC link Permalink

Hello!

Frankly, Pharamp, this is quite shocking. I am strongly against “solving” this non-problem.

As long as natives do write numbers using digits and words in real life, I see no reason not to allow both. Languages are not this much consistent, and so Tatoeba should be. Researchers and learners should embrace languages like they actually are, and not change them in order to match their own specific needs.

Furthermore, there are plenty of contexts where numbers are almost always written with digits, or almost always written in words. You just can’t mass-edit one way or the other for the sake of consistency. Of course every writing is possible, but conventions exist and should be reflected in Tatoeba’s content. Just to name a few : I was born in 1952. There are 1,952 sentences. The book costs $19.52. I got a 404 error. My phone number is 0123456789. I’ve got a 32 bits processor, and a Nintendo 64. The next train arrives at 7:50 p.m. She killed two birds with one stone. One should know that. Remember that two wrongs don’t make a right. Give me five! He can talk French twenty to the dozen. Le weekend du 15 août. Je me suis mis sur mon trente-et-un. Appliquons la règle de trois. Vingt-deux, voilà les flics. C’est trois fois rien. Mille mercis. Les mille et unes merveilles du monde. (Now I’ve got to add all these to Tatoeba.)

To me, the reading problem that others mentionned is a different problem that should be solved with a different solution. Like adding audio, or adding readings (like Japanese already has, though it’s broken at the moment).

{{vm.hiddenReplies[18522] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 1, 2014 February 1, 2014 at 7:40:30 PM UTC link Permalink

Mettez-vous donc une minute dans la peau d'un apprenant du français :

S'il est en Suisse, on lui a dit que 80 se dit Huitante (ou Octante, selon les cantons...)
Il décide de vérifier la prononciation de Huitante sur Tatoeba (huit, en soi, est déjà un défi de prononciation pour tous les anglophones, hispanophones, lusitophones, russophones, japonophones et j'en passe !)...et là, SURPRISE, il entend "katrevin" !?!

Que va-t-il en tirer ?

Je vous le dit tout net : de la merde.

Ne pas écrire les nombres pour les apprenants est une connerie absolue.

{{vm.hiddenReplies[18523] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux February 1, 2014 February 1, 2014 at 8:09:27 PM UTC link Permalink

Bienvenue dans le monde réel.

{{vm.hiddenReplies[18528] ? 'expand_more' : 'expand_less'}} hide replies show replies
sacredceltic sacredceltic February 1, 2014 February 1, 2014 at 11:49:30 PM UTC link Permalink

Vous ne répondez pas à l'objection.

En quoi entendre "katrevin" aide-t-il un apprenant du français qui lit "Huitante" ?!?

alexmarcelo alexmarcelo February 1, 2014 February 1, 2014 at 7:42:51 PM UTC link Permalink

+100

Besides, it would be nearly impossible to write things like these in words and yet make them look natural:

http://tatoeba.org/eng/sentences/show/2169927
http://tatoeba.org/eng/sentences/show/2945976
http://tatoeba.org/eng/sentences/show/2169906
http://tatoeba.org/eng/sentences/show/2583226
http://tatoeba.org/eng/sentences/show/2583268
http://tatoeba.org/eng/sentences/show/2169864
http://tatoeba.org/eng/sentences/show/2176019
http://tatoeba.org/eng/sentences/show/2176014
http://tatoeba.org/eng/sentences/show/2176011
http://tatoeba.org/eng/sentences/show/2176018

Terminology is a fundamental part of human language.

Pharamp Pharamp February 1, 2014 February 1, 2014 at 8:14:35 PM UTC link Permalink

Hallo Gillux,

I think a lot of people here missed something I wrote but didn't get noticed at all because of the structure of this wall. But well, no point on going on with this issue.

Cheers.