What are best practices for linking similar sentences that only differ in punctuation?
Particularly common variant is expressions like "Hello!" and "Hello."; to me, they have different meanings as symbolized by the exclamation mark or the full stop.
Should these be linked as synonyms or not?
The same question happens across languages; should one link "Hei." only with "Hello." or also with "Hello!"?
(The exclamation mark has slightly different uses in different languages, but in many cases the meaning seems to be the same.)
Good question. I started to link them as synonyms a while ago. But it would be good to have a general agreement on this. An advantage of linking them is that people will not add the Hello! sentence if they clearly see that the Hello sentence exists.
Looking into it a bit more, the Wikipedia page is useful, as it often is: https://en.wikipedia.org/wiki/Exclamation_mark .
There are languages without an exclamation mark (old European languages, presumably many non-European languages unless they all have adopted it, which I doubt) and it is used in different ways in different languages. Communicating these differences is only possible if the sentences are not made synonyms unless the meanings do overlap.
As such, it seems a good idea to not link for example "Hey." and "Hey!", because if they are not linked, it is meaningful to link a word in another language to both of them, if that language for example does not use the exclamation mark at all or uses it a lot or uses it very rarely, when compared to English. These distinctions are lost if all sentences are linked irrespective of punctuation.
I am employed as a professional editor at the British Columbia Legislative Assembly. Our house style guide takes account of the current regrettable tendency among some people to overuse exclamation marks. Our style guide mandates that in nearly all cases the exclamation marks be changed to periods, reserving exclamation marks for truly unusual occasions.
I have sometimes received emails in which EVERY sentence, even the most mundane, ended with an exclamation mark; other writers adapt bizarre and idiosyncratic conventions like ending all sentences with an ellipsis, as if they were unable to complete a rational thought.
A Member of the Legislative Assembly might seem, in his or her rhetorical excitement, to be saying: "Imagine that!" or "Shameful!" or "That Minister should have known better!" — even using the sort of intonation in speaking that might make it seem that an exclamation mark is somehow warranted. Regardless, our parliamentary reports style these sentences as: "Imagine that." or "Shameful." or "That Minister should have known better."
People who are interested could take a look at one of our Hansard (verbatim record of debates) issues, accessible from this menu, and just do a search for the "!" character:
It would be a very, very rare transcript that has even a single exclamation mark.
My recommendation would be to simply change all Tatoeba sentences ending in an exclamation mark to end with a period instead. Then let the automatic dupolicate removal tool purge the unneeded duplicates.
I disagree with replacing all exclamation marks with full stops, as the exclamation mark is part of many languages and communicates meaning. There are valid sentences with exclamation marks.
I agree that it is rarely useful in formal writing (factorials aside), but that does not seem relevant for Tatoeba.
> My recommendation would be to simply change all Tatoeba sentences ending in an exclamation mark to end with a period instead.
I don't agree with this. However, I do agree that most English sentences with exclamation marks might be just as good or better without the exclamation marks. Non-native English speakers seem to overuse them here, so I assume that exclamation marks are more commonly used in some other languages.
I think some languages use exclamation marks naturally for imperative sentences, and maybe other uses.
Even in English, we more often than not see certain sentences with exclamation marks.
[#1557994] How annoying! (melospawn) *audio*
[#1915781] How arrogant! (donkirkby)
[#1913083] How beautiful! (CK) *audio*
[#729024] How romantic! (Shazback)
[#2173907] What a bad movie! (Hybrid)
[#1460433] What a beautiful city! (piksea)
[#1108267] What a beautiful day! (nata23)
[#528258] What a big dog! (fanty) *audio*
[#7357078] What a bizarre idea! (Eccles17)
[#2824472] What a great party! (eirik174)
For more, see sentences tagged as "exclamative".
I'm not sure that I completely understood what sentences you're linking together so I wouldn't say that I disagree but I think I might disagree^^
Languages that possess an exclamation mark, or whatever punctuation sign it may be, possess it for a reason. The fact that people don't know how to use them or overuse them is certainly sad, but « Mange. » and « Mange ! » are two completely different sentences.
Therefore, « linking them as synonyms » is, for me, a mistake. And I actually unlink this kind of pair when I see them. We wouldn't use the one instead of the other. And if we translate these two examples into Japanese, for example, they will give us two quite distinct sentences.
On a more global scale, I am against linking sentences of the same language together, except in the case of expressions/sayings/regionalisms/not dictionary-friendly. Doing so could lead users / learners of the said corpus to unfortunate mistakes, believing two words have the same meaning out of all context.
> « Mange. » and « Mange ! » are two completely different sentences.
"Completely different" overstates the case. Obviously they're not identical. But they contain the same vocabulary, the same grammatical structure, the same word order, and so on. It's just the punctuation that's different. A language learner who reads « Mange. » will know that they can produce the sentence « Mange ! » through a trivial transformation. So in the context of language learning (probably the major purpose for which Tatoeba is used), the two sentences have a very strong connection.
> And I actually unlink this kind of pair when I see them.
This makes me uncomfortable, especially if you do not know why the reasons that they might have been linked in the first place. If your contribution to a group project is simply to undo someone else's contribution, and the change is not based on a firmly settled principle, someone's time is being wasted.
Linking two sentences, whether in the same language or in different languages, does not imply that the sentences are freely interchangeable in all situations. It means that the linked sentences are interchangeable in at least ONE situation. Sentences that differ in their use of a synonym ("That house is big" vs. "That house is large"), or in order of clauses ("In the summer, we go to the beach" vs. "We go to the beach in the summer") are interchangeable in many situations, which means it's perfectly justifiable to link them. Similarly, in many situations, the use of an exclamation point versus a period is a matter of taste, meaning that these sentences are indeed interchangeable.
Tatoeba is not designed to be a dictionary, much less the kind of dictionary where every place where multiple alternatives are offered must be accompanied by usage notes. I believe that our users understand that a link between sentences does not mean that we guarantee that they are identical, or that we are obligated to warn them how they might differ in emphasis or formality.
Furthermore, even if a sentence in Japanese could be linked only to "Eat!" and not to "Eat.", or vice versa, there's no problem if someone links "Eat!" with "Eat." It's a basic principle here that the fact that a sentence can be linked to another sentence does not imply that it can necessarily be linked to every sentence to which the second sentence is linked. That's the reason we treat direct and indirect links differently in the user interface.
One purpose served by linking "Eat!" with "Eat." is that "Eat!" may be translated into language X while "Eat." is not, and users looking at "Eat." are better served by being able to see the the translation of "Eat!", even if it's marked as an indirect link. They can use their judgment to determine whether that translation (with or without a transformation) will suit their needs.
Personally, I link sentences if and only if their meanings overlap in at least one situation where I cannot find a strictly better translation, or where they are close enough and a strictly better translation is not yet in Tatoeba.
For sentences in the same language, this is usually not the case, as different words or punctuation marks suggest or emphasize different things, most of the time.
So "Hei." and "Hei!" are not good translations of each other, since one is a more excited than the other. The sentence itself is its best translation; no need to link them.
On the other hand, the words "ekvivalentti" and "yhtäpitävä" have precisely the same meaning in mathematical Finnish, so in any such sentence, one can be replaced by the other while conserving the meaning. So such sentences could be made synonyms. (One would still be communicating a little bit about one's usage of foreign versus more Finnish words, which is why I do not make them synonyms myself, but I would not object to someone else doing it or remove the links. Communicating that difference is not a big deal in modern Finland.)
Examples of why a single situation where two sentences have the same meaning is not a sufficient grounds for linking them:
Context: A dead animal was found in a region where lion is the only great cat around. "It was killed by a lion." and "It was killed by a great cat." would be essentially the same in that context, yet they should not be made synonyms.
Context: Mathematics research within function theory (i.e. complex analysis). "But wait, the function is differentiable." and "But wait, the function is analytic." have precisely the same meaning, since differentiable complex functions are also analytic. But making these sentences synonyms would be a mistake, since in many other contexts they have a crucially different meaning.
Likewise, if you are excited about meeting someone, you might say "God morgen!", and if you are feeling less energetic, maybe you would say "God morgen.". There certainly exists a situation between those two where they are interchangeable. But still, most of the time, the tone of the greeting makes a difference for the meaning and the exclamation mark suggests the tone.
I agree that where two sentences are equivalent only because there is a situational context that eliminates other possibilities, it doesn't make sense to link them. But I still think that where they are equivalent except for the presence or lack of an exclamation point, it does make sense to link them. In that case, as I mentioned, the vocabulary and grammar are identical, and producing one sentence from another is a trivial matter of adding or removing an exclamation mark, so there's a value in making sure that people who see one see the other as well. Note that I'm not trying to convince you to link such sentences yourself. But as I said, I'm not comfortable with someone actively unlinking them because there doesn't seem to be a widely held opinion that that should be done.
To me, this sounds like an ad hoc decision based on the feature of many (maybe all?) European languages that adding the exclamation mark is a trivial thing. I am not sure this is true of every language.
For Finnish, when adding a question mark to a statement, one often does more: "True. True?" could be "Totta. Tottako?" (not good sentences, but I hope you get the point). This might be true of structures related to the exclamation mark in some languages.
This reminds me of the issue of using standardized names of people or cities. Names are inflected (I hope this is the right word; taivuttaa) in Finnish, and especially for foreign names this is non-trivial, so using standardized names would be a loss.
I am generally against almost all links within a language, for the reasons mentioned earlier; I understand I might be in the minority here, so I do not remove those links, most of the time. But when posing the question I was more interested in links between languages, and I still am.
Different languages use the full stop and the exclamation mark in slightly different contexts, which is not communicated at all if sentences with and without exclamation marks are linked indiscriminately. So I suggest not linking phrases across languages where one has a full stop and the other has an exclamation mark without being familiar with the use of the symbols in both. One might have broader use of a symbol than the other, for example.
That sounds reasonable.
Your reasoning seems biased to me, at least on two points: european-centered, and purpose-centered (Tatoeba is used to learn a language).
>But they contain the same vocabulary, the same grammatical structure, the same word order, and so on.
It is funny how you do not say "the same meaning", although it is my mistake for saying "different sentences" and not "different meaning". However, since we're debating in English, you'll surely excuse me for that mistake.
And now, if you can tell me contexts where "Mange." and "Mange !" are the same, in terms of meaning, and not of "trivial transformation", I'll be happy to hear some of them. But then I guess "Tu vas au restaurant." and "Tu vas au restaurant ?" are also similar sentences that can be linked. And I shall be doomed.
Long story short: don't tell me how to maintain our French corpus, French natives are here to debate on that. :)
For most of the other points, Thanur has explained them probably better that I could have.
If you have a policy of unlinking sentences within the French corpus that differ only in the use of exclamation mark versus period, then yes, that's your prerogative. I thought that the French was being offered as an example, but that you were talking about unlinking such pairs even in English, where we don't have such a policy. I still think that many people can profit from seeing both elements of such a pair even in French, but they can be found in other ways, such as via search.
Then let me apologize for not being precise enough. I was talking only about this specific case.
As another example, I would not unlink "Qu'est-ce que tu fais ?" and "Que fais-tu ?" if they were linked. Personally, I don't find it necessary, but such a link is not harmful and "may" be helpful to somebody.
I was not sufficiently precise, either.