menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
slomox {{ icon }} keyboard_arrow_right

Profile

keyboard_arrow_right

Sentences

keyboard_arrow_right

Vocabulary

keyboard_arrow_right

Reviews

keyboard_arrow_right

Lists

keyboard_arrow_right

Favorites

keyboard_arrow_right

Comments

keyboard_arrow_right

Comments on slomox's sentences

keyboard_arrow_right

Wall messages

keyboard_arrow_right

Logs

keyboard_arrow_right

Audio

keyboard_arrow_right

Transcriptions

translate

Translate slomox's sentences

slomox's messages on the Wall (total 81)

slomox slomox March 20, 2012 March 20, 2012 at 12:15:06 PM UTC link Permalink

It would be nice, if there was some information about the technical steps that need to be taken to create a new language. That could help to raise some appreciation for the task, or somebody could write a script to automatise some of the steps. Or somebody from the community could adopt the new language creation procedure.

slomox slomox November 14, 2011 November 14, 2011 at 10:32:36 AM UTC link Permalink

What is the correct procedure to link two sentences that are completely unconnected so far?

I could just add a duplicate of one of the sentences as a translation to the other and wait for the duplicate merge script to resolve the duplication. But there must be a better way. I just cannot find it at the moment...

slomox slomox November 9, 2011 November 9, 2011 at 3:04:38 PM UTC link Permalink

>>Unlike the Romance languages however Chinese languages are connected by a common script

You cut the most important part from the quote. The script is etymon-based and can therefore overarch much more variety than a sound-based script.

>Yes they do: official Mandarin is Putonghua

And that's a different term.

>By the way, why did you forget German and English in your list?

Because if I had decided to make the list encompass every language that ever did that, it would probably be a list of all the languages ever adopted as official language of any territory.

>You're denying that the current PRC's regime is exterminating and levelling cultures and languages?

Where does this conclusion come from? I do not and I don't think that I said anything that would allow this conclusion.

I don't want to engage in an argument with you, Sacredceltic. That would be pointless, because our opinions are basically the same. We could only battle about the semantic subtleties or minor ambiguities of our sentences.

Politically I'm on your side, I just disagree about some methods. Like calling people senile because you disagree with their supposed opinions. Or denying the existence of an overarching superidentity to protect the overarched subidentities.

slomox slomox November 9, 2011 November 9, 2011 at 12:54:17 PM UTC link Permalink

Just for the records: Pretty much everybody agrees that "Chinese" is not a single language, but a family of languages comparable to the Romance languages. Unlike the Romance languages however Chinese languages are connected by a common script that's not sound-based but in its core etymon-based. So while English, Dutch, German and Low Saxon have the related but different words "sun", "zon", "Sonne" and "Sünn" respectively, Chinese script has only one sign for it (let's represent it for our purposes with the Unicode sign for "sun": ☼). The different languages all write ☼, but pronounce it according to their own phonological rules (this however fails if different Chinese languges use unrelated words, similar to how English uses "horse" while the other languages use "paard", "Pferd", "Peerd". Chinese would have two different signs in a case like this). That's why the Chinese always considered themselves members of one culture despite all the differences. Among the Chinese languages Mandarin is the most widely spoken one. Therefore it dominates official contexts and the media and if the West speaks about "Chinese" it almost always means Mandarin.

If you ask me, it would be useful to rename cmn to "Mandarin" in the Tatoeba localization repositories. I however do find it funny, how Sacredceltic starts to grouse when somebody just innocently used the common and widespread term "Chinese" when referring to Mandarin. The reference to "senile communist party leaders" is especially funny because both "Chinese" and "Mandarin" are western terms without any direct correspondence in Chinese. Chinese has its own separate terms to differentiate between the varieties. "Senile communist party leaders" are well able to discern them and probably don't worry much about the English terms. It's even more funny that Sacredceltic suspects the worst political motives if you can find the exact same methods of "everybody who I reign over just speaks a dialect of what I speak, which is a language, not a dialect" all over Europe.

slomox slomox November 9, 2011 November 9, 2011 at 12:03:25 PM UTC link Permalink

Like Catalan or Occitan or Sicilian.

slomox slomox November 9, 2011 November 9, 2011 at 11:35:35 AM UTC link Permalink

+1

Just like "Spanish" or "French" or "Italian".

slomox slomox October 14, 2011 October 14, 2011 at 3:45:09 PM UTC link Permalink

>Of course it will not be a simple "copy paste" of tatoeba
>code with only a change in the content

It won't? The Tatoeba software is Open Source and available for download, isn't it? You just need to install an instance of it and put in words instead of whole sentences. Or am I missing something?

slomox slomox September 16, 2011 September 16, 2011 at 4:52:48 PM UTC link Permalink

Di smeckt Engelsch nich? Snack ik Platt. Schall ik di seggen, wo dat an liggt, dat di dat nich wunnert? Dat liggt dor an, dat du dat löven _wullt_. All dien Denken dreiht sik dor üm. Dor liggt dat an.

slomox slomox September 16, 2011 September 16, 2011 at 4:07:06 PM UTC link Permalink

Currently we have 12 "* killed *" sentences out of total 1,064,858. If we extrapolate that to the point where we have 100,000 "* killed *" sentences, we'll have 8.87 billion sentences total. The cluttering will probably always stay relative.

And if a name cannot be identified as being a person's name in context, then translating it as a name of a place or brand isn't a mistranslation. In "I love Paris." Paris can be a city, a first name, a last name, an asteroid, a movie, a ship, a plant etc. Translating it as either is neither a misunderstanding (our sentences stay without context) nor a mistranslation.

slomox slomox August 16, 2011 August 16, 2011 at 3:56:51 PM UTC link Permalink

I have no specific knowledge of French laws, but I'm pretty sure that Tatoeba cannot be considered a collective work. Each single edit can be pinpointed to a single person. Tatoeba is just a platform where the authors can publish and connect their content. At least that's the way Wikipedia and the Wikimedia Foundation handle it. The Wikimedia Foundation would never claim any authorship for the Wikipedia contents. Alone for the legal risks of claiming responsibility for other peoples' posts, Tatoeba shouldn't do so.

And it's not legally acceptable to reuse the whole Tatoeba database and omit the attribution of the single authors. Creative Commons is designed for the needs of individuals: content creators who are willing to grant rights to the public and for the re-users of this content. It seeks to balance the respective interests of these two groups and to make it easy for them to share and to use shared content. In the original concept of Creative Commons content accumulators like Tatoeba or Wikipedia do not appear as a separate group. From a CC-perspective they are just re-users of the CC-licensed content of their contributors.

slomox slomox August 16, 2011 August 16, 2011 at 3:05:35 PM UTC link Permalink

>"The data come from the tatoeba projects <link to the project>"

Actually that's not enough. To fulfil the license's requirements the licensee needs to name the authors of the single sentences. They are the licensors.

In practice most people will probably accept a "from Tatoeba" attribution, but if you want to be safe you should attribute the sentence's author(s).

slomox slomox August 16, 2011 August 16, 2011 at 1:39:25 PM UTC link Permalink

Wikipedia has CC-BY-SA and Tatoeba has CC-BY. So if you use text from Wikipedia the resulting new work has to be licensed under a free license too. Text from Tatoeba can be re-used in works under any license, even if it's proprietary.

Therefore it's okay to post Tatoeba content on Wikipedia, but not the other way round.

But depending on the extent of the quotes it should be okay to use simple sentences from Wikipedia, because simple sentences are not copyrightable or licensable at all.

slomox slomox August 13, 2011 August 13, 2011 at 11:51:15 AM UTC link Permalink

Mine got through anyways. (Finally! Somebody likes me!)

Perhaps there should be a message sending throttle.

slomox slomox August 7, 2011 August 7, 2011 at 2:25:08 PM UTC link Permalink

Me neither :( But you are Tux the penguin and I'm a silhouette, not the most attractive partners for a polyamorous person ;)

slomox slomox August 7, 2011 August 7, 2011 at 1:33:50 PM UTC link Permalink

This is discriminatory against polyamorous people...

slomox slomox August 6, 2011 August 6, 2011 at 2:55:13 PM UTC link Permalink

I see, thanks. I had hoped that I could somehow help to make Low Saxon automatically detectable, but Google usually doesn't give a fuck about languages that are not commercially exploitable due to lack of a sizable and demanding internet population.

So I'll have to wait until Tatoeba gets its own system. Is this something worked on or just an idea for the distant future?

slomox slomox August 6, 2011 August 6, 2011 at 12:50:12 PM UTC link Permalink

When I add sentences I can activate automatic detection. On what is this automatic detection based?

slomox slomox August 4, 2011 August 4, 2011 at 11:44:07 AM UTC link Permalink

What do you mean by naturalistic parts of text? If you want real naturalistic parts of text why do you want to take them from imageboards? Or do you mean naturalistic parts of text common for imageboards? Like "Cool story bro." or "Tits or GTFO!" or "Impossibru!"?

slomox slomox August 4, 2011 August 4, 2011 at 11:38:36 AM UTC link Permalink

Posting original content onto an imageboard is de facto equivalent to releasing a work into the public domain. But of course you never know whether the original poster is the original creator.

The point whether the content gets archived, is irrelevant. But it should be okay in most cases. Think about offline jokes. "A Rabbi, a Priest and a Minister walk into a bar ..." Have you ever considered copyright concerns before retelling a funny joke you heard at a party?

slomox slomox August 4, 2011 August 4, 2011 at 11:31:21 AM UTC link Permalink

Well, short memes like "I don't want to live on this planet anymore!" or "In need of a new meme? Why not Zoidberg?" or "I like turtles." or "Paint me like one of your French girls." are clearly okay. Nobody can seriously claim copyright on simple one-line jokes.

The same with short sentences from books etc. But you shouldn't split up longer sequences of text into single sentences and insert them all into Tatoeba.