Profile
Sentences
Vocabulary
Reviews
Lists
Favorites
Comments
Comments on slomox's sentences
Wall messages
Logs
Audio
Transcriptions
Translate slomox's sentences
It would be nice, if there was some information about the technical steps that need to be taken to create a new language. That could help to raise some appreciation for the task, or somebody could write a script to automatise some of the steps. Or somebody from the community could adopt the new language creation procedure.
What is the correct procedure to link two sentences that are completely unconnected so far?
I could just add a duplicate of one of the sentences as a translation to the other and wait for the duplicate merge script to resolve the duplication. But there must be a better way. I just cannot find it at the moment...
>>Unlike the Romance languages however Chinese languages are connected by a common script
You cut the most important part from the quote. The script is etymon-based and can therefore overarch much more variety than a sound-based script.
>Yes they do: official Mandarin is Putonghua
And that's a different term.
>By the way, why did you forget German and English in your list?
Because if I had decided to make the list encompass every language that ever did that, it would probably be a list of all the languages ever adopted as official language of any territory.
>You're denying that the current PRC's regime is exterminating and levelling cultures and languages?
Where does this conclusion come from? I do not and I don't think that I said anything that would allow this conclusion.
I don't want to engage in an argument with you, Sacredceltic. That would be pointless, because our opinions are basically the same. We could only battle about the semantic subtleties or minor ambiguities of our sentences.
Politically I'm on your side, I just disagree about some methods. Like calling people senile because you disagree with their supposed opinions. Or denying the existence of an overarching superidentity to protect the overarched subidentities.
Just for the records: Pretty much everybody agrees that "Chinese" is not a single language, but a family of languages comparable to the Romance languages. Unlike the Romance languages however Chinese languages are connected by a common script that's not sound-based but in its core etymon-based. So while English, Dutch, German and Low Saxon have the related but different words "sun", "zon", "Sonne" and "Sünn" respectively, Chinese script has only one sign for it (let's represent it for our purposes with the Unicode sign for "sun": ☼). The different languages all write ☼, but pronounce it according to their own phonological rules (this however fails if different Chinese languges use unrelated words, similar to how English uses "horse" while the other languages use "paard", "Pferd", "Peerd". Chinese would have two different signs in a case like this). That's why the Chinese always considered themselves members of one culture despite all the differences. Among the Chinese languages Mandarin is the most widely spoken one. Therefore it dominates official contexts and the media and if the West speaks about "Chinese" it almost always means Mandarin.
If you ask me, it would be useful to rename cmn to "Mandarin" in the Tatoeba localization repositories. I however do find it funny, how Sacredceltic starts to grouse when somebody just innocently used the common and widespread term "Chinese" when referring to Mandarin. The reference to "senile communist party leaders" is especially funny because both "Chinese" and "Mandarin" are western terms without any direct correspondence in Chinese. Chinese has its own separate terms to differentiate between the varieties. "Senile communist party leaders" are well able to discern them and probably don't worry much about the English terms. It's even more funny that Sacredceltic suspects the worst political motives if you can find the exact same methods of "everybody who I reign over just speaks a dialect of what I speak, which is a language, not a dialect" all over Europe.
Like Catalan or Occitan or Sicilian.
+1
Just like "Spanish" or "French" or "Italian".
>Of course it will not be a simple "copy paste" of tatoeba
>code with only a change in the content
It won't? The Tatoeba software is Open Source and available for download, isn't it? You just need to install an instance of it and put in words instead of whole sentences. Or am I missing something?
Di smeckt Engelsch nich? Snack ik Platt. Schall ik di seggen, wo dat an liggt, dat di dat nich wunnert? Dat liggt dor an, dat du dat löven _wullt_. All dien Denken dreiht sik dor üm. Dor liggt dat an.
Currently we have 12 "* killed *" sentences out of total 1,064,858. If we extrapolate that to the point where we have 100,000 "* killed *" sentences, we'll have 8.87 billion sentences total. The cluttering will probably always stay relative.
And if a name cannot be identified as being a person's name in context, then translating it as a name of a place or brand isn't a mistranslation. In "I love Paris." Paris can be a city, a first name, a last name, an asteroid, a movie, a ship, a plant etc. Translating it as either is neither a misunderstanding (our sentences stay without context) nor a mistranslation.
I have no specific knowledge of French laws, but I'm pretty sure that Tatoeba cannot be considered a collective work. Each single edit can be pinpointed to a single person. Tatoeba is just a platform where the authors can publish and connect their content. At least that's the way Wikipedia and the Wikimedia Foundation handle it. The Wikimedia Foundation would never claim any authorship for the Wikipedia contents. Alone for the legal risks of claiming responsibility for other peoples' posts, Tatoeba shouldn't do so.
And it's not legally acceptable to reuse the whole Tatoeba database and omit the attribution of the single authors. Creative Commons is designed for the needs of individuals: content creators who are willing to grant rights to the public and for the re-users of this content. It seeks to balance the respective interests of these two groups and to make it easy for them to share and to use shared content. In the original concept of Creative Commons content accumulators like Tatoeba or Wikipedia do not appear as a separate group. From a CC-perspective they are just re-users of the CC-licensed content of their contributors.
>"The data come from the tatoeba projects <link to the project>"
Actually that's not enough. To fulfil the license's requirements the licensee needs to name the authors of the single sentences. They are the licensors.
In practice most people will probably accept a "from Tatoeba" attribution, but if you want to be safe you should attribute the sentence's author(s).
Wikipedia has CC-BY-SA and Tatoeba has CC-BY. So if you use text from Wikipedia the resulting new work has to be licensed under a free license too. Text from Tatoeba can be re-used in works under any license, even if it's proprietary.
Therefore it's okay to post Tatoeba content on Wikipedia, but not the other way round.
But depending on the extent of the quotes it should be okay to use simple sentences from Wikipedia, because simple sentences are not copyrightable or licensable at all.
Mine got through anyways. (Finally! Somebody likes me!)
Perhaps there should be a message sending throttle.
Me neither :( But you are Tux the penguin and I'm a silhouette, not the most attractive partners for a polyamorous person ;)
This is discriminatory against polyamorous people...
I see, thanks. I had hoped that I could somehow help to make Low Saxon automatically detectable, but Google usually doesn't give a fuck about languages that are not commercially exploitable due to lack of a sizable and demanding internet population.
So I'll have to wait until Tatoeba gets its own system. Is this something worked on or just an idea for the distant future?
When I add sentences I can activate automatic detection. On what is this automatic detection based?
What do you mean by naturalistic parts of text? If you want real naturalistic parts of text why do you want to take them from imageboards? Or do you mean naturalistic parts of text common for imageboards? Like "Cool story bro." or "Tits or GTFO!" or "Impossibru!"?
Posting original content onto an imageboard is de facto equivalent to releasing a work into the public domain. But of course you never know whether the original poster is the original creator.
The point whether the content gets archived, is irrelevant. But it should be okay in most cases. Think about offline jokes. "A Rabbi, a Priest and a Minister walk into a bar ..." Have you ever considered copyright concerns before retelling a funny joke you heard at a party?
Well, short memes like "I don't want to live on this planet anymore!" or "In need of a new meme? Why not Zoidberg?" or "I like turtles." or "Paint me like one of your French girls." are clearly okay. Nobody can seriously claim copyright on simple one-line jokes.
The same with short sentences from books etc. But you shouldn't split up longer sequences of text into single sentences and insert them all into Tatoeba.