menu
Tatoeba
language
Luo käyttäjätili Kirjaudu sisään
language Suomi
menu
Tatoeba

chevron_right Luo käyttäjätili

chevron_right Kirjaudu sisään

Selaa

chevron_right Näytä satunnainen lause

chevron_right Selaa kielen mukaan

chevron_right Selaa listan mukaan

chevron_right Selaa tunnisteen mukaan

chevron_right Selaa äänitteitä

Yhteisö

chevron_right Seinä

chevron_right Luettelo kaikista jäsenistä

chevron_right Jäsenten kielet

chevron_right Äidinkieliset puhujat

search
clear
swap_horiz
search
slomox {{ icon }} keyboard_arrow_right

Profiili

keyboard_arrow_right

Lauseet

keyboard_arrow_right

Sanasto

keyboard_arrow_right

Arvostelut

keyboard_arrow_right

Listat

keyboard_arrow_right

Suosikit

keyboard_arrow_right

Kommentit

keyboard_arrow_right

Käyttäjän slomox lauseiden kommentit

keyboard_arrow_right

Seinäviestit

keyboard_arrow_right

Lokit

keyboard_arrow_right

Äänitteet

keyboard_arrow_right

Transkriptiot

translate

Käännä käyttäjän slomox lauseita

Käyttäjän slomox viestit seinällä (yhteensä 81)

slomox slomox 20. maaliskuuta 2012 20. maaliskuuta 2012 klo 12.15.06 UTC link Ikilinkki

It would be nice, if there was some information about the technical steps that need to be taken to create a new language. That could help to raise some appreciation for the task, or somebody could write a script to automatise some of the steps. Or somebody from the community could adopt the new language creation procedure.

slomox slomox 14. marraskuuta 2011 14. marraskuuta 2011 klo 10.32.36 UTC link Ikilinkki

What is the correct procedure to link two sentences that are completely unconnected so far?

I could just add a duplicate of one of the sentences as a translation to the other and wait for the duplicate merge script to resolve the duplication. But there must be a better way. I just cannot find it at the moment...

slomox slomox 9. marraskuuta 2011 9. marraskuuta 2011 klo 15.04.38 UTC link Ikilinkki

>>Unlike the Romance languages however Chinese languages are connected by a common script

You cut the most important part from the quote. The script is etymon-based and can therefore overarch much more variety than a sound-based script.

>Yes they do: official Mandarin is Putonghua

And that's a different term.

>By the way, why did you forget German and English in your list?

Because if I had decided to make the list encompass every language that ever did that, it would probably be a list of all the languages ever adopted as official language of any territory.

>You're denying that the current PRC's regime is exterminating and levelling cultures and languages?

Where does this conclusion come from? I do not and I don't think that I said anything that would allow this conclusion.

I don't want to engage in an argument with you, Sacredceltic. That would be pointless, because our opinions are basically the same. We could only battle about the semantic subtleties or minor ambiguities of our sentences.

Politically I'm on your side, I just disagree about some methods. Like calling people senile because you disagree with their supposed opinions. Or denying the existence of an overarching superidentity to protect the overarched subidentities.

slomox slomox 9. marraskuuta 2011 9. marraskuuta 2011 klo 12.54.17 UTC link Ikilinkki

Just for the records: Pretty much everybody agrees that "Chinese" is not a single language, but a family of languages comparable to the Romance languages. Unlike the Romance languages however Chinese languages are connected by a common script that's not sound-based but in its core etymon-based. So while English, Dutch, German and Low Saxon have the related but different words "sun", "zon", "Sonne" and "Sünn" respectively, Chinese script has only one sign for it (let's represent it for our purposes with the Unicode sign for "sun": ☼). The different languages all write ☼, but pronounce it according to their own phonological rules (this however fails if different Chinese languges use unrelated words, similar to how English uses "horse" while the other languages use "paard", "Pferd", "Peerd". Chinese would have two different signs in a case like this). That's why the Chinese always considered themselves members of one culture despite all the differences. Among the Chinese languages Mandarin is the most widely spoken one. Therefore it dominates official contexts and the media and if the West speaks about "Chinese" it almost always means Mandarin.

If you ask me, it would be useful to rename cmn to "Mandarin" in the Tatoeba localization repositories. I however do find it funny, how Sacredceltic starts to grouse when somebody just innocently used the common and widespread term "Chinese" when referring to Mandarin. The reference to "senile communist party leaders" is especially funny because both "Chinese" and "Mandarin" are western terms without any direct correspondence in Chinese. Chinese has its own separate terms to differentiate between the varieties. "Senile communist party leaders" are well able to discern them and probably don't worry much about the English terms. It's even more funny that Sacredceltic suspects the worst political motives if you can find the exact same methods of "everybody who I reign over just speaks a dialect of what I speak, which is a language, not a dialect" all over Europe.

slomox slomox 9. marraskuuta 2011 9. marraskuuta 2011 klo 12.03.25 UTC link Ikilinkki

Like Catalan or Occitan or Sicilian.

slomox slomox 9. marraskuuta 2011 9. marraskuuta 2011 klo 11.35.35 UTC link Ikilinkki

+1

Just like "Spanish" or "French" or "Italian".

slomox slomox 14. lokakuuta 2011 14. lokakuuta 2011 klo 15.45.09 UTC link Ikilinkki

>Of course it will not be a simple "copy paste" of tatoeba
>code with only a change in the content

It won't? The Tatoeba software is Open Source and available for download, isn't it? You just need to install an instance of it and put in words instead of whole sentences. Or am I missing something?

slomox slomox 16. syyskuuta 2011 16. syyskuuta 2011 klo 16.52.48 UTC link Ikilinkki

Di smeckt Engelsch nich? Snack ik Platt. Schall ik di seggen, wo dat an liggt, dat di dat nich wunnert? Dat liggt dor an, dat du dat löven _wullt_. All dien Denken dreiht sik dor üm. Dor liggt dat an.

slomox slomox 16. syyskuuta 2011 16. syyskuuta 2011 klo 16.07.06 UTC link Ikilinkki

Currently we have 12 "* killed *" sentences out of total 1,064,858. If we extrapolate that to the point where we have 100,000 "* killed *" sentences, we'll have 8.87 billion sentences total. The cluttering will probably always stay relative.

And if a name cannot be identified as being a person's name in context, then translating it as a name of a place or brand isn't a mistranslation. In "I love Paris." Paris can be a city, a first name, a last name, an asteroid, a movie, a ship, a plant etc. Translating it as either is neither a misunderstanding (our sentences stay without context) nor a mistranslation.

slomox slomox 16. elokuuta 2011 16. elokuuta 2011 klo 15.56.51 UTC link Ikilinkki

I have no specific knowledge of French laws, but I'm pretty sure that Tatoeba cannot be considered a collective work. Each single edit can be pinpointed to a single person. Tatoeba is just a platform where the authors can publish and connect their content. At least that's the way Wikipedia and the Wikimedia Foundation handle it. The Wikimedia Foundation would never claim any authorship for the Wikipedia contents. Alone for the legal risks of claiming responsibility for other peoples' posts, Tatoeba shouldn't do so.

And it's not legally acceptable to reuse the whole Tatoeba database and omit the attribution of the single authors. Creative Commons is designed for the needs of individuals: content creators who are willing to grant rights to the public and for the re-users of this content. It seeks to balance the respective interests of these two groups and to make it easy for them to share and to use shared content. In the original concept of Creative Commons content accumulators like Tatoeba or Wikipedia do not appear as a separate group. From a CC-perspective they are just re-users of the CC-licensed content of their contributors.

slomox slomox 16. elokuuta 2011 16. elokuuta 2011 klo 15.05.35 UTC link Ikilinkki

>"The data come from the tatoeba projects <link to the project>"

Actually that's not enough. To fulfil the license's requirements the licensee needs to name the authors of the single sentences. They are the licensors.

In practice most people will probably accept a "from Tatoeba" attribution, but if you want to be safe you should attribute the sentence's author(s).

slomox slomox 16. elokuuta 2011 16. elokuuta 2011 klo 13.39.25 UTC link Ikilinkki

Wikipedia has CC-BY-SA and Tatoeba has CC-BY. So if you use text from Wikipedia the resulting new work has to be licensed under a free license too. Text from Tatoeba can be re-used in works under any license, even if it's proprietary.

Therefore it's okay to post Tatoeba content on Wikipedia, but not the other way round.

But depending on the extent of the quotes it should be okay to use simple sentences from Wikipedia, because simple sentences are not copyrightable or licensable at all.

slomox slomox 13. elokuuta 2011 13. elokuuta 2011 klo 11.51.15 UTC link Ikilinkki

Mine got through anyways. (Finally! Somebody likes me!)

Perhaps there should be a message sending throttle.

slomox slomox 7. elokuuta 2011 7. elokuuta 2011 klo 14.25.08 UTC link Ikilinkki

Me neither :( But you are Tux the penguin and I'm a silhouette, not the most attractive partners for a polyamorous person ;)

slomox slomox 7. elokuuta 2011 7. elokuuta 2011 klo 13.33.50 UTC link Ikilinkki

This is discriminatory against polyamorous people...

slomox slomox 6. elokuuta 2011 6. elokuuta 2011 klo 14.55.13 UTC link Ikilinkki

I see, thanks. I had hoped that I could somehow help to make Low Saxon automatically detectable, but Google usually doesn't give a fuck about languages that are not commercially exploitable due to lack of a sizable and demanding internet population.

So I'll have to wait until Tatoeba gets its own system. Is this something worked on or just an idea for the distant future?

slomox slomox 6. elokuuta 2011 6. elokuuta 2011 klo 12.50.12 UTC link Ikilinkki

When I add sentences I can activate automatic detection. On what is this automatic detection based?

slomox slomox 4. elokuuta 2011 4. elokuuta 2011 klo 11.44.07 UTC link Ikilinkki

What do you mean by naturalistic parts of text? If you want real naturalistic parts of text why do you want to take them from imageboards? Or do you mean naturalistic parts of text common for imageboards? Like "Cool story bro." or "Tits or GTFO!" or "Impossibru!"?

slomox slomox 4. elokuuta 2011 4. elokuuta 2011 klo 11.38.36 UTC link Ikilinkki

Posting original content onto an imageboard is de facto equivalent to releasing a work into the public domain. But of course you never know whether the original poster is the original creator.

The point whether the content gets archived, is irrelevant. But it should be okay in most cases. Think about offline jokes. "A Rabbi, a Priest and a Minister walk into a bar ..." Have you ever considered copyright concerns before retelling a funny joke you heard at a party?

slomox slomox 4. elokuuta 2011 4. elokuuta 2011 klo 11.31.21 UTC link Ikilinkki

Well, short memes like "I don't want to live on this planet anymore!" or "In need of a new meme? Why not Zoidberg?" or "I like turtles." or "Paint me like one of your French girls." are clearly okay. Nobody can seriously claim copyright on simple one-line jokes.

The same with short sentences from books etc. But you shouldn't split up longer sequences of text into single sentences and insert them all into Tatoeba.