clear
{{language.name}} No language found.
swap_horiz
{{language.name}} No language found.
search

Wall (5,552 threads)

sharptoothed
23 hours ago
** Stats & Graphs **

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/
hide replies
Guybrush88
16 hours ago
thanks :)
hide replies
sharptoothed
2 hours ago
you're welcome :-)
BakirHamou
4 hours ago
L'intercompréhension dialectale entre les variantes de la langue amazighe est assez importante, surtout entres les dialectes qui ont une proximité géographique tels que le kabyle et le chaoui, ou bien le kabyle et le chenoui (nord-ouest de l'Algérie).

L'intercompréhension est également importante entre les dialectes dits zénètes, même s'ils sont géographiquement éloignés. Ainsi, les Chaouis et les Chenouis se
comprennent parfaitement bien comme ils comprennent le dialect rifain parlé au Maroc ainsi que tous les dialectes parlés dans l'énorme chapelet d'oasis éparpillés au sud-ouest algérien (Béchar, Naama, El-Bayadh, Adrar, Timimoun, etc.). Ces derniers dialectes, à leur tour, et dû leur proximité géographiques aux dialectes marocains parlés plus à l'ouest de l'Afrique du Nord, sont également compris pas les populations berbérophones du Maroc central et méridional (populations qui parlent le chleuh).
CK
CK
5 days ago
** Is it legal to use CC-BY sentences on Tatoeba.org? **

There seems to be a discrepancy on this page.

https://blog.tatoeba.org/search?q=cc-by

Trang says, "Anything that basically doesn't say "You can do absolutely whatever you want with this" is NOT compatible with CC-BY."

However, later she says, "Anything that is under CC-BY is compatible with CC-BY. "

My interpretation is that since people use our data under a CC-BY license, that we can't use other people's CC-BY material since those who use our data can't also do a CC-BY for the other source. Trang's first statement seems to indicate this, since people who release material under CC-BY require that they be given credit for the material and do not grant the right to "do absolutely whatever you want with this."
hide replies
TRANG
5 days ago
I've updated the article.
https://blog.tatoeba.org/2011/0...d-content.html

The line you quoted was more of a simplified guideline for people who are not too familiar with licenses. It was also written back in 2011 when we overall had much less experience with licenses. It is obviously not a precise legal statement.

In general, you should avoid making interpretations out of blog posts when it comes to licenses. You should instead read the license text and make your own interpretation based on that text, as it is the original source.
hide replies
CK
CK
4 days ago
I think you're mistaken and that you should ask a lawyer first if you are going to encourage members to take someone else's CC-BY material and put it into the Tatoeba Corpus which is distributed under it's own CC-BY license, which only requires attribution to tatoeba.org.

FROM:
https://creativecommons.org/lic...by/4.0/deed.en

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
hide replies
shekitten
4 days ago - 4 days ago
CC-BY is literally designed specifically to work like this. Reusing CC-BY content and making it also CC-BY is ideal. There are more restrictive licenses (not permitted on Tatoeba) that REQUIRE this.
hide replies
CK
CK
4 days ago
Some licensors choose the BY license, which requires attribution to the creator as the only condition to reuse of the material.
FROM: https://creativecommons.org/faq/

How can someone using the Tatoeba Corpus and properly crediting the Tatoeba Project, know that they also need to give credit to a third party as well? If you don't specifically give credit to the third party, then you would be violating the third party's CC-BY license, I think.



hide replies
Thanuir
3 days ago
Yes, someone publishing Tatoeba data would have to figure out how to credit the third parties (if they do not want to breach the terms of CC licenses), too, which would be quite challenging.
TRANG
4 days ago
As shekitten said, CC BY is designed for reuse. No one in their right mind would choose a CC BY license if they didn't want their content to be reused somewhere else. So it is nonsense to say that you cannot reuse CC BY content into another CC BY content.

What you are pointing out is that we may not be doing the attribution properly. So let's break it down:

1) You must give appropriate credit

That is done by adding a comment on the sentence with a link to the original source. I believe that is appropriate enough, but I suppose the best way is to be sure is to contact the author to confirm. And indeed, we may not have been very diligent on that, so we could add it as a guideline that when copying from another CC BY content, one should always try to contact the author to ask about attribution. Now if the author doesn't reply, I think we're still very, very safe sticking to adding a comment with a link to the original source.

2) provide a link to the license

This is done indirectly: since we provide a link to the source, the source will have the link to the license. In the comment on Tatoeba, I think the mention to the license name and version is fine and there is no need to additionally put a link to the license itself.

3) and indicate if changes were made

This is done with the logs: whenever someone edits the sentence, the logs indicate when and how the sentence has been modified.


With all of that, I think we're okay.


Now I understand very well that one flaw of CC BY is that if you want to be 100% sure that you're doing attribution properly, it can be very tedious work, because indeed, you would have drag along all the attributions from previous reuses. And that is actually why we agreed to introduce CC0 when Common Voice approached us. We know how painful that is with CC BY, and we know that CC0 would alleviate this pain. With CC0, there is no need to worry about this whole trail of attribution when content is reused in content that is then again reused, and again reused.

In our case with CC BY though, we are following some common sense and we assume that someone who shares their work under CC BY is okay with indirect attribution. Meaning that if I create CC BY content and you reuse my CC BY content into your own CC BY content and shekitten reuses your CC BY content, I'm okay if shekitten only gives attribution to you and not to me. Because indirectly, I'm still being given attribution (through you).

I think it is a fairly reasonable assumption. But if for some reason, we are copying from someone who is not okay with this concept of indirect attribution, then we can figure out something. We can readapt the way we give credit by adding some warning on the Downloads page about the people who are not okay with indirect attribution so that projects that reuse our content will know that they need to mention these people. But again, we don't have to reject copied sentences from external CC BY sources right off the bat because it's really borderline paranoia to do so.
hide replies
Thanuir
4 days ago
The link to the license should be direct. (The target website might vanish.) But luckily there already is a direct link to CC-BY license on the sentence page. I have no idea what happens if the original was licensed under a non-French version of the license; is the French license sufficient or should one also link to the original license?

...

Here is more information on what is proper attribution: https://wiki.creativecommons.or...mparison_chart

In particular, the legality is not a matter of what the author intended or wanted to accomplish, but rather of the license.
In any case, only the original source needs to be attributed, not any intermediate sources.
hide replies
TRANG
4 days ago
> is the French license sufficient or should one also link to the original license?

The French license would not be enough. Each version of CC BY should be considered as a different license, even if they are very similar.

For the context, this topic was brought up because of this sentence:
https://tatoeba.org/eng/sentences/show/8242255

So going from this example, if we want to be absolutely strict about attribution, then we would have to ask shekitten to also post a link to the CC BY 4.0 license (not just mention the license name). And we may have to do other things in order to be 99.999999% safe legally speaking.

> In any case, only the original source needs to be attributed

It's very clear that when possible the original source needs to be attributed. But when content gets mixed and remixed, it can be difficult and confusing to find out who is the very first author. And in such cases, I think we are still safe if we are only attributing to the intermediate source. No one is going to sue Tatoeba because we didn't give them attribution directly, but instead gave attribution to someone who reused their content. They will most likely just let us know and we can update the information when we find out that we were referencing an intermediate source.

My whole paragraph about "indirect attribution" was mostly to argue on the fact that there is no imminent danger by referencing an intermediate source unknowingly and therefore we do not need to reject every sentence copied from other CC BY sources (concretely, it doesn't make sense to mark shekitten's Láaden sentences in red).

CK still thinks that no matter what, it is wrong to let contributors copy into Tatoeba sentences from other CC BY sources and that I risk being sued for it. In other word, that we should completely forbid people from copying sentences from other CC BY content.

I think that enforcing such a rule would be unreasonable. I know there is a risk and I know we are not handling the whole legal aspect perfectly, but that's normal considering that we have grown on a very scarce (nearly non-existent) budget and considering that the topic of intellectual property in the internet era is still a fairly new territory.

If anyone would like to help out and investigate on the safety of allowing Tatoeba members to copy CC BY sentences and on what else we can do at this stage to avoid any risk of lawsuit, I would be infinitely grateful. On my side, I cannot put any more effort into this topic.
hide replies
shekitten
4 days ago - 4 days ago
I think the issue of how to attribute the sentences - whether to just attribute Tatoeba or to directly attribute the original source - is ultimately up to whoever is making use of Tatoeba data to resolve.

It's indisputable that it is possible to release a CC-BY work (such as Tatoeba) that makes use of other CC-BY works (such as individual sentences), and I'm not the first person who has ever done this. From our end, all we have to do is attribute the original source.

From the POV of the downstream service, that's for them to resolve. I would say the most legal and ethical practice is to attribute both, which is entirely within the realm of possibility on their end and is something they should already be doing.
Thanuir
3 days ago
Personally, I think that there is nothing morally wrong with breaking copyright laws, as they hurt humanity in general and the mission of Tatoeba in particular.

I guess the risk to be sued due to the content of Tatoeba is tiny. I think the risk to be sued on a legally sound basis is even tinier, since even copying entire single sentences from a book while breaking the order of sentences is not obviously wrong, AFAIK. (This is not legal advice.)

I read that some computational linguistics researchers, when they want to share a corpus, put the sentences in a random order so that the original work can not be recovered from there, but their approach is ad hoc and has not been tested in court. I do not remember the source anymore.

I would suggest linking to the source and the license when using CC-BY-licensed content. The effort is not big when one should link to the source anyway and the source should link to the license, so one can simply copy the link from there.
hide replies
shekitten
3 days ago
I agree that there's nothing morally wrong with breaking copyright laws, but I still think there is something morally questionable about not attributing a source - particularly in cases where you are making money off of that source.
hide replies
CK
CK
3 days ago - 3 days ago
* Here are 2 statements by shekitten and my comments.


> From the POV of the downstream service, that's for them to resolve ...

I think that when the Tatoeba Project distributes their corpus with the understanding that it's OK to use it if attribution is given to the Tatoeba Project, the implication is that it is free to use with no other restrictions.


> ... but I still think there is something morally questionable about not attributing a source ...

I think when a person releases their material under a CC-BY license that they do it with the expectation that they will receive attribution when their material is reused. I don't think that they expect it to be reused without attribution as suggested in another comment above.

So, regardless of the legal aspect, I would say it's morally wrong to reuse CC-BY material that is going to be redistributed without the required attribution that the person who chose the CC-BY license wants and expects.




* Additional comments.

If it is indeed possible to add CC-BY material to the Tatoeba Corpus and redistribute it under Tatoeba's own CC-BY license, then perhaps TRANG should find all the parallel corpora with CC-BY licenses and import them all into the Tatoeba Corpus, like she did with the public domain Tanaka Corpus.

Also, if this is true, would it mean that anyone who reuses bilingual pairs from http://www.manythings.org/anki in another project not need to to give attribution to the Tatoeba Project anymore?

I think that it would be a lot better and safer for us to not include CC-BY licensed material by others in the Tatoeba Corpus. We have a number of native speakers who can easily add their own material without needing to reuse (steal?) CC-BY material.
hide replies
Thanuir
3 days ago
There is no "stealing" with copyright breaches, and there is no copyright breach on Tatoeba users' side, here, so please do not use needlessly inflammatory language.

I agree that the current situation is inconvenient for people who would like to republish Tatoeba content. If someone wanted to do that, they would have to figure out a way of identifying which sentences need further attribution, and then either provide that or exclude those sentences.

It would be helpful to those people if the sentences requiring further attribution would be marked somehow, perhaps with a different "license" option: CC-BY license with additional attribution required or something like that as a licensing option, maybe.
TRANG
2 days ago
I want to stress that no one said that CC BY material should not be given attribution. It is very, very clear that we should give attribution when reusing CC BY material.

But you are apparently advocating for "viral" attribution and you are also advocating to forbid people from mixing CC BY content just because those who reuse the remix might not give proper attribution. I don't know if you realize that this point of view is also morally questionable and is a creativity killer...

We are going to do our best to be as fair as possible to everyone who is a content creator, but we cannot take measures that are disconnected from reality.

> perhaps TRANG should find all the parallel corpora with CC-BY licenses
> and import them all into the Tatoeba Corpus

I would but I'm not interested in quantity. Tatoeba still has too many flaws and there's really, really a lot of challenges to solve on a software engineering level, on a UI/UX level, on an organizational level... Having more sentences is at the very bottom of my priorities. We are not scalable enough for the corpus to grow much faster than ~2000 sentences per day.

> Also, if this is true, would it mean that anyone who reuses bilingual pairs
> from http://www.manythings.org/anki in another project not need to to
> give attribution to the Tatoeba Project anymore?

Yes, those who reuse the Anki bilingual pairs do not need to give attribution to Tatoeba. These subsets have been processed and reorganized in a different manner than what Tatoeba originally provides. There has been actual work put into reorganizing the data and it's enough work that Tatoeba does not need to be attributed anymore. Giving attribution to manythings.org alone would be completely fine and I would personally find it outrageous if people were forced to also give attribution to Tatoeba.
hide replies
CK
CK
2 days ago - 2 days ago
I still think you're wrong about not needing to credit the person who has released something as CC-BY, if it is then distributed as CC-BY by someone else. I still think that you should not be distributing someone else's CC-BY material to others under your own license, and assuming that it is then OK for others to use that material if they give your website credit, but not give credit to the original person who released material under a CC-BY license. It think this is morally wrong, not following the spirit of CC-BY and a copyright infringement. A copyright owner has the right to control distribution of his/her material. If they choose to distribute their material for just the cost of attribution, their rights are being violated if that is not done.
hide replies
TRANG
2 days ago - 2 days ago
If that is really what you believe, why do you only give attribution to Tatoeba when you reuse the Tatoeba corpus in your projects instead of giving attribution to every contributor individually?

Or why haven't you protested against the release of the Tatoeba corpus since the beginning?

Each contributor has provided their sentences to Tatoeba under CC BY (or CC0 since early 2019), and Tatoeba is only packaging them into one big corpus.
hide replies
CK
CK
6 hours ago - 6 hours ago
> If that is really what you believe, why do you only give attribution to Tatoeba when you reuse the Tatoeba corpus in your projects instead of giving attribution to every contributor individually?

I thought that the message on the downloads page meant that developers could use the Tatoeba Corpus if they credited tatoeba.org. Even now, the message on the downloads page implies that.

Actually, most of my projects have a direct link to each sentence's page on tatoeba.org and the username of the owner of sentence. I think perhaps a couple of projects only include a link to the page on tatoeba.org.

The only project that didn't have that was http://www.manythings.org/anki. I have corrected that today, by inserting one extra field on each line to include attribution. This does add a bit to the file sizes, but shouldn't really bother people too much.

You can see a quick screenshot, so you don't need to download a file.

https://imgur.com/a/08iX5Gh



Ricardo14
yesterday
CK,

I have agreed that all the sentences I post on Tatoeba now "belong" to Tatoeba. That said, whatever Tatoeba wants, needs to do with them, I'll have no objection. I truly believe that it's every single user's feeling. :)
hide replies
CK
CK
yesterday - yesterday
I thought that we all agreed to let the Tatoeba Project release our sentences under their CC-BY license. However, at this time, the "Terms of Use" are only in French, so I'm not sure what people are agreeing to now.
hide replies
shekitten
yesterday
Tatoeba is attributing our CC-BY content by our username, and then doing the standard thing that people do with CC-BY work: reusing it with attribution and releasing the whole thing as CC-BY. Using a CC-BY license is a way of giving forwards permission to anyone who wants to reuse your work.

If you really want to be within the letter and arguably the spirit of CC-BY, you should be attributing individual contributors. This is what people are generally supposed to be doing when they reuse content from Wikipedia as well.
TRANG
11 hours ago
You can find the old Terms of Use here:
https://en.wiki.tatoeba.org/art...erms-of-use-v1

It says: "for any text to which you hold the copyright, by submitting it, you agree to license it under the Creative Commons Attribution License 2.0 (fr)."

The new Terms of Use were written with the goal that nothing should change for people who still contribute under CC BY, but by taking into account that Tatoeba will expand to allow more than just CC BY.

The whole section about intellectual property describes that.
https://tatoeba.org/eng/terms_of_use#section-6

But the two relevant paragraphs are:

"L’infrastructure technique de Tatoeba utilise par défaut, pour la contribution de phrases textuelles, la licence Creative Commons Attribution 2.0 France (CC-BY 2.0 FR)."
= This is saying that we use CC BY 2.0 FR as the default license.

"Lors de la contribution, sur notre Site Internet, d’une phrase dont vous êtes propriétaire, en votre qualité d’auteur·e, vous attribuez une licence à cette phrase."
= This is saying that when you contribute a sentence that is your own sentence, you are applying a license to this sentence.

If you combine these two paragraphs, the idea is that when someone submits a sentence to Tatoeba, by default, they license it under CC BY 2.0 FR.
Thanuir
3 days ago
Yes, providing attribution is the polite thing to do.
Thanuir
2 days ago - 2 days ago
From the English CC-BY 2.0 legal code, https://creativecommons.org/lic...2.0/legalcode. This is just an example; one needs to read the relevant license of what is to be added to Tatoeba if one wants to be sure.

I am quoting or referencing the parts that might be problematic for Tatoeba, or that are otherwise good to know. Not a lawyer, not legal advice, and so on. I am not suggesting any particular way of going forward, here, just trying to figure out what the license exactly says. A native speaker or someone with background in law should go through the text, too.

...

From part 1, definitions:
"
"Collective Work" means a work, such as a periodical issue, anthology or encyclopedia, in which the Work in its entirety in unmodified form, along with a number of other contributions, constituting separate and independent works in themselves, are assembled into a collective whole. A work that constitutes a Collective Work will not be considered a Derivative Work (as defined below) for the purposes of this License.

"Derivative Work" means a work based upon the Work or upon the Work and other pre-existing works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which the Work may be recast, transformed, or adapted, except that a work that constitutes a Collective Work will not be considered a Derivative Work for the purpose of this License. For the avoidance of doubt, where the Work is a musical composition or sound recording, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered a Derivative Work for the purpose of this License.
"

Basically, this means that the sentence itself could be a part of collection, but any translations are derivative works, and any edits also create a derivative work. Any larger collection that includes both original CC-BY sentences and their modifications is a derivative work (I think).

...

This part is 3.b, i.e. rights given to for example Tatoeba project:
"to create and reproduce Derivative Works"

This is 3.d.
"to distribute copies or phonorecords of, display publicly, perform publicly, and perform publicly by means of a digital audio transmission Derivative Works. "

Recall that translations are derivative works. I think this implies that every translation of an outside sentences with the CC-BY license should attribute the original source, provide copyright notice (if any), and link to the relevant license. This is currently unfeasible on Tatoeba, since many user interfaces for translating do not suggest the original sentence is under specific attribution requirements.

...

This part is from under restrictions, 4.a.:
"...You may not offer or impose any terms on the Work that alter or restrict the terms of this License or the recipients' exercise of the rights granted hereunder. You may not sublicense the Work. You must keep intact all notices that refer to this License and to the disclaimer of warranties...."

I do not know if Tatoeba is sublicensing the sentences or translations thereof.

I do not know whether all the different CC-BY licenses are compatible enough with CC-BY 2.0 French that we do not "alter or restrict the terms of this License or the recipients' exercise of the rights granted hereunder". I suspect it would be better to use the same license as the original.

We also need to add any potential disclaimers of warranties from the source, if any are written there.

...

Restrictions, 4.b. This is important.
"
If you distribute, publicly display, [...] the Work or any Derivative Works or Collective Works, You must keep intact all copyright notices for the Work and give the Original Author credit reasonable to the medium or means You are utilizing by conveying the name (or pseudonym if applicable) of the Original Author if supplied; the title of the Work if supplied; to the extent reasonably practicable, the Uniform Resource Identifier, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and in the case of a Derivative Work, a credit identifying the use of the Work in the Derivative Work (e.g., "French translation of the Work by Original Author," or "Screenplay based on original Work by Original Author"). Such credit may be implemented in any reasonable manner; provided, however, that in the case of a Derivative Work or Collective Work, at a minimum such credit will appear where any other comparable authorship credit appears and in a manner at least as prominent as such other comparable authorship credit.
"

This outlines exactly what information should be included when giving credit to the creator.

First, how to give credit due. It should be "reasonable to the medium or means You are utilizing";
and later
"Such credit may be implemented in any reasonable manner; provided, however, that in the case of a Derivative Work or Collective Work, at a minimum such credit will appear where any other comparable authorship credit appears and in a manner at least as prominent as such other comparable authorship credit.".

Second, the contents of the credit notice. It should include the name of the Original Author if supplied; the title of the Work if supplied; to the extent reasonably practicable, the Uniform Resource Identifier, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and in the case of a Derivative Work, a credit identifying the use of the Work in the Derivative Work (e.g., "French translation of the Work by Original Author.")."

...

Section 7, termination:
"This License and the rights granted hereunder will terminate automatically upon any breach by You of the terms of this License."
CK
CK
yesterday
For Stylish users, I've updated this.

Tatoeba.org - Hide Top 5 Language Stats on Home
https://userstyles.org/styles/1...-stats-on-home

Description:

This hides the top 5 language stats on the main page and brings the previews of the latest Wall postings up closer to the top of the page.

To see all the language stats, you will need to go to this URL: https://tatoeba.org/eng/stats/s...es_by_language

This is what it looks like on my computer.
https://userstyles.org/style_sc...g?r=1571027016

There are other userstyles for tatoeba.org, written by me and others.
Here's a search.

https://userstyles.org/styles/b...eba&type=false
hide replies
Pfirsichbaeumchen
yesterday - yesterday
Replying to this thread rather than starting a new one because the topic is related.

The "top five language stats" seem to occupy too much space. The fields for every language seem to be too high, resulting in too much line spacing. The font seems to be too large. This is what it looks like for me: https://imgur.com/a/mIuNdpq.

I wonder if it wouldn't be better to either remove it completely or make it (much) smaller and display more languages instead. The numbers of the top five languages aren't really so interesting and don't show a lot of diversity. Making the languages displayed random would be another option.
hide replies
CK
CK
yesterday - yesterday
Personally, I'd vote for something more like what's on the main non-logged in page instead of these Top 5 languages.

https://imgur.com/a/Brg7euP

I think regular members would find this more interesting.

Also perhaps adding a note about with the "day" began, or making these numbers reflect the last 24 hours would be nice. Maybe the line about "supported languages" isn't needed, since the link directs to a page that shows all the supported languages.

Another idea would be to display the last 2 or 3 full days of these stats instead, not including today's stats.

https://tatoeba.org/eng/contrib...meline/2019/10


That said, I've grown accustomed to seeing Wall message previews at the top, which I like, so I'm not sure we really need anything above those.
hide replies
Pfirsichbaeumchen
yesterday
Good suggestions worth considering. That would indeed be more interesting. 🙂
sabretou
yesterday
I would like it if the Top 5 languages listed the top 5 contributed-to languages of the day, or maybe the week, as opposed an unchanging all-time list.
TRANG
18 hours ago
> The "top five language stats" seem to occupy too much space.

This is part of the transition to the responsive UI. You will see that in general, anything with a list of clickable items will take more space. This was already done for:
- the list of tags
- the links on the sidebar on the profile page

My guess is that the space it takes is not really the problem itself, but rather that the information displayed is too irrelevant for you.

> I wonder if it wouldn't be better to either remove it completely

There was an issue about it that I already wanted to solve several weeks ago:
https://github.com/Tatoeba/tatoeba2/issues/842

I aborted my plans because the whole Kabyle discussion was a bit too overwhelming and I had no time to really think carefully about what this block can be replaced with.

I had two ideas in mind:
- Displaying the same stats that are displayed on the homepage for non-authenticated users.
- Displaying the top 5 languages, but limited to the languages that the user has added in their profile.

Removing the whole block was also something I considered, but doing that would remove the possibility to access stats of all languages. The link to https://tatoeba.org/eng/stats/s...es_by_language is only available from the stats block and I wasn't sure where is the best place to put it if this block was to be removed.
hide replies
Pfirsichbaeumchen
15 hours ago - 15 hours ago
I immediately noticed it for the links in the sidebar on my profile page. I was hoping it was only temporary because I use that sidebar a lot and it now requires me to scroll up and down a lot because of the immense spacing. I didn't think that could be intentional. I would probably have kept silent if I hadn't noticed that design choice spreading, in this case to the top five languages.

I think all suggestions made for the latter so far would be improvements to what there is now.

On the other hand, the field where you write comments seems to have shrunk. There I would find something bigger more comfortable. It felt good the way it was before. 🙂
hide replies
TRANG
13 hours ago
The increase in spacing is intentional for the reason that we will be reusing as much as possible design patterns from Material Design (https://material.io/). In Material Design, there is more emphasis on space because it takes into account mobile experience. If you are browsing from a mobile phone, you need more space between items in order to be able to tap on the desired item.

Having a "compact mode" would definitely be a possibility in the future, but as of now, we are still designing with the default paddings and margins that come with the AngularJS Material framework (https://material.angularjs.org/).

I take note for the comment form. It has indeed shrunk but that one was not intentional.
CK
CK
14 hours ago
> The link to https://tatoeba.org/eng/stats/s...es_by_language is only available from the stats block and I wasn't sure where is the best place to put it if this block was to be removed.

A logical place for the link would be in the drop-down menu, just below "Show activity line."
Perhaps both could go under "Community" rather than "Contribute."
maaster
4 days ago
I think English sentences should also be controlled on Tatoeba by native English speaking corpus maintainers.
Now this task is done about 90% by non native speakers.
hide replies
Pfirsichbaeumchen
4 days ago
I've been seeing Alan doing a lot of the work. More helpers would be very welcome.

• Everyone can use the comment section to suggest corrections.
• Everyone can use the rating system.
• Advanced contributors can put @change tags.
• Corpus maintainers can apply necessary changes for inactive members.
hide replies
CK
CK
4 days ago - 4 days ago
Also, I proofread most incoming English sentences and add good ones that I want to use in my own projects to List 907.

@maaster
You can easily see your own English sentences that are on this list with this search.

https://tatoeba.org/eng/sentenc...sort=relevance


https://tatoeba.org/eng/sentences/show/3946394
We recommend adding sentences and translations in your strongest language. If you are interested primarily in having your sentences corrected, you should try a site like Lang-8.com, where that's the focus.
hide replies
Thanuir
4 days ago - 4 days ago
Do you comment, rate or tag the ones that you do not think are okay?
hide replies
maaster
2 days ago
If the wrong ones or the ones supposed to be wrong are tagged, it's not really enough, I think.
I can't still be sure that the rest is fine since not all sentences are checked and fixed.

There are some languages in which each sentence written these days are checked.
But it doesn't work in every language.
(E.g. all sentences in Danish written by non native speakers are tagged. Well, those sentences are much rarely than those in English.)
maaster
2 days ago - 2 days ago
Of course I saw them.
Perhaps, this tag can be for others disturbing, ones can think those sentences are correct and the other ones aren't.

(I know what that tag is, I've asked you about two, three years ago.)
AlanF_US
4 days ago
Thank you, Pfirsichbaeumchen. I just want to add the following:

(1) If you think there's an error in a sentence, but you're not a native speaker of the language it's written in, you can always add the "@check" or "@needs native check" tag as an alternative to "@change".
(2) Whether or not you leave a tag, you should leave a comment. These are more likely to be seen than tags.
maaster
3 days ago
Yes, he does, I can see it as well (and sometimes Objectivesee does it too).
And also Patgfisher and cueayotl did it.
(And what I can see yet, even the tagged sentences of mine wouldn't be checked.)

I think as in some other languages, it would be nice if the English sentences written by non native speakers were tagged with OK (or with change) in order that other users also know whether the sentence is O.K.
In many cases, English sentences of non natives are ignored, even if they are wrong.
Thanuir
4 days ago
For statistics, there currently are (rounded to one significant digit):

* 50 000 orphan English sentences: https://tatoeba.org/deu/Activit..._sentences/eng
* 80 English sentences with the @change tag: https://tatoeba.org/deu/Tags/sh...th_tag/561/eng
* 6 English sentences with the @check tag: https://tatoeba.org/deu/tags/sh...th_tag/841/eng
* 50 English sentences with the @needs native check tag: https://tatoeba.org/deu/Tags/sh...h_tag/1207/eng

The amount of orphan sentences is huge, but the other task queues are in very good shape, I would say.
hide replies
AlanF_US
3 days ago
Thanks for the analysis, Thanuir. I believe the number of orphan sentences in English will always be high because:
- the backlog is so high (50,000 would require a year's worth of effort at the rate of 136 per day, which would require hours' worth of work)
- the task of dealing with them is laborious and involves so many judgment calls
- the benefits are relatively small
- there's no good way of sorting out the ones that have been looked at from the ones that haven't

There are many orphan sentences that are grammatically correct but that I would not want to adopt, for one reason or another:
- they're a little unnatural or just old-fashioned
- they reflect a sentiment that I don't agree with

Having said that, I do adopt orphan sentences from time to time. I just don't consider adoption of those sentences as a high priority compared to other tasks that I could be doing, such as marking or fixing incorrect sentences.
Guybrush88
3 days ago
my two cents to add some more info to this analysis: there are 19 orphan sentences contaning audio, which might be useful to English learners because they contain audio (there are users who use this as thrr main element for translating native sentences, afaik): https://tatoeba.org/ita/sentenc...sort=relevance
mramosch
7 days ago - 7 days ago
Hello folks,

is anybody who contributes audio recordings using a mobile device for recording (tablet or phone)?

If yes, could you please give me some links to your sentences and if possible add a little note whether the internal microphone of the device was used or an external accessory, like ANALOG for the headphone jack or DIGITAL with lightning or even USB...

And a little note about the location of recording would be helpful, too. Like indoors, outdoors, living room, toilet etc. - just to get an idea how much of the room acoustics got captured by the device.

Thanks in advance!


P.S.: Of course all other contributions regarding desktop/laptop setups, recording equipment and recording/editing software are welcome, too.
hide replies
Rockaround
7 days ago
I would also be interested to know if someone is recording from a mobile phone. Last I checked, the software suggested in the wiki worked only on computers. If you are using a phone, I would appreciate a quick guide on how to do it, in addition to what mramosch already asked. Thanks!
hide replies
TRANG
3 days ago
I would also be interested to know :)

We could then update our wiki page to include instructions on how to record from phone or tablet.
gillux
2 days ago
Sadly we do not provide support for recording from mobile phones. But I agree that it would be a great addition.

I think it should be possible to re-implement the functionality of the Shtooka recorder in Javascript using the Web Audio API. That would allow recording from the web browser of a mobile phone (or a computer). I tried to hack a proof of concept today: https://dev.tatoeba.org/tatorec/ This should allow you to record one sentence and download it as a WAV file. Please let me know if it works on your phone.
maaster
3 days ago - 3 days ago
I don't think that anyone should add thousand sentences one after another with the same word(s), in the same theme. That makes Tatoeba fucking boring.

Now, I can hardly find a sentence that would be worth translating.

A bit more creativity please or else I'll fall asleep reading the tenth sentence.
hide replies
Pandaa
3 days ago
Szerintem az a járható út, amibe már belekezdtél, hogy időnként a magyar mondatokat fordítod, ezzel egy picit diverzebb is lesz a korpusz. Kicsit talán több időt igényel, de csak megéri.
Sokszor látom, hogy ritkább szavakkal nem is az anyanyelviek írtak mondatokat.
hide replies
maaster
2 days ago
Igen, csak ez l'art pour l'art.
Ergulis
3 days ago
I believe that by chosing the translate mode you get mostly filtered sentences from English native speakers or you can translate directly from contributors who you already know and whose sentences you like.

I just try to ignore certain kind of sentences, especially about religion.
hide replies
maaster
2 days ago
I usually translate the last 2-3-4 pages of a language and I find about two sentences of them interesting to translate. I translate from everybody, I dont disuinguish contributors; everyone can add interesting, rare etc. sentences I can learn of.

I don't have any problem with sentences about religion, they can be interesting. Religion is a controversial and deep theme.
Thanuir
2 days ago
I recommend:

* Search for long sentences (not translated to any language or to Hungarian in particular).
* Search for short sentences (as above).
* Search for random sentences.
* Translate from different languages by using the previous methods.
* Search for an interesting tag and translate everything there.

You can also play around with the search settings - audio only, native only, everything including orphan sentences included, etc.

The sentences on Tatoeba are ageless, so there is little reason to only check the recent ones.

---

Also, in general, let people add the sentences they add, unless they are obviously harmful. Tatoeba is a volunteer project, so the right way to fix things is to add, translate and organize the content you find interesting, while letting others do as they wish.
Pfirsichbaeumchen
2 days ago - 2 days ago
Ich verwende meistens so eine Suche: https://tatoeba.org/deu/sentenc...=&sort=random. Vielleicht findest Du das interessanter.

Wenn Du Kreativität willst, könntest Du Dich an diese Sätze wagen: https://tatoeba.org/deu/sentenc...=&sort=random.

Speichere die Links einfach in Deinem Profil ab.
sharptoothed
6 days ago
* Tatoeba Top 30 Languages Interactive Graphs *

Tatoeba Top 30 Languages Interactive Graphs have been updated:
https://tatoeba.j-langtools.com/igraph/
https://tatoeba.j-langtools.com/igraph/share.html
hide replies
Ricardo14
6 days ago
Thank you so really much!
hide replies
sharptoothed
3 days ago
you're welcome :-)
jegaevi
3 days ago
Sentences wanted

Is it likely that if I add English vocabulary items to the list any native English speaker will write sentences to them?
I'm asking because I'm planing on adding some as I'm reading a book and I'm collecting unknown words from it. But I don't want to bother with that if there's no avail.

Also I saw a few words in Cyrillic writing among the English words.
hide replies
Thanuir
3 days ago
Some of my English words have been added, some have not. I do not know whether this has been due to them being vocabulary words or by blind luck. Many English contributions use quite basic vocabulary, but more varied sentences do exist in the corpus.

I usually add unknown words (when reading something on a computer) and then once a while go through my vocabulary and remove the words I understand, leaving the ones I do not, and checking any sentences with those in case they help understanding.

There are quite a lot of English vocabulary items, so in any case it will take quite some time before any particular one is addressed with multiple sentences, unless one gets lucky.
hide replies
jegaevi
3 days ago - 3 days ago
Thanks for replying! :)
I guess I'll add a few more then and see what will happen.
MisterTrouser
7 days ago
Feature suggestion:
Mark sentences as "not directly linked".

Reason / Why does it help?
Short: Working together on finding directly linkable sentences gets enabled.

If someone tries to find indirectly linked sentences, that could be directly linked, he/she can use this search:
https://tatoeba.org/eng/sentenc...io=&sort=words

He/She reads through all 65 pages of sentences and figured, all indirectly linked sentences are not directly linkable.

Now a second user wants to do the same: Trying to directly link indirectly linked sentences. This second user will also have to read through all 65 pages of sentences to figure, there is nothing to do.
hide replies
TRANG
3 days ago
CK opened a GitHub issue for this:
https://github.com/Tatoeba/tatoeba2/issues/1980