menu
Tatoeba
language
S'inscriure Connexion
language Occitan
menu
Tatoeba

chevron_right S'inscriure

chevron_right Connexion

Percórrer

chevron_right Afichar la frasa aleatòria

chevron_right Percórrer per lenga

chevron_right Percórrer per lista

chevron_right Percórrer per etiqueta

chevron_right Percórrer los enregistraments àudio

Community

chevron_right Paret

chevron_right Lista de totes los membres

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7138 threads)

Astúcias

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Darrièrs messatges feedback

zbetnetwork2025

30 minutes ago

subdirectory_arrow_right

deniko

2 days ago

subdirectory_arrow_right

deniko

2 days ago

subdirectory_arrow_right

frpzzd

2 days ago

subdirectory_arrow_right

araneo

2 days ago

subdirectory_arrow_right

deniko

2 days ago

subdirectory_arrow_right

deniko

2 days ago

subdirectory_arrow_right

deniko

2 days ago

subdirectory_arrow_right

deniko

2 days ago

feedback

deniko

2 days ago

superduperimpose superduperimpose September 2, 2024 September 2, 2024 at 9:53:48 PM UTC flag Report link Permalink

Some sentences have this info "This sentence is original and was not derived from translation."

Is this information anywhere in the downloadable data?
thank you!

{{vm.hiddenReplies[40752] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba September 4, 2024 September 4, 2024 at 6:54:50 PM UTC flag Report link Permalink

It's in the sentences_base file.

{{vm.hiddenReplies[40754] ? 'expand_more' : 'expand_less'}} hide replies show replies
superduperimpose superduperimpose September 4, 2024 September 4, 2024 at 7:07:22 PM UTC flag Report link Permalink

You're right. It's right there. Sorry, I just didn't see it.

superduperimpose superduperimpose August 31, 2024 August 31, 2024 at 11:54:06 AM UTC flag Report link Permalink

Is the format of transcriptions (japanese if that makes any difference) explained anywhere? (nothing in the Wiki, afaik)

I found three different cases (there may be more):

A: [Kanji|Reading] which makes sense

B: [Kanji1Kanji2|Reading1|Reading2] which is probably short for [Kanji1|Reading1][Kanji2|Reading2]

C: [Kanji1Kanji2|Reading] which probably means the two Kanji combined have this reading

is this correct?
And can I expect to always find something that either fits A, B or C?
That is, can I expect to *never* find something like [Kanji1Kanji2Kanji3|Reading1|reading2], i.e. a number of Kanji and readings which are not equal (in that case, how would I know whether Reading1 belongs to Kanji1Kanji2 or just Kanji1?

I hope my ad-hoc syntax makes sense.

{{vm.hiddenReplies[40749] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba August 31, 2024 August 31, 2024 at 1:38:45 PM UTC flag Report link Permalink

I assume you're asking this question because you want to transform the data programmatically (otherwise you could just handle edge cases whenever you encounter them). If my assumption is correct, it might be easiest to look at Tatoeba's own code for Japanese transcriptions. (Note that Tatoeba is AGPL-licensed, in case that's an issue for you.)

The validation code for user-provided furigana is here: https://github.com/Tatoeba/tato...ption.php#L220 but I think it might not apply to those that are generated automatically using MeCab.

The testcases might also be helpful: https://github.com/Tatoeba/tato...onTest.php#L27

If you just want to display furigana using HTML <ruby> tags, our code for that is here: https://github.com/Tatoeba/tato...naTrait.php#L9 To be honest, it's not written in an easily readable manner, but I think what it does is basically to assume without validation that there are at least as many kanji as there are readings, and if there is a kanji without reading (|| or end of list) it will merge it with the preceding kanji until the numbers are equal.

So [Kanji1Kanji2Kanji3|Reading1|reading2] would be equivalent to [Kanji1|Reading1][Kanji2Kanji3|reading2], I think.

{{vm.hiddenReplies[40750] ? 'expand_more' : 'expand_less'}} hide replies show replies
superduperimpose superduperimpose August 31, 2024 August 31, 2024 at 3:07:10 PM UTC flag Report link Permalink

Yes, ruby is a good example. This looks good, thanks!
I will take a look at the code, especially the one where it handles unequal numbers of Kanji and readings.

charcoalis charcoalis August 27, 2024 August 27, 2024 at 12:23:06 PM UTC flag Report link Permalink

When you search on Tatoeba.org, it only shows 1000 results. That is, it shows a maximum of 10 pages. It says the total number of results, but it only shows 1000. How can I fix this?

{{vm.hiddenReplies[40744] ? 'expand_more' : 'expand_less'}} hide replies show replies
Guybrush88 Guybrush88 August 27, 2024 August 27, 2024 at 1:34:43 PM UTC flag Report link Permalink

this is a technical limitation to not overload the server

brauchinet brauchinet August 28, 2024 August 28, 2024 at 10:05:50 AM UTC flag Report link Permalink

I also wonder if this limit of 1000 sentences is too low.
I use this feature to find recently added sentences (in German) and sometimes the last 1000 sentences don't even cover one day.
The limit doesn't apply to sentences of specific users. Some of them own a huge amount of sentences (> 700000).
Currently, displaying or even re-sorting these is reasonably fast.

sharptoothed sharptoothed August 25, 2024 August 25, 2024 at 4:04:08 PM UTC flag Report link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Ergulis Ergulis August 17, 2024, edited August 17, 2024 August 17, 2024 at 5:46:37 PM UTC, edited August 17, 2024 at 6:06:15 PM UTC flag Report link Permalink

In searching for some solution to my problem with displaying text on Tatoeba in italics, I tried downloading another browser. From what Google offered me, I chose Brave. To my big surprise, it displays normally on it; the italics are gone.
It seems that something went wrong with setting on my basic browsers (Edge, Google Chrome, even Firefox), resulting in the issue.

{{vm.hiddenReplies[40738] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ergulis Ergulis August 17, 2024 August 17, 2024 at 5:56:02 PM UTC flag Report link Permalink

There is a shield in the Brave browser. If it is on, the text shows normally. However, if I disable it, the italics appears even there. Very strange.

{{vm.hiddenReplies[40739] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba August 17, 2024 August 17, 2024 at 7:25:11 PM UTC flag Report link Permalink

https://support.brave.com/hc/en...while-browsing indicates that the shield combines various blocking features that you can also toggle individually using the advanced controls. My guess is that you have a non-standard system font that shows up as italics and the font fingerprinting protection in Brave, when enabled, is preventing the browser from loading it.

In Firefox, by right-clicking the italic text and selecting "Inspect", you should be able to open a panel with three columns, the rightmost of which shows "Layout" initially, but one of the other options is "Fonts", which should show you which font is being used.

{{vm.hiddenReplies[40740] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ergulis Ergulis August 18, 2024, edited August 20, 2024 August 18, 2024 at 11:21:13 AM UTC, edited August 20, 2024 at 8:27:25 PM UTC flag Report link Permalink

Thank you for your insight, Yorwba. I checked that and found out that Noto sans italic font is used. If I disable it, the site displays normally. However, it works only temporarily, until next launching. I just need to make out how to change it permanently.
I'm glad to understand the problem and for now, I'm ok with running Tatoeba on Brave.

PrasantaHembram PrasantaHembram August 10, 2024 August 10, 2024 at 7:06:39 PM UTC flag Report link Permalink

Hi,
I'm reaching out to inquire about importing thousands of bilingual English-Santali sentences into the Tatoeba database. I have a large collection of sentences in two languages that I'd like to contribute to the platform. Could you please provide guidance on the recommended format for preparing the sentence files, the process for uploading them to the database, and any specific requirements or guidelines for ensuring data quality and consistency? I'd greatly appreciate any assistance or documentation to help me import my sentence collection efficiently.

Thanks
Prasanta Hembram

{{vm.hiddenReplies[40723] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux August 12, 2024 August 12, 2024 at 10:35:58 AM UTC flag Report link Permalink

Hello, this sounds awesome, but Tatoeba does not support mass import of sentences just yet. This is because we lack ressources to implement a proper import system. If you know how to program, you are welcome to contribute such system. If you know anybody who is willing to implement an import system, you can ask them. If you want to get notified about any progress on that matter, you can mention your interest on this Github issue thread https://github.com/Tatoeba/tatoeba2/issues/1762

As for importing sentences in general, you should care about the license of the data you want to contribute. It should be legal to re-use the data, as Tatoeba will publish it under Creative Commons CC-BY.

As for the data quality, the sentences should follow these rules https://en.wiki.tatoeba.org/art...h-explanations There is no particular expectations in terms of consistency, because Tatoeba already receives contributions from various people, without are not really following any consistency guidelines.

As for the data format, since we don’t have the tool to import just yet, there is not requirement yet, but I think CSV or TSV should be okay.

{{vm.hiddenReplies[40726] ? 'expand_more' : 'expand_less'}} hide replies show replies
PrasantaHembram PrasantaHembram August 14, 2024 August 14, 2024 at 3:14:13 PM UTC flag Report link Permalink

Hi, @gillux. Thank you for the information. I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system. Will refer someone. I think for now, only admins can do mass import and is used rarely ?? and only way to contribute right now is to add/translate sentences one by one.

{{vm.hiddenReplies[40736] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux August 16, 2024 August 16, 2024 at 3:31:59 PM UTC flag Report link Permalink

> I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system.

I think that creating an import system is a complex task, too. Not only on the technical level, but also on the social level, as one can see from the discussions on the GitHub issue page. I think that such an import system needs to designed collaboratively, so you are more than welcome to share your ideas.

> I think for now, only admins can do mass import and is used rarely ??

Admins used to be able to do some kind of basic mass import, but, for technical reasons, not anymore.

> and only way to contribute right now is to add/translate sentences one by one.

That is correct.

sharptoothed sharptoothed August 11, 2024 August 11, 2024 at 5:58:47 AM UTC flag Report link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

miketheknight miketheknight July 13, 2020 July 13, 2020 at 1:04:10 PM UTC flag Report link Permalink

In advanced search there's a checkbox "Owned by a self-identified native". Would it be reasonable to extend this functionality to "Owned or approved by a self-identified native"?

{{vm.hiddenReplies[35625] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG July 16, 2020 July 16, 2020 at 7:31:34 PM UTC flag Report link Permalink

It could be.

Can you tell us what made you think of this? With a bit more context, we can better assess whether we should extend the checkbox as you suggested or whether we should add another search option.

Note that there has been a similar request raised on GitHub:
https://github.com/Tatoeba/tatoeba2/issues/2261

{{vm.hiddenReplies[35634] ? 'expand_more' : 'expand_less'}} hide replies show replies
miketheknight miketheknight July 17, 2020, edited July 17, 2020 July 17, 2020 at 8:22:47 AM UTC, edited July 17, 2020 at 8:23:14 AM UTC flag Report link Permalink

Because I like working with sentences added by native speakers. They are less likely to be awkward, they are less likely to contain structural mistakes. I even enjoy noticing typical mistakes that native speakers make - for example, I used to pronounce "they're" and "there" differently in English a long time ago, and only after having noticed that native speakers of English regularly confuse "they're", "there" and "their" in writing did I understand those three have identical pronunciation.

Anyway, I have a lot of reasons to work only with sentences added by native speakers, so I almost always use the "Added by self-identified native speakers" checkbox in my searches.

However, I think I'm missing out on sentences added by non-native speakers that were approved / corrected by native speakers. I don't see why those would be any worse than sentences added by native speakers.

So I believe it would be useful to treat "Sentences added by native speakers" + "Sentences approved by native speakers" as one set.

The github link is not the same. If I OK an English sentence, it doesn't make it any more reliable than it was before me okaying it, but it's important for a sentence to be reviewed by a native speaker.

{{vm.hiddenReplies[35641] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 17, 2020 July 17, 2020 at 3:52:34 PM UTC flag Report link Permalink

> only after having noticed that native speakers of English regularly confuse "they're", "there" and "their" in writing did I understand those three have identical pronunciation.

Not identical, at least not for every speaker, but definitely similar. :)

morbrorper morbrorper July 18, 2020 July 18, 2020 at 9:10:41 PM UTC flag Report link Permalink

There are a lot of correct sentences that belong to users that have not indicated their native language. And I have come across quite a few sentences with errors, by native contributors. Nevertheless, I think this would be a useful feature.

morbrorper morbrorper July 19, 2020 July 19, 2020 at 8:51:45 AM UTC flag Report link Permalink

In relation to this, I would like to call for an overview of all the sentences having the @needs native check tag. I find it discouraging to see sentences with this tag being ignored for ages.

There are also quite a few sentences that have both this tag and an "OK" tag, which is confusing. I understand that may be because the person who OK's a sentence does not always have the right to delete other people's tags, or they just forget; this makes me think the issue is perhaps not best handled using tags.

{{vm.hiddenReplies[35651] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 19, 2020 July 19, 2020 at 1:44:05 PM UTC flag Report link Permalink

> In relation to this, I would like to call for an overview of all the sentences having the @needs native check tag. I find it discouraging to see sentences with this tag being ignored for ages.

In which languages? I make a point of frequently reviewing the English sentences with this tag (along with @check and @change). Sometimes they build up in the short term because they are owned by an active member who hasn't had a chance to get through all of them yet.

> There are also quite a few sentences that have both this tag and an "OK" tag, which is confusing.

Again, in which languages?

{{vm.hiddenReplies[35652] ? 'expand_more' : 'expand_less'}} hide replies show replies
morbrorper morbrorper July 19, 2020 July 19, 2020 at 3:12:05 PM UTC flag Report link Permalink

OK, I looked a bit closer, using the web interface, and found that Norwegian Bokmål actually stands for more than half of the total ~4500 sentences. It is among these that I found the OK'd ones.

Other languages that stand out are Japanese (412), and Mandarin Chinese (384). To get the whole picture, for each language, maybe somebody could run an SQL query?

Indeed, English has very few unhandled @NNC requests, for which I am grateful.

{{vm.hiddenReplies[35653] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US July 19, 2020 July 19, 2020 at 6:11:42 PM UTC flag Report link Permalink

Norwegian Bokmål has no corpus maintainers. Maybe it's time to recruit one.

{{vm.hiddenReplies[35654] ? 'expand_more' : 'expand_less'}} hide replies show replies
Thanuir Thanuir July 20, 2020 July 20, 2020 at 7:17:08 AM UTC flag Report link Permalink

Det ville være en stor forbedring.

Thanuir Thanuir August 7, 2024 August 7, 2024 at 6:07:49 AM UTC flag Report link Permalink

Koskien norjankielistä korpusta: siellä on useita lauseita, joissa on helposti huomattava kirjoitusvirhe, jonka usea käyttäjä on ilmoittanut. Monissa tapauksissa jollakulla käyttäjistä on norja äidinkielenä. Monissa tapauksissa virhe on dokumentoitu myös sanakirjaviittauksilla.

Myös henkilö, jolla ei ole norja äidinkielenään, voisi käydä läpi lauseet ja tehdä tällaiset ilmeiset korjaukset. Jättää vain tekemättä ne, jotka eivät ole hyvin dokumentoituja tai tuettuja.

CK CK July 29, 2020 July 29, 2020 at 5:43:48 AM UTC flag Report link Permalink

We have a new Japanese voice.

Lowteq has contributed 123 audio files.

https://tatoeba.org/eng/sentenc...how/168586/und

{{vm.hiddenReplies[35698] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 July 29, 2020 July 29, 2020 at 1:56:58 PM UTC flag Report link Permalink

Amazing! :D

TRANG TRANG June 30, 2020, edited June 30, 2020 June 30, 2020 at 10:44:50 PM UTC, edited June 30, 2020 at 10:45:02 PM UTC flag Report link Permalink

** What's New on Tatoeba? - Your biweekly recap #20 **

(What's New on Tatoeba will be published biweekly until the end of August.)


EVENT

It's been now one month since our Kodoeba event[1] started.

※ As far as the internal code goes:

• Our participants[2] have solved five issues, and seven others are on their way. You can find the details on GitHub[3].
• Alexs has asked for feedback about the tags: https://tatoeba.org/eng/wall/show_message/35555. Be sure to share your thoughts if you'd like to see the tags in Tatoeba become more useful!

※ As for the external projects:

• lbdx has updated Tatominer: https://tatoeba.org/eng/wall/show_message/35527.
• The other projects are starting to take shape, it's still too early to showcase anything. We'll have to wait until mid or end of July.

[1] https://blog.tatoeba.org/2020/0...kodoeba-1.html
[2] https://blog.tatoeba.org/2020/0...ticipants.html
[3] https://github.com/orgs/Tatoeba/projects/1


UPDATES

※ The search has been improved for languages using Arabic scripts, Indonesian and Tagalog. Many thanks to Yorwba.

※ The number of messages in the private messages has been localized, thanks to Ricardo14.

※ There's now a reset icon in the inputs of the advanced search. Thanks to Roverandom789133 for adding this.

※ We no longer unnecessarily store IPs in our contributions logs. Thanks to jpear1 for cleaning this up.


ON THE WALL

※ Trang has been working on making the landing page responsive: https://tatoeba.org/eng/wall/show_message/35464

※ gillux has asked which sentences would be a good candidate to print on a Tatoeba T-shirt or mug: https://tatoeba.org/eng/wall/show_message/35547

※ tommg has announced the release of his language learning app that uses Tatoeba's data: https://tatoeba.org/eng/wall/show_message/35512


LANGUAGES

※ Rircardo14 posted some updates about the progress of the translation of our UI: https://tatoeba.org/eng/wall/show_message/35518.

※ A new UI language has been enabled on the dev website: Serbian.

※ As usual, thanks to all the members who helped to translate the website!


----------

If you'd like to help to the development of Tatoeba, report issues, or are just curious, have a look at the GitHub repository.

If you want to help us translate the website to your language, you can join us on Transifex: https://www.transifex.com/tatoe...ite/dashboard/ and check this article on the wiki https://en.wiki.tatoeba.org/art...e-translation.

If you're especially happy with one of the updates, don't hesitate to personally thank our developers :) They're working in the shadow but they'll be glad to hear your feedback.

----------

Last recap: https://tatoeba.org/eng/wall/show_message/35504
See this recap on the blog: http://blog.tatoeba.org/2020/06...weekly_30.html

{{vm.hiddenReplies[35583] ? 'expand_more' : 'expand_less'}} hide replies show replies
samir_t samir_t July 4, 2020 July 4, 2020 at 9:03:27 PM UTC flag Report link Permalink

I would like to know why the FAQ is not published in the Kabyle interface although the translation is finished on Transifex.

{{vm.hiddenReplies[35592] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG July 4, 2020 July 4, 2020 at 9:39:58 PM UTC flag Report link Permalink

Translating wiki content through Transifex is a new process. In this case, we simply didn't pay enough attention and we missed the Transifex notifications about how the FAQ was fully translated into Kabyle. So it was simply forgotten (sorry!).

The wiki is a separate application than the tatoeba.org website. The UI languages available on the wiki are actually not in sync with the languages available on tatoeba.org. Currently, the wiki doesn't support Kabyle yet. We first have to add it as a supported language then we have to manually add the Kabyle translations.

But in any case, we will make sure the Kabyle translation of the FAQ is made available soon :) Thanks for reporting it and thanks for translating!

{{vm.hiddenReplies[35593] ? 'expand_more' : 'expand_less'}} hide replies show replies
samir_t samir_t July 5, 2020 July 5, 2020 at 12:15:23 AM UTC flag Report link Permalink

Thanks.

{{vm.hiddenReplies[35594] ? 'expand_more' : 'expand_less'}} hide replies show replies
TRANG TRANG July 9, 2020 July 9, 2020 at 2:33:53 PM UTC flag Report link Permalink

It's online now: https://kab.wiki.tatoeba.org/articles/show/faq

I have a couple of questions:

(1) What would be a translation of "main" (or "main page") in Kabyle?

For instance, the URL of the English main page is like this:
https://en.wiki.tatoeba.org/articles/show/main

While for French it's "page-principale" instead of "main":
https://fr.wiki.tatoeba.org/art...age-principale

For each language the name in the URL is in the language itself. For now, I have named the Kabyle page "main" but it would be more suitable to have a string in Kabyle.

(2) Same question for the FAQ URL. Is it fine to leave it as "faq" or is there another acronym in Kabyle?

{{vm.hiddenReplies[35613] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 July 9, 2020 July 9, 2020 at 5:19:28 PM UTC flag Report link Permalink

samir_t replied on another thread - https://tatoeba.org/eng/wall/sh...#message_35614

"The translation of "main" in Kabyle is "agejdan".

As for the FAQ URL, it would be better to leave it as "faq".

Thanks."

https://prnt.sc/tevi8q