Wall - Tatoeba

Wall (7,120 threads)

Tips

Before asking a question, make sure to read the FAQ.

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Latest messages

subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

sharptoothed

2 days ago

subdirectory_arrow_right

TATAR1

2 days ago

subdirectory_arrow_right

AlanF_US

3 days ago

feedback

sharptoothed

4 days ago

subdirectory_arrow_right

Shanaz

7 days ago

subdirectory_arrow_right

Qaztat

7 days ago

subdirectory_arrow_right

TATAR1

7 days ago

feedback

Tartar

7 days ago

subdirectory_arrow_right

menaud

10 days ago

sharptoothed September 15, 2024 September 15, 2024 at 7:14:01 AM UTC

link

Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Ergulis September 8, 2024 September 8, 2024 at 5:33:28 AM UTC

link

Permalink

Tatoeba website has been running very slowly since yesterday's shutdown. Do any of you have similar experience?

hide replies show replies

DJ_Saidez September 8, 2024 September 8, 2024 at 5:58:00 AM UTC

link

Permalink

It is slow for me too.

LanguageExpert September 8, 2024, edited September 8, 2024 September 8, 2024 at 2:17:01 PM UTC, edited September 8, 2024 at 2:19:20 PM UTC

link

Permalink

Yes, it's been slow for me too.

PaulP September 9, 2024 September 9, 2024 at 3:33:53 AM UTC

link

Permalink

Not just slow. Mostly I get the message „Tatoeba is currently unavailable. We are sorry for the inconvenience. You can check our blog or Twitter for more information.”
But nor the blog nor Twitter gives any information. Yesterday I hardly did 10 % of the work I used to do in one day.

gillux September 9, 2024 September 9, 2024 at 9:19:17 AM UTC

link

Permalink

There was a little problem indeed! Tatoeba should run smoothly now.

hide replies show replies

small_snow September 10, 2024, edited September 10, 2024 September 10, 2024 at 10:41:08 AM UTC, edited September 10, 2024 at 10:45:10 AM UTC

link

Permalink

サクッ..サク🤭動いています。ありがとうございます。

maaster September 6, 2024 September 6, 2024 at 1:18:04 PM UTC

link

Permalink

I think adopting-unadopting of sentences doesn't really work.
One must beg in order his(/her) unadopted sentence(s) to be adopted.
(I may suppose that's why the sentences aren't checked.)
And I think simply unadopted sentences remain rather unchecked.

(As for me, I don't like adopt sentences added by other ones.)

hide replies show replies

PaulP September 8, 2024 September 8, 2024 at 5:01:41 AM UTC

link

Permalink

I don't understand, Maaster. I adopt and correct sentences in „my languages” regularly. If an autor unadopts sentences, there is no need to be begged to change them. You can use a simple link to find them all:

https://tatoeba.org/eo/activiti..._sentences/epo

(change "epo" to the code of your language).

superduperimpose September 2, 2024 September 2, 2024 at 9:53:48 PM UTC

link

Permalink

Some sentences have this info "This sentence is original and was not derived from translation."

Is this information anywhere in the downloadable data?
thank you!

hide replies show replies

Yorwba September 4, 2024 September 4, 2024 at 6:54:50 PM UTC

link

Permalink

It's in the sentences_base file.

hide replies show replies

superduperimpose September 4, 2024 September 4, 2024 at 7:07:22 PM UTC

link

Permalink

You're right. It's right there. Sorry, I just didn't see it.

superduperimpose August 31, 2024 August 31, 2024 at 11:54:06 AM UTC

link

Permalink

Is the format of transcriptions (japanese if that makes any difference) explained anywhere? (nothing in the Wiki, afaik)

I found three different cases (there may be more):

A: [Kanji|Reading] which makes sense

B: [Kanji1Kanji2|Reading1|Reading2] which is probably short for [Kanji1|Reading1][Kanji2|Reading2]

C: [Kanji1Kanji2|Reading] which probably means the two Kanji combined have this reading

is this correct?
And can I expect to always find something that either fits A, B or C?
That is, can I expect to *never* find something like [Kanji1Kanji2Kanji3|Reading1|reading2], i.e. a number of Kanji and readings which are not equal (in that case, how would I know whether Reading1 belongs to Kanji1Kanji2 or just Kanji1?

I hope my ad-hoc syntax makes sense.

hide replies show replies

Yorwba August 31, 2024 August 31, 2024 at 1:38:45 PM UTC

link

Permalink

I assume you're asking this question because you want to transform the data programmatically (otherwise you could just handle edge cases whenever you encounter them). If my assumption is correct, it might be easiest to look at Tatoeba's own code for Japanese transcriptions. (Note that Tatoeba is AGPL-licensed, in case that's an issue for you.)

The validation code for user-provided furigana is here: https://github.com/Tatoeba/tato...ption.php#L220 but I think it might not apply to those that are generated automatically using MeCab.

The testcases might also be helpful: https://github.com/Tatoeba/tato...onTest.php#L27

If you just want to display furigana using HTML <ruby> tags, our code for that is here: https://github.com/Tatoeba/tato...naTrait.php#L9 To be honest, it's not written in an easily readable manner, but I think what it does is basically to assume without validation that there are at least as many kanji as there are readings, and if there is a kanji without reading (|| or end of list) it will merge it with the preceding kanji until the numbers are equal.

So [Kanji1Kanji2Kanji3|Reading1|reading2] would be equivalent to [Kanji1|Reading1][Kanji2Kanji3|reading2], I think.

hide replies show replies

superduperimpose August 31, 2024 August 31, 2024 at 3:07:10 PM UTC

link

Permalink

Yes, ruby is a good example. This looks good, thanks!
I will take a look at the code, especially the one where it handles unequal numbers of Kanji and readings.

charcoalis August 27, 2024 August 27, 2024 at 12:23:06 PM UTC

link

Permalink

When you search on Tatoeba.org, it only shows 1000 results. That is, it shows a maximum of 10 pages. It says the total number of results, but it only shows 1000. How can I fix this?

hide replies show replies

Guybrush88 August 27, 2024 August 27, 2024 at 1:34:43 PM UTC

link

Permalink

this is a technical limitation to not overload the server

brauchinet August 28, 2024 August 28, 2024 at 10:05:50 AM UTC

link

Permalink

I also wonder if this limit of 1000 sentences is too low.
I use this feature to find recently added sentences (in German) and sometimes the last 1000 sentences don't even cover one day.
The limit doesn't apply to sentences of specific users. Some of them own a huge amount of sentences (> 700000).
Currently, displaying or even re-sorting these is reasonably fast.

sharptoothed August 25, 2024 August 25, 2024 at 4:04:08 PM UTC

link

Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Ergulis August 17, 2024, edited August 17, 2024 August 17, 2024 at 5:46:37 PM UTC, edited August 17, 2024 at 6:06:15 PM UTC

link

Permalink

In searching for some solution to my problem with displaying text on Tatoeba in italics, I tried downloading another browser. From what Google offered me, I chose Brave. To my big surprise, it displays normally on it; the italics are gone.
It seems that something went wrong with setting on my basic browsers (Edge, Google Chrome, even Firefox), resulting in the issue.

hide replies show replies

Ergulis August 17, 2024 August 17, 2024 at 5:56:02 PM UTC

link

Permalink

There is a shield in the Brave browser. If it is on, the text shows normally. However, if I disable it, the italics appears even there. Very strange.

hide replies show replies

Yorwba August 17, 2024 August 17, 2024 at 7:25:11 PM UTC

link

Permalink

https://support.brave.com/hc/en...while-browsing indicates that the shield combines various blocking features that you can also toggle individually using the advanced controls. My guess is that you have a non-standard system font that shows up as italics and the font fingerprinting protection in Brave, when enabled, is preventing the browser from loading it.

In Firefox, by right-clicking the italic text and selecting "Inspect", you should be able to open a panel with three columns, the rightmost of which shows "Layout" initially, but one of the other options is "Fonts", which should show you which font is being used.

hide replies show replies

Ergulis August 18, 2024, edited August 20, 2024 August 18, 2024 at 11:21:13 AM UTC, edited August 20, 2024 at 8:27:25 PM UTC

link

Permalink

Thank you for your insight, Yorwba. I checked that and found out that Noto sans italic font is used. If I disable it, the site displays normally. However, it works only temporarily, until next launching. I just need to make out how to change it permanently.
I'm glad to understand the problem and for now, I'm ok with running Tatoeba on Brave.

PrasantaHembram August 10, 2024 August 10, 2024 at 7:06:39 PM UTC

link

Permalink

Hi,
I'm reaching out to inquire about importing thousands of bilingual English-Santali sentences into the Tatoeba database. I have a large collection of sentences in two languages that I'd like to contribute to the platform. Could you please provide guidance on the recommended format for preparing the sentence files, the process for uploading them to the database, and any specific requirements or guidelines for ensuring data quality and consistency? I'd greatly appreciate any assistance or documentation to help me import my sentence collection efficiently.

Thanks
Prasanta Hembram

hide replies show replies

gillux August 12, 2024 August 12, 2024 at 10:35:58 AM UTC

link

Permalink

Hello, this sounds awesome, but Tatoeba does not support mass import of sentences just yet. This is because we lack ressources to implement a proper import system. If you know how to program, you are welcome to contribute such system. If you know anybody who is willing to implement an import system, you can ask them. If you want to get notified about any progress on that matter, you can mention your interest on this Github issue thread https://github.com/Tatoeba/tatoeba2/issues/1762

As for importing sentences in general, you should care about the license of the data you want to contribute. It should be legal to re-use the data, as Tatoeba will publish it under Creative Commons CC-BY.

As for the data quality, the sentences should follow these rules https://en.wiki.tatoeba.org/art...h-explanations There is no particular expectations in terms of consistency, because Tatoeba already receives contributions from various people, without are not really following any consistency guidelines.

As for the data format, since we don’t have the tool to import just yet, there is not requirement yet, but I think CSV or TSV should be okay.

hide replies show replies

PrasantaHembram August 14, 2024 August 14, 2024 at 3:14:13 PM UTC

link

Permalink

Hi, @gillux. Thank you for the information. I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system. Will refer someone. I think for now, only admins can do mass import and is used rarely ?? and only way to contribute right now is to add/translate sentences one by one.

hide replies show replies

gillux August 16, 2024 August 16, 2024 at 3:31:59 PM UTC

link

Permalink

> I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system.

I think that creating an import system is a complex task, too. Not only on the technical level, but also on the social level, as one can see from the discussions on the GitHub issue page. I think that such an import system needs to designed collaboratively, so you are more than welcome to share your ideas.

> I think for now, only admins can do mass import and is used rarely ??

Admins used to be able to do some kind of basic mass import, but, for technical reasons, not anymore.

> and only way to contribute right now is to add/translate sentences one by one.

That is correct.

sharptoothed August 11, 2024 August 11, 2024 at 5:58:47 AM UTC

link

Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Wall (7,120 threads)

Tips

sharptoothed

sharptoothed

TATAR1

AlanF_US

sharptoothed

Shanaz

Qaztat

TATAR1

Tartar

menaud

Need some help?

Developers

About