menu
Tatoeba
language
Registreren Anmellen
language Plattdüütsch
menu
Tatoeba

chevron_right Registreren

chevron_right Anmellen

Dörkieken

chevron_right Show random sentence

chevron_right Na Spraak dörkieken

chevron_right Na List dörkieken

chevron_right Dörkieken na Tag

chevron_right Audiodatein dörkieken

Community

chevron_right Pinnwand

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search

Wall (7,123 threads)

Tipps

Ehrdat du en Fraag stellst, lees man de Faken stellt Fragen (FAQ).

We aim to maintain a healthy atmosphere for civilized discussions. Please read our rules against bad behavior.

Ne’este Narichten subdirectory_arrow_right

sharptoothed

6 days ago

subdirectory_arrow_right

sharptoothed

6 days ago

subdirectory_arrow_right

TATAR1

6 days ago

subdirectory_arrow_right

AlanF_US

6 days ago

feedback

sharptoothed

8 days ago

subdirectory_arrow_right

Shanaz

11 days ago

subdirectory_arrow_right

Qaztat

11 days ago

subdirectory_arrow_right

TATAR1

11 days ago

feedback

Tartar

11 days ago

subdirectory_arrow_right

menaud

13 days ago

sharptoothed sharptoothed 2024 M09 15 2024 M09 15 07:14:01 UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Ergulis Ergulis 2024 M09 8 2024 M09 8 05:33:28 UTC link Permalink

Tatoeba website has been running very slowly since yesterday's shutdown. Do any of you have similar experience?

{{vm.hiddenReplies[40758] ? 'expand_more' : 'expand_less'}} hide replies show replies
DJ_Saidez DJ_Saidez 2024 M09 8 2024 M09 8 05:58:00 UTC link Permalink

It is slow for me too.

LanguageExpert LanguageExpert 2024 M09 8, edited 2024 M09 8 2024 M09 8 14:17:01 UTC, edited 2024 M09 8 14:19:20 UTC link Permalink

Yes, it's been slow for me too.

PaulP PaulP 2024 M09 9 2024 M09 9 03:33:53 UTC link Permalink

Not just slow. Mostly I get the message „Tatoeba is currently unavailable. We are sorry for the inconvenience. You can check our blog or Twitter for more information.”
But nor the blog nor Twitter gives any information. Yesterday I hardly did 10 % of the work I used to do in one day.

gillux gillux 2024 M09 9 2024 M09 9 09:19:17 UTC link Permalink

There was a little problem indeed! Tatoeba should run smoothly now.

{{vm.hiddenReplies[40762] ? 'expand_more' : 'expand_less'}} hide replies show replies
small_snow small_snow 2024 M09 10, edited 2024 M09 10 2024 M09 10 10:41:08 UTC, edited 2024 M09 10 10:45:10 UTC link Permalink

サクッ..サク🤭動いています。ありがとうございます。

maaster maaster 2024 M09 6 2024 M09 6 13:18:04 UTC link Permalink

I think adopting-unadopting of sentences doesn't really work.
One must beg in order his(/her) unadopted sentence(s) to be adopted.
(I may suppose that's why the sentences aren't checked.)
And I think simply unadopted sentences remain rather unchecked.

(As for me, I don't like adopt sentences added by other ones.)

{{vm.hiddenReplies[40756] ? 'expand_more' : 'expand_less'}} hide replies show replies
PaulP PaulP 2024 M09 8 2024 M09 8 05:01:41 UTC link Permalink

I don't understand, Maaster. I adopt and correct sentences in „my languages” regularly. If an autor unadopts sentences, there is no need to be begged to change them. You can use a simple link to find them all:

https://tatoeba.org/eo/activiti..._sentences/epo

(change "epo" to the code of your language).

superduperimpose superduperimpose 2024 M09 2 2024 M09 2 21:53:48 UTC link Permalink

Some sentences have this info "This sentence is original and was not derived from translation."

Is this information anywhere in the downloadable data?
thank you!

{{vm.hiddenReplies[40752] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 2024 M09 4 2024 M09 4 18:54:50 UTC link Permalink

It's in the sentences_base file.

{{vm.hiddenReplies[40754] ? 'expand_more' : 'expand_less'}} hide replies show replies
superduperimpose superduperimpose 2024 M09 4 2024 M09 4 19:07:22 UTC link Permalink

You're right. It's right there. Sorry, I just didn't see it.

superduperimpose superduperimpose 2024 M08 31 2024 M08 31 11:54:06 UTC link Permalink

Is the format of transcriptions (japanese if that makes any difference) explained anywhere? (nothing in the Wiki, afaik)

I found three different cases (there may be more):

A: [Kanji|Reading] which makes sense

B: [Kanji1Kanji2|Reading1|Reading2] which is probably short for [Kanji1|Reading1][Kanji2|Reading2]

C: [Kanji1Kanji2|Reading] which probably means the two Kanji combined have this reading

is this correct?
And can I expect to always find something that either fits A, B or C?
That is, can I expect to *never* find something like [Kanji1Kanji2Kanji3|Reading1|reading2], i.e. a number of Kanji and readings which are not equal (in that case, how would I know whether Reading1 belongs to Kanji1Kanji2 or just Kanji1?

I hope my ad-hoc syntax makes sense.

{{vm.hiddenReplies[40749] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 2024 M08 31 2024 M08 31 13:38:45 UTC link Permalink

I assume you're asking this question because you want to transform the data programmatically (otherwise you could just handle edge cases whenever you encounter them). If my assumption is correct, it might be easiest to look at Tatoeba's own code for Japanese transcriptions. (Note that Tatoeba is AGPL-licensed, in case that's an issue for you.)

The validation code for user-provided furigana is here: https://github.com/Tatoeba/tato...ption.php#L220 but I think it might not apply to those that are generated automatically using MeCab.

The testcases might also be helpful: https://github.com/Tatoeba/tato...onTest.php#L27

If you just want to display furigana using HTML <ruby> tags, our code for that is here: https://github.com/Tatoeba/tato...naTrait.php#L9 To be honest, it's not written in an easily readable manner, but I think what it does is basically to assume without validation that there are at least as many kanji as there are readings, and if there is a kanji without reading (|| or end of list) it will merge it with the preceding kanji until the numbers are equal.

So [Kanji1Kanji2Kanji3|Reading1|reading2] would be equivalent to [Kanji1|Reading1][Kanji2Kanji3|reading2], I think.

{{vm.hiddenReplies[40750] ? 'expand_more' : 'expand_less'}} hide replies show replies
superduperimpose superduperimpose 2024 M08 31 2024 M08 31 15:07:10 UTC link Permalink

Yes, ruby is a good example. This looks good, thanks!
I will take a look at the code, especially the one where it handles unequal numbers of Kanji and readings.

charcoalis charcoalis 2024 M08 27 2024 M08 27 12:23:06 UTC link Permalink

When you search on Tatoeba.org, it only shows 1000 results. That is, it shows a maximum of 10 pages. It says the total number of results, but it only shows 1000. How can I fix this?

{{vm.hiddenReplies[40744] ? 'expand_more' : 'expand_less'}} hide replies show replies
Guybrush88 Guybrush88 2024 M08 27 2024 M08 27 13:34:43 UTC link Permalink

this is a technical limitation to not overload the server

brauchinet brauchinet 2024 M08 28 2024 M08 28 10:05:50 UTC link Permalink

I also wonder if this limit of 1000 sentences is too low.
I use this feature to find recently added sentences (in German) and sometimes the last 1000 sentences don't even cover one day.
The limit doesn't apply to sentences of specific users. Some of them own a huge amount of sentences (> 700000).
Currently, displaying or even re-sorting these is reasonably fast.

sharptoothed sharptoothed 2024 M08 25 2024 M08 25 16:04:08 UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

Ergulis Ergulis 2024 M08 17, edited 2024 M08 17 2024 M08 17 17:46:37 UTC, edited 2024 M08 17 18:06:15 UTC link Permalink

In searching for some solution to my problem with displaying text on Tatoeba in italics, I tried downloading another browser. From what Google offered me, I chose Brave. To my big surprise, it displays normally on it; the italics are gone.
It seems that something went wrong with setting on my basic browsers (Edge, Google Chrome, even Firefox), resulting in the issue.

{{vm.hiddenReplies[40738] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ergulis Ergulis 2024 M08 17 2024 M08 17 17:56:02 UTC link Permalink

There is a shield in the Brave browser. If it is on, the text shows normally. However, if I disable it, the italics appears even there. Very strange.

{{vm.hiddenReplies[40739] ? 'expand_more' : 'expand_less'}} hide replies show replies
Yorwba Yorwba 2024 M08 17 2024 M08 17 19:25:11 UTC link Permalink

https://support.brave.com/hc/en...while-browsing indicates that the shield combines various blocking features that you can also toggle individually using the advanced controls. My guess is that you have a non-standard system font that shows up as italics and the font fingerprinting protection in Brave, when enabled, is preventing the browser from loading it.

In Firefox, by right-clicking the italic text and selecting "Inspect", you should be able to open a panel with three columns, the rightmost of which shows "Layout" initially, but one of the other options is "Fonts", which should show you which font is being used.

{{vm.hiddenReplies[40740] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ergulis Ergulis 2024 M08 18, edited 2024 M08 20 2024 M08 18 11:21:13 UTC, edited 2024 M08 20 20:27:25 UTC link Permalink

Thank you for your insight, Yorwba. I checked that and found out that Noto sans italic font is used. If I disable it, the site displays normally. However, it works only temporarily, until next launching. I just need to make out how to change it permanently.
I'm glad to understand the problem and for now, I'm ok with running Tatoeba on Brave.

PrasantaHembram PrasantaHembram 2024 M08 10 2024 M08 10 19:06:39 UTC link Permalink

Hi,
I'm reaching out to inquire about importing thousands of bilingual English-Santali sentences into the Tatoeba database. I have a large collection of sentences in two languages that I'd like to contribute to the platform. Could you please provide guidance on the recommended format for preparing the sentence files, the process for uploading them to the database, and any specific requirements or guidelines for ensuring data quality and consistency? I'd greatly appreciate any assistance or documentation to help me import my sentence collection efficiently.

Thanks
Prasanta Hembram

{{vm.hiddenReplies[40723] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 2024 M08 12 2024 M08 12 10:35:58 UTC link Permalink

Hello, this sounds awesome, but Tatoeba does not support mass import of sentences just yet. This is because we lack ressources to implement a proper import system. If you know how to program, you are welcome to contribute such system. If you know anybody who is willing to implement an import system, you can ask them. If you want to get notified about any progress on that matter, you can mention your interest on this Github issue thread https://github.com/Tatoeba/tatoeba2/issues/1762

As for importing sentences in general, you should care about the license of the data you want to contribute. It should be legal to re-use the data, as Tatoeba will publish it under Creative Commons CC-BY.

As for the data quality, the sentences should follow these rules https://en.wiki.tatoeba.org/art...h-explanations There is no particular expectations in terms of consistency, because Tatoeba already receives contributions from various people, without are not really following any consistency guidelines.

As for the data format, since we don’t have the tool to import just yet, there is not requirement yet, but I think CSV or TSV should be okay.

{{vm.hiddenReplies[40726] ? 'expand_more' : 'expand_less'}} hide replies show replies
PrasantaHembram PrasantaHembram 2024 M08 14 2024 M08 14 15:14:13 UTC link Permalink

Hi, @gillux. Thank you for the information. I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system. Will refer someone. I think for now, only admins can do mass import and is used rarely ?? and only way to contribute right now is to add/translate sentences one by one.

{{vm.hiddenReplies[40736] ? 'expand_more' : 'expand_less'}} hide replies show replies
gillux gillux 2024 M08 16 2024 M08 16 15:31:59 UTC link Permalink

> I have some basic programming knowledge, but I'm not confident in my ability to contribute to the development of an import system.

I think that creating an import system is a complex task, too. Not only on the technical level, but also on the social level, as one can see from the discussions on the GitHub issue page. I think that such an import system needs to designed collaboratively, so you are more than welcome to share your ideas.

> I think for now, only admins can do mass import and is used rarely ??

Admins used to be able to do some kind of basic mass import, but, for technical reasons, not anymore.

> and only way to contribute right now is to add/translate sentences one by one.

That is correct.

sharptoothed sharptoothed 2024 M08 11 2024 M08 11 05:58:47 UTC link Permalink

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/