menu
Tatoeba
language
注册 登录
language 吳語
menu
Tatoeba

chevron_right 注册

chevron_right 登录

浏览

chevron_right 随机句子

chevron_right 选择闲话

chevron_right 选择列表

chevron_right 选择标签

chevron_right 选择音频

社群

chevron_right 留言墙

chevron_right 全部用户列表

chevron_right 用户额闲话

chevron_right 母语者

search
clear
swap_horiz
search

留言墙(7277则话题)

提醒

提问前头确定已经读了常见问题解答

阿拉额目标是保持文明讨论额健康氛围。 请读阿拉对于伐良行为额规定

最新留言 subdirectory_arrow_right

sharptoothed

2日前头

subdirectory_arrow_right

small_snow

3日前头

subdirectory_arrow_right

frpzzd

4日前头

subdirectory_arrow_right

LeviHighway

5日前头

subdirectory_arrow_right

frpzzd

5日前头

feedback

sharptoothed

5日前头

subdirectory_arrow_right

LeviHighway

5日前头

subdirectory_arrow_right

lingomaxim

5日前头

subdirectory_arrow_right

frpzzd

5日前头

feedback

LeviHighway

5日前头

19小时前头 December 18, 2025 at 5:14:07 PM UTC link 永久链接
warning

搿消息额内容违反了阿拉额规定,所以伊被隐藏,伊只好拨管理员帮发布搿消息额宁看到。

2日前头 December 16, 2025 at 5:43:21 PM UTC link 永久链接
warning

搿消息额内容违反了阿拉额规定,所以伊被隐藏,伊只好拨管理员帮发布搿消息额宁看到。

sharptoothed sharptoothed 5日前头 December 14, 2025 at 6:07:02 AM UTC flag Report link 永久链接

✹✹ Stats & Graphs ✹✹

Tatoeba Stats, Graphs & Charts have been updated:
https://tatoeba.j-langtools.com/allstats/

{{vm.hiddenReplies[41551] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
small_snow small_snow 3日前头 December 16, 2025 at 11:06:53 AM UTC flag Report link 永久链接

お元気ですか?
いつもありがとうございます!

{{vm.hiddenReplies[41555] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
sharptoothed sharptoothed 2日前头 December 16, 2025 at 5:17:31 PM UTC flag Report link 永久链接

元気です。
どういたしまして!:-)

LeviHighway LeviHighway 5日前头 December 13, 2025 at 1:31:41 PM UTC flag Report link 永久链接

I'm posting here to ask if anyone know collaborative language learning platforms that I can contribute on like Tatoeba. I like it when my content can be directly useful for others.

{{vm.hiddenReplies[41546] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
frpzzd frpzzd 5日前头 December 13, 2025 at 6:14:31 PM UTC flag Report link 永久链接

What kind of features would you have in mind for such a platform?

I've daydreamed for a while about a collaborative and open-source framework for writing open-source language/grammar textbooks containing structured data for things like grammar topics, vocab lists, and exercises. There are several language textbooks that I really admire, but they frustrate me for 2 reasons. (1) Their data (e.g. list of grammar topics, vocabulary per chapter, etc) is not delivered in a structured format and can only be extracted by OCR/scraping (which often has errors) or manual data entry, so it's hard to use it in tandem with other resources such as Tatoeba. (2) Textbook licenses make it hard to use textbook content for legal reasons as well.

Another part of this daydream is the idea of a language textbook with non-linear progression between chapters. Each chapter or exercise would have a list of "dependencies", i.e. other chapters whose content it depends on, but a reader would be free to traverse the chapters in any order so long as they complete each chapter's dependencies before reading it. All chapters together would form a DAG (directed acyclic graph) structure, not just a tree.

There are a few permissively licensed language textbooks out there, but they are usually one-off projects by some institution or professor that don't allow ongoing contributions, and they still lock up their data in an inconsistent/un-parseable format. I also find that a lot of more recent language textbooks de-emphasize grammar and instead focus on trying to simulate immersion through activities and media, which I personally find unhelpful.

So for now this is just a bit of a pipe dream. However, if I had some help and knew that a considerable number of people would be interested in contributing to open source language web-textbooks, I might try programming a framework for such textbooks myself.

{{vm.hiddenReplies[41547] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 5日前头,编辑5日前头 December 14, 2025 at 5:59:00 AM UTC,编辑December 14, 2025 at 5:59:17 AM UTC flag Report link 永久链接

About dependencies, you mean something like: before you start this lesson, you have to look through these words first?

{{vm.hiddenReplies[41550] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
frpzzd frpzzd 5日前头 December 14, 2025 at 7:11:57 AM UTC flag Report link 永久链接

Regarding dependencies, I was thinking that not only vocabulary but also grammar concepts could be involved in dependencies. For example, imagine the following grammar concepts in Spanish which might correspond to chapters / sections in a textbook:

1) The alphabet and pronunciation
2) Nouns and grammatical gender
3) Plural nouns and grammatical number
4) Spanish subject pronouns
5) Adjective-noun agreement
6) The verb "ser"

Of course the alphabet is needed for all of these. The topics (2), (3) and (4) could be studied more or less independently of each other. However, (5) depends on both (2) and (3), since Spanish adjectives agree based on gender and number. Further, (6) might depend on (3) and (4) since conjugating verbs in Spanish requires some understanding of grammatical number and subject pronouns. So a person could progress through these topics in, say, any of the following orders:

1 --> 2 --> 3 --> 4 --> 5 --> 6
1 --> 2 --> 3 --> 4 --> 6 --> 5
1 --> 2 --> 3 --> 5 --> 4 --> 6
1 --> 4 --> 2 --> 3 --> 5 --> 6

This kind of thing would be especially handy for someone with some limited background knowledge of the language. In this hypothetical web textbook, the user could check off the grammar topics they had already studied, and the app would indicate which chapters they could study next with their background knowledge.

{{vm.hiddenReplies[41552] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 5日前头,编辑5日前头 December 14, 2025 at 8:16:02 AM UTC,编辑December 14, 2025 at 8:16:37 AM UTC flag Report link 永久链接

Do you have any existing processes for this? I mean mainly for learning content or just structure.

{{vm.hiddenReplies[41553] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
frpzzd frpzzd 4日前头 December 15, 2025 at 2:44:15 AM UTC flag Report link 永久链接

I don't know what you mean by that, can you be a little more specific?

Anyways, I don't meant to fill up the wall with all of my long messages. Feel free to send me a DM if you want more details.

lingomaxim lingomaxim 5日前头 December 13, 2025 at 6:55:33 PM UTC flag Report link 永久链接

I forgot about it until reading this but there's a site called LangCorrect, which is not quite like what you described, but still may be interesting.

If I remember correctly, it's a writing prompt type thing in a bunch of languages. You can get corrections from native/fluent speakers as well as help people learning your own native language.

alt alt 7日前头 December 12, 2025 at 6:56:58 AM UTC flag Report link 永久链接

I was wondering if any one had a way to search for Japanese verbs that include all possible conjugations? I'm trying to automatically pull some sentences for my Anki cards but just searching for the dictionary form of a verb doesn't work well for proper sentences. Is there maybe some MantiCore syntax I could use here? Thanks!

{{vm.hiddenReplies[41538] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 7日前头 December 12, 2025 at 7:14:22 AM UTC flag Report link 永久链接

You can just strip the verb ending. 読 would match 読む, 読みます, 読んだ, 読んで, 読まない, 読める, 読もう, etc.

gillux gillux 7日前头 December 12, 2025 at 9:32:37 AM UTC flag Report link 永久链接

Hello,

Searching in Japanese on Tatoeba is currently limited in that it only works a character level, not at a "word" level. Put differently, every character is considered as a word of its own. We would like to improve that situation, but we don’t have the resources to do so at the moment. Note that Tatoeba is an open project and contributions are welcome.

If you are looking for a way to automate the process of retrieving example sentences containing any form of a given verb, I recommend that you download the Japanese Tatoeba corpus as a file and to process it yourself with a natural language processing toolkit. Weekly exports are available at https://tatoeba.org/downloads.

If you are looking for a way to manually search for sentences containing any form of a few verbs, let me explain some ways this can be achieved currently. I'll take the verb 振る(ふる) as example.

1. First, you need to use quotes around Japanese keywords:

振ります
→ will match sentences containing the four characters 振, り, ま and す anywhere in a sentence, in any order (but the provided order is prioritized).

"振ります"
→ will match those four characters contiguously.

2. You can use @text and @transcription to match furigana, too:

@transcription "ふります" @text "振ります"
→ will match sentences having "ふります" in furigana and "振ります" in sentence text.

3. There is a caveat: the furigana will not be specifically matched against the corresponding kanjis, for example:

@transcription "ふ" @text "振"
→ could potentially match the sentence 降りるふりをしました。 because of the presence of unrelated ふ elsewhere in the sentence.
→ however the probability of such mismatch gets very rare if you use two or more characters in @transcription

4. Regarding your question about how to match all forms of a verb. Let’s first consider verbs which reading is unambiguous, such as 思う. If you search for the root "思", you will get unwanted matches such as 不思議. The easiest way to exclude these is to provide a suffix to 思. Because 思う is a godan verb, there are only 6 possible suffixes:

思う(おもう)
→ 思わ, 思い, 思う, 思え, 思お, 思っ

We can give all the forms separated by the OR operator "|":

"思わ"|"思い"|"思う"|"思え"|"思お"|"思っ"
→ this search shows all sentences containing the verb 思う conjugated

5. If the verb is ichidan, and the root has more than one character, you can just search for the root:

"食べ"
→ this search shows all sentences containing the verb 食べる conjugated

6. If the verb is ichidan and the root has only one character, because of the caveat explained at 3, you need to provide a second character. It gets tricky because there are many possible second character:

見る(みる)
"見る"|"見ま"|"見さ"|"見ら"|"見た"|"見な"|"見ろ"|"見よ"|"見て"|"見え"
→ brings a few false positive such as この指輪ね、祖母の形見なの。

7. If the verb reading is ambiguous, you can combine example 2 and 4:

振る(ふる)
→ 降ら, 降り, 降る, 降れ, 降ろ, 降っ

So the final search is:
(@text "降ら" @transcription "ふら") | (@text "降り" @transcription "ふり") | (@text "降る" @transcription "ふる") | (@text "降れ" @transcription "ふれ") | (@text "降ろ" @transcription "ふろ") | (@text "降っ" @transcription "ふっ")

(We need to use parenthesis because the implicit AND operator has a
higher priority than OR "|")

CK CK 7日前头,编辑7日前头 December 12, 2025 at 11:22:06 AM UTC,编辑December 12, 2025 at 11:24:25 AM UTC flag Report link 永久链接

Something else that might save you time is to use Jim Breen's wwwjdic.

Look up a verb, click the "links" link, find the "verb conjugation" link, then copy all the forms, and find what you need to search for as gillux explains.

This should save you a lot of typing, Here's an example of one verb conjugation page.
https://www.edrdg.org/cgi-bin/w...A4%A8%A4%EB_v1

Here also has an "example search".

https://www.edrdg.org/cgi-bin/w...A4%A8%A4%EB_1_

AlanF_US AlanF_US 6日前头 December 12, 2025 at 1:46:18 PM UTC flag Report link 永久链接

These responses by @gillux and @CK contain useful information that should be made available on the Tatoeba wiki as well.

frpzzd frpzzd 6日前头 December 12, 2025 at 3:40:55 PM UTC flag Report link 永久链接

Here are a few more links that you may find useful.

The Python NLP library called spaCy has pipelines available for Japanese:

https://spacy.io/models/ja

Using this library, you can download spaCy models and use them to analyze a sentence. Part of this pipeline includes a component that attempts to split the sentence into individual words, and reduce each word to its "base form". I personally use this library to generate fill-in-the-blank vocabulary exercises for German and Russian (a la Clozemaster) by searching for Tatoeba sentences containing words whose base forms are a target word that I want to study.

You may also find this useful:

https://github.com/cl-tohoku/J-UniMorph

This dataset purports to list all (most?) inflected forms for many Japanese words. Not only that, but the data file also lists how each form is inflected.

alt alt 6日前头 December 12, 2025 at 4:06:38 PM UTC flag Report link 永久链接

Thanks @gillux and @CK, I had not known about the character level only search. I was initially just sniping off the last character of the verb but as you said, I'd be running into issues with words containing just the root. I think obtaining all the conjugations separately and searching for that is probably the way to go for now.

7日前头 December 12, 2025 at 7:35:57 AM UTC link 永久链接
warning

搿消息额内容违反了阿拉额规定,所以伊被隐藏,伊只好拨管理员帮发布搿消息额宁看到。

7日前头 December 11, 2025 at 6:09:32 PM UTC link 永久链接
warning

搿消息额内容违反了阿拉额规定,所以伊被隐藏,伊只好拨管理员帮发布搿消息额宁看到。

LeviHighway LeviHighway November 5, 2025 November 5, 2025 at 3:36:24 PM UTC flag Report link 永久链接

I think we need warnings when someone adds sentences that are longer than the upper limit. I was adding translations to some long sentences and didn't realized it's longer the limit and the sentence was cut in half.

{{vm.hiddenReplies[41390] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
Babelball Babelball 9日前头 December 10, 2025 at 1:02:32 AM UTC flag Report link 永久链接

In my opinion, maybe a small counter to the side of or below the writing box which shows how many out of n amount (n being the word limit) of words one has written would be good, so one knows how close one is to the word limit; something like the character counter one might see on social media sites.

cafoc64474 cafoc64474 10日前头,编辑10日前头 December 8, 2025 at 11:32:44 PM UTC,编辑December 9, 2025 at 2:30:17 AM UTC flag Report link 永久链接

I hope it is OK to post this here. I made a small list of Lower Sorbian words. It is based on Tatoeba.
=====================================================
Lower Sorbian

I am - Ja som
You are - Ty sy
He is - Wón jo
She is - Wóna jo

(we) are - Smy
(we) are - Smej
(you) are - Sćo
they are - Wóni su

why - cogodla
where - Źo
No - ně
everything - wšykno
nothing - nic


I know - Wěm
You know - Ty wěš
He knows - Wón wě
Nobody knows - Nichten njewě


today - źinsa

Monday - pónjeźele
Tuesday - wałtora
Wednesday - srjody
Thursday - stwórtka
Friday - pětk
Sunday - njeźela


car - awto
bird - ptašk
school - šula
bridge - móst
doctor - gójc
church - cerkwja
dictionary - słownik
horse - kóń
key - kluc
hospital - chórownja
sister - sotša

{{vm.hiddenReplies[41528] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 10日前头,编辑10日前头 December 9, 2025 at 8:16:26 AM UTC,编辑December 9, 2025 at 8:18:08 AM UTC flag Report link 永久链接

I always think it would be nice if Tatoeba has a built-in dictionary and everyone can contribute in the logic of how this website treats sentences...

I was thinking about OmegaWiki, but Tatoeba has a much better software than that.

{{vm.hiddenReplies[41532] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
CK CK 10日前头,编辑10日前头 December 9, 2025 at 11:48:16 AM UTC,编辑December 9, 2025 at 11:48:56 AM UTC flag Report link 永久链接

Another way to think about this is that perhaps the strength of the Tatoeba Project is that it focuses on one thing and does it fairly well. It makes its data available to others who can figure out other ways to manipulate the data.

{{vm.hiddenReplies[41533] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
LeviHighway LeviHighway 9日前头 December 9, 2025 at 1:12:13 PM UTC flag Report link 永久链接

Well I just can't find a dictionary website or tool that is similar to the Tatoeba project! OmegaWiki and Glosbe are similar, but OmegaWiki was canceled and Glosbe is not maintained and does not have a community. I am not capable of starting a website so I wish there is one tho :p

{{vm.hiddenReplies[41534] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
cafoc64474 cafoc64474 9日前头,编辑9日前头 December 9, 2025 at 1:41:57 PM UTC,编辑December 9, 2025 at 1:47:28 PM UTC flag Report link 永久链接

I like the UI of Interslavic dictionary website. On the other hand, website itself doesn't have a method for inserting new words into the dictionary. It is done by entering words into a large Excel table.
I think it would be great if a website like Tatoeba but for a dictionary existed.

lehunghup lehunghup 10日前头 December 9, 2025 at 4:58:49 AM UTC flag Report link 永久链接

Hi everyone,

I have tried to use the sentence export page on Tatoeba, but unfortunately, I am not managing to do it correctly. Could someone please help me download the following data (preferably as a CSV file, zipped or not)?

1. All Kabyle sentences
—with their translations in French, English, and Spanish,
—and with an audio recording attached.

2. All original Kabyle sentences that:
—do not have translations,
—do not have audio recordings,
—but without duplicated sentences if possible.

Any help, guidance, or explanation on how to extract these properly would be greatly appreciated.

Thank you very much in advance!

{{vm.hiddenReplies[41529] ? 'expand_more' : 'expand_less'}} 隐藏回复 显示回复
brauchinet brauchinet 10日前头 December 9, 2025 at 7:40:57 AM UTC flag Report link 永久链接

https://tatoeba.org/de/wall/sho...#message_41510

CK CK 10日前头 December 9, 2025 at 7:47:47 AM UTC flag Report link 永久链接

This post seems to have been copy-and-pasted by a Vietnamese speaker.
It's something that appeared on the Wall before.
If another admin doesn't delete this within 24 hours, I will.
Strictly speaking, we don't have a policy about this, I think.