menu
Tatoeba
language
登録する ログイン
language 日本語
menu
Tatoeba

chevron_right 登録する

chevron_right ログイン

閲覧する

chevron_right ランダム表示

chevron_right 言語を指定して見る

chevron_right リストごとに見る

chevron_right タグごとに見る

chevron_right 音声つきの例文を見る

コミュニティ

chevron_right 掲示板

chevron_right メンバー一覧

chevron_right 言語ごとのメンバー

chevron_right Native speakers

search
clear
swap_horiz
search

メニュー

掲示板に戻る

kakul kakul 2015年2月25日 2015年2月25日 5:45:49 UTC flag Report link 固定リンク

Hello All

I am a student from India willing to work upon the improvement of audio recording feature that is used on Tatoeba. I saw this topic in the GSoC 2015 ideas and I want to know some key points that I should take care of while thinking about some change. In a conversation with Trang I came to know that quality and pronunciation are some of the important points to take care. Initially I thought that a web-based system for recording is a priority but as Trang says it's of lower priority so I will focus on the current system.

Here are my few observations:

1. The audio file has information such as author's name in ID3 tags which is not available to user.

2. The audio is probably recorded with 44.1 kHz sampling rate with bit rate(constant for a particular audio file) but varying across different files.

3. A flash-player (in case of Firefox) is used to play the file mentioned in hyperlink tag.

4. The audio recording process is quite tedious for both the users and admins to carry out.

5. Support for multiple pronunciation/accents/gender for a word is not present.

Some problems collected from the discussions on Github Issues:

1.Fully automating the process of audio recording is unfeasible as the contributor's voice quality and accent can only be determined human ear.

2. Difficulty in parsing of audio file tags due to the ID3 version issue.

3. Not possible to reflect the changes in audio if there a change is made in sentence or vice-versa (Current solution is to make those sentences uneditable).

4.Difficulty in tracing back the author or finding his ID on Tatoeba of some earlier recorded audio files.

Some suggestions:

1. Instead of flash we can use HTML5 audio which has greater flexibility and can be manipulated better and easily through JavaScript. Hence the problem for multiple accents/gender can be solved by loading the required file only when a click event occurs.

2. A feasible amount of automation of process can be done, also this process should be ideally reversible so that glitches and conflicts can be handled without hassle. This involves having a proper structure of the databases/directories. Aim is to mainly to reduce the workload of the person carrying out upload/verification process.

3. Using compatible command-line tools like mediainfo to extract ID3 tags and write a script to automate the process. I believe mediainfo is good in terms of performance for that purpose. Even then there is possibility of some files which would require manual inspection where author's name, etc. can't be determined.

4. Experimental support for web-based audio recording using HTML5 getUserMedia API which uses microphone/webcam for recording . Although I couldn't find an official documentation regarding the recording quality but I believe it is somewhat same as that we use in Tatoeba.

I would like to know what the members have to say about this. I might be having some wrong ideas about the workflow as well as I might be skipping some key areas, so I request for your suggestions.

Cheers :)

{{vm.hiddenReplies[21921] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
odexed odexed 2015年2月25日, 編集 2015年2月25日 2015年2月25日 6:08:26 UTC, 編集 2015年2月25日 6:14:32 UTC flag Report link 固定リンク

Hi! First of all, thank you for your participation in this project. I do hope you'll succeed.
As a user I really want two things to be done: the first one is to make the audio faster i.e. not to wait for a long time after I click the sentence that I want to listen, and the second one is to have different pronunciations from native speakers from different countries.

P.S. I believe all the things you have described above are also important (of course, the most important thing is still the high quality of audio).

{{vm.hiddenReplies[21922] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
kakul kakul 2015年2月25日 2015年2月25日 10:49:00 UTC flag Report link 固定リンク

Hi odexed

Although I'm not in the right capacity to say anything for the project for now as I've yet not begun any contribution in this project. But I'll surely take care of the above reply when I get involved.

kakul kakul 2015年2月25日 2015年2月25日 10:50:32 UTC flag Report link 固定リンク

My post was meant to interact with the developers of the project and to know their views and specifically to discuss with them how to proceed further with the development.

{{vm.hiddenReplies[21924] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
TRANG TRANG 2015年2月26日 2015年2月26日 12:07:49 UTC flag Report link 固定リンク

Hi Kakul, it's nice to see that you took the time to analyse further how audio works in Tatoeba.

If you would like to interact more with the developers, then our Google group and IRC channel are the best place to go. The Wall is a place to discuss with the contributors of the corpus, and most people here have very little technical knowledge.

It's however important to discuss new features here, to possibly find out what's more important, what's less important, and what you may have forgotten about but that you need to take into account in your project.

About that, there is one thing you didn't list: the license.

A few people have expressed their interest in using audio in their own projects. But at this stage, we cannot redistribute all our audio the same way we distribute the corpus because of license issues. There are audio for which we know that the author agreed that we use them in Tatoeba (because they sent it to us), but we did not ask them if they agree that their audio is used in other projects, and if yes, do they want to restrict it to only non-commercial projects.

Since we would like to be able to share our audio, in a legal way, you will need to take into account how to manage the license.

tommy_san tommy_san 2015年2月27日 2015年2月27日 8:37:44 UTC flag Report link 固定リンク

Hi. I'm really happy that you're interested in the audio feature. I think the issue is more challenging than it looks and it's worth your whole summer.


> 3. Using compatible command-line tools like mediainfo to extract ID3 tags and write a script to automate the process.

We've already identified most of the contributors of audio.
https://github.com/Tatoeba/tato...mment-73865084
http://goo.gl/ymiKR9
Actually, audio files of most contributors don't have ID3 tags, so we needed to rely on some other information.
What you could do is to analyze the voices and check whether all the sentences in a list are read by the same person.
What you could also do is to standardize the ID3 tags of all the existing audio files.


> 4. Experimental support for web-based audio recording using HTML5 getUserMedia API which uses microphone/webcam for recording .

Have you tried out the Shtooka Recorder?
http://a4esl.com/temporary/tatoeba/shtooka/
If you're thinking of a function to record sentences one by one on a browser, you couldn't beat the Shtooka Recorder. It would be much more comfortable to record using it offline.
If you're thinking of an online version of the Shtooka Recorder, that might be nice, but I don't see many advantages.


Here are some more ideas.

1. Should we, and how should we control the quality of audio?

I personally want only good audio files, which roughly involves three aspects.
(a) Pronunciation should be correct. For example, the "i" in the Japanese word "yasashisa" shouldn't be pronounced. (http://www.forvo.com/word/yasashisa/ is wrong.)
(b) The way of reading should match the situation where the sentence is likely used. For example, a rude sentence shouldn't sound polite and vice versa.
(c) There shouldn't be too much noise.

Currently, I listen to all the Japanese recordings and when I find some files not good enough, I ask the contributor to re-record them. Some people might think that I'm being too intrusive, that Tatoeba should be open and accept everything. If the majority of the community think so, I can stop doing it and simply ask them to send the files directly to an admin. If many members are interested in the quality, and not only the quality of sound, it would be nice if there were a system that enables us to check audio files easily, either before or after they're uploaded.


2. What could (and should in my opinion) happen in the future is having recordings of dialogs read by two people, so you'll need to support this kind of audio as well.


3. Some audio files are louder or lower than other files. It would be nice if you could normalize them, not based on the MP3 files, but on the FLAC and WAV files that admins have.
You also need to keep in mind that some sentences should be read louder or lower than other sentences, so you can't simply normalize all the files.


4. Concerning license, I'd suggest that you thoroughly read the website of Creative Commons.
http://creativecommons.org/
No one of us actually know what we should do.
http://groups.google.com/group/...5d5872ca6759eb

{{vm.hiddenReplies[21939] ? 'expand_more' : 'expand_less'}} 返信を非表示 返信を表示
kakul kakul 2015年2月27日 2015年2月27日 16:29:04 UTC flag Report link 固定リンク

Hello tommy_san

Thanks for your reply.

I have mid-semester exams in my college this coming week so I'll be a little busy preparing for them. I'll reply to your message as soon I'm free. Also I'll look into the points you've mentioned and probably discuss them in detail on the Google group as Trang says.

Thanks again and apologies for the delay in my response.