menu
Tatoeba
language
Register Log in
language English
menu
Tatoeba

chevron_right Register

chevron_right Log in

Browse

chevron_right Show random sentence

chevron_right Browse by language

chevron_right Browse by list

chevron_right Browse by tag

chevron_right Browse audio

Community

chevron_right Wall

chevron_right List of all members

chevron_right Languages of members

chevron_right Native speakers

search
clear
swap_horiz
search
Selena777 Selena777 October 14, 2015 October 14, 2015 at 7:31:45 AM UTC link Permalink

I can't download any list that isn't in Latin script correctly. There are only strange symbols instead of normal letters. Is it a bug?

{{vm.hiddenReplies[24550] ? 'expand_more' : 'expand_less'}} hide replies show replies
Ricardo14 Ricardo14 October 14, 2015, edited October 14, 2015 October 14, 2015 at 1:29:09 PM UTC, edited October 14, 2015 at 1:33:05 PM UTC link Permalink

I opened a ticket for that - https://github.com/Tatoeba/tatoeba2/issues/830 =)

{{vm.hiddenReplies[24570] ? 'expand_more' : 'expand_less'}} hide replies show replies
wells wells October 15, 2015 October 15, 2015 at 7:53:34 AM UTC link Permalink

Can you add to the bug that having a byte-order mark might fix it? (Needs some testing though.)

(I don't have a github account)

{{vm.hiddenReplies[24579] ? 'expand_more' : 'expand_less'}} hide replies show replies
User55521 User55521 October 15, 2015, edited October 15, 2015 October 15, 2015 at 9:44:46 AM UTC, edited October 15, 2015 at 9:47:59 AM UTC link Permalink

UTF-8 doesn’t have a real byte-order mark because the byte-order is always the same in UTF-8.

Although some Windows applications do use byte-order mark this way (i.e. not as byte-order mark but to specify the encoding), it may cause problems with Unix applications, so I think it shouldn't be added to CSV files: i.e. the first entry of the file will end up having incorrect chars before it, because many Unix applications don't expect a BOM in UTF-8 files and don't strip it.

Ricardo14 Ricardo14 October 16, 2015 October 16, 2015 at 12:33:20 AM UTC link Permalink

done!

CK CK October 14, 2015, edited October 30, 2019 October 14, 2015 at 1:45:45 PM UTC, edited October 30, 2019 at 10:34:03 AM UTC link Permalink

[not needed anymore- removed by CK]

{{vm.hiddenReplies[24572] ? 'expand_more' : 'expand_less'}} hide replies show replies
Selena777 Selena777 October 14, 2015 October 14, 2015 at 4:25:56 PM UTC link Permalink

There is UTF-8 in my browser settings. Should I change the settings in Excel?

{{vm.hiddenReplies[24575] ? 'expand_more' : 'expand_less'}} hide replies show replies
wells wells October 14, 2015 October 14, 2015 at 5:12:27 PM UTC link Permalink

When you import the file into Excel with the wizard, there's a drop-down called 'File origin' where you need to select the UTF-8 option.

You cannot change the encoding after importing. (You must re-import to fix it.)

A screenshot of the relevant screen of the wizard:
http://wiki.redcomponent.com/in...oding_html.png

{{vm.hiddenReplies[24576] ? 'expand_more' : 'expand_less'}} hide replies show replies
Selena777 Selena777 October 15, 2015 October 15, 2015 at 6:37:17 AM UTC link Permalink

I don't see such a page when I download files.
Actually, I didn't know it was a special program to download them. How can I find it?

{{vm.hiddenReplies[24577] ? 'expand_more' : 'expand_less'}} hide replies show replies
wells wells October 15, 2015 October 15, 2015 at 7:50:32 AM UTC link Permalink

You can launch the wizard if you save the file to disk and then open it manually in Excel.

If you open it just by double-clicking on the file, you will bypass the wizard (which leads to Excel guessing wrong).

The dialogue is called 'Text Import Wizard'. This is how it's launched in Excel 2007:

> To start the Text Import Wizard, on the Data tab, in the Get External Data group, click From Text. Then, in the Import Text File dialog box, double-click the text file that you want to import.

The important thing is selecting the "65001: Unicode (UTF-8)" option in the wizard.

{{vm.hiddenReplies[24578] ? 'expand_more' : 'expand_less'}} hide replies show replies
AlanF_US AlanF_US October 15, 2015 October 15, 2015 at 12:40:28 PM UTC link Permalink

Thanks for the information about Excel. Another option is to use Calc, the spreadsheet program from OpenOffice.org, which is thoughtful enough to ask you which encoding you want to use when you open the file. OpenOffice is free.

Selena777 Selena777 October 15, 2015 October 15, 2015 at 3:00:19 PM UTC link Permalink

How can I save the file as .txt?
I tried, I can only save it as .html.

{{vm.hiddenReplies[24583] ? 'expand_more' : 'expand_less'}} hide replies show replies
wells wells October 16, 2015 October 16, 2015 at 5:52:58 AM UTC link Permalink

I'm confused on how you end up saving them as html. They're text files and Excel should be able to import them no matter what their file extension is.

The relevant HTTP headers the server sends look like this:

HTTP/1.1 200 OK
[...]
Content-Type: application/vnd.ms-excel
[...]
Content-Disposition: attachment;filename=export_list_2994.csv

I'm not seeing anything particularly wrong and my browser recommends a .csv file name on the download dialogue.

The Content-Type should be text/csv though, but that shouldn't affect anything.

Selena777 Selena777 October 15, 2015 October 15, 2015 at 2:46:43 PM UTC link Permalink

I tried to download Cirillic script, and also tried Hebrew as a test. I haven't tried to download Japanese or Arabic sentences.

wells wells October 14, 2015 October 14, 2015 at 3:06:08 PM UTC link Permalink

It's possible that your text editor (or other program) tries to guess the encoding and messes up. Set the encoding to UTF-8 manually if the program has such an option.

Perhaps there should be an UTF-8 byte-order mark in the CSV?