Menu
I can't download any list that isn't in Latin script correctly. There are only strange symbols instead of normal letters. Is it a bug?
I opened a ticket for that - https://github.com/Tatoeba/tatoeba2/issues/830 =)
Can you add to the bug that having a byte-order mark might fix it? (Needs some testing though.)
(I don't have a github account)
UTF-8 doesn’t have a real byte-order mark because the byte-order is always the same in UTF-8.
Although some Windows applications do use byte-order mark this way (i.e. not as byte-order mark but to specify the encoding), it may cause problems with Unix applications, so I think it shouldn't be added to CSV files: i.e. the first entry of the file will end up having incorrect chars before it, because many Unix applications don't expect a BOM in UTF-8 files and don't strip it.
done!
[not needed anymore- removed by CK]
There is UTF-8 in my browser settings. Should I change the settings in Excel?
When you import the file into Excel with the wizard, there's a drop-down called 'File origin' where you need to select the UTF-8 option.
You cannot change the encoding after importing. (You must re-import to fix it.)
A screenshot of the relevant screen of the wizard:
http://wiki.redcomponent.com/in...oding_html.png
I don't see such a page when I download files.
Actually, I didn't know it was a special program to download them. How can I find it?
You can launch the wizard if you save the file to disk and then open it manually in Excel.
If you open it just by double-clicking on the file, you will bypass the wizard (which leads to Excel guessing wrong).
The dialogue is called 'Text Import Wizard'. This is how it's launched in Excel 2007:
> To start the Text Import Wizard, on the Data tab, in the Get External Data group, click From Text. Then, in the Import Text File dialog box, double-click the text file that you want to import.
The important thing is selecting the "65001: Unicode (UTF-8)" option in the wizard.
Thanks for the information about Excel. Another option is to use Calc, the spreadsheet program from OpenOffice.org, which is thoughtful enough to ask you which encoding you want to use when you open the file. OpenOffice is free.
How can I save the file as .txt?
I tried, I can only save it as .html.
I'm confused on how you end up saving them as html. They're text files and Excel should be able to import them no matter what their file extension is.
The relevant HTTP headers the server sends look like this:
HTTP/1.1 200 OK
[...]
Content-Type: application/vnd.ms-excel
[...]
Content-Disposition: attachment;filename=export_list_2994.csv
I'm not seeing anything particularly wrong and my browser recommends a .csv file name on the download dialogue.
The Content-Type should be text/csv though, but that shouldn't affect anything.
I tried to download Cirillic script, and also tried Hebrew as a test. I haven't tried to download Japanese or Arabic sentences.
It's possible that your text editor (or other program) tries to guess the encoding and messes up. Set the encoding to UTF-8 manually if the program has such an option.
Perhaps there should be an UTF-8 byte-order mark in the CSV?