News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu

NVG - proposal to use UTF-8 character encoding

Started by Nich, 20:45, 30 August 19

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Nich

For what seems like an eternity now, the NVG FTP archive has been using the ISO-8859-1 character encoding for metadata, such as the file_id.diz files that are found in almost every ZIP file on the site, and the 00_table.csv file that combines all the information in the file_id.diz into a single file.

For many years, this was all well and good. ISO-8859-1 can accommodate accented characters in French, Spanish and German, and the vast majority of non-British CPC users are from these countries. However, it's 2019 now, and the world has moved on, and UTF-8 is now the de facto standard for character encoding. I can't properly write the names of authors from the Balkans or Hungary using ISO-8859-1.

I don't know how many people still use NVG for downloading CPC software, let alone use the data in it for their own applications. I know CPCGamesCD uses the data, although it hasn't been updated for nearly three years. Nonetheless I want to consult with CPC users before making any changes.

Does anyone have or foresee any problems with re-encoding the metadata to UTF-8?

troels

Hi Nich. Perhaps insert BOM at the beginning of the file to make it possible to detect the format used
("The UTF-8 representation of the BOM is the hexadecimal byte sequence 0xEF,0xBB,0xBF.", https://en.wikipedia.org/wiki/Byte_order_mark)

Nich

Quote from: troels on 20:57, 31 August 19
Hi Nich. Perhaps insert BOM at the beginning of the file to make it possible to detect the format used
("The UTF-8 representation of the BOM is the hexadecimal byte sequence 0xEF,0xBB,0xBF.", https://en.wikipedia.org/wiki/Byte_order_mark)

The same link suggests that the BOM isn't really necessary for UTF-8, and my intention is that every file_id.diz will be re-encoded in UTF-8.

One of the clever things about UTF-8 is that if a file_id.diz file is encoded in ISO-8859-1, any accented characters that are present are usually not valid UTF-8.

Docent

Quote from: Nich on 23:19, 01 September 19
The same link suggests that the BOM isn't really necessary for UTF-8, and my intention is that every file_id.diz will be re-encoded in UTF-8.

One of the clever things about UTF-8 is that if a file_id.diz file is encoded in ISO-8859-1, any accented characters that are present are usually not valid UTF-8.

Add BOM and encode as utf-8 only files containing characters with code values >127, do not change other files.

genesis8

Sorry for being late to reply.
It isnt a problem for me, I will just have to adjust some code on my site.
____________
Amstrad news site at Genesis8 Amstrad Page

roudoudou

Quote from: Nich on 20:45, 30 August 19
I don't know how many people still use NVG for downloading CPC software, let alone use the data in it for their own applications. I know CPCGamesCD uses the data, although it hasn't been updated for nearly three years. Nonetheless I want to consult with CPC users before making any changes.
Hi Nich
First, thanks for being one of the first (the first?) people to host CPC software!
I was using NVG back in the 90's and that was great!You're asking if there is people using NVG nowadays
I'm not anymore since cpc-power.com is as far as i know the ultimate reference and cpcrulez forum contains also many new dumps
So... How can you convince me (us?) to come back to NVG ?
Can you tell us more about the files on NVG ? Sources, exclusive content, ..
.Regards
My pronouns are RASM and ACE

Nich

Quote from: roudoudou on 11:21, 22 October 19
First, thanks for being one of the first (the first?) people to host CPC software!
I was using NVG back in the 90's and that was great!

I didn't set up the NVG archive. I believe @llopis did that way back in 1994. I took over the administration of the archive in January 2001.

QuoteYou're asking if there is people using NVG nowadays
I'm not anymore since cpc-power.com is as far as i know the ultimate reference and cpcrulez forum contains also many new dumps
So... How can you convince me (us?) to come back to NVG ?
Can you tell us more about the files on NVG ? Sources, exclusive content, ..

I readily acknowledge that CPC-POWER and CPCRulez are the 'go to' sites for CPC downloads nowadays and NVG's importance has diminished considerably, and I'm actually OK with that.

There are various projects I would like to do to tidy up and reorganise NVG, but I don't have a lot of time to do that, and I think CPC-POWER and CPCRulez are fairly well organised anyway.

My aim with NVG has always been to focus on games, and instead of providing multiple versions of each game, try to provide the single best version that I can find. Of course, the definition of 'best' is subjective. There isn't really any content that's exclusive to NVG, as anything that is released nowadays will also be available on CPC-POWER and CPCRulez.

I would also like to modernise CPC Game Reviews, as it really needs modernising and improvement, but that will take a long time as my web development skills are rather basic after all these years. :-[

Nich

It's been a long time since I first suggested modifying NVG to use UTF-8. The reason for the delay is that I was waiting for a new release of CPCGamesCD. However that hasn't happened, and no news is forthcoming at this time, so my plan now is to start using UTF-8 from 1st April 2020.

Powered by SMFPacks Menu Editor Mod