Author Topic: NVG - proposal to use UTF-8 character encoding  (Read 989 times)

0 Members and 1 Guest are viewing this topic.

Offline Nich

  • Supporter
  • 6128 Plus
  • *
  • Posts: 699
  • Country: gb
  • CPC Game Reviews webmaster
    • CPC Game Reviews
  • Liked: 543
  • Likes Given: 679
NVG - proposal to use UTF-8 character encoding
« on: 22:45, 30 August 19 »
For what seems like an eternity now, the NVG FTP archive has been using the ISO-8859-1 character encoding for metadata, such as the file_id.diz files that are found in almost every ZIP file on the site, and the 00_table.csv file that combines all the information in the file_id.diz into a single file.

For many years, this was all well and good. ISO-8859-1 can accommodate accented characters in French, Spanish and German, and the vast majority of non-British CPC users are from these countries. However, it's 2019 now, and the world has moved on, and UTF-8 is now the de facto standard for character encoding. I can't properly write the names of authors from the Balkans or Hungary using ISO-8859-1.

I don't know how many people still use NVG for downloading CPC software, let alone use the data in it for their own applications. I know CPCGamesCD uses the data, although it hasn't been updated for nearly three years. Nonetheless I want to consult with CPC users before making any changes.

Does anyone have or foresee any problems with re-encoding the metadata to UTF-8?

Offline troels

  • CPC464
  • **
  • Posts: 6
  • Liked: 0
  • Likes Given: 0
Re: NVG - proposal to use UTF-8 character encoding
« Reply #1 on: 22:57, 31 August 19 »
Hi Nich. Perhaps insert BOM at the beginning of the file to make it possible to detect the format used
("The UTF-8 representation of the BOM is the hexadecimal byte sequence 0xEF,0xBB,0xBF.", https://en.wikipedia.org/wiki/Byte_order_mark)

Offline Nich

  • Supporter
  • 6128 Plus
  • *
  • Posts: 699
  • Country: gb
  • CPC Game Reviews webmaster
    • CPC Game Reviews
  • Liked: 543
  • Likes Given: 679
Re: NVG - proposal to use UTF-8 character encoding
« Reply #2 on: 01:19, 02 September 19 »
Hi Nich. Perhaps insert BOM at the beginning of the file to make it possible to detect the format used
("The UTF-8 representation of the BOM is the hexadecimal byte sequence 0xEF,0xBB,0xBF.", https://en.wikipedia.org/wiki/Byte_order_mark)

The same link suggests that the BOM isn't really necessary for UTF-8, and my intention is that every file_id.diz will be re-encoded in UTF-8.

One of the clever things about UTF-8 is that if a file_id.diz file is encoded in ISO-8859-1, any accented characters that are present are usually not valid UTF-8.

Offline Docent

  • CPC6128
  • ****
  • Posts: 166
  • Country: pl
  • Liked: 104
  • Likes Given: 0
Re: NVG - proposal to use UTF-8 character encoding
« Reply #3 on: 00:17, 03 September 19 »
The same link suggests that the BOM isn't really necessary for UTF-8, and my intention is that every file_id.diz will be re-encoded in UTF-8.

One of the clever things about UTF-8 is that if a file_id.diz file is encoded in ISO-8859-1, any accented characters that are present are usually not valid UTF-8.

Add BOM and encode as utf-8 only files containing characters with code values >127, do not change other files.

Offline genesis8

  • CPC6128
  • ****
  • Posts: 187
  • Country: fr
    • Genesis8 Amstrad Page
  • Liked: 51
  • Likes Given: 21
Re: NVG - proposal to use UTF-8 character encoding
« Reply #4 on: 10:21, 22 October 19 »
Sorry for being late to reply.
It isnt a problem for me, I will just have to adjust some code on my site.
____________
Amstrad news site at Genesis8 Amstrad Page

Offline roudoudou

  • 6128 Plus
  • ******
  • Posts: 708
  • Country: fr
    • urban exploration
  • Liked: 984
  • Likes Given: 619
Re: NVG - proposal to use UTF-8 character encoding
« Reply #5 on: 13:21, 22 October 19 »
I don't know how many people still use NVG for downloading CPC software, let alone use the data in it for their own applications. I know CPCGamesCD uses the data, although it hasn't been updated for nearly three years. Nonetheless I want to consult with CPC users before making any changes.
Hi Nich
First, thanks for being one of the first (the first?) people to host CPC software!
 I was using NVG back in the 90's and that was great!You're asking if there is people using NVG nowadays
 I'm not anymore since cpc-power.com is as far as i know the ultimate reference and cpcrulez forum contains also many new dumps
So... How can you convince me (us?) to come back to NVG ?
Can you tell us more about the files on NVG ? Sources, exclusive content, ..
.Regards
use RASM, the best assembler ever made :p

I will survive

Offline Nich

  • Supporter
  • 6128 Plus
  • *
  • Posts: 699
  • Country: gb
  • CPC Game Reviews webmaster
    • CPC Game Reviews
  • Liked: 543
  • Likes Given: 679
Re: NVG - proposal to use UTF-8 character encoding
« Reply #6 on: 23:17, 23 October 19 »
First, thanks for being one of the first (the first?) people to host CPC software!
 I was using NVG back in the 90's and that was great!

I didn't set up the NVG archive. I believe @llopis did that way back in 1994. I took over the administration of the archive in January 2001.

Quote
You're asking if there is people using NVG nowadays
 I'm not anymore since cpc-power.com is as far as i know the ultimate reference and cpcrulez forum contains also many new dumps
So... How can you convince me (us?) to come back to NVG ?
Can you tell us more about the files on NVG ? Sources, exclusive content, ..

I readily acknowledge that CPC-POWER and CPCRulez are the 'go to' sites for CPC downloads nowadays and NVG's importance has diminished considerably, and I'm actually OK with that.

There are various projects I would like to do to tidy up and reorganise NVG, but I don't have a lot of time to do that, and I think CPC-POWER and CPCRulez are fairly well organised anyway.

My aim with NVG has always been to focus on games, and instead of providing multiple versions of each game, try to provide the single best version that I can find. Of course, the definition of 'best' is subjective. There isn't really any content that's exclusive to NVG, as anything that is released nowadays will also be available on CPC-POWER and CPCRulez.

I would also like to modernise CPC Game Reviews, as it really needs modernising and improvement, but that will take a long time as my web development skills are rather basic after all these years. :-[