Author Topic: "Tokenizer" for A-lot-of-Text-games ..  (Read 658 times)

0 Members and 1 Guest are viewing this topic.

Offline SRS

  • Supporter
  • 6128 Plus
  • *
  • Posts: 628
  • Country: de
  • Schneider CPC464 - what else ?
  • Liked: 627
  • Likes Given: 363
"Tokenizer" for A-lot-of-Text-games ..
« on: 00:06, 30 September 16 »
Transfering BASIC-Adeventures from other platforms often leads to the "low memory" problem on CPC in Basic.

Sometime I do a manual "tokenizing" if I see some big chunk of repeated text in several lines, i.e." You are watching a" I'll reduce to some string ... or "token"

Do we have some software out there that scans let'S say 60 room desriptions and so on for "common text parts", (above 3 ore four chars) and gives a table of replacement ? Like "you do" becomes "T01" ... so we could compress / recylce parst of text "On the fly" maybe with an RSx and using second RAM bank for adventures ?

Hope you get my point ...

Example :

Code: [Select]
"You are standing at the bar inside the tavern. There ..."

gets

Aat the bar inside the tavern. There ..."

With decoding
Code: [Select]
IF LEFT$(hoel$,1)="A" THEN PRINT"You are standing ";:GOTO 998

« Last Edit: 00:09, 30 September 16 by SRS »

Offline Prodatron

  • 6128 Plus
  • ******
  • Posts: 833
  • Country: de
  • Back on the Z80
    • index.php?action=treasury
    • SymbOS SYmbiosis Multitasking Based Operating System
  • Liked: 1061
  • Likes Given: 556
Re: "Tokenizer" for A-lot-of-Text-games ..
« Reply #1 on: 01:14, 30 September 16 »
Better use something like
#A
instead of just A
so you can replace phrases anywhere in the sentences. With the help of INSTR and MID it's easy to write a string replace function.
Regarding "common text parts" it's nothing else than using a compression algorithm which is based on a dictionary.
« Last Edit: 01:16, 30 September 16 by Prodatron »

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

Offline Docent

  • CPC6128
  • ****
  • Posts: 166
  • Country: pl
  • Liked: 108
  • Likes Given: 0
Re: "Tokenizer" for A-lot-of-Text-games ..
« Reply #2 on: 16:27, 03 October 16 »
Transfering BASIC-Adeventures from other platforms often leads to the "low memory" problem on CPC in Basic.

Sometime I do a manual "tokenizing" if I see some big chunk of repeated text in several lines, i.e." You are watching a" I'll reduce to some string ... or "token"

Do we have some software out there that scans let'S say 60 room desriptions and so on for "common text parts", (above 3 ore four chars) and gives a table of replacement ? Like "you do" becomes "T01" ... so we could compress / recylce parst of text "On the fly" maybe with an RSx and using second RAM bank for adventures ?

Hope you get my point ...

Example :

Code: [Select]
"You are standing at the bar inside the tavern. There ..."

gets

Aat the bar inside the tavern. There ..."

With decoding
Code: [Select]
IF LEFT$(hoel$,1)="A" THEN PRINT"You are standing ";:GOTO 998

Hey, you've just invented LZW compression algorithm :)


Offline SRS

  • Supporter
  • 6128 Plus
  • *
  • Posts: 628
  • Country: de
  • Schneider CPC464 - what else ?
  • Liked: 627
  • Likes Given: 363
Re: "Tokenizer" for A-lot-of-Text-games ..
« Reply #3 on: 23:19, 03 October 16 »
IF it would be the SRS algorithm :)

But did not see something like this for single PRINT commands until nowadays. :)