Hello,
I have created a BASIC tokeniser (and AMSDOS header) for a big BASIC project. I know WinApe can do it by 'typing' a copy/paste text, but it is slow (you have a x10 but you need to switch it ON and OFF) and it can't do several files... With this, I can easily add several BAS files into a DSK file.
It works for me and seem to create identical tokens than my tests with WinApe. Only missing point: no floating point tokeniser... If you know how to create the 5 bytes from a string, feel free to tell me (I have some documentation and a link to the ROM assembly, but I can't figure it out in python).
Link to project (https://jolletx.visualstudio.com/CPCBasicator/_git/CPCBasicator)
https://www.cpcwiki.eu/index.php?title=Technical_information_about_Locomotive_BASIC&mobileaction=toggle_view_desktop#Floating_Point_data_definition
Yes I know this one (I put a link to it in doc directory of project).
Just not so easy to reprogram (even with readable ASM method).
https://github.com/Bread80/CPC6128-Firmware-Source look for REAL_5byte_to_real
Also, my project was not using floating point numbers so no real need for them :D
@SagaDS if floatnumber=0 then
exponent=0
mantissa=0
else
exponent =-127
norm= abs(floatnumber)/(2^exponent)
while norm>=1 or norm<0.5
exponent=exponent+1
norm= abs(floatnumber)/(2^exponent)
wend
if sgn(floatnumber)=1 then norm=norm-0.5
mantissa=int(norm*(2^32))
exponent=exponent+128
endif
4 bytes Little endian mantissa and 1 byte exponent
I wrote a small program in Locomotive BASIC to do the conversion. It should not be a problem to convert any part of it to Python or something else.
100 MODE 2 : ZONE 16
110 float!=-123.456
120 PRINT "Number: ",float!
130 PRINT
140 IF float!<>0 THEN GOTO 170
150 PRINT "Special case: Set all bytes to zero."
160 GOTO 350
170 sign=SGN(float!)
180 intDigits=INT(LOG(sign*float!)/LOG(2))
190 exponent=128+intDigits+1
200 mantissa=(sign*float!)/(2^intDigits)
210 PRINT "Sign: ",sign
220 PRINT "Exponent: ",exponent,
230 PRINT "&";RIGHT$("0"+HEX$(exponent),2)
240 mantissa=mantissa-1
250 mantissa=mantissa/2
260 PRINT "Mantissa:"
270 FOR i=1 TO 4
280 mantissa=mantissa*256
290 intPart=INT(mantissa)
300 mantissa=mantissa-intPart
310 IF i=1 AND sign<0 THEN intPart=intPart+128
320 PRINT i;": ",intPart,
330 PRINT "&";RIGHT$("0"+HEX$(intPart),2)
340 NEXT
350 PRINT
360 PRINT "Memory:"
370 FOR i=0 TO 4
380 byte=PEEK(@float!+i)
390 PRINT i;": ",byte,
400 PRINT "&";RIGHT$("0"+HEX$(byte),2)
410 NEXT
420 PRINT
Some explanation:
- Line 110: In modern languages often there is some kind of parsing function to convert a string to a floating point number. Here I use the BASIC interpreter to do this work. Afterwards the floating point number is deconstructed to its parts.
- Line 140: Value 0.0 is a special case stored as five bytes with &00.
- Line 180: Instead of searching for the first bit, it can be calculated with a logarithm.
- Line 190: Because the result of the logarithm is rounded down, the exponent is increased by one.
- Line 200: The mantissa value will always start with 1,...
- Line 240: Remove the leading 1. The mantissa is now 0,...
- Line 250: Reserve one bit for the sign. Shift the mantissa right. It is now 0,0...
- Line 280: Shift the mantissa left by 8 bits. Extract these 8 bits.
- Line 310: Only for the first mantissa byte: Add the sign bit.
- Line 370: As can be seen, in memory the values are stored in reverse.
Finally I have to say that I only did a few tests. There could be some corner cases left where unexpected things happen. Also using a floating point number itself to do the calculations could introduce rounding errors. This effect will be minimized on modern systems that use 64 bits, much more than the needed 32 bits.
This works on every CPC and emulator:
- Load ASCII file (every line begins with an number)
- Save it with SAVE"xzy
Now you got a BASIC program on you disc / cassette
Quote from: GUNHED on 15:11, 20 July 25This works on every CPC and emulator:
- Load ASCII file (every line begins with an number)
- Save it with SAVE"xzy
Now you got a BASIC program on you disc / cassette
There are several ways to generate BASIC files.
My purpose here was to generate them directly on a DSK produce on PC.
Quote from: McArti0 on 21:35, 19 July 25@SagaDS
if floatnumber=0 then
exponent=0
mantissa=0
else
exponent =-127
norm= abs(floatnumber)/(2^exponent)
while norm>=1 or norm<0.5
exponent=exponent+1
norm= abs(floatnumber)/(2^exponent)
wend
if sgn(floatnumber)=1 then norm=norm-0.5
mantissa=int(norm*(2^32))
exponent=exponent+128
endif
4 bytes Little endian mantissa and 1 byte exponent
Thanks for proposed algo (@lightforce6128 too).
I will give it a go sometime in future.
Just hope that python will not modify float precision in a way that result won't be the same...
That is why I was looking for a text parser (thus the ROM information) instead of a conversion from float...
You have to calculate with double numbers.
I have implemented algorithm of lightforce6128.
I had to modify one thing when testing with more values in python (was working in BASIC):
intDigits=int(math.log(sign*floatnumber)/math.log(2))
if intDigits<0:
intDigits-=1
exponent=128+intDigits+1
New version v1.0 pushed.
It took me a while to figure it out in TypeScript. I hope it's correct...
https://github.com/benchmarko/CPCBasicTS/blob/8496dd96ecc1a2585626637fc14b6d23e3c0952f/src/CodeGeneratorToken.ts#L375C1-L394C3
private static floatToByteString(number: number) {
let mantissa = 0,
exponent = 0,
sign = 0;
if (number !== 0) {
if (number < 0) {
sign = 0x80000000;
number = -number;
}
exponent = Math.ceil(Math.log(number) / Math.log(2));
mantissa = Math.round(number / Math.pow(2, exponent - 32)) & ~0x80000000;
if (mantissa === 0) {
exponent += 1;
}
exponent += 0x80;
}
return CodeGeneratorToken.convInt32ToString(sign + mantissa) + CodeGeneratorToken.convUInt8ToString(exponent);
}
And the reverse (bytes to number):
https://github.com/benchmarko/CPCBasicTS/blob/8496dd96ecc1a2585626637fc14b6d23e3c0952f/src/BasicTokenizer.ts#L98C1-L114C3
...
Thanks
@SagaDS for writing and sharing this! Love the name, by the way! :)
Alternatives to CPC BasicatorAs other noticed, one can let Locomotive BASIC read ASCII and produce binary. It is its own reference implementation. Of course this is interesting if all done without any manual step: have in your toolchain a continuous integration step that instruments any open-source emulator to read an ascii text into the BASIC interpreter and have it save a binary file. CPC Basicator does it in one step very quickly without launching an emulator.
Even more, do we actually need a binary BASIC program? One can simply save the BASIC program as a text file on the tape/disc and call it a day. That's what I did in color-flood-for-amstrad-cpc (https://github.com/cpcitor/color-flood-for-amstrad-cpc). Job done!
If that was all, then one might question the value of CPC Basicator.
The real benefit CPC Basicator can haveTo me the real benefit of such a program would be to create binary files that the firmware BASIC is incapable of providing!
Consider this:
* a program that, given an ASCII input that the regular BASIC interpreter would accept, always produces exactly the same binary as the regular BASIC interpreter, byte for byte -- that's what CPC Basicator currently aims at
* yet some ASCII input that the regular interpreter would reject (or ignore some parts, like a line that starts with REM or ' and no line number), produce a binary with something more interesting, that the regular BASIC interpreter would be incapable of
Some real world use for such a programSome prods have mixed BASIC/binary loader. The point is to have a file that the firmware recognizes as a BASIC program, so the firmware is not reset when it runs, yet its payload is actually Z80 binary code, in one file. This is typically done by hiding some lines that contains the compiled Z80 code in comments.
To do this, I have seen somewhere a ASM source code that, interspersed with actual Z80 instructions as assembly source, hard-codes some bytes so that the result of calling an assembler on that source is a file that the firmware recognizes and can load and run as a BASIC program. Even when commented, the assembly source code is at best readable, not practical to write. One might imagine wanting to write long BASIC programs with many short assembly parts.
One could imagine that CPC Basicator is extended to make such a program easier to make.
Let's get wild nowI see two ways:
* modify the output so that CPC Basicator generates not a binary program but assembly source code (with a choice of Z80 syntax), to be later interspersed with actual assembly source code and processed by a regular pre-existing assembler to make the actual BASIC binary file. Not obvious how to put the pieces together. Would allow links between various assembled parts (like, Z80 code hidden in line 100 could reference Z80 code and data hidden in line 120 or even BASIC structures).
* expanding the syntax accepted by CPC Basicator with useful constructs, and have it call an external assembly program to compile each part. Each assembly part would be independent and could not refer to each other.
Examples:
* define a pointer label that is the address of any part
100 print $(PTR: my_address) "Hello"
110 print peek( $(@ my_address) )
* lines given as a inline binary stream:
100 $(BIN: C0 20 43 50 43 20 72 75 6c 65 73 0a)
100 '$(BIN: 20 43 50 43 20 72 75 6c 65 73 0a)
* (let's get wild) insert assembly code anywhere:
100 $(PTR: clrscr) '$(ASM
ld hl, # 0xC000
ld (hl), a
ld de, # 0xc001
ld bc, # 3fff
ldir
ret
)
110 call $(@ clrscr)
One could even imagine from the Z80 ASM reference addresses of lines or even individual token, to change strings or even adjust code at run time. Crazy? Yes. Useful? I don't know. If someone finds a use then it is useful.
* autonumbering mode, use labels instead of lines in your source, get a regular BASIC program with generated lines. This can be activated locally.
$(set autonumber on)
PRINT "This line will automatically get a number and prints once."
$(LINENO: loop) PRINT "Hello CPC Basicator (in a loop)"
GOTO $(# loop)
Other ideas:
* given as input a binary file containing a valid BASIC program, turn it into text again ("CPC DeBasicator")
* given as input a binary file containing a hacked BASIC program (using any of the known hacks: out-of-order line numbers, line number zero, comment with binary values), turn it into text again but with some extra information, like hidden lines, decode binary into proper $(BIN ...) instructions, etc
(Did any one say "transpiler"?)
Notice the CPC Basic interpreter accept (nearly?) anything you throw at it, provided it has a valid line number and is not too long. This means that technically, the use of $(...) syntax is not good, in the sense that while it is not a valid run-time syntax, the regular interpreter does generate a valid (as in "you can save and load it") binary BASIC file. So this would need a decision: adopt $() considering that no sane BASIC program would use that? Define some different syntax?
Or maybe I just got too crazy and this is of no use. Your turn to imagine things now. 8)
Wouah, that's a lot of ideas :o
Really appreciate it ;) Sorry for not seeing it earlier, I was on vacations.
As I said, my main goal was in 2 points:
- Make sure that BASIC I write on a PC is read directly by the interpretor on CPC and not interpreted every time : As my main program was composed of several BASIC files of several kB (using MERGE instruction), I wished to just test them as quickly as possible on an emulator (with binary I was sure that it is quick)
- I also use this to put all my comments inside my code as 'not translated to CPC' or using same number lines, meaning that only useful instructions are send to CPC (not comments)
As I said, I decided to create this tool for me. When using small BASIC (like a launcher), ASCII is enough.
Now, on your proposals:
On the BASIC/assembly mix in one file, I recognize the C64 way of doing things (ZX80 listings also uses comment with Z80 code inside). But I never see this on CPC. Not sure it is possible. That would need investigations.
I see more that like an assembler generating BASIC stuff directly inside correct adress... I've just watched a C64 video on this kind of assembly : https://www.youtube.com/watch?v=cFWNo1GjMP0 . You assemble it and you have a 10 SYSxxxx generated so you can run it easily.
But inlining assembly in Basic seems a bit off topic this tool for my taste ;D
About a Debasicator, it is already existing, I think a simple search may find what you want. It is easy to code... I think DiskEditor tools can do it. And a deobfuscator of invalid BASIC statements/lines seems a project on its own.
Autonumbering with labels is a good idea. A bit hard to put in place because of all labels that will be needed inside a BASIC program, but yes that can be helpful.