Rasm Z80 assembler

roudoudou · 13:14, 22 May 23

Quote from: zhulien on 13:11, 22 May 23SJASM for some things which breaks the relocation throws up a warning, perhaps hi and lo could do that - and people who need relocation wouldn't use those - it is 'sort of' a copout, but it is a reasonable one until a way to resolve that could be found.

if there is no 8bits usage, there wont be 8bits output

Typhon · 21:54, 08 June 23

Might not be much in the grand scheme of things, but rasm helped me do my first assembly language program for the CPC ever. Z80 assembler defeated me when I was a kid, but 30 years later, mwahahahaha! So thank you Eduoard! (I'm using VSCode/Rasm/Caprice32 on Ubuntu 22.04 LTS)

roudoudou · 19:00, 19 June 23

new release out! https://github.com/EdouardBERGE/rasm/releases/tag/v1.9

Rasm v1.9 - 2023 Guilty Pleasure

- ludicrous speed enhancement
- new struct export for rasm embeded
- enforce LD r8,im8 usage (to avoid confusion when it's possible between memory reference and calculation using parenthesis)
- enforce AMSDOS name proper format
- may export files on DSK with another user
- new REMU chunk in snapshots for debug usage and precise breakpoints
- new RMxx chunks in snapshots for ROM usage
- new directives ROMBANK to manage ROM in snapshot output
- new directives SRL8 and shift aliases RLC/RRC for 16 bits registers
- new conditionnals RST #38
- audio filtering for DMA conversion
- various prototypes in order to design relocatable code (still in progress but documented)
- changed DJNZ timing to "loop"
- bugfix some autotests and leaks with embeded Rasm
- up to date documentation
- each OS build of rasm is still tested with more than 1100 automatic tests

builds for windows XP, windows x64
builds for MacOS ARM and MacOS Intel

zhulien · 21:07, 20 June 23

I have a suggested feature I haven't seen a z80 assembler support, what are your thoughts.

Normally people code their assembly language program with a 64k memory map and they need to deal with complex memory banking if they want large programs and/or other overlay methods.

My suggestion is this. I can elaborate if you need, but it is it introduce a couple of memory models with supporting directives.

Model directive:

Model small ; default behaviour no different to now

Model large ; 4mb memory model

Model gigantic; 16mb memory model

Instead of programs being assembly purely from org to himem as in small... large and gigantic assemble the code unto the banks instead as follows.

Bank directives:

Set Global 0040, 3fff, 8000, a410
Set Banked 4000, 7fff ;can be hardcoded but if flexible can work on other platforms

Global ; code after this is assembled in the global areas

Banked ; code here is assembled only inside the banks

Set farcall blah ; blah is the farcall function in global ram

Align keyword in front of labels that need alignment.

Banked code is byte aligned to either 64bits for large memory model, or 256 bytes for gigantic model whenever a call or jump or label is referenced that is defined within the banked ram and aligned. This allows a small global farcall subroutine to make fast 16bit far calls.

Large memory model suits very nicely large ram or rom assemblies up to a 4mb plus binary without much thought. And gigantic also but with virtual memory support in the farcall function or a future 16mb ram expansion.

It means I can make a high level language generate rasm source and not be constrained by memory and memory now is cheap.

CPC users have no excuse to not buy a 2mb or 1mb Gemini card.

zhulien · 04:13, 21 June 23

Correction:

Banked code is byte aligned to either 64 bytes for large memory model, or 256 bytes for gigantic model

roudoudou · 06:44, 21 June 23

You should read the documentation, there is already a complete banking system

For example it's possible to assemble in RAM and ROM (lower and upper) when building snapshots, you may use BANK gathered by 64K or use a 64K space for a single 16K bank

Alignment is already possible with ALIGN directive and also CONFINE for small arrays

And if you need moar than what is already existing, you can simply BANK as many as you want, temporary 64K space (and use SAVE)

zhulien · 07:52, 21 June 23

Oops

zhulien · 07:53, 21 June 23

I have now read the memory management and alignment documentation. Great to see it there. I was hoping a single assembly could handle all the addressing and substitute necessary far calls instead of the programmer. Rasm is open source right? Maybe I can try figure out how to add the large and gigantic schemes. They are sort of like the ROM schemes but not totally. As mentioned a single 16bit byte aligned address can address either 4mb or 16mb. The reason we don't align ever address is... neighboring functions in a same bank don't need to do far calls and can freely call eachother normally.

roudoudou · 08:26, 21 June 23

Quote from: zhulien on 07:53, 21 June 23I have now read the memory management and alignment documentation. Great to see it there. I was hoping a single assembly could handle all the addressing and substitute necessary far calls instead of the programmer. Rasm is open source right? Maybe I can try figure out how to add the large and gigantic schemes. They are sort of like the ROM schemes but not totally. As mentioned a single 16bit byte aligned address can address either 4mb or 16mb. The reason we don't align ever address is... neighboring functions in a same bank don't need to do far calls and can freely call eachother normally.

You said you "can make a high level language generate rasm source and not be constrained by memory and memory now is cheap." so it wont be to the programmer charge

(only your generator)

From my point of view you may already use rasm to use 16M of BANK if you want, charge to your programm to track what you do (i'v done myself an 8M ROM proof of concept (working!) years ago)

So, go for it!

zhulien · 09:15, 21 June 23

There is a difference in me working out the memory usage and deciding what goes into a bank and actually generating code than me generating assembler for rasm. The latter is easier at my end. If I generate code then it becomes a full compiler.

zhulien · 09:17, 21 June 23

To use the 16kb blocks, rasm currently won't track what is a local or far call will it?

roudoudou · 09:38, 21 June 23

you can organize yourself the way you want with many kinds of labels => http://rasm.wikidot.com/label:labels

zhulien · 09:50, 21 June 23

As the coder, how will I know before assembly what is going to fit in a bank before overflowing to the next bank, and moving a whole function to the next bank and then changing any labels that were local into far calls? Only the assembler know that unless I want things to be in specific banks.

roudoudou · 10:01, 21 June 23

Quote from: zhulien on 09:50, 21 June 23As the coder, how will I know before assembly what is going to fit in a bank before overflowing to the next bank, and moving a whole function to the next bank and then changing any labels that were local into far calls? Only the assembler know that unless I want things to be in specific banks.

You generator will have to count what he's doing: basics!

It output any instruction, he already know where he is, you can also buffered any of your fonction to defer output, it's a generator problem, not an assembler problem (all of my generators do that)

andycadley · 11:06, 21 June 23

The problem with trying to hide how banking work is that banking takes instructions and thus affects state of the processor. So silently replacing a CALL with a bunch of instructions to manipulate memory, then do a CALL, then revert memory might subtly introduce bugs. If, for example, the called function relied on the state of the carry flag etc. So then you have to start making the assembler defend against this by PUSHing all the registers etc, impacting performance unnecessarily and possibly causing other issues with stack usage.

At some point you're either writing assembler, with all the low level management requirements it brings, or you want a higher level language like C, which can abstract away some of the complexity for you.

zhulien · 16:20, 21 June 23

I would do it this way, pass 1 almost as usual, but instead of counting 1 set of label addresses etc. If it is identified that anything is within the banking RAM, then we introduce a pass 1b which is almost the same as pass 1, but now works out of those labels, which would be far addresses and which calls would now be changed to far calls. Pass 2 as per normal but honor what was identified as far calls.

Far calls I would substitute as either... a defined block of code like a macro, or hardcoded: push hl; ld hl, nnnn; push hl; ret; yes, it is more than 3 bytes hense why we had a pass 1b.

But, nnnn is the address of the bank routine that would itself rotate the stack then push current bank, push current return address, set new bank and return to the call address - of course it returns to the proper location upon the far call returning.

Why do this? because it is still low level, but... gives an "almost" flat 4mb or 16mb memory model for programmers so they can just code their software and load assets without too much banking headaches. If you think a farcall is expensive, then put functions that are often called in loops neighbouring eachother so that farcalls are not often as much called in loops.

Anyway, to me it is a good novel idea

andycadley · 17:18, 21 June 23

Problems I can see:

You increase code size in Pass 1B, so all the addresses you calculate (including what is in banked addresses) are wrong. Really you need a linker to handle this better.
It's not sufficient to only manipulate CALLs like this, since a banked routine may no longer be in the same address space as the data it needs to access, so you have to come up with convoluted modifications for absolutely everything: for example an LDIR might cross multiple banks because you've hidden the details of it, so you have to replace it with code that can detect it needs to bank switch. And pretty soon almost every assembly instruction has to get swapped out with a CALL to a routine that does that instruction but for a flat memory model.

It would be much, much easier to manage this in a higher level language, which can more easily abstract away the intent of code and generate more optimal assembly for it.

roudoudou · 17:18, 21 June 23

or your generator makes code of all your functions, he already knows the boundaries of all functions, can sort them by hierarchy calls and optimise the very last calls by placing them in the same bank (he already knows also the function minimum size and maximum size)
once the first optimisation step is done, he can also gather functions in less bank using a simple cargo loading algorithme
finally any assembler will assemble the source
done

zhulien · 20:12, 21 June 23

have you never wanted to assemble something larger than 40K without mucking around with banking?

I have looked at the supersnapshot functionality and the C code already knows what bank the labels are in as well as the orgzones, so *there is 'almost' logic in place to allow for programs in RAM that can be up to 4mb or 16mb*. To me that is a good step towards allowing better software to be more easily coded on CPC.

I do understand that a full high level language can do all that, if it compiles to native code. But besides macros and a LOT of planning about memory locations, having the assembler do that allows someone to code low level with little mucking around.

A supposed memory map would be such as:

(global variables and buffers)

0040 - 3fff
8000 - a410 (perhap slightly lower) and also contains stack and a few banking service routines configured by assembler macros/directives

(medium, large and gigantic almost-automatic banked memory)

4000 - 7fff from bank 0 to bank 256 (in medium 512KB mode) - far address support with byte alignment at 4 for FAR REFERENCED or ALIGNED labels only
4000 - 7fff from bank 0 to bank 256 (in large 4MB mode) - far address support with byte alignment at 64 for FAR REFERENCED or ALIGNED labels only
4000 - 7fff from bank 0 to bank 256 (in gigantic 16M mode) - far address support with byte alignment at 256 for FAR REFERENCED or ALIGNED labels only

The whole point is that a user doesn't need to work out which labels are far and deal with the far addresses as 16 bits instead of e.g. 24 or 32 bits. And we would have 'almost' a flat 4mb or 16mb memory model for most part. Data and such of course, you can code your blit routines to be in global RAM and use 16bit far addresses since they would be labels that are far or aligned.

andycadley · 20:45, 21 June 23

Having only 16K of usable data would make having 16MB of RAM pretty pointless though. And the minute you need your data to be pagable the whole scheme falls down - something that just doesn't need to happen with a high level language doing this for you.

zhulien · 21:57, 21 June 23

it's not 16k of usable data, it's 16k of "global" usable data / buffer space. Anyway, it was a thought because RASM already supports quite a lot of what would have needed to be done anyway out of the box, just not all the way there.

zhulien · 07:16, 22 June 23

I modified a small c compiler so that the following memory layout occurs and i am assembling with rasm.

0040 to 3fff for heap
4000 and up for logic followed by literals and workspace

The literals and workspace I could fit at 8000 and below himem if the logic wasn't so big.

So... I will try make it a rom to see what happens. But if the banking as mentioned was already in place then it would just work...

The c compiler can compile itself which is why I chose this one.

If I choose the cpr models how do the calls between 16k roms work?

andycadley · 08:51, 22 June 23

Quote from: zhulien on 21:57, 21 June 23it's not 16k of usable data, it's 16k of "global" usable data / buffer space. Anyway, it was a thought because RASM already supports quite a lot of what would have needed to be done anyway out of the box, just not all the way there.

Yes, but that's effectively got to be your entire "heap". Any data that any routine accesses has got to be accessible by a 16-bit pointer, because the Z80 doesn't really have the flat memory model your code wants to pretend it does so anything you want to pass between functions either needs to live in a register or somewhere in the main 64K of address space.

I think it's going to be a very weird program that needs nearly 16MB of code and just 16K of heap space. Most 128K games, for example, use the extra memory almost exclusively for data and not code.

andycadley · 08:54, 22 June 23

Quote from: zhulien on 07:16, 22 June 23If I choose the cpr models how do the calls between 16k roms work?

There is no such thing as a call between ROMs. The code has to arrange the memory map into a suitable fashion and then do an ordinary CALL and then arrange it back again afterwards as required.

zhulien · 10:30, 22 June 23

Actually I have a heap across banks too, but it is not yet used by the c compiler, only used by assembler code. I plan to make it accessible to the c compiler though. Also virtual memory so that 128k users can have 4mb of vm.

There are multiple ways to go from one rom to another as is the same with ram. You can in fact just flow from one to another instantly making a call a moot point. This method is useful for iterating through multi bank lists or if a function is split between 2 banks instead of wasting the space at the end of a bank. As pc always increments not decrement when executing it only us useful with forward iterating - can be heap or context switches...

With selectively aligned assets as well as functions you can then perform a far call via ram which translates the packed 16bit far address into the correct bank and aligned offset. This makes programming with far calls easier than having to deal with 24bit addresses.

News:

Rasm Z80 assembler