128k memory handling in Assembly

Started by keith56, 05:22, 26 November 16

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

keith56

I'm trying to add Cpc+ sprites to my assembly game, and I'm struggling to work out the best way to handle the 128k banks.

My game is set up like this:
Bank 0 (0000) Primary level graphics and Level code
Bank 1 (4000) Screen Buffer 1
Bank 2 (8000) Core game code
Bank 3 (C000) Screen Buffer 2
Bank 4             Music and 128k sprite data
Bank 5             Graphics for loading and continue screens
Bank 6             Level Loader
Bank 7             Extra Level Graphics and Font

Now I use Bank switch Mode C1 and C3 so that Bank 7 is always at C000, and my screen buffers are at &4000 - which all works great

The trouble is I now need to shift sprite data into the &4000 area when the Plus ASIC is enabled, but C4-C7 all use the same area!
I am working around this by copying sprites to a temp area, but I don't have the spare CPU overhead for such luxuries!

Can anyone suggest a better idea? either for this game, or so I do it right for my next project

One idea I had was to put a Memcopy routine at, say &9000 in bank 2, and the same place in bank 6, I assume I could then switch to mode C2 - and code execution would continue? I have not tried this, but to my understanding of the CPC hardware it should work.

Did I go wrong from the start?? should I have put my screen buffer at &0040 instead? would that have made things better, or does it cause more problems in the long run?


Oh, and I have an X-Mem on order, and I plan to use that in future games too, so please advise how 128k programming scales up to 256k+

Any advice would be very welcome.
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Axelay


In terms of the best arrangement, I'd say a lot depends on the quantities of background graphics, sprites and if you are using interupts and/or hardware scrolling of screens.


Since you mention using C4-C7 I assume you are only using C1/C3 while updating the background graphics so all the background graphics are visible and/or the screen is at &4000 in either case for simplifying the background update?  Could all the background graphics fit in one bank if not sharing space with level code?


How many software sprites are you using in addition to the Plus sprites?  I mean, if you are only or mostly using Plus sprites, do you need the screen buffered?  Or is the game intended to be compatible with both Plus & CPC?


Have you or can you put any amount of core code or tables not involved directly in writing to the screen in an extended memory bank and move sprite data to main memory?  Or move the extra background graphics to main memory instead and move the sprite frames to bank 7?

arnoldemu

@keith56: If the game is 128KB only then I would exchange the roles of code and level graphics in the main ram. Use that for plus sprites, and put the level graphics in a bank.

If it's meant to be 64kb compatible but you get the best game on 128KB then it' going to be tough and you'll have to copy into main ram and then into plus sprite ram (I hope you're not doing much animation).


Problem solved.

OR consider making a cart game (easy if you have a c4cpc). In this case all code, sprites and graphics can be in cart (put some variables in ram, and have your screens here). Use the lower rom cart paging and upper rom cart paging to your advantage! Your screen can be at &8000-&ffff and &c000-&ffff and then you have full freedom to page in asic registers at &4000-&7fff. This may be better for a future game. The cart page at &4000-&7fff was probably decided to be there because it's easier for cart games. The "free ram" could be used for decompression of data if cart space is low (but then you've got up to 512kB you can fill).

My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

andycadley

As I've said before, the Plus hardware is awkward to use from 128K code because the ASIC pages in at entirely the wrong place to be useful (it makes a lot more sense when running from cartridge). That said, your plan sounds like it should work (assuming you don't have any variables/stack usage in the banks you switch out). Your overall layout seems pretty sound though, I certainly always favoured the C1/C2 combination, though it makes page 3 awkward to use when you're assuming page 7 will remain constant. I'm not convinced there is a "better" arrangement, short of running from cart and taking advantage of the better paging and bigger effective address space (since you can write-through ROM space)

keith56

Thanks for all the replies!
To answer the questions below:

Yes, the reason for the weird memory allocations is I want the game to work on 64k, 128k and 128+, Hence why "nice to have" data is in the banks, and half the sprites are in a separate place (they are extra animation frames which wont be shown on 64k).

The game doesn't use hardware scrolling, and as its a scrolling shooter the double buffering is a must. also I am using my own interrupt sequence. The plus sprites are being used to provide better quality character sprites as a "bonus" on the plus machine

I have just got the C4CPC, and would like to try to make the game Cart compatible, but its not possible now, the game has 9 stages, that all have their own program code, and would all need rewriting and testing for a change as dramatic as that. Once the game is finished and debugged I may look at it - but right now its too much of a change when the game is almost done.

I think I've figured out a solution, I've put a small "Sprite updater in the spare space of Bank 7. I can then switch to Memory mode C2 - as I'm usually using C1 or C3, this means the current running code has not moved, and now I can move the sprite data from Bank 4 to straight the Asic as nothing is overlapping - I had to avoid using the stack unil I switched back to C1/C3, but I managed that OK.

Dont know why I didn't think of that this morning - working out this memory managment feels like thinking around some weird rubicks cube.

Well, the plus sprite is animating at least! I still need to hook it into the player input and the like, but I think I can manage it.

Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

keith56


Here is the code I ended up using, I loaded this into bank 7 I switch to C2 when doing the copy, this means the code does not move, and even with the Plus asic enabled I can copy directly from bank 4,6 or 7 (Bank 5 is hidden by the Plus asic)


Interrupts must be disabled before calling this  - firstly because C2 switches the stack out memory, and secondly because I alter the stack pointer to speed up sprite copying - The stack pointer is restored when the routine exits so normal service can resume, you will need to restore interrupts if you need after it returns


I will be releasing all the source-code to the game once its finished, but I thought I should upload this now since I asked the question!






Plus_CopySprite:
; b Sprite Num
; a Source bank - I use C2 in my code
; hl = Source memory Loc
push af
ld a,b
ld e,0
add &40
ld d,a


ld a,(BankSwitch_128k_Current_Plus1-1) ; this stores The memory bank used by the main game loop
ld i,a
pop af
LD BC,&7F00 ;Gate array port
or %11000000 ;Switch to Ram config
OUT (C),A ;Send it






ld bc,&7fb8 ;turn Plus Asic on
out (c),c
;ld bc,&0100
ld b,&20


ld (Plus_CopySprite_StackRestore_Plus2-2),sp
ld sp,hl
ex hl,de
Plus_CopySprite_Loop:
pop de
ld (hl),e
inc hl
ld (hl),d
inc hl
pop de
ld (hl),e
inc hl
ld (hl),d
inc hl
pop de
ld (hl),e
inc hl
ld (hl),d
inc hl
pop de
ld (hl),e
inc hl
ld (hl),d
inc hl
dec b
jp nz,Plus_CopySprite_Loop
ld sp,&0000:Plus_CopySprite_StackRestore_Plus2




;ldir


ld bc,&7fa0 ;turn asic off
out (c),c


ld a,i
LD BC,&7F00 ;Gate array port
or %11000000 ; Switch memory
OUT (C),A ;Send it

ret
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Ast

Just do
Inc l instead of inc hl if you only copy one sprite into asic sprite ram


If you need to transfert more sprite, just do : inc l:inc hl instead of inc hl:inc hl


To win more speed, just delete your loop...



_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

keith56

Thanks for spotting that! I copied the code from my general sprite routine, and forgot that i could rely on the data being aligned - no point wasting good cpu cycles for nothing!
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

roudoudou

Quote from: Ast on 14:01, 27 November 16
Just do
Inc l instead of inc hl if you only copy one sprite into asic sprite ram


If you need to transfert more sprite, just do : inc l:inc hl instead of inc hl:inc hl


To win more speed, just delete your loop...


No, just do 'inc l' all the time and 'inc h' at the end of each sprites  ;D
use RASM, the best assembler ever made :p

I will survive

Ast

In all cases, push is the fastest!
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

keith56

Quote from: Ast on 00:21, 28 November 16
In all cases, push is the fastest!

You mean I should be using PUSH DE rather than POP DE? - or just that "stack misuse" is faster than LDIR?


Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Axelay

Quote from: keith56 on 05:09, 28 November 16
You mean I should be using PUSH DE rather than POP DE? - or just that "stack misuse" is faster than LDIR?


I would think it would be that stack use is faster, although, POP is faster than PUSH so you should be sticking with POP.  I'm pretty sure if you used PUSH, it would be no faster than using an LDI list (which would be faster than LDIR).


I recall there was some discussion about compressed Plus sprites here a while back, perhaps there might be something in it that will interest you?

keith56

Thats an interesting article, but in this case, I only need raw speed, not compression, the game only has 2 frames of "Flying" animation for the player character (and I have enough memory for that!)

I'm assuming writing the compressed sprite is not faster than the uncompressed one? (only smaller)
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Axelay

Quote from: keith56 on 09:17, 28 November 16
Thats an interesting article, but in this case, I only need raw speed, not compression, the game only has 2 frames of "Flying" animation for the player character (and I have enough memory for that!)

I'm assuming writing the compressed sprite is not faster than the uncompressed one? (only smaller)


Yes, the compressed sprites are about saving memory.  From memory the routines discussed were a bit slower than a straight copy, but not much slower in some cases.

roudoudou

Quote from: keith56 on 05:09, 28 November 16
You mean I should be using PUSH DE rather than POP DE? - or just that "stack misuse" is faster than LDIR?


It's a compilation of sprites


LD HL,#1234
PUSH HL
; 7 nops for 2 bytes it's the fastest generic way but you need 512 bytes for one sprite


But if you make some statistics you can use one 16bits register for volatile values and the other registers for recurrent values then you gain space and speed again as you made the LD once for each sprite
You can also use AF to clear bytes (initialised with xor a ; ccf)


Another optimisation is to analyse transitions if you use a sequence of sprites. Then you push only the differences






use RASM, the best assembler ever made :p

I will survive

keith56

Ok, I will stick with what I am using for now, as there are only 4 of the 12 plus sprites I use are being updated per frame, and I only really need the plus version to be as fast as the normal 6128 version

I will have a look again later, and if I need to squeeze more out I will write a Sprite compiler to update only the bytes that have changed.

Thanks for all the advice!
Keith
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Ast

Quote from: keith56 on 23:19, 28 November 16
Ok, I will stick with what I am using for now, as there are only 4 of the 12 plus sprites I use are being updated per frame, and I only really need the plus version to be as fast as the normal 6128 version

I will have a look again later, and if I need to squeeze more out I will write a Sprite compiler to update only the bytes that have changed.

Thanks for all the advice!
Keith
Noticed that there are 16 hardware sprites on Amstrad Plus, not 12...
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

keith56

Noted.
Im holding 4 sprites back, as I would like to add a 2 player mode later!
Chibi Akumas: Comedy-Horror 8-bit Bullet Hell shooter!
Learn ARM, 8086, Z80, 6502 or 68000 with my tutorials: www.assemblytutorial.com
My Assembly programming book is available now on amazon!

Powered by SMFPacks Menu Editor Mod