Print Page - Optmising compressed Plus sprites

Title: Optmising compressed Plus sprites
Post by: redbox on 16:21, 02 October 10

I have written a routine that copies a Plus hardware sprite stored in a compressed format (i.e. instead of &01,&02,&03,&04 etc it's stored as &12, &34) to the ASIC. Can anyone make it any faster...?

Code Select


        ld hl,&2000      ;stored sprite data location
        ld de,&4000      ;address of ASIC sprite location 

copy_spr_asic:    
        ld b,128         ;number of compressed pieces of data for sprite

spr_asic_loop:    
        ld a,(hl)        ;get byte of compressed data
        sra a            ;shift the bits right
        sra a            ;4 times to, for example this
        sra a            ;turns &17 into &01
        sra a

        ld (de),a        ;copy it to the ASIC in uncompressed form
        inc de           ;increase ASIC location to next byte

        ld a,(hl)        ;get same byte of compressed data
        and %00001111    ;and delete bits 7 to 4, e.g. turns &17 into &07

        ld (de),a        ;copy it to the ASIC in uncompressed form
        inc de           ;increase ASIC location to next byte

        inc hl           ;move onto next byte of compressed data

        djnz spr_asic_loop

        ret

And why we're on the subject of optimisation, does anyone know how to convert T-states into microseconds for the CPC?

Title: Re: Optmising compressed Plus sprites
Post by: fano on 16:34, 02 October 10

Due to architecture constraints , all instructions timings are multiples of 4 tstates on CPC.4 tstates = 1 µs = 1 NOP = 1 CRTC char width = 1 Mode 1 Char width

This is a chart with most used instructions , Winape owns a NOP counter to see instructions timing too (there is one at Quasar too but it is in French language)

http://www.grimware.org/doku.php/documentations/devices/z80 (http://www.grimware.org/doku.php/documentations/devices/z80)

I wrote too this type of packed sprites code , i'd suggest you to store right pixel on the left and left pixel on the right of the byte like this 0b11110000 , 0b33332222 and so on (unlike like me the first time for RD128+).This way , for the first one you can write it directly without shifting .More , you will not have to reload the byte to get the second pixel.Another thing is ASIC will do a 'AND 15' when getting pixel so you don't have to take about the 4 upper bits.

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 17:14, 02 October 10

Quote from: fano on 16:34, 02 October 10
This is a chart with most used instructions , Winape owns a NOP counter to see instructions timing too (there is one at Quasar too but it is in French language)
http://www.grimware.org/doku.php/documentations/devices/z80 (http://www.grimware.org/doku.php/documentations/devices/z80)

Thanks for the link, I should have thought to look in Grim's website ;) Where is the NOP counter in WinAPE? The help file doesn't load on my version (I'm annoyingly using Windows Vista on my home laptop and the help file doesn't load on it).

Quote from: fano on 16:34, 02 October 10
I wrote too this type of packed sprites code , i'd suggest you to store right pixel on the left and left pixel on the right of the byte like this 0b11110000 , 0b33332222 and so on (unlike like me the first time for RD128+).This way , for the first one you can write it directly without shifting .More , you will not have to reload the byte to get the second pixel.Another thing is ASIC will do a 'AND 15' when getting pixel so you don't have to take about the 4 upper bits.

That's really great Fano, thanks. I did wonder about whether the ASIC bothered with the upper 4 bits! Am glad I wrote my own capture routine now because I can easily adjust it to write the data in the reversed format.

Title: Re: Optmising compressed Plus sprites
Post by: fano on 17:34, 02 October 10

Look at the picture , it is under registers where is "T 0" , the red cross is to reset the counter.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 22:30, 02 October 10

Ok, guys, nice so far :) Now replace these slow "SRA A" commands by the most quick "RRCA" commands, that saves a lot of time. That's the way I do it in FutureOS ;) 8) :laugh:

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 23:01, 02 October 10

Quote from: TFM/FS on 22:30, 02 October 10
Ok, guys, nice so far :) Now replace these slow "SRA A" commands by the most quick "RRCA" commands, that saves a lot of time. That's the way I do it in FutureOS ;) 8) :laugh:

Yes, that's a good point!

I was using the SRA A because I didn't realise the ASIC performed a AND %00001111 on the number (as pointed out by Fano earlier), but now we know it does RRCA can be used instead.

So now we have:

Code Select


;Display Plus hardware sprite stored in reverse compressed format
;e.g. &01,&02,&03,&04 etc is stored as &21,&43 etc

        ld hl,&2000      ;stored sprite data location
        ld de,&4000      ;address of ASIC sprite location 

copy_spr_asic:    
        ld b,128         ;number of compressed pieces of data for sprite

spr_asic_loop:    
        ld a,(hl)        ;get byte of reverse compressed data
        ld (de),a        ;copy it to the ASIC (ASIC does a AND %00001111 and ignores upper 4 bits)
        inc de           ;increase ASIC location to next byte

        rrca             ;shift the bits right
        rrca             ;4 times, for example this
        rrca             ;turns &37 into &03
        rrca
        ld (de),a        ;copy it to the ASIC
        inc de

        inc hl           ;move onto next byte of compressed data

        djnz spr_asic_loop

Title: Re: Optmising compressed Plus sprites
Post by: Axelay on 07:06, 03 October 10

You should be able to replace those 3 16bit incs with 8 bit incs as well, or at least 2 of them if you don't want to keep the source data page aligned.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 19:56, 03 October 10

Quote from: redbox on 23:01, 02 October 10
Yes, that's a good point!
I was using the SRA A because I didn't realise the ASIC performed a AND %00001111 on the number (as pointed out by Fano earlier), but now we know it does RRCA can be used instead.

It's a feature, not a bug, but I'm sure at the beginning they were planning 256 colors, then the "management" told the developpers "Use only half of the ASIC RAM, we save the other part". But what managers don't know is if you take only half of the RAM (4 of 8 bits) then you have only 1/16 of the colors (16 instead of 256) - what a pity.

Title: Re: Optmising compressed Plus sprites
Post by: arnoldemu on 09:33, 05 October 10

Unroll the code.
Use a lookup table to get shifted pixel data (256 bytes for lookup table ;) )
inc de -> inc e
use asics "auto and &f"
if hl is aligned you can also use inc l

Code Select


        ld hl,&2000      ;stored sprite data location
        ld de,&4000      ;address of ASIC sprite location 

rept 128
        ld a,(hl)        ;get byte of compressed data
        ld c,a
        ld a,(bc)   ; use lookup table to get shifted version
        ld (de),a
        inc e           ;increase ASIC location to next byte
        ld (de),a        ;copy it to the ASIC in uncompressed form
        inc e           ;increase ASIC location to next byte

        inc hl           ;move onto next byte of compressed data
endm

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 10:09, 05 October 10

Quote from: arnoldemu on 09:33, 05 October 10
Use a lookup table to get shifted pixel data (256 bytes for lookup table ;) )
if hl is aligned you can also use inc l

This is fast, but obviously at the expense of size of code.

What would the lookup table be like? I assume at initialization you LD BC,table but then can't work out what format the table would be or why then loading the table into A doesn't wipe out the byte of compressed data we've just loaded into it previously...?

Also, what does 'page-aligned' mean? Axelay mentioned it earlier and I did change the INCs in the old routine to 8-bit but left the last one alone as I didn't understand the page-aligned bit...? :(

Title: Re: Optmising compressed Plus sprites
Post by: arnoldemu on 10:30, 05 October 10

Quote from: redbox on 10:09, 05 October 10
This is fast, but obviously at the expense of size of code.

What would the lookup table be like? I assume at initialization you LD BC,table but then can't work out what format the table would be or why then loading the table into A doesn't wipe out the byte of compressed data we've just loaded into it previously...?

Also, what does 'page-aligned' mean? Axelay mentioned it earlier and I did change the INCs in the old routine to 8-bit but left the last one alone as I didn't understand the page-aligned bit...? :(

The table would be 256 bytes long (one value for each of the possible values in the compressed data). It would effectively store the value of the compressed data shifted to the right 4 times. so &a3 would result in &0a, and &a0 would also result in &0a.
This table would be initialised at the beginning of your program and never modified after. It could go into ROM if the game was cartridge based.

The table should be positioned in ram so that it's lowest byte is 0. It is then aligned to a 256-byte boundary, e.g. it's start is a multiple of 256.
Then, we only need to do LD B,table/256 to set it's location for the code.

When we load the compressed data, this forms the lower 8-bits of the address. We load the data into the C register. Now we have formed the address in the table. We then read from this address (into A register) to get the shifted pixel. C remains unchanged.
We can write the shifted pixel. Then we can use C itself and write that to ram, knowing that the asic will AND the data.
The key here is that C is 8-bit value, it can be used to form address in table to lookup, and itself is part of the data written to asic ram.

Effectively aligning something means you position it in ram so that it's start address is a multiple of some value, and this also then means that you can make assumptions about how to access and move through the data.

So, a compressed sprite is 128 bytes. If you located it so it's lowest 8 bits were 0 or &80, you could then use LD HL, to set the initial address and then INC L to move through the data, knowing that when you increment L it will never go past 256 and will then never cause H to be modified.
So by doing both of this, now you can use an instruction that is 2 times faster :)

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 12:17, 05 October 10

Quote from: arnoldemu on 10:30, 05 October 10
The key here is that C is 8-bit value, it can be used to form address in table to lookup, and itself is part of the data written to asic ram.
Effectively aligning something means you position it in ram so that it's start address is a multiple of some value, and this also then means that you can make assumptions about how to access and move through the data.

I understand how the routine and page-aligning works now, many thanks for the explanations :)

I see in your routine that you have forgotten to LD A,C after the first LD (DE),A : INC E, as otherwise the same value loaded from the look-up table would be copied to the ASIC twice...?

So here are the two routines fully optimised:

Code Select


;Display Plus hardware sprite stored in reverse compressed format
;e.g. &01,&02,&03,&04 etc is stored as &21,&43 etc
;49 x 128 = 6272 T-states = 1568 microseconds

            org &8000

            ld hl,&2000          ;stored sprite data location (page-aligned)
            ld de,&4000          ;address of ASIC sprite location

repeat 128
            ld a,(hl)            ;get byte of reverse compressed data
            ld (de),a            ;copy it to the ASIC (ASIC does a AND %00001111 and ignores upper 4 bits)
            inc e                ;increase ASIC location to next byte

            rrca                 ;shift the bits right
            rrca                 ;4 times, for example this
            rrca                 ;turns &37 into &03
            rrca
            ld (de),a            ;copy it to the ASIC
            inc e

            inc l               ;move onto next byte of compressed data
endm

            ret

Code Select


;Display Plus hardware sprite stored in compressed format
;e.g. &01,&02,&03,&04 etc is stored as &12,&34 etc
;48 x 128 = 6144 T-states = 1536 microseconds

            org &8000

            ld hl,&2000          ;stored sprite data location (page-aligned)
            ld de,&4000          ;address of ASIC sprite location
            ld b,table/256       ;table (page-aligned)

repeat 128
            ld a,(hl)            ;get byte of compressed data            
            ld c,a               ;copy it to the ASIC (ASIC does a AND %00001111 and ignores upper 4 bits)                        
            ld a,(bc)            ;use lookup table to get shifted version    

            ld (de),a
            inc e                ;increase ASIC location to next byte
            ld a,c
            ld (de),a            ;copy it to the ASIC in uncompressed form
            inc e                ;increase ASIC location to next byte

            inc l                ;move onto next byte of compressed data
endm

            ret

            org &9000

table:        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00,&00
              defb &01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01,&01
              defb &02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02,&02
              defb &03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03,&03
              defb &04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04,&04
              defb &05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05,&05
              defb &06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06,&06
              defb &07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07,&07
              defb &08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08,&08
              defb &09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09,&09
              defb &0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A,&0A
              defb &0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B,&0B
              defb &0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C,&0C
              defb &0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D,&0D
              defb &0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E,&0E
              defb &0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F,&0F

So the second routine is slightly faster but has the overhead of the table...!

Title: Re: Optmising compressed Plus sprites
Post by: arnoldemu on 12:20, 05 October 10

Quote from: redbox on 12:17, 05 October 10
I understand how the routine and page-aligning works now, many thanks for the explanations :)

I see in your routine that you have forgotten to LD A,C after the first LD (DE),A : INC E, as otherwise the same value loaded from the look-up table would be copied to the ASIC twice...?

I was planning to swap the roles of HL and DE then I could use LD (DE),C for the final write.
This would make it slightly faster again because you don't need the ld a,c then.

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 12:40, 05 October 10

Quote from: arnoldemu on 12:20, 05 October 10
I was planning to swap the roles of HL and DE then I could use LD (DE),C for the final write.
This would make it slightly faster again because you don't need the ld a,c then.

You mean you could use LD (HL),C ;)

But yes, this would revise it to 44 x 128 = 5632 T-states = 1408 microseconds, which would make it significantly faster than the other routine (10.21% faster) :)

Title: Re: Optmising compressed Plus sprites
Post by: Grim on 12:56, 05 October 10

Below is an alternative version using the stack to fetch and process 2 bytes of packed data in one go, which is a little bit faster but comes with constraints on the interrupts (which might be a no-go in some cases).

Code Select


			org &1000
			run $,spriteDepack

asic_sprite0		equ &4000

			; Fill the lookup table
			call spriteDepack_init
			; Depack sprite
			ld hl,data_sprite
			ld de,asic_sprite0
			call spriteDepack
			ret


			; Copy packed sprite into the asic RAM
			;
			; Input
			;  HL = address of packed sprite data
			;  DE = address of ASIC sprite
spriteDepack:
			; disable interrupts and init the stack
			di
			ld (spriteBlit_var_sp),sp
			ld sp,hl
			ex de,hl

			; Init the lookup table pointer
			ld d,lut_depackSprite / 256

			repeat 64
				; fetch 4 packed pixels
				pop bc		;3
		
				; write pixel 1
				ld (hl),c	;2
				inc l		;1
				; lookup pixel 2
				ld e,c		;1
				ld a,(de)	;2
				; write pixel 2
				ld (hl),a	;2
				inc l		;1
				; write pixel 3
				ld (hl),b	;2
				inc l		;1
				; lookup pixel 4
				ld e,b		;1
				ld a,(de)	;2
				; write pixel 4
				ld (hl),a	;2
				inc l		;1 (<- will be overwritten with the ORG adjustment below)
						;= 21us @ 4 pixels
			rend			;= 21*64 = 1344us
						

			org $-1			; move one byte back to overwrite an useless inc l
spriteBlit_var_sp	equ $+1
			ld sp,0
			ei
			ret			; 845 bytes... uuwwh!


			; Fill the 256 bytes lookup table
spriteDepack_init:
			ld hl,lut_depackSprite
spriteDepack_init_loop
			ld a,l
			rrca
			rrca
			rrca
			rrca
			;and %1111 ; would be more meaningful than useful =)
			ld (hl),a
			inc l
			jr nz,spriteDepack_init_loop
			ret


data_sprite		; some packed sprite data here


			align 256
lut_depackSprite		; lookup table here

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 13:57, 05 October 10

That's an interesting way to take it further Grim, but will this affect the DMA if you are using it on the Plus...?

This process has been incredibly interesting for me as I've learnt new techniques and also generally how to optimize code and I'm currently going over lots of routines and giving them the treatment :)

One other thing it's taught me is it's not possible to update all 16 hardware sprites in one 50hz frame! :o

Title: Re: Optmising compressed Plus sprites
Post by: Axelay on 14:15, 05 October 10

Quote from: redbox on 13:57, 05 October 10
One other thing it's taught me is it's not possible to update all 16 hardware sprites in one 50hz frame! :o

Every project I've worked on so far has only required animation at 1/4 of 50hz at most. Unless you have a lot of small incremental frames you should be able to get away with updating, say 4 sprites every frame, moving through them in blocks of 4, and it would look perfectly fine.

Title: Re: Optmising compressed Plus sprites
Post by: redbox on 14:29, 05 October 10

Quote from: Axelay on 14:15, 05 October 10
Every project I've worked on so far has only required animation at 1/4 of 50hz at most. Unless you have a lot of small incremental frames you should be able to get away with updating, say 4 sprites every frame, moving through them in blocks of 4, and it would look perfectly fine.

I will give that a go and let you know!

Title: Re: Optmising compressed Plus sprites
Post by: Grim on 15:58, 05 October 10

Quote from: redbox on 13:57, 05 October 10will this affect the DMA if you are using it on the Plus...?

It should not interfere in any way with the DMA (except if you're using DMA-interrupts where strange things could happen :).

QuoteOne other thing it's taught me is it's not possible to update all 16 hardware sprites in one 50hz frame! :o

Indeed, even using raw sprite data and an uber-fast-copy routine, it takes a little bit more than one frame to update all the hardware sprites.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 21:50, 05 October 10

You can update all hardware sprites (16) in 16 ms, a frame has 20 ms, just use a LDI:LDI:LDi... construction ;)

Title: Re: Optmising compressed Plus sprites
Post by: Grim on 22:44, 05 October 10

Quote from: TFM/FS on 21:50, 05 October 10
You can update all hardware sprites (16) in 16 ms, a frame has 20 ms, just use a LDI:LDI:LDi... construction ;)

Errr... What?

One frame is 312*64 = 19968us with regular CRTC settings.

16 sprites
256 bytes each
LDI = 5us / byte

16*256*5 = 20480us = 20.5ms

Am I missing something? (or... is that LDI runs in 4us in your parallel universe? :)

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 01:53, 06 October 10

It does ;) ;D

Title: Re: Optmising compressed Plus sprites
Post by: arnoldemu on 09:34, 07 October 10

Quote from: TFM/FS on 01:53, 06 October 10
It does ;) ;D

I remember you've got a 6Mhz CPC?
Has anyone tried making a 6Mhz Plus yet?

Title: Re: Optimising compressed Plus sprites
Post by: Sykobee (Briggsy) on 14:47, 07 October 10

Quote from: arnoldemu on 09:34, 07 October 10
Has anyone tried making a 6Mhz Plus yet?

Why not skip straight to a 16MHz Z80 CPU? I think the CPC used a 16MHz base clock, divided by four for the Z80 so you'd want to bypass that bit of logic.

Wonder if anyone has a Minimig compatible CPC FPGA implementation yet.

Title: Re: Optmising compressed Plus sprites
Post by: steve on 15:42, 07 October 10

Not (yet) cpc compatible, is the v6z8op+ available in very small quantity at retroleum, it cost £85 + p+p, it uses a 16mhz z80 and a FPGA to generate a VGA display with blitter and sound, it has 1152KB ram.

I have no idea how to reprogram the FPGA to make it cpc compatible but it is theoretically possible.

The designer of this board is now working on an eZ80 system which I will also get when it becomes available.

The eZ80 runs at up to 50 mhz which would be equivalent to a z80 running at 200mhz, who needs core i7 :laugh:
There is a free tcp/ip stack written for this chip so net access should be possible, once a browser has been ported.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 17:51, 07 October 10

Quote from: arnoldemu on 09:34, 07 October 10
I remember you've got a 6Mhz CPC?
Has anyone tried making a 6Mhz Plus yet?

Not me, it's more complicated in the Plus. This crystal oszillator is different.

In the good old 6128 you basically have only to switch the Crystal (24 MHz instead of 16 MHz) and the Z80 (B or H instead of A). BTW: The crystals should be switchable (switches for all three connections (but one wire can also be ok, even not in every CPC though)).

Edit: Here (just from my memories) some try of a page:
http://www.cpcwiki.eu/index.php/6_MHz_CPC (http://www.cpcwiki.eu/index.php/6_MHz_CPC)

Quote from: Briggsy on 14:47, 07 October 10
Why not skip straight to a 16MHz Z80 CPU? I think the CPC used a 16MHz base clock, divided by four for the Z80 so you'd want to bypass that bit of logic.
Wonder if anyone has a Minimig compatible CPC FPGA implementation yet.

Right. It's a 16 MHz crystal. You can use 6 MHz for the whole system (I did only test few boards!) but not 8 MHz. 8 definitely doesn't work. If the Z80 shall run more quick than 6 MHz then you have to use an own crystal for the Z80 and one for the rest of the system.

The overclocking of the _WHOLE_ system has the following features:
- Faster CPU
- Better Graphics resolution
- Faster Floppy with formats holding 50% more data
- Sound is also better.

Disadvantages:
- Not all expansions can work with the increased bus speed
(Eprom Boards do, ROM-RAM-Boxes don't do).

Title: Re: Optmising compressed Plus sprites
Post by: Executioner on 01:57, 13 October 10

Another possible routine:

Code Select


ld b,xlatcs / 256
repeat 128
  ld c,(hl)   ;2
  ldi           ;7
  ld a,(bc)  ;9
  ld (de),a  ;11
  inc e        ;12
endr

align 256
.xlatcs
repeat 256
  db $ / 16 and 15
endr

Title: Re: Optmising compressed Plus sprites
Post by: Sykobee (Briggsy) on 14:46, 13 October 10

Shame the CPC 6128 didn't come at 6MHz by default, I think the Z80B was out by then. Extra costs were a no-no though for Amstrad.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 19:37, 13 October 10

Quote from: Briggsy on 14:46, 13 October 10
Shame the CPC 6128 didn't come at 6MHz by default, I think the Z80B was out by then. Extra costs were a no-no though for Amstrad.

Why not use 16 MHz? Look at Anne (the PCW Plus ;) , it had a 16 MHz Z80. But even using a good old Z80H with 8 MHz would be nice.

However, if the Plus would have a multiple of 4 MHz if wouldn't be a problem to switch it back to 4 MHz for games (for example, or whenever needed). Amstrad just saved money. To make the ASIC faster could be one problem... ???

Did ever somebody try to replace the crystal of the CPC Plus?

Title: Re: Optmising compressed Plus sprites
Post by: Executioner on 03:54, 14 October 10

Quote from: TFM/FS on 21:50, 05 October 10
You can update all hardware sprites (16) in 16 ms, a frame has 20 ms, just use a LDI:LDI:LDi... construction ;)

As Grim said, you can't do that, but you can do it with compiled sprites:

Code Select


ld hl,#0304  ;3
push hl         ;8

And sometimes you can get away with 8 bit loads or pushing the same value. Even if you set every pixel individually this way, it's 8 * 128 * 16 = 16384 us plus a bit of overhead for storing the stack pointer.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 06:06, 14 October 10

Well.... and whats about ....

LD HL,&XXXX:PUSH HL
LD HL,&XXXX:PUSH HL
LD HL,&XXXX:PUSH HL
LD HL,&XXXX:PUSH HL ;already 8 bytes transferred.... and counting ;D
...
..
.
... and so on an on and on...... ok, it uses a lot of memory but, it's damn quick... Now recalculate, but keep in mind folks that the PUSH writes two bytes ;D :laugh: ;)

EDIT: To explain this more in detail, the interrupts are off, the SP is saved and at the start of that "load all 16 sprites"-routine the SP points to the upper end of the sprite area (memory mapped I/O activated!) ... And you see you can do it all in one FRAME. TFM frames again... ;)

BTW: I use similar techniques, but way more advanced, for FilmeMacher / MovieMaker

Title: Re: Optmising compressed Plus sprites
Post by: Grim on 07:28, 14 October 10

Quote from: Executioner on 01:57, 13 October 10
Code Select Expand
ld b,xlatcs / 256 repeat 128 ld c,(hl) ;2 ldi ;7 ld a,(bc) ;9 ld (de),a ;11 inc e ;12 endr

If the byte fetched by the ld c,(hl) is &00, the following LDI will decrement B. This will screw up the LUT pointer and half of the remaining pixels in the sprite (those fetched from the lookup table pointed by BC). Also, when C=&x0, the LDI will decrement x (which is used for the lookup) and corrupt another pixel. Unless the sprite graphics are not using transparency at all, it seems you're running into some problems with that.

Compiled hardware sprites are fast, but that's quite a big, heavy and sad machinery for a handful of pixels imo. Thank you Amstrad! (should have bought an Amiga sooner... :)

Title: Re: Optmising compressed Plus sprites
Post by: Executioner on 12:57, 14 October 10

Quote from: TFM/FS on 06:06, 14 October 10
Well.... and whats about ....

LD HL,&XXXX:PUSH HL
LD HL,&XXXX:PUSH HL

Isn't that exactly what I had in my post?

Title: Re: Optmising compressed Plus sprites
Post by: Executioner on 13:09, 14 October 10

Quote from: Grim on 07:28, 14 October 10
If the byte fetched by the ld c,(hl) is &00, the following LDI will decrement B. This will screw up the LUT pointer ...

Yes, that's a slight problem, but you can overcome it by restricting it to 15 colours in every second pixel which should be enough in 99.99% of cases. (ie. for pixels A (0..14) and B(0..14), the value stored is (A + 1) * 16 + B and the lookup table translates it back but storing (C / 16) - 1.

The compressed version of the PUSH mechanism can actually be nearly as memory efficient as compressed sprites for less complex sprites. A lot of the time you can simply do something like:

LD H,L
PUSH HL
PUSH HL
PUSH HL
PUSH HL
PUSH HL
PUSH HL
PUSH HL
PUSH HL

for a whole row of one solid colour.

I'm writing something with 3 colour (plus transparent) sprites (like Frogger has) and it just moves values around between registers HL, DE, BC and A and pushes whichever register it needs.

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 17:37, 14 October 10

Quote from: Executioner on 12:57, 14 October 10
Isn't that exactly what I had in my post?

I don't know... which one was it? Ok... I start reading this looooong thread from the beginning.

But I'm glad that I'm not the only one with such ideas ;D

Title: Re: Optmising compressed Plus sprites
Post by: Longshot on 15:05, 29 October 10

QuoteAnd sometimes you can get away with 8 bit loads or pushing the same value. Even if you set every pixel individually this way, it's 8 * 128 * 16 = 16384 us plus a bit of overhead for storing the stack pointer.

Little error :
LD HL,xxxx = 3 us
PUSH HL = 4 us
7 x 128 x 16 = 14.336 ms

Title: Re: Optmising compressed Plus sprites
Post by: TFM on 21:29, 29 October 10

Hey Longshot, good to see you here :) Right, push takes longer than pop if I remember right ;)

Title: Re: Optmising compressed Plus sprites
Post by: Executioner on 00:32, 03 November 10

Quote from: Longshot on 15:05, 29 October 10
PUSH HL = 4 us

Yeah, for some reason I had it in mind it took 5us. So it's even faster than I thought :)

CPCWiki forum

General Category => Programming => Topic started by: redbox on 16:21, 02 October 10