Hi everybody,
I'm actually working on a game/demo and more precisely on sprites display routs.
My background restore's rout is approximatively 8 Rasterlines per sprite. (Sprite, mode 1, 4 bytes/16 lines)
Is it possible to do better ? You must know I use 4 LDI per line to copy my data. After many and many tries,(using the stack for example) i must tell you that i failed.
Can somesone give me a better way to use less time to restore the background ?
Thanks for reply.
Sou you need about 32 ys for every scanline (4 Bytes). This is 5*4=20 ys for the four LDIs. Leaves you about 12 ys for changing target address. You could optimize that a bit.
Also it depends if you have a single color background or background with GFX.
Thaken all together your routines seem to be very well. :)
Thanks TFM. Of course, i was speaking about background with Gfx.... If the background was empty, i'd use the stack pointer to do the job.
Here comes the complete code :
;
; Entry : H=x, L=y
;
di
ld (oldpi1+1),sp ; save the stack pointer
xor a ; Reset A
ld c,h ; c=h=X
sla l ; l=l*2
ld h,tbadr/256 ; hl=table (screen adress)
ld sp,hl
ld b,a ; b=0
ld a,c ; save c in A (so A=x)
;
; 32 nops -> 8 rasterlines (for 16 lines)
pop hl ; hl=C0XX
add hl,bc ; bc=x ; hl=#c0xx+x
ld e,l:ld d,h ; hl=de de=#c0xx
res 7,h ; hl=#40xx
ldi:ldi:ldi:ldi ; copy data de #4000 a #c000
ld c,a:ld b,0 ; Restore BC
; Copy 16 times for 16 lines
oldpi1 ld sp,0
ei
ret
I add some comments.
Well, if the background would be empty then using the SP to blank the 16 KB V-RAM is of course very quick. But if you make a table and save all the V-RAM addresses (f.e. upper lift corner of sprite) of all sprites you put on screen, then you don't need to empty all 16 KB. Just go back to this table and clean the area used for the sprite before. Ok, the disadvantage is: you need to have a "clear sprite" routine for any sprite size. But it can save you a FRAME or two. :)
Regarding your code... pretty nice btw.... Which screen format do you use? (x and y) Do you use hardware scrolling?
@TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179) :
All y lines are precalculed in a table like that :
#c000,#c800,#d000....etc
line 0 -> #c000
line 1 -> #c800
for example, x values can be 0 to 80
And i do that :
find y line and add x to have the correct value.
if x=10 and line=2 -> y=#d000+x
No BC26 trick, only pop hl who give us the y line adress !
(-:
My rout uses tripple buffering and i'm very proud of it. But i believe i can' give it a better way (more optimisation)
I want to find a way to do a faster background restitution.
This will be use in my next demo and on my first cpc plus game... hardware scroll, i really don't know what will be my choice but i'll really love to play with a game using hardware scroll. So wait and see.
Tripple buffering ... can't have too much beer that way! 8)
Looking forward to see one of your upcoming prods. :)
Thanks... I win 2 nops by suppressing totaly ld b,0 :-) ^^
Good idea to modify ld b,0 by ld b,a but i can't coz register c is already saved in register a.
Yes, saw that.... too late ;-)
Summary....So i win 2 us per lines, 32 us for my background restore's routine. But i'm looking for a better restore routine...
Maybe Captain Future will find a new way. Who knows ?
Quote from: Ast on 23:06, 02 September 14
Summary....So i win 2 us per lines, 32 us for my background restore's routine. But i'm looking for a better restore routine...
Maybe use POP and PUSH, since it's only 4 bytes per tile:
ld hl,base_address
ld sp,hl
pop bc
pop de
set 7,h
push de
push bc
res 7,h
; and so on
Good idea (i was already thinking of thaïs way) but i use the stock pointer to take screen adress. So it Will be slower ni thïs way.
Quote from: Executioner on 01:35, 03 September 14
Maybe use POP and PUSH, since it's only 4 bytes per tile:
ld hl,base_address
ld sp,hl
pop bc
pop de
set 7,h
*
push de
push bc
res 7,h
*
; and so on
* Shouldn't this contain a LD SP,HL ?
I'll make a try And post here the last result.
Quote from: TFM on 20:03, 03 September 14
* Shouldn't this contain a LD SP,HL ?
Yeah, wouldn't work very well without it :)
Unfortunately, all my tests take more machine time, approximatively, 4 to 10 nops more..... :o
Empires rise and fall in 10 nops :laugh:
Well, since I love to work with overscan I never use tripple buffer, but...
Let's add to executioneers routine also something like
EXX
POP BC
POP DE
POP HL
EXX
alternatively you can also use IX and IY (even if slower 1 ys)
It all depends how big your sprite is in X.
Quote from: TFM on 16:10, 04 September 14
Empires rise and fall in 10 nops :laugh:
Well, since I love to work with overscan I never use tripple buffer, but...
Let's add to executioneers routine also something like
EXX
POP BC
POP DE
POP HL
EXX
alternatively you can also use IX and IY (even if slower 1 ys)
It all depends how big your sprite is in X.
that's exactly what i do. sprite is 4 bytes by 16 lines (mode1)
some examples ? here it comes
;
; Restore BackGround
; Using the Sp Register
; h=x ; l=y
;
; Example 1
backg4 di
ld b,0
ld c,h ; bc=X
ld a,l
ld h,tbadr/256
add a,a ; hl=screen adress table
ld l,a
ld sp,hl
; repeat16 times
POP HL ; hl=#cOXX
add hl,bc ; hl=#c0XX+x
res 7,H ; h=#40
ld d,h ; d=#40
ld e,l ; e=xx+x
ld sp,hl ; sp=#4Oxx+x
pop ix;pop af ; sp=sp+4
set 7,H ; hl=#c0xx+x
add hl,bc
push af:push ix
ex de,hl
inc hl:inc hl
ld sp,hl ; 38 nops
;
; Example 2
backg4 di
ld bc,4
exx
ld b,0
ld c,h ; bc=X
ld a,l
ld h,tbadr/256
add a,a ; hl=screen adress table
ld l,a
; repeat 16 times
ld d,h
ld e,l
ld a,(hl)
inc l
ld h,(hl)
ld l,a
add hl,bc
res 7,a
ld sp,hl
exx ; 15
pop af:pop de ; 21
exx
add hl,bc
set 7,a
ld sp,hl
exx
push de:push af
exx
inc e:inc e
ld h,d
ld l,e ; 41 us
;
oldsp ld sp,0
ei
ret
; Example 3
backg4 di
ld b,0
ld c,h ; bc=X
ld a,l
defb #dd:ld h,tbadr/256
add a,a ; hl=screen adress table
defb #dd:ld l,a
;
; repeat16 times
pop hl
add hl,bc
res 7,h
ld sp,hl
pop de:pop af
;
add hl,bc
inc hl:inc l:inc hl:inc l
;
set 7,h
push af:push de
ld sp,ix
inc sp:inc sp ; 40 us
;
note that i don't finish some routs because too many times was used....
Summary,
As the sprite is only 4 bytes per 16 lines, you only need 2 pop (to take data in #4000 background) et 2 push (to put data in #c000 1st physical screen)
You must know that h=x et l=y. I first take y valeur (screen adr.) then i add x (saved in bc register)
So in hl, you must find the new adress screen.
Now hl=#c0xx. With a simple res 7,h (you reset bit 7 from h, and now hl=#40xx)
A simple set 7,h (re-put bit 7 in h, and hl=#coxx)
Changing state of the 7th bit of h :
When bit 7=0 i CAN take datas from my restore's screen
When bit 7=1 i CAN put datas at physical screen.
Here comes a better version... 1 more nop win... So 3 nops win since the start of thaïs topic.
Thanks to Olivier for his precious help.
Oh, you Want to know how i do ? Just replace the last ldi (5 nops) by ld a,(hl):ld (de),a (2+2 nops)
;
; Entry : H=x, L=y
;
di
ld (oldpi1+1),sp ; save the stack pointer
xor a ; Reset A
ld c,h ; c=h=X
sla l ; l=l*2
ld h,tbadr/256 ; hl=table (screen adress)
ld sp,hl
ld b,a ; b=0
ld a,c ; save c in A (so A=x)
;
; 32 nops -> 8 rasterlines (for 16 lines)
pop hl ; hl=C0XX
add hl,bc ; bc=x ; hl=#c0xx+x
ld e,l:ld d,h ; hl=de de=#c0xx
res 7,h ; hl=#40xx
ldi:ldi:ldi ; copy data de #4000 a #c000
Ld a,(hl):ld (de),a ; instead of the last ldi -> 1 nop win
ld c,a: ; Restore BC
; suppressing ld b,0 -> 2 nops win
; Copy 16 times for 16 lines
oldpi1 ld sp,0
ei
ret
Quote from: Ast on 17:48, 05 September 14
Here comes a better version... 1 more nop win... So 3 nops win since the start of thaïs topic.
Thanks to Olivier for his precious help.
Oh, you Want to know how i do ? Just replace the last ldi (5 nops) by ld a,(hl):ld (de),a (2+2 nops)
I'm sorry but it's wrong
.....
ld a,c ; save c in A (so A=x)
.....
Ld a,(hl):ld (de),a ; instead of the last ldi -> 1 nop win
ld c,a: ; Restore BC
and you're right... effectively c contains x so, it don't work.... but wait, i'm working on a new version.
There is something else to optimise further....
For most kind of sprites (depends on the way they are drawn), you could make different routine for horizontal movement
and different for vertical.
This way you only update the edges of each sprite.
So you use only two LDIs instead of four, in your example, for x movement.
You sacrifice some pixels detail of the background behind your sprite for more speed.
Just an idea...
Hi!
Quote from: fgbrain
There is something else to optimise further....
He's right: no doubt there's much more cputime to avoid by optimising the other parts of code. I mean especially: the way sprites are put on screen <- there's likely more optimisation to find there (than on low-cputime background restore routines) as fgbrain wrote.
Anyway, here's my own thought about restoring background.
Used tips are:
- stack not used, SP used to add on HL
- LDI then LDD on next line, and so on
- when adding 8 to H, if bit3=0, set 3,H is enough
- you'll need 8 variants of the call, for 8 cases = for each 8 pixel-line start
I won't code them all, I only wrote one case call.
In the sample code, sprite starts at 3rd pixel line; it's 16 pixels heigth so
. 6 pixel-lines to restore on 1st character-line
. full 8 pixel-lines to restore on middle character-line: not ordered cos res/set are used
. 2 pixel-lines to restore on last character-line
ld de,#D000 ; case "starts at 3rd pixel-line" out of 8 cases
; init
ld bc,#8FF
ld sp,#50
ld h,d:ld l,e:res 7,h
; 6 pixels lines only
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;010
ldd:ldd:ldd:ld a,(hl):ld (de),a:ld a,d:add a,b:ld d,a:ld h,d:res 7,h ;011
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;100
ldd:ldd:ldd:ld a,(hl):ld (de),a:ld a,d:add a,b:ld d,a:ld h,d:res 7,h ;101
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;110
ldd:ldd:ldd:ld a,(hl):ld (de),a ;111
; next character-line
add hl,sp:ld d,h:ld e,l:set 7,d
; full character = 8 pixel-lines
ldi:ldi:ldi:ld a,(hl):ld (de),a:res 3,h:res 3,d ;111
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 4,h:res 4,d ;110
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;100
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 5,h:res 5,d ;101
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 4,h:set 4,d ;001
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 3,h:res 3,d ;011
ldi:ldi:ldi:ld a,(hl):ld (de),a:res 4,h:res 4,d ;010
ldd:ldd:ldd:ld a,(hl):ld (de),a ;000
; next character-line
add hl,sp:ld d,h:ld e,l:set 7,d
; 2 pixels lines only
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;000
ldd:ldd:ldd:ld a,(hl):ld (de),a ;001
Definitely not my cup of tea, but a nice way to come back to z80 after Summer! :D
Well done, but the same idea as yesterday. :laugh:
Without jokes, it was exactly what i Wanted as optimisation. Thanks.
Quote from: fgbrain on 07:07, 06 September 14
There is something else to optimise further....
For most kind of sprites (depends on the way they are drawn), you could make different routine for horizontal movement
and different for vertical.
This way you only update the edges of each sprite.
So you use only two LDIs instead of four, in your example, for x movement.
You sacrifice some pixels detail of the background behind your sprite for more speed.
Just an idea...
That works very well for very big sprites, but if a sprite is only about 16*24 or so, the math to calculate it takes more ys than a brute force "Restoration" algorithm. So yes, size matters ;-)
Right, but it's also good in what i need, 4 pixels per 16 lines.... ;D