News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Ast

BackGround Restore

Started by Ast, 16:50, 02 September 14

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Ast

Hi everybody,
I'm actually working on a game/demo and more precisely on sprites display routs.
My background restore's rout is approximatively 8 Rasterlines per sprite. (Sprite, mode 1, 4 bytes/16 lines)



Is it possible to do better ? You must know I use 4 LDI per line to copy my data. After many and many tries,(using the stack for example) i must tell you that i failed.


Can somesone give me a better way to use less time to restore the background ?


Thanks for reply.

_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

Sou you need about 32 ys for every scanline (4 Bytes). This is 5*4=20 ys for the four LDIs. Leaves you about 12 ys for changing target address. You could optimize that a bit.
Also it depends if you have a single color background or background with GFX.


Thaken all together your routines seem to be very well.  :)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

Thanks TFM. Of course, i was speaking about background with Gfx.... If the background was empty, i'd use the stack pointer to do the job.


Here  comes the complete code :




       ;
       ; Entry : H=x, L=y
       ;
        di
ld (oldpi1+1),sp ; save the stack pointer



xor a ; Reset A
ld c,h ; c=h=X
sla l   ; l=l*2
ld h,tbadr/256 ; hl=table (screen adress)
ld sp,hl
ld b,a               ; b=0
ld a,c               ; save c in A (so A=x)
;
                        ; 32 nops -> 8 rasterlines (for 16 lines)

        pop hl ; hl=C0XX
add hl,bc ; bc=x ; hl=#c0xx+x
ld e,l:ld d,h ; hl=de de=#c0xx
res 7,h ; hl=#40xx
        ldi:ldi:ldi:ldi ; copy data de #4000 a #c000
ld c,a:ld b,0      ; Restore BC


                                ; Copy 16 times for 16 lines


oldpi1 ld sp,0
ei
ret



I add some comments.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

Well, if the background would be empty then using the SP to blank the 16 KB V-RAM is of course very quick. But if you make a table and save all the V-RAM addresses (f.e. upper lift corner of sprite) of all sprites you put on screen, then you don't need to empty all 16 KB. Just go back to this table and clean the area used for the sprite before. Ok, the disadvantage is: you need to have a "clear sprite" routine for any sprite size. But it can save you a FRAME or two.  :)


Regarding your code... pretty nice btw.... Which screen format do you use? (x and y) Do you use hardware scrolling?
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

#4
@TFM :

All y lines are precalculed in a table like that :

#c000,#c800,#d000....etc

line 0 -> #c000
line 1 -> #c800

for example, x values can be 0 to 80

And i do that :

find y line and add x to have the correct value.

if x=10 and line=2 -> y=#d000+x

No BC26 trick, only pop hl who give us the y line adress !

(-:

My rout uses tripple buffering and i'm very proud of it. But i believe i can' give it a better way (more optimisation)
I want to find a way to do a faster background restitution.


This will be use in my next demo and on my first cpc plus game... hardware scroll, i really don't know what will be my choice but i'll really love to play with a game using hardware scroll. So wait and see.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

#5
Tripple buffering ... can't have too much beer that way!  8)

Looking forward to see one of your upcoming prods.  :)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

Thanks... I win 2 nops by suppressing totaly ld b,0 :-) ^^


Good idea to modify ld b,0 by ld b,a but i can't coz register c is already saved in register a.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

Yes, saw that.... too late ;-)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

#8
Summary....So i win 2 us per lines, 32 us for my background restore's routine. But i'm looking for a better restore routine...
Maybe Captain Future will find a new way. Who knows ?
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Executioner

Quote from: Ast on 23:06, 02 September 14
Summary....So i win 2 us per lines, 32 us for my background restore's routine. But i'm looking for a better restore routine...

Maybe use POP and PUSH, since it's only 4 bytes per tile:


ld hl,base_address
ld sp,hl
pop bc
pop de
set 7,h
push de
push bc
res 7,h

; and so on


Ast

Good idea (i was already thinking of thaïs way) but i  use the stock pointer to take screen adress. So it Will be slower ni thïs way.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

Quote from: Executioner on 01:35, 03 September 14
Maybe use POP and PUSH, since it's only 4 bytes per tile:


ld hl,base_address
ld sp,hl
pop bc
pop de
set 7,h
*
push de
push bc
res 7,h
*
; and so on



* Shouldn't this contain a LD SP,HL ?
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

I'll make a try And post here the last result.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Executioner

Quote from: TFM on 20:03, 03 September 14
* Shouldn't this contain a LD SP,HL ?

Yeah, wouldn't work very well without it :)

Ast

Unfortunately, all my tests take more machine time, approximatively, 4 to 10 nops more..... :o



_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

#15
Empires rise and fall in 10 nops  :laugh:


Well, since I love to work with overscan I never use tripple buffer, but...

Let's add to executioneers routine also something like

EXX
POP BC
POP DE
POP HL
EXX

alternatively you can also use IX and IY (even if slower 1 ys)

It all depends how big your sprite is in X.
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Ast

#16
Quote from: TFM on 16:10, 04 September 14
Empires rise and fall in 10 nops  :laugh:


Well, since I love to work with overscan I never use tripple buffer, but...

Let's add to executioneers routine also something like

EXX
POP BC
POP DE
POP HL
EXX

alternatively you can also use IX and IY (even if slower 1 ys)

It all depends how big your sprite is in X.


that's exactly what i do. sprite is 4 bytes by 16 lines (mode1)


some examples ? here it comes




;
;   Restore BackGround
;   Using the Sp Register
;   h=x ; l=y
;


; Example 1


backg4   di
   
   ld b,0
   ld c,h      ; bc=X
   ld a,l
   ld h,tbadr/256
   add a,a      ; hl=screen adress table
   ld l,a
   ld sp,hl   
   
  ; repeat16 times
   POP HL      ; hl=#cOXX
   add hl,bc   ; hl=#c0XX+x
   res 7,H      ; h=#40
   ld d,h      ; d=#40
   ld e,l      ; e=xx+x
   ld sp,hl   ; sp=#4Oxx+x
   pop ix;pop af   ; sp=sp+4
   set 7,H      ; hl=#c0xx+x
   add hl,bc
   push af:push ix
   ex de,hl
   inc hl:inc hl
   ld sp,hl   ; 38 nops
  ;


; Example 2


   backg4   di
   ld bc,4
   exx   
   ld b,0
   ld c,h      ; bc=X
   ld a,l
   
   ld h,tbadr/256
   add a,a      ; hl=screen adress table
   ld l,a

   ; repeat 16 times
   ld d,h     
   ld e,l
   ld a,(hl)
   inc l
   ld h,(hl)
   ld l,a
   add hl,bc
   res 7,a
   ld sp,hl
   exx      ; 15
   pop af:pop de   ; 21
   exx
   add hl,bc
   set 7,a
   ld sp,hl
   exx
   push de:push af
   exx
   inc e:inc e
   ld h,d
   ld l,e        ; 41 us
;


oldsp    ld sp,0
   ei
   ret


; Example 3


backg4   di
   
   ld b,0
   ld c,h      ; bc=X
   ld a,l
   
   defb #dd:ld h,tbadr/256
   add a,a      ; hl=screen adress table
   defb #dd:ld l,a


;
;   repeat16 times
   pop hl
   add hl,bc
   res 7,h
   ld sp,hl
   pop de:pop af
;
  add hl,bc
  inc hl:inc l:inc hl:inc l
;
   set 7,h
   push af:push de
   ld sp,ix
   inc sp:inc sp            ; 40 us
;



note that i don't finish some routs because too many times was used....
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Ast

Summary,


As the sprite is only 4 bytes per 16 lines, you only need 2 pop (to take data in #4000 background) et 2 push (to put data in #c000 1st physical screen)


You must know that h=x et l=y. I first take y valeur (screen adr.) then i add x (saved in bc register)
So in hl, you must find the new adress screen.
Now hl=#c0xx. With a simple res 7,h (you reset bit 7 from h, and now hl=#40xx)
A simple set 7,h (re-put bit 7 in h, and hl=#coxx)


Changing state of the 7th bit of h :


When bit 7=0 i CAN take datas from my restore's screen
When bit 7=1 i CAN put datas at physical screen.



_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Ast

Here  comes a better version... 1 more nop win... So 3 nops win since the start of thaïs topic.

Thanks to Olivier for his precious help.

Oh, you Want to know how i do ? Just replace the last ldi (5 nops) by ld a,(hl):ld (de),a (2+2 nops)



       ;
       ; Entry : H=x, L=y
       ;
        di
   ld (oldpi1+1),sp ; save the stack pointer


   
   xor a ; Reset A
   ld c,h ; c=h=X
   sla l   ; l=l*2
   ld h,tbadr/256 ; hl=table (screen adress)
   ld sp,hl
   ld b,a               ; b=0
   ld a,c               ; save c in A (so A=x)
;
                           ; 32 nops -> 8 rasterlines (for 16 lines)
   
        pop hl      ; hl=C0XX
   add hl,bc   ; bc=x ; hl=#c0xx+x
   ld e,l:ld d,h   ; hl=de de=#c0xx
   res 7,h      ; hl=#40xx
        ldi:ldi:ldi   ; copy data de #4000 a #c000
   Ld a,(hl):ld (de),a ; instead of the last ldi -> 1 nop win



   ld c,a:     ; Restore BC
                 ; suppressing ld b,0 -> 2 nops win


                                ; Copy 16 times for 16 lines


oldpi1   ld sp,0
   ei
   ret
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Urusergi

#19
Quote from: Ast on 17:48, 05 September 14
Here  comes a better version... 1 more nop win... So 3 nops win since the start of thaïs topic.

Thanks to Olivier for his precious help.

Oh, you Want to know how i do ? Just replace the last ldi (5 nops) by ld a,(hl):ld (de),a (2+2 nops)

I'm sorry but it's wrong


.....
   ld a,c               ; save c in A (so A=x)
.....

   Ld a,(hl):ld (de),a ; instead of the last ldi -> 1 nop win


   ld c,a:     ; Restore BC

Ast

and you're right... effectively c contains x so, it don't work.... but wait, i'm working on a new version.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

fgbrain

There is something else to optimise further....

For most kind of sprites (depends on the way they are drawn), you could make different routine for horizontal movement
and different for vertical.
This way you only update the edges of each sprite.
So you use only two LDIs instead of four, in your example, for x movement.

You sacrifice some pixels detail of the background behind your sprite for more speed.

Just an idea...
_____

6128 (UK keyboard, Crtc type 0/2), 6128+ (UK keyboard), 3.5" and 5.25" drives, Reset switch and Digiblaster (selfmade), Inicron Romram box, Bryce Megaflash, SVideo & PS/2 mouse, , Magnum Lightgun, X-MEM, X4 Board, C4CPC, Multiface2 X4, RTC X4 and Gotek USB Floppy emulator.

Overflow

Hi!

Quote from: fgbrain
There is something else to optimise further....
He's right: no doubt there's much more cputime to avoid by optimising the other parts of code. I mean especially: the way sprites are put on screen <- there's likely more optimisation to find there (than on low-cputime background restore routines) as fgbrain wrote.

Anyway, here's my own thought about restoring background.
Used tips are:
- stack not used, SP used to add on HL
- LDI then LDD on next line, and so on
- when adding 8 to H, if bit3=0, set 3,H is enough
- you'll need 8 variants of the call, for 8 cases = for each 8 pixel-line start
I won't code them all, I only wrote one case call.
In the sample code, sprite starts at 3rd pixel line; it's 16 pixels heigth so
  .  6 pixel-lines to restore on 1st character-line
  .  full 8 pixel-lines to restore on middle character-line: not ordered cos res/set are used
  .  2 pixel-lines to restore on last character-line

ld de,#D000 ; case "starts at 3rd pixel-line" out of 8 cases

; init
ld bc,#8FF
ld sp,#50
ld h,d:ld l,e:res 7,h 

; 6 pixels lines only
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d                      ;010
ldd:ldd:ldd:ld a,(hl):ld (de),a:ld a,d:add a,b:ld d,a:ld h,d:res 7,h ;011
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d                      ;100
ldd:ldd:ldd:ld a,(hl):ld (de),a:ld a,d:add a,b:ld d,a:ld h,d:res 7,h ;101
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d                      ;110
ldd:ldd:ldd:ld a,(hl):ld (de),a                                      ;111

; next character-line
add hl,sp:ld d,h:ld e,l:set 7,d

; full character = 8 pixel-lines
ldi:ldi:ldi:ld a,(hl):ld (de),a:res 3,h:res 3,d ;111
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 4,h:res 4,d ;110
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;100
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 5,h:res 5,d ;101
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 4,h:set 4,d ;001
ldd:ldd:ldd:ld a,(hl):ld (de),a:res 3,h:res 3,d ;011
ldi:ldi:ldi:ld a,(hl):ld (de),a:res 4,h:res 4,d ;010
ldd:ldd:ldd:ld a,(hl):ld (de),a                 ;000

; next character-line
add hl,sp:ld d,h:ld e,l:set 7,d

; 2 pixels lines only
ldi:ldi:ldi:ld a,(hl):ld (de),a:set 3,h:set 3,d ;000
ldd:ldd:ldd:ld a,(hl):ld (de),a                 ;001



Definitely not my cup of tea, but a nice way to come back to z80 after Summer! :D



Unregistered from CPCwiki forum.

Ast

Well done, but the same idea as yesterday.  :laugh:


Without jokes, it was exactly what i Wanted as optimisation. Thanks.
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

TFM

Quote from: fgbrain on 07:07, 06 September 14
There is something else to optimise further....

For most kind of sprites (depends on the way they are drawn), you could make different routine for horizontal movement
and different for vertical.
This way you only update the edges of each sprite.
So you use only two LDIs instead of four, in your example, for x movement.

You sacrifice some pixels detail of the background behind your sprite for more speed.

Just an idea...


That works very well for very big sprites, but if a sprite is only about 16*24 or so, the math to calculate it takes more ys than a brute force "Restoration" algorithm. So yes, size matters ;-)

TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Powered by SMFPacks Menu Editor Mod