Calculating next line down when drawing a sprite

ervin · 14:03, 09 January 24

Hi folks.

For the last few days I've been working on a simple sprite routine, and I'm trying to change the way I calculate the next line down.
In the past I've always had a lookup table called yAddress, from where I simply lookup the address of any given line on-screen (yes, for each line of a sprite).

I know that's a naive way of doing it, which is why I want to learn the "proper" way of moving down the screen when drawing a sprite.
The difficulty I'm having of course is the point at which video memory wraps around every 8 lines.

I've looked at a number of routines, including those used by ChibiAkumas, cpctelera, and several other examples I've found in posts in the forums.
But no matter what, I just don't seem to understand what is going on in those routines.

The most basic example is to check the carry flag after adding &0800 to my video RAM pointer, but I'm not sure how that technique will cope with a double-buffered screen, where writing to a 2nd screen at &8000 won't overflow the way that writing to a screen at &C000 would. I can't think of a way to catch an overflow (which would set the carry flag) on the 2nd screen.

Then there are other ways of moving down the screen, including the mysterious use of "AND &38" used by cpctelera amongst others. I have no idea what that does or why it works.

This is from the cpctelera cpct_drawSprite function.
(Of course all the LDIs can be ignored for the purposes of this question. I understand the purpose of them).

Code Select

;; Input Parameters (6 bytes):
;;  (2B HL) sprite - Source Sprite Pointer (array with pixel data)
;;  (2B DE) memory - Destination video memory pointer
;;  (1B C ) width  - Sprite Width in *bytes* [1-63] (Beware, *not* in pixels!)
;;  (1B B ) height - Sprite Height in bytes (>0)

   ;; Modify code using width to jump in drawSpriteWidth
   ld    a, #126           ;; [2] We need to jump 126 bytes (63 LDIs*2 bytes) minus the width of the sprite * 2 (2B)
   sub   c                 ;; [1]    to do as much LDIs as bytes the Sprite is wide
   sub   c                 ;; [1]
   ld (ds_drawSpriteWidth+#4), a ;; [4] Modify JR data to create the jump we need

   ld    a, b              ;; [1] A = Height (used as counter for the number of lines we have to copy)
   ex   de, hl             ;; [1] Instead of jumping over the next line, we do the inverse operation because 
                           ;; .... it is only 4 cycles and not 10, as a JP would be)

ds_drawSpriteWidth_next:
   ;; NEXT LINE
   ex   de, hl             ;; [1] HL and DE are exchanged every line to do 16bit maths with DE. 
                           ;; .... This line reverses it before proceeding to copy the next line.
ds_drawSpriteWidth:
   ;; Draw a sprite-line of n bytes
   ld   bc, #0x800  ;; [3] 0x800 bytes is the distance in memory from one pixel line to the next within every 8 pixel lines
                    ;; ... Each LDI performed will decrease this by 1, as we progress through memory copying the present line
   jr__0            ;; [3] Self modifying instruction: the '00' will be substituted by the required jump forward. 
                    ;; ... (Note: Writting JR 0 compiles but later it gives odd linking errors)
   ldi              ;; [5] <| 63 LDIs, which are able to copy up to 63 bytes each time.
   ldi              ;; [5]  | That means that each Sprite line should be 63 bytes width at most.
   ldi              ;; [5]  | The JR instruction at the start makes us ignore the LDIs we don't need 
   ldi              ;; [5] <| (jumping over them) That ensures we will be doing only as much LDIs 
   ldi              ;; [5] <| as bytes our sprite is wide.
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
   ldi              ;; [5] <|
   ldi              ;; [5] <|
   ldi              ;; [5]  |
   ldi              ;; [5]  |
 
   dec   a          ;; [1] Another line finished: we discount it from A
   ret   z          ;; [2/4] If that was the last line, we safely return

   ;; Jump destination pointer to the start of the next line in video memory
   ex   de, hl      ;; [1] DE has destination, but we have to exchange it with HL to be able to do 16bit maths
   add  hl, bc      ;; [3] We add 0x800 minus the width of the sprite (BC) to destination pointer 
   ld    b, a       ;; [1] Save A into B (B = A)
   ld    a, h       ;; [1] We check if we have crossed video memory boundaries (which will happen every 8 lines). 
                    ;; .... If that happens, bits 13,12 and 11 of destination pointer will be 0
   and   #0x38      ;; [2] leave out only bits 13,12 and 11 from new memory address (00xxx000 00000000)
   ld    a, b       ;; [1] Restore A from B (A = B)
   jp   nz, ds_drawSpriteWidth_next ;; [3] If any bit from {13,12,11} is not 0, we are still inside 
                                    ;; .... video memory boundaries, so proceed with next line

   ;; Every 8 lines, we cross the 16K video memory boundaries and have to
   ;; reposition destination pointer. That means our next line is 16K-0x50 bytes back
   ;; which is the same as advancing 48K+0x50 = 0xC050 bytes, as memory is 64K 
   ;; and our 16bit pointers cycle over it
   ld   bc, #0xC050           ;; [3] We advance destination pointer to next line
   add  hl, bc                ;; [3]  HL += 0xC050
   jp ds_drawSpriteWidth_next ;; [3] Continue copying

Does someone have a few moments to explain a simple, fast technique? And how it works?
I'd be ever so grateful for any assistance.

Thanks!

Prodatron · 14:53, 09 January 24

ld a,h
and #38
->
test hl,%00111000 00000000

will check, if the in-char-line position (this is at ..xxx........... of the screen address) is back at 0 again.
If it is so, a new line has been reached, and so you have to add #c050 again (if the width it 80 bytes).

If your screens are at #4000 and #c000 you can do a faster way:

Replace...

Code Select

  ld    b, a       ;; [1] Save A into B (B = A)
   ld    a, h       ;; [1] We check if we have crossed video memory boundaries (which will happen every 8 lines). 
                    ;; .... If that happens, bits 13,12 and 11 of destination pointer will be 0
   and   #0x38      ;; [2] leave out only bits 13,12 and 11 from new memory address (00xxx000 00000000)
   ld    a, b       ;; [1] Restore A from B (A = B)
   jp   nz, ds_drawSpriteWidth_next ;; [3] If any bit from {13,12,11} is not 0, we are still inside

With...

Code Select

bit 6,h
   jp   nz, ds_drawSpriteWidth_next ;; [3] check if we are still inside #4-7xxx or #c-ffff

ervin · 15:05, 09 January 24

Thanks!!!
I'll spend some time to analyse your suggestion.

ervin · 07:13, 10 January 24

Thanks @Prodatron !!!
That technique is brilliant!
Very simple, and very fast.

roudoudou · 08:22, 10 January 24

since the check occurs only 12% of the time, you can do a routine for this, with CALL Z,adjust

and if your adjust routine is located in #38, you can do something like RST Z,#38 (in fact a JR Z,$-1) saving space and time

old forgotten tips (re)discovered in 64Nops issue 1

you can also check some old topics here
https://www.cpcwiki.eu/forum/programming/ufd-tecnnology-(ultra-fast-drawing)/
https://www.cpcwiki.eu/index.php/Programming:Fast_Sprites

Prodatron · 08:51, 10 January 24

JR Z,$+1
(but usually you need RST#38 for interrupt stuff)

roudoudou · 09:10, 10 January 24

No, $-1 because the relative address will be encoded as #FF which is also RST #38 opcode

Prodatron · 11:15, 10 January 24

Oh, at least in WinApe $ is the address at which the actual opcode starts. The Z80 is using the address behind the 2byte JR opcoode as the 0-address for the relative value.

So if you have
#4000 JR $-1
it would mean, that it should jump to #3FFF, and so it will set the relative value to -3 (#FD) as the origin for it is #4002.

if you have
#4000 JR $+1
it would mean, that it should jump to #4001, and so it will set the relative value to -1 (#FF), which is RST#38 and is located at #4001.

I didn't know that $ can be different in other assemblers?

Or you mean JR -1 instead of JR $-1, that would be ok, if the assembler is accepting it in this way (but WinApe then would like to jump to address #FFFF).

roudoudou · 11:25, 10 January 24

you make me doubt ^_^ i use the fake opcode RST <cond>,#38 anyway

you're right, it's $+1 !

ervin · 11:42, 10 January 24

Quote from: roudoudou on 08:22, 10 January 24since the check occurs only 12% of the time, you can do a routine for this, with CALL Z,adjust

and if your adjust routine is located in #38, you can do something like RST Z,#38 (in fact a JR Z,$-1) saving space and time

old forgotten tips (re)discovered in 64Nops issue 1

you can also check some old topics here
https://www.cpcwiki.eu/forum/programming/ufd-tecnnology-(ultra-fast-drawing)/
https://www.cpcwiki.eu/index.php/Programming:Fast_Sprites

Thanks!
I'd love to try something with RST, but I don't understand how that sort of thing works.
Whenever people talk about RST #38, I get very confused.

andycadley · 12:44, 10 January 24

RST is just a very specific CALL instruction with a limited number of fixed addresses it can use, the advantage being it's a single byte opcode and thus faster.

ervin · 14:44, 10 January 24

Quote from: andycadley on 12:44, 10 January 24RST is just a very specific CALL instruction with a limited number of fixed addresses it can use, the advantage being it's a single byte opcode and thus faster.

Thanks @andycadley
Maybe I will try it.

lightforce6128 · 03:32, 11 January 24

Usually it is good to keep any comparisons and jumps out of the inner loop. For modern CPUs this is to relieve prefetching, branch prediction, etc., although those CPUs use many techniques to improve on this. For the old Z80 it is mainly to spare opcode bytes, because reading and processing an opcode byte costs at least 4 cycles / 1 NOP.

So instead of checking on every line if the address has to simply advance or sometimes to go back to reach the next character line, it might be faster to split the loop in eight loops, where every loop handles lines 0, 8, 16, ..., then 1, 9, 17, ..., and so on. With this, the correction of the line address only has to be done once (somewhere between lines 0 to 8 ), and not to be checked on every single line. Also, the organization of the sprite data gets closer to the organization of the screen data, what hints to that less processing is needed.

Anthony Flack · 01:54, 16 January 24

If you know which character line you're on, switching to the next line can be as simple as flipping a bit in h. It's only if you don't know that you have to check.

If you know you're on an even line, you can always get to the next line with set 3,h. So just by using that you can potentially replace half of your newline calls. For instance you could LDI one way, then set 3,h and LDD back again.

ervin · 02:54, 16 January 24

Thanks Anthony & lightforce6128.

News:

Calculating next line down when drawing a sprite