Hi folks.
For the last few days I've been working on a simple sprite routine, and I'm trying to change the way I calculate the next line down.
In the past I've always had a lookup table called yAddress, from where I simply lookup the address of any given line on-screen (yes, for each line of a sprite).
I know that's a naive way of doing it, which is why I want to learn the "proper" way of moving down the screen when drawing a sprite.
The difficulty I'm having of course is the point at which video memory wraps around every 8 lines.
I've looked at a number of routines, including those used by ChibiAkumas, cpctelera, and several other examples I've found in posts in the forums.
But no matter what, I just don't seem to understand what is going on in those routines.
The most basic example is to check the carry flag after adding &0800 to my video RAM pointer, but I'm not sure how that technique will cope with a double-buffered screen, where writing to a 2nd screen at &8000 won't overflow the way that writing to a screen at &C000 would. I can't think of a way to catch an overflow (which would set the carry flag) on the 2nd screen.
Then there are other ways of moving down the screen, including the mysterious use of "AND &38" used by cpctelera amongst others. I have no idea what that does or why it works.
This is from the cpctelera cpct_drawSprite function.
(Of course all the LDIs can be ignored for the purposes of this question. I understand the purpose of them).
;; Input Parameters (6 bytes):
;; (2B HL) sprite - Source Sprite Pointer (array with pixel data)
;; (2B DE) memory - Destination video memory pointer
;; (1B C ) width - Sprite Width in *bytes* [1-63] (Beware, *not* in pixels!)
;; (1B B ) height - Sprite Height in bytes (>0)
;; Modify code using width to jump in drawSpriteWidth
ld a, #126 ;; [2] We need to jump 126 bytes (63 LDIs*2 bytes) minus the width of the sprite * 2 (2B)
sub c ;; [1] to do as much LDIs as bytes the Sprite is wide
sub c ;; [1]
ld (ds_drawSpriteWidth+#4), a ;; [4] Modify JR data to create the jump we need
ld a, b ;; [1] A = Height (used as counter for the number of lines we have to copy)
ex de, hl ;; [1] Instead of jumping over the next line, we do the inverse operation because
;; .... it is only 4 cycles and not 10, as a JP would be)
ds_drawSpriteWidth_next:
;; NEXT LINE
ex de, hl ;; [1] HL and DE are exchanged every line to do 16bit maths with DE.
;; .... This line reverses it before proceeding to copy the next line.
ds_drawSpriteWidth:
;; Draw a sprite-line of n bytes
ld bc, #0x800 ;; [3] 0x800 bytes is the distance in memory from one pixel line to the next within every 8 pixel lines
;; ... Each LDI performed will decrease this by 1, as we progress through memory copying the present line
jr__0 ;; [3] Self modifying instruction: the '00' will be substituted by the required jump forward.
;; ... (Note: Writting JR 0 compiles but later it gives odd linking errors)
ldi ;; [5] <| 63 LDIs, which are able to copy up to 63 bytes each time.
ldi ;; [5] | That means that each Sprite line should be 63 bytes width at most.
ldi ;; [5] | The JR instruction at the start makes us ignore the LDIs we don't need
ldi ;; [5] <| (jumping over them) That ensures we will be doing only as much LDIs
ldi ;; [5] <| as bytes our sprite is wide.
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
ldi ;; [5] <|
ldi ;; [5] <|
ldi ;; [5] |
ldi ;; [5] |
dec a ;; [1] Another line finished: we discount it from A
ret z ;; [2/4] If that was the last line, we safely return
;; Jump destination pointer to the start of the next line in video memory
ex de, hl ;; [1] DE has destination, but we have to exchange it with HL to be able to do 16bit maths
add hl, bc ;; [3] We add 0x800 minus the width of the sprite (BC) to destination pointer
ld b, a ;; [1] Save A into B (B = A)
ld a, h ;; [1] We check if we have crossed video memory boundaries (which will happen every 8 lines).
;; .... If that happens, bits 13,12 and 11 of destination pointer will be 0
and #0x38 ;; [2] leave out only bits 13,12 and 11 from new memory address (00xxx000 00000000)
ld a, b ;; [1] Restore A from B (A = B)
jp nz, ds_drawSpriteWidth_next ;; [3] If any bit from {13,12,11} is not 0, we are still inside
;; .... video memory boundaries, so proceed with next line
;; Every 8 lines, we cross the 16K video memory boundaries and have to
;; reposition destination pointer. That means our next line is 16K-0x50 bytes back
;; which is the same as advancing 48K+0x50 = 0xC050 bytes, as memory is 64K
;; and our 16bit pointers cycle over it
ld bc, #0xC050 ;; [3] We advance destination pointer to next line
add hl, bc ;; [3] HL += 0xC050
jp ds_drawSpriteWidth_next ;; [3] Continue copying
Does someone have a few moments to explain a simple, fast technique? And how it works?
I'd be ever so grateful for any assistance.
Thanks!
ld a,h
and #38
->
test hl,%00111000 00000000
will check, if the in-char-line position (this is at ..xxx........... of the screen address) is back at 0 again.
If it is so, a new line has been reached, and so you have to add #c050 again (if the width it 80 bytes).
If your screens are at #4000 and #c000 you can do a faster way:
Replace...
ld b, a ;; [1] Save A into B (B = A)
ld a, h ;; [1] We check if we have crossed video memory boundaries (which will happen every 8 lines).
;; .... If that happens, bits 13,12 and 11 of destination pointer will be 0
and #0x38 ;; [2] leave out only bits 13,12 and 11 from new memory address (00xxx000 00000000)
ld a, b ;; [1] Restore A from B (A = B)
jp nz, ds_drawSpriteWidth_next ;; [3] If any bit from {13,12,11} is not 0, we are still inside
With...
bit 6,h
jp nz, ds_drawSpriteWidth_next ;; [3] check if we are still inside #4-7xxx or #c-ffff
Thanks!!!
I'll spend some time to analyse your suggestion. :)
Thanks
@Prodatron !!!
That technique is brilliant!
Very simple, and very fast.
since the check occurs only 12% of the time, you can do a routine for this, with CALL Z,adjust
and if your adjust routine is located in #38, you can do something like RST Z,#38 (in fact a JR Z,$-1) saving space and time
old forgotten tips (re)discovered in 64Nops issue 1
you can also check some old topics here
https://www.cpcwiki.eu/forum/programming/ufd-tecnnology-(ultra-fast-drawing)/
https://www.cpcwiki.eu/index.php/Programming:Fast_Sprites
JR Z,$+1
(but usually you need RST#38 for interrupt stuff)
No, $-1 because the relative address will be encoded as #FF which is also RST #38 opcode ;D
Oh, at least in WinApe $ is the address at which the actual opcode starts. The Z80 is using the address behind the 2byte JR opcoode as the 0-address for the relative value.
So if you have
#4000 JR $-1
it would mean, that it should jump to #3FFF, and so it will set the relative value to -3 (#FD) as the origin for it is #4002.
if you have
#4000 JR $+1
it would mean, that it should jump to #4001, and so it will set the relative value to -1 (#FF), which is RST#38 and is located at #4001.
I didn't know that $ can be different in other assemblers?
Or you mean JR -1 instead of JR $-1, that would be ok, if the assembler is accepting it in this way (but WinApe then would like to jump to address #FFFF).
you make me doubt ^_^ i use the fake opcode RST <cond>,#38 anyway ;D
you're right, it's $+1 !
Quote from: roudoudou on 08:22, 10 January 24since the check occurs only 12% of the time, you can do a routine for this, with CALL Z,adjust
and if your adjust routine is located in #38, you can do something like RST Z,#38 (in fact a JR Z,$-1) saving space and time
old forgotten tips (re)discovered in 64Nops issue 1
you can also check some old topics here
https://www.cpcwiki.eu/forum/programming/ufd-tecnnology-(ultra-fast-drawing)/
https://www.cpcwiki.eu/index.php/Programming:Fast_Sprites
Thanks!
I'd love to try something with RST, but I don't understand how that sort of thing works.
Whenever people talk about RST #38, I get very confused. :laugh:
RST is just a very specific CALL instruction with a limited number of fixed addresses it can use, the advantage being it's a single byte opcode and thus faster.
Quote from: andycadley on 12:44, 10 January 24RST is just a very specific CALL instruction with a limited number of fixed addresses it can use, the advantage being it's a single byte opcode and thus faster.
Thanks
@andycadley Maybe I will try it.
Usually it is good to keep any comparisons and jumps out of the inner loop. For modern CPUs this is to relieve prefetching, branch prediction, etc., although those CPUs use many techniques to improve on this. For the old Z80 it is mainly to spare opcode bytes, because reading and processing an opcode byte costs at least 4 cycles / 1 NOP.
So instead of checking on every line if the address has to simply advance or sometimes to go back to reach the next character line, it might be faster to split the loop in eight loops, where every loop handles lines 0, 8, 16, ..., then 1, 9, 17, ..., and so on. With this, the correction of the line address only has to be done once (somewhere between lines 0 to 8 ), and not to be checked on every single line. Also, the organization of the sprite data gets closer to the organization of the screen data, what hints to that less processing is needed.
If you know which character line you're on, switching to the next line can be as simple as flipping a bit in h. It's only if you don't know that you have to check.
If you know you're on an even line, you can always get to the next line with set 3,h. So just by using that you can potentially replace half of your newline calls. For instance you could LDI one way, then set 3,h and LDD back again.
Thanks Anthony & lightforce6128.