Hello,
i have written a function to make double buffer for a part of my screen.
I draw the room background and the content in a back buffer sprite then i draw it in the screen with cpctelera sprite function.
I also use double buffer to make a scrolling between two rooms.
Here my function to copy masked sprite to my backbuffer, i call it for each row of the sprite to be copied :
void BlitMasked(UCHAR* pSpriteDest, UCHAR* pSpriteSrc, UCHAR pWidth)
{
UCHAR posPixDest;
UCHAR byteMask, byteColor;
for (posPixDest = 0; posPixDest < pWidth; posPixDest++)
{
byteMask = *pSprite++;
byteColor = *pSprite++;
pDest[posPixDest] = (pDest[posPixDest] & byteMask) | byteColor;
}
}
How can optimize this function to run faster ? :D
Thanks.
converting it to asm would make it faster.
If you want to keep it as c:
void BlitMasked(UCHAR* pSpriteDest, UCHAR* pSpriteSrc, UCHAR pWidth)
{
UCHAR posPixDest;
UCHAR byteMask, byteColor;
int count = pWidth;
UCHAR *pDestPtr = pDest; /* take initial copy */
/* compare count against 0 */
while (count!=0)
{
byteMask = *pSprite++;
byteColor = *pSprite++;
*pDestPtr = (*pDestPtr & byteMask) | byteColor;
++pDestPtr; /* increment it for each byte */
--count; /* decrement count */
}
}
Changes:
1. count down. It is quicker to compare against 0 than to compare against a number. A equal or not equal comparison is also quicker than a more complex comparison such as less, greater etc.
2. Keep a local pointer for pDestPtr, initialise it and increment it.
It is quicker to use a pointer like this rather than pDest[index]. pDest[index] (for a uchar pointer is equivalent to *(pDest+index). By using pointer you are avoiding that add. For uword it is *(pDest+(index<<1)). Here avoiding an add and a shift/multiply by two.
3. Sometimes it is quicker to use pre-increment/pre-decrement. Not so sure on sdcc and z80.
For asm the code looks a bit like this:
BC is your count set to width and decremented each time around the loop
DE is pDestPtr
HL is your pSprite
loop:
ld a,(de) ;; same as *pDestPtr
and (hl) ;; same as & *pSprite
inc hl ;; same as ++pSprite;
or (hl) ;; same as | *pSprite
inc hl ;; same as ++pSprite;
ld (de),a ;; same as *pDestPtr =
inc de ;; same as ++pDestPtr
dec bc ;; same as --count;
jp nz,loop ;; same as while (count!=0)
Thanks a lot, i'll study your solutions with care.
Quote from: arnoldemu on 18:16, 14 November 15
dec bc ;; same as --count;
jp nz,loop ;; same as while (count!=0)
Warning:
dec bc does not set the Z flag.
So the loop will not work as expected.
References, including solutions:
- Z80 Instruction Set - WikiTI (http://wikiti.brandonw.net/index.php?title=Z80_Instruction_Set) says "dec Q" does not change any flag, Q means 16-bit register
- Infinite Loop (http://z80.info/z80prog.htm) states the problem
- 16-bit counters (http://z80-heaven.wikidot.com/flags-and-bit-level-instructions#toc17) offers a classical solution
- Looping with 16 bit counter (http://wikiti.brandonw.net/index.php?title=Z80_Optimization) offers a faster variant (the principle is good, not sure their code is correct, though, seems to be missing some parts).
Given that the original code had count as UCHAR (8 bit unsigned?), the count could just be in register B and DJNZ could be used.
Quote
BC is your count set to width and decremented each time around the loop
Use a simple B instead of BC because I don't think width will be superior to 255 chars !
Then B will be your counter and you'll have to replace "Dec bc : jr nz,xxxx" by a simple "Djnz xxxx"
....
Quote from: cpcitor on 10:58, 07 December 15
- Looping with 16 bit counter (http://wikiti.brandonw.net/index.php?title=Z80_Optimization) offers a faster variant (the principle is good, not sure their code is correct, though, seems to be missing some parts).[/l][/l]
It works perfectly but is outdated. My optimization is here (in Variable length loops):
Z80 programming techniques - Loops (http://map.grauw.nl/articles/fast_loops.php)[/list]
Quote from: Urusergi on 16:15, 07 December 15
It works perfectly but is outdated. My optimization is here (in Variable length loops):
Z80 programming techniques - Loops (http://map.grauw.nl/articles/fast_loops.php)
Interesting, but why use B and D instead of B and C, ld bc,#0a03 would loop #20a times. This does become a little tricky to convert from an initial value in BC since you can't swap the registers B and C, so you'd have to do ld a,b:inc a:ld b,c:ld b,a or something.
I'm also not sure why you're bothering disabling interrupts before the self modifying code? Even if your interrupt routine could corrupt the A register, you'd need the DI before you set it.