News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Arnaud

Partial Masked Sprite copy to another : optimization

Started by Arnaud, 20:22, 13 November 15

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Arnaud

Hello,
i have written a function to make double buffer for a part of my screen.

I draw the room background and the content in a back buffer sprite then i draw it in the screen with cpctelera sprite function. 
I also  use double buffer to make a scrolling between two rooms.

Here my function to copy masked sprite to my backbuffer, i call it for each row of the sprite to be copied :


void BlitMasked(UCHAR* pSpriteDest, UCHAR* pSpriteSrc, UCHAR pWidth)
{
    UCHAR posPixDest;
    UCHAR byteMask, byteColor;
   
    for (posPixDest = 0; posPixDest < pWidth; posPixDest++)
    {
        byteMask = *pSprite++;
        byteColor = *pSprite++;

        pDest[posPixDest] = (pDest[posPixDest] & byteMask) | byteColor;
    }
}


How can optimize this function to run faster ?  :D
Thanks.

arnoldemu

converting it to asm would make it faster.

If you want to keep it as c:


void BlitMasked(UCHAR* pSpriteDest, UCHAR* pSpriteSrc, UCHAR pWidth)
{
    UCHAR posPixDest;
    UCHAR byteMask, byteColor;
    int count = pWidth;
    UCHAR *pDestPtr = pDest; /* take initial copy */

    /* compare count against 0 */
    while (count!=0)
    {
        byteMask = *pSprite++;
        byteColor = *pSprite++;

        *pDestPtr = (*pDestPtr & byteMask) | byteColor;
        ++pDestPtr; /* increment it for each byte */
        --count; /* decrement count */
    }
}


Changes:
1. count down. It is quicker to compare against 0 than to compare against a number. A equal or not equal comparison is also quicker than a more complex comparison such as less, greater etc.
2. Keep a local pointer for pDestPtr, initialise it and increment it.
It is quicker to use a pointer like this rather than pDest[index]. pDest[index] (for a uchar pointer is equivalent to *(pDest+index). By using pointer you are avoiding that add. For uword it is *(pDest+(index<<1)). Here avoiding an add and a shift/multiply by two.
3. Sometimes it is quicker to use pre-increment/pre-decrement. Not so sure on sdcc and z80.

For asm the code looks a bit like this:

BC is your count set to width and decremented each time around the loop
DE is pDestPtr
HL is your pSprite

loop:
ld a,(de) ;; same as *pDestPtr
and (hl)  ;; same as & *pSprite
inc hl ;; same as ++pSprite;
or (hl) ;; same as | *pSprite
inc hl ;; same as ++pSprite;
ld (de),a ;; same as *pDestPtr =
inc de ;; same as ++pDestPtr
dec bc ;; same as --count;
jp nz,loop ;; same as while (count!=0)
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

Arnaud

Thanks a lot, i'll study your solutions with care.

cpcitor

Quote from: arnoldemu on 18:16, 14 November 15

dec bc ;; same as --count;
jp nz,loop ;; same as while (count!=0)


Warning: dec bc does not set the Z flag.

So the loop will not work as expected.

References, including solutions:
Had a CPC since 1985, currently software dev professional, including embedded systems.

I made in 2013 the first CPC cross-dev environment that auto-installs C compiler and tools: cpc-dev-tool-chain: a portable toolchain for C/ASM development targetting CPC, later forked into CPCTelera.

Executioner

Given that the original code had count as UCHAR (8 bit unsigned?), the count could just be in register B and DJNZ could be used.

Ast

Quote
BC is your count set to width and decremented each time around the loop

Use a simple B instead of BC because I don't think width will be superior to 255 chars !
Then B will be your counter and you'll have to replace "Dec bc : jr nz,xxxx" by a simple "Djnz xxxx"

....
_____________________

Ast/iMP4CT. "By the power of Grayskull, i've the power"

http://amstradplus.forumforever.com/index.php
http://impdos.wikidot.com/
http://impdraw.wikidot.com/

All friends are welcome !

Urusergi

Quote from: cpcitor on 10:58, 07 December 15

       
  • Looping with 16 bit counter offers a faster variant (the principle is good, not sure their code is correct, though, seems to be missing some parts).[/l][/l]
It works perfectly but is outdated. My optimization is here (in Variable length loops):
Z80 programming techniques - Loops[/list]

Executioner

Quote from: Urusergi on 16:15, 07 December 15
It works perfectly but is outdated. My optimization is here (in Variable length loops):
Z80 programming techniques - Loops

Interesting, but why use B and D instead of B and C, ld bc,#0a03 would loop #20a times. This does become a little tricky to convert from an initial value in BC since you can't swap the registers B and C, so you'd have to do ld a,b:inc a:ld b,c:ld b,a or something.

I'm also not sure why you're bothering disabling interrupts before the self modifying code? Even if your interrupt routine could corrupt the A register, you'd need the DI before you set it.

Powered by SMFPacks Menu Editor Mod