News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Arnaud

Help needed for speed up copy sprite array to another

Started by Arnaud, 18:12, 17 August 17

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Arnaud

Hello,
i'm trying to copy a sprite into another sprite before drawing it to the screen, i have coded a function in C but it's really too slow.

[attach=3]


void BlitBackbuffer(u8 destX, u8 destY, u8 *srcSprite, u8 srcWidth, u8 srcHeight)
{
    u8 yDest = srcHeight;
    u8* destMem = sBackBuffer + destX + VIEW_CX*destY;

    while (yDest != 0)
    {
        cpct_memcpy(destMem, srcSprite, srcWidth);

        srcSprite += srcWidth;
        destMem += VIEW_CX;
        --yDest;
    }
}


Is someone already do this in assembly or have idea to speed up my code ?

Here my test project with CPCTelera 1.4.2

Thanks,
Arnaud.

roudoudou

To speed up you may make dedicated functions for each possible width of sprites (src or dst regarding of your datas)
My pronouns are RASM and ACE

Docent

Quote from: Arnaud on 18:12, 17 August 17
Hello,
i'm trying to copy a sprite into another sprite before drawing it to the screen, i have coded a function in C but it's really too slow.

[attach=3]


void BlitBackbuffer(u8 destX, u8 destY, u8 *srcSprite, u8 srcWidth, u8 srcHeight)
{
    u8 yDest = srcHeight;
    u8* destMem = sBackBuffer + destX + VIEW_CX*destY;

    while (yDest != 0)
    {
        cpct_memcpy(destMem, srcSprite, srcWidth);

        srcSprite += srcWidth;
        destMem += VIEW_CX;
        --yDest;
    }
}


Is someone already do this in assembly or have idea to speed up my code ?

Here my test project with CPCTelera 1.4.2

Thanks,
Arnaud.

Try this asm copy routine:

ld a, yDest
ld de, destMem
ld hl, srcSprite
ld bc, srcWidth
copy_loop:
push bc
push de
ldir
ld bc, #VIEW_CX
pop de
ex de, hl
add hl, bc
ex de, hl
pop bc
dec a
jr nz, copy_loop



You'll probably need some additional code to store registers required by sdcc and setup initial registers differently.


ervin

How about this as your main.c?


#include <cpctelera.h>
#include "test.h"

#define VIEW_X      0
#define VIEW_Y      0

u8 buffer0[G_TEST_W*G_TEST_H];
u8 buffer1[G_TEST_W*G_TEST_H];

void main(void)
{
   u8* pBuffer;
   u8 shift;
   u8 x;

   cpct_disableFirmware();

   pBuffer=buffer0;
   shift=0;
   x=VIEW_X;

   while(1)
   {
      cpct_memcpy(pBuffer,g_test,G_TEST_W*G_TEST_H);
      cpct_drawSprite(pBuffer,cpctm_screenPtr(CPCT_VMEM_START,x,VIEW_Y),G_TEST_W,G_TEST_H);
      cpct_memset_f8(pBuffer,0x00,G_TEST_W*G_TEST_H);

      if (shift==0){
         pBuffer=buffer1;
         shift=1;
         x+=G_TEST_W;
      }
      else{
         pBuffer=buffer0;
         shift=0;
         x-=G_TEST_W;
      }

      cpct_drawSprite(pBuffer,cpctm_screenPtr(CPCT_VMEM_START,x,VIEW_Y),G_TEST_W,G_TEST_H);
   }
}

ervin

Or you could do it with one buffer.
(I should have written that way in the first place.)


#include <cpctelera.h>
#include "test.h"

#define VIEW_X      0
#define VIEW_Y      0

u8 buffer[G_TEST_W*G_TEST_H];

void main(void)
{
   u8 shift;
   u8 x;

   cpct_disableFirmware();

   shift=0;
   x=VIEW_X;

   while(1)
   {
      cpct_memcpy(buffer,g_test,G_TEST_W*G_TEST_H);
      cpct_drawSprite(buffer,cpctm_screenPtr(CPCT_VMEM_START,x,VIEW_Y),G_TEST_W,G_TEST_H);
      cpct_memset_f8(buffer,0x00,G_TEST_W*G_TEST_H);

      if (shift==0){
         shift=1;
         x+=G_TEST_W;
      }
      else{
         shift=0;
         x-=G_TEST_W;
      }

      cpct_drawSprite(buffer,cpctm_screenPtr(CPCT_VMEM_START,x,VIEW_Y),G_TEST_W,G_TEST_H);
   }
}

Arnaud

@ervin, @Docent, @roudoudou : thanks for advice, it works faster.

Here the CPCTelera code of the copy :


void CopyData(u8 yDest, u8* destMem, u8 *srcSprite, u8 srcWidth)
{
__asm
        push ix; Save ix before making changes
        ld ix, #0; ix points to the top of the stack
        add ix, sp
        ld a, 4(ix); yDest

        ld e, 5(ix); destMem
        ld d, 6(ix)

        ld l, 7(ix); srcSprite
        ld h, 8(ix)

        ld c, 9(ix); srcWidth
        ld b, #0

    copy_loop :
        push     bc
        push     de
        ldir
        ld    bc, #VIEW_CX
        pop    de
        ex     de, hl
        add    hl, bc
        ex     de, hl
        pop    bc
        dec a
        jr nz, copy_loop

        pop ix; Restore IX before returning
__endasm;
}

Arnaud

Hello,
i need more help  ;) , because i also want to draw masked sprite.
I tried to modify the previous assembly code, it seems easy but i wasn't able to do this (i am really lost with asm).

Here my C code i'd like to convert in asm, it's adapted from the CPCTelera cpct_drawSpriteMaskedAlignedTable :

void CopyDataMasked(u8 yDest, u8* destMem, u8 *srcSprite, u8 srcWidth, u8* maskTable) {
    while (yDest != 0) {
        u8 i = 0;
        for (i = 0; i < srcWidth; i++) {
            u8 sprite = srcSprite[i];
            u8 mask = maskTable[sprite];
            u8 dest = destMem[i];
            dest &= mask;
            dest |= sprite;
            destMem[i] = dest;
        }
        srcSprite += srcWidth;
        destMem += VIEW_CX;
        --yDest;
    }
}


My goal is to modify the previous assembly code (Reply#5) with adding this part from CPCTelera (cf. cpct_drawSpriteMaskedAlignedTable.asm), in order to have transparency :

   ld    a, (bc)   ;; [2] Get next byte from the sprite
   ld    l, a      ;; [1] Access mask table element (table must be 256-byte aligned)
   ld    a, (de)   ;; [2] Get the value of the byte of the screen where we are going to draw
   and (hl)        ;; [2] Erase background part that is to be overwritten (Mask step 1)
   or    l         ;; [1] Add up background and sprite information in one byte (Mask step 2)
   ld  (de), a     ;; [2] Save modified background + sprite data information into memory


I think i have to modify this part of code (Reply#5) but i don't know how do it :

ex     de, hl
add    hl, bc
ex     de, hl


Thanks,
Arnaud.

Xifos

I think the part of code you have to modify is the ldir.

You must change the ldir for a loop doing the transparency part.

ronaldo

@Arnaud You may start by looking at the code generated by SDCC from your C source and modifying it. That would be an interesting way to start if you still need to train your asm abilities.

With respect to what you want to do, you are solving part of the problem. The problem of bliting a sprite into another sprite buffer requires knowing the sizes of both. Your code assumes a constant size for the "canvas" sprite. I would rather prefer writing a solution for any sprite size, as that would be valid for CPCtelera users. In fact, that one is in the todo list.

What about starting with your first attempts with the problem? We may assist you in creating a good solution and that would also help you improve your asm skills :)

Docent

Quote from: Arnaud on 07:56, 19 August 17
Hello,
i need more help  ;) , because i also want to draw masked sprite.
I tried to modify the previous assembly code, it seems easy but i wasn't able to do this (i am really lost with asm).

Here my C code i'd like to convert in asm, it's adapted from the CPCTelera cpct_drawSpriteMaskedAlignedTable :

void CopyDataMasked(u8 yDest, u8* destMem, u8 *srcSprite, u8 srcWidth, u8* maskTable) {
    while (yDest != 0) {
        u8 i = 0;
        for (i = 0; i < srcWidth; i++) {
            u8 sprite = srcSprite[i];
            u8 mask = maskTable[sprite];
            u8 dest = destMem[i];
            dest &= mask;
            dest |= sprite;
            destMem[i] = dest;
        }
        srcSprite += srcWidth;
        destMem += VIEW_CX;
        --yDest;
    }
}


My goal is to modify the previous assembly code (Reply#5) with adding this part from CPCTelera (cf. cpct_drawSpriteMaskedAlignedTable.asm), in order to have transparency :

   ld    a, (bc)   ;; [2] Get next byte from the sprite
   ld    l, a      ;; [1] Access mask table element (table must be 256-byte aligned)
   ld    a, (de)   ;; [2] Get the value of the byte of the screen where we are going to draw
   and (hl)        ;; [2] Erase background part that is to be overwritten (Mask step 1)
   or    l         ;; [1] Add up background and sprite information in one byte (Mask step 2)
   ld  (de), a     ;; [2] Save modified background + sprite data information into memory


I think i have to modify this part of code (Reply#5) but i don't know how do it :

ex     de, hl
add    hl, bc
ex     de, hl


Thanks,
Arnaud.

Here you go..

ld a, yDest
ld de, destMem
ld hl, maskTable
exx
ld hl, srcSprite
exx
copy_loop:
ex af, af'
ld c, srcWidth
push de
line_loop:
exx
ld a, (hl) ; sprite = srcSprite[i];
inc hl
exx
ld l, a ;
ld a, (de) ; dest = destMem[i];
and (hl) ; dest &= maskTable[sprite];
or l ; dest |= sprite;
ld (de),a ; destMem[i] = dest;
inc de
dec c
jr nz, line_loop

ld bc, #VIEW_CX
pop de
ex de, hl
add hl, bc ; destMem += VIEW_CX;
ex de, hl
ex af, af'
dec a
jr nz, copy_loop

As earlier, you need some additional code to store registers required by sdcc and setup initial registers differently.

have fun :)

btw: maskTable need to be 256 bytes aligned.

Arnaud

@Docent: Thanks a lot for the conversion

@ronaldo : You are right, and i have add and rename some parameters in order to make the function useful to CpcTelera user.

Well to begin, i'm trying to adapt the code in Cpctelera format in a s file, after i'll add / rename parameters.
I carefuly read the parameters in the right order, but it crashes  :doh:

Here the asm code.


extern void drawBackBufferMaskedAlignedTable(u8 dest_y, u8* dest_mem, u8 *src_sprite, u8 src_width, u8* mask_table) __z88dk_callee;

_drawBackBufferMaskedAlignedTable::

        push ix; Save ix before making changes
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld a, 4(ix); yDest

        ld e, 5(ix); destMem
        ld d, 6(ix)

        ld l, 10(ix); maskTable
        ld h, 11(ix)
        exx

        ld l, 7(ix); srcSprite
        ld h, 8(ix)   
        exx

    copy_loop_masked :
        ex    af, af'
        ld c, 9(ix); srcWidth
        ld b, #0
        push de

    line_loop :
        exx
        ld    a, (hl); sprite = srcSprite[i];
        inc    hl
        exx
        ld    l, a;
        ld    a, (de); dest = destMem[i];
        and (hl); dest &= maskTable[sprite];
        or l; dest |= sprite;
        ld(de), a; destMem[i] = dest;
        inc    de
        dec     c
        jr     nz, line_loop

        ld    bc, #44 ;VIEW_CX
        pop    de
        ex     de, hl
        add    hl, bc; destMem += VIEW_CX;
        ex     de, hl
        ex    af, af'
        dec    a
        jr     nz, copy_loop_masked
       
        pop ix; Restore IX before returning


And the full project.

Thanks for help,
Arnaud

Xifos

What is the effect of __z88dk_callee ?
Don't you have to restore the stack yourself at the end of the function ?

Arnaud

According to cpctelera commentary in code i have to put returning address in the stack again as this function uses __z88dk_callee convention.

So you should be right, i have to restore the stack at the end of the function, i'm trying to do this in a small example.

Arnaud

Well, i don't know what to do.

When my asm code is inline it works.

void copyData(u8* sprite, u8* memory, u8 width, u8 height)
{
    sprite;
    memory;
    width;
    height;
__asm
        push ix; Save ix before making changes
       
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld l, 4(ix); sprite
        ld h, 5(ix)
       
        ld e, 6(ix); memory
        ld d, 7(ix)

        ld c, 8(ix); width
        ld b, #0
       
        ld a, 9(ix); height
       
    copy_loop:
        push     bc
        push     de
        ldir
        pop    de
        ex     de, hl
        ld    bc, #VIEW_CX
        add    hl, bc
        ex     de, hl
        pop    bc
        dec a
        jr nz, copy_loop

        pop ix; Restore IX before returning
__endasm;
}



When i put it in a s file it doesn't work.

I don't know if i must set the __z88dk_callee function attribute, some cpctelera function use it, other not, but with ou without it crash in my code.

I guess some register are not restored or the SP.

Xifos

Weird.

Is it ok to use the alternate register set ?
You don't have firmware or cpctelera code under interrupts using it ?

With __z88dk_callee i don't think you need to push the return address.
Just adjust stack according to the parameters pushed.

Arnaud

Quote from: Xifos on 12:40, 21 August 17
Weird.
It's necessarily logic  :D

Quote from: Xifos on 12:40, 21 August 17
Is it ok to use the alternate register set ?
I don't know

Quote from: Xifos on 12:40, 21 August 17You don't have firmware or cpctelera code under interrupts using it ?
I disabled interrupt

Quote from: Xifos on 12:40, 21 August 17
With __z88dk_callee i don't think you need to push the return address.
Just adjust stack according to the parameters pushed.
I was thinking about it, but in SDCC generated asm it seems the compilator update the SP according the parameters size :


call    _drawBackBuffer
ld    hl, #6
add    hl, sp
ld    sp, hl


Xifos

In reply #13, you were talking about the copyData function, when you said it was not working from an s file ?

In that case, forget about the alternate register that's not the problem.

Without the __z88dk_callee thing , the copyData function should work, even from an asm file.
(except that you must be sure the #VIEW_CX is defined)

I must say i am not at ease with the linker...

Docent

Quote from: Arnaud on 18:08, 20 August 17
@Docent: Thanks a lot for the conversion

@ronaldo : You are right, and i have add and rename some parameters in order to make the function useful to CpcTelera user.

Well to begin, i'm trying to adapt the code in Cpctelera format in a s file, after i'll add / rename parameters.
I carefuly read the parameters in the right order, but it crashes  :doh:

Here the asm code.


extern void drawBackBufferMaskedAlignedTable(u8 dest_y, u8* dest_mem, u8 *src_sprite, u8 src_width, u8* mask_table) __z88dk_callee;

_drawBackBufferMaskedAlignedTable::

        push ix; Save ix before making changes
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld a, 4(ix); yDest

        ld e, 5(ix); destMem
        ld d, 6(ix)

        ld l, 10(ix); maskTable
        ld h, 11(ix)
        exx

        ld l, 7(ix); srcSprite
        ld h, 8(ix)   
        exx

    copy_loop_masked :
        ex    af, af'
        ld c, 9(ix); srcWidth
        ld b, #0
        push de

    line_loop :
        exx
        ld    a, (hl); sprite = srcSprite[i];
        inc    hl
        exx
        ld    l, a;
        ld    a, (de); dest = destMem[i];
        and (hl); dest &= maskTable[sprite];
        or l; dest |= sprite;
        ld(de), a; destMem[i] = dest;
        inc    de
        dec     c
        jr     nz, line_loop

        ld    bc, #44 ;VIEW_CX
        pop    de
        ex     de, hl
        add    hl, bc; destMem += VIEW_CX;
        ex     de, hl
        ex    af, af'
        dec    a
        jr     nz, copy_loop_masked
       
        pop ix; Restore IX before returning


And the full project.

Thanks for help,
Arnaud

You need to terminate your asm function by ret - otherwise when called it wont return and will continue execution.
Ret is not needed if you put asm code between __asm/__endasm directives in c source code (and you don't use __naked for function definition) - sdcc will add it itself.

Xifos

Of course the ret !
I should have seen it !
:doh:

Arnaud

Quote from: Docent on 14:45, 21 August 17
You need to terminate your asm function by ret - otherwise when called it wont return and will continue execution.
Ret is not needed if you put asm code between __asm/__endasm directives in c source code (and you don't use __naked for function definition) - sdcc will add it itself.

Yes that it !  :D
I really need to acquire some asm basic knowledge.

Now i'll try to make working the second function.

Arnaud

I have added a new parameter sprite_width in replacement of constant VIEW_CX.

To temporary store this parameter i put it in asm variable, is the good way to do ?

extern void drawBackBuffer(u8 *sprite, u8 sprite_width, u8* memory, u8 width, u8 height);

_drawBackBuffer::
        push ix; Save ix before making changes
       
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld l, 4(ix); sprite
        ld h, 5(ix)
       
        ld c, 6(ix); sprite_width   
        ld b, #00
        ld (sprite_width), bc;

        ld e, 7(ix); memory
        ld d, 8(ix)

        ld c, 9(ix); width
       
        ld a, 10(ix); height
       
    copy_loop:
        push     bc
        push     de
        ldir
        pop    de
        ex     de, hl
        ld    bc, (sprite_width)
        add    hl, bc
        ex     de, hl
        pop    bc
        dec a
        jr nz, copy_loop

        pop ix; Restore IX before returning
        ret

sprite_width:
    .dw #0000

Arnaud

Here the code for back buffer copy :

       
  • drawBackBuffer
  • drawBackBufferMasked
  • drawBackBufferMaskedAlignedTable
I modified order of parameters and add commentary header.

All remarks are welcome  :D

I'll make a better example project and propose all to cpctelera.

demoniak

Small optimisation :

_drawBackBuffer::
        push ix; Save ix before making changes
       
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld l, 4(ix); sprite
        ld h, 5(ix)
       
        ld a, 10(ix); buffer_width   
        ld (buffer_width+1), a;
        ld b, #00
       
        ld a, 7(ix); height

        ld e, 8(ix); memory
        ld d, 9(ix)
       
    copy_loop:
        ld c, 6(ix); width
        push     de
        ldir
        pop    de
        ex     de, hl
buffer_width:
        ld    c, 0
        add    hl, bc
        ex     de, hl
        dec a
        jr nz, copy_loop

        pop ix; Restore IX before returning
        ret
       

Docent

Quote from: demoniak on 20:36, 22 August 17
Small optimisation :

_drawBackBuffer::
        push ix; Save ix before making changes
       
        ld ix, #0; ix points to the top of the stack
        add ix, sp

        ld l, 4(ix); sprite
        ld h, 5(ix)
       
        ld a, 10(ix); buffer_width   
        ld (buffer_width+1), a;
        ld b, #00
       
        ld a, 7(ix); height

        ld e, 8(ix); memory
        ld d, 9(ix)
       
    copy_loop:
        ld c, 6(ix); width
        push     de
        ldir
        pop    de
        ex     de, hl
buffer_width:
        ld    c, 0
        add    hl, bc
        ex     de, hl
        dec a
        jr nz, copy_loop

        pop ix; Restore IX before returning
        ret
       

You managed to squezze 15 tstates from the copyloop - nice!
But I don't like selfmodifyng code, so I thought I try to get rid of it. I went with a few undocumented instructions and saved 10 tstates more than your code and 25 from my initial version - over 21% speedup :)


_drawBackBuffer::
push ix; Save ix before making changes

ld ix, #0; ix points to the top of the stack
add ix, sp

ld l, 4(ix); sprite
ld h, 5(ix)

ld c, 6(ix); width

ld a, 7(ix); height

ld e, 8(ix); memory
ld d, 9(ix)

ld b, 10(ix); buffer_width

ld ixh, b  ; undocumented opcode: 0xdd, 0x60
ld ixl, c   ; undocumented opcode: 0xdd, 0x69 ; 8

ld b, #00

copy_loop:
push de
ldir
pop de
ex de, hl
ld c, ixh ; undocumented opcode: 0xdd, 0x4c
add hl, bc
ex de, hl
ld c, ixl  ; undocumented opcode: 0xdd, 0x4d)
dec a
jr nz, copy_loop
        pop ix; Restore IX before returning
        ret

Xifos

I am sorry it's off topic but :
Does sdcc assembler support instructions written directly with ixl/ixh and iyl/iyh ?

I'am still using .db #0xDD or #0xFD in my code.

Powered by SMFPacks Menu Editor Mod