News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Arnaud

[CPCTelera asm] Partially screen fill needs optimisation

Started by Arnaud, 07:09, 20 October 18

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Arnaud

Hello,here my function to fill partially the screen. It fills an area of 72*160 bytes (15520 bytes) with 0 and useless to say it takes a lot of time  ;)
I use the SP to speed up the execution and i save and restore it at end of each line (maybe it's too often).

If someone have idea to improve this function he will be welcome  :D
Arnaud.

ervin

Hi Arnaud.

I'm wondering, do you need to disable/enable interrupts for every scan line?
Would you prefer to disable/enable only for every character line?
Or would it be enough to do it once, around the entire procedure?

If you would be ok with only doing it for every character line, you'd only need to save/restore SP once per character line.
If you would be ok with DI/EI once around the entire procedure, you'd only need to save/restore SP once.

Also, you have a counter for raster lines in B', and you perform a DJNZ with it.
Then later you use a counter for character lines (A), and you perform DEC A and JR NZ with it.
Characters lines [20] * raster lines [8] = 160, so you could just have one DJNZ operation, with B=160 at the start of the loop.
You wouldn't need the DEC A and JR NZ parts.

ervin

Actually I just realised that my suggestion for not using A as a counter is no good, as that works together work the code that calculates the next raster line, rather than the next character line.

Targhan

Most of the CPU is spent PUSHing, the rest of the code can maybe be optimized, but even if you do, you won't gain much. So in my opinion, this code can not be substantially optimized.
Targhan/Arkos

Arkos Tracker 2.0.1 now released! - Follow the news on Twitter!
Disark - A cross-platform Z80 disassembler/source converter
FDC Tool 1.1 - Read Amsdos files without the system

Imperial Mahjong
Orion Prime

ervin

How about this?

It now has an inner loop and an outer loop, and the saving/restoring of SP is only done by the outer loop.
So the DI/EI and SP operations are done 20 times instead of 160.
This saves 1400 NOPs per call.

Also, I added EXX before the RET, as we need to switch back to standard registers before returning from the function.

PLEASE BE CAREFUL with this code if you are sensitive to brightly flashing colours on-screen.

#include <cpctelera.h>

void fillView(u8*);

u8 counter;

void main(void) {
   cpct_disableFirmware();
   cpct_clearScreen_f64(0x00);
   cpct_setBorder(1);
   cpct_setDrawCharM1(2,0);

   counter=0;

   while (1){
      fillView((u8*)CPCT_VMEM_START+160);
      counter++;
   }
}

void fillView(u8* backBuffer){
   __asm

   ;; Parameter *memory* is directly given in HL register, using __z88dk_fastcall convention                 

   ;; Not fill Border Right 8-bytes
   ld   bc, #0x0008               ;; [3] BC = 8-bytes Border Width
   sbc  hl, bc                    ;; [4] HL = HL - BC
   
   ld   de, #0x01C0               ;; [3] DE = 0x640 + Next PixelLine End (0x01C0) = 0x800
   ld   c,  #0x50                 ;; [2] BC = 80-bytes ScanLine (0x0050)
   ld   a,  #08                   ;; [2] A = Number of Character per line

   exx
   ld de,(_counter) ;; set fill colour to counter variable, in both D and E
   ld d,e

   ld c,#20
   ld b,c

;; Fill 80-bytes ScanLine       

fillLoop_raster:
   ;; Save SP to restore it later, as this function makes use of it
   di                             ;; [1] Disable interrupts first
   ld   (fv_restore_sp + 1), sp   ;; [5] Save SP to recover it later on

fillLoop_character:
   exx               ;; [1] Switch to Standard Registers

   ;; Move SP to the end of the array       
   add  hl, bc       ;; [3] HL += BC (0x50) Start CharacterLine (HL points to the end of the array)
   ld   sp, hl       ;; [2] SP = HL (SP points to the end of the array)     
   
   exx               ;; [1] Switch to Alternate Registers
   
   ;; Fill Center View 64-bytes
   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]
   
   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4] 

   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]

   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]

   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]
   
   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4] 

   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]
   
   push de           ;; [4] Push 8-bytes
   push de           ;; [4] 
   push de           ;; [4]
   push de           ;; [4]
   
   djnz fillLoop_character
   ld   b,  c        ;; [1] B' = C' Character Lines to fill (20)
   
   exx               ;; [1] Switch to Default Registers
   add  hl, de       ;; [3] HL += DE (Next PixelLine End)
   exx               ;; [1] Switch to Alternate Registers
   
fv_restore_sp:
   ld   sp, #0000    ;; [3] Placeholder for restoring SP value before returning
   ei                ;; [1] Reenable interrupts   

   dec  a            ;; [1] A-- Number of Character per line
   jr   nz, fillLoop_raster ;; [2/3]
   
   exx
   ret               ;; [3] Return 

   __endasm;
}


Arnaud

Thanks @ervin for code  :)
effectively it's faster but the interrupt is not called enough often and the music slow down.
I'll try to add counter to save/restore every two scanlines.

ervin

Quote from: Arnaud on 14:02, 20 October 18
Thanks @ervin for code  :)
effectively it's faster but the interrupt is not called enough often and the music slow down.
I'll try to add counter to save/restore every two scanlines.

Every 2 scanlines is a really good idea.
That way you could add the PUSH code twice inside your loop, and only perform DJNZ four times per character line.
It'll use a little more memory that way, but you'll get a very small performance improvement.
Also, you won't need a counter if you use this technique.

Powered by SMFPacks Menu Editor Mod