CPCWiki forum

General Category => Programming => Topic started by: opqa on 12:30, 25 January 15

Title: Fast MODE 2 text printing routines
Post by: opqa on 12:30, 25 January 15
I've always thought that the strange memory layout of the CPC could be useful for one single thing, fast text printing. But to make profit of it, "special" printing routines based on a line per line print basis instead of a character by character basis should be used. Oddly enough, the firmware doesn't use this technique, neither I've seen it being used anywhere, so I decided to create my owns and share them with the forum.

In the attached zip there are three routines, copycharset, superfastprint, and fastprint.

Copycharset must be invoked once, and what it does is to copy the whole 2KB's ROM charset in RAM but re-arranged in a line per line basis. The portion of RAM used to hold this copy can be chosen at assembling time altering the CHARSET_BASE constant, it can be placed at the beginning of any 256B page. Of course, if you're not going to use the whole charset you can overwrite the space taken by the unused characters. The only problem is that this space will be evenly distributed in 8 different "holes" along the 2KB's. Somewhat like what happens with the free space of the CPC screen along 16KB's. Once the charset has been copied you can use superfastprint or fastprint as specified in the comments.

Superfastprint is the fastest, but it needs to disable the interrupts while printing, as it uses the stack pointer to read the text buffer.
Fastprint is a little bit slower, but it can be interrupted safely.
There's a third option with an intermediate speed. Superfastprint can be assembled with REENABLE_INTERRUPTS constant set to 1, then it will run with interrupts disabled but will briefly re-enable them every 64 printed bytes (~7.5 scanlines).

Here are some performance tests, they are obtained printing a whole 2000 chars screen. In the cases where interrupts are enabled totally or partially, the interrupt routine has been patched to "ei:ret" so that its duration doesn't influence the result.

Superfastprint: 60,71 us/char
Superfastprint*: 63,56 us/char
Fastprint: 64,97 us/char

* with interrupt re-enabling

I think the results are quite good compared to another fast text output printing routine I've seen around there, this ones are smaller in size, at least as fast and more versatile (they can print almost all the charset and much more than a whole screen at once).
Title: Re: Fast MODE 2 text printing routines
Post by: TFM on 17:49, 26 January 15
Nice piece or work!!!  :)

Best value I ever got was around 45 us for one character, but it slows down with complexity of test.
Displaying characters quick with FutureOS - YouTube (https://www.youtube.com/watch?v=rrOPw3a0uFU)

(starting at second 27 or so).
Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 20:20, 26 January 15
Well done TFM.
Title: Re: Fast MODE 2 text printing routines
Post by: Prodatron on 02:07, 27 January 15
Opqa, that's very impressive and much more useful than only copying the same single char hundert of times over the whole screen.
It seems that your methode is good for printing complete screens of "real" texts. I wonder what's the performance is for small text portions?
You probably know this:
Programming:Fast Textoutput - CPCWiki (http://www.cpcwiki.eu/index.php/Programming:Fast_Textoutput)
It is working with interrupts and has an average speed of 65Nops/Char, which is the same speed like your IRQ friendly Fastprint routine. It requires some memory but works for small text portions at this average speed, too. I wonder what your routine would reach if it is used for short texts? Unfortunately I didn't have time for a closer look at it yet.

CU,
Prodatron
Title: Re: Fast MODE 2 text printing routines
Post by: opqa on 12:15, 27 January 15
You're right, I was to write about this but couldn't until now. At it current state it downscales really bad. I made all the tests and optimizations over whole screens printing. It has room for improvement about this aspect. I've been experimenting with faster versions for smaller texts that can't print so many characters at a time. But in the end, the routine is going to be good only for printing relatively long chunks of text, because it's based on the fact printing over the same character line of text is fast, while changing from one line to another is slow.

It all depends on the use, your routine is great for general purpose use, but it can't print the whole charset and is a little bit larger in size than mines (3,5KB vs ~2.3KB).

My routines are good for printing large chunks of text and occasionally small portions (although it is slower in terms of speed per character printing small texts is always fast). But it sucks at printing many small portions of text one after another in quick succession.

@TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179)
Your video is also impressive. What printing technique are you using, char per char or line per line? Is there source code available?

PD: If anyone is interested, the "core" of the routines is simple, for the interrupt friendly version:

ld a,(bc)
ld l,a
ldd

Where BC holds pointer to the text buffer, H the charset base address (the 3 LSB are the character line being printed), and DE the screen address. The routine prints the text backwards, from right to left. Lower bound for this routine is 64us/char (8us/byte).

And for the faster version:

pop bc
ld l,c
ldi
ld l,b
ldi

Where SP holds the pointer to the text buffer, H the charset base address and DE the screen address as before. This one prints the text forward in pairs of bytes. As a lower bound it takes 15us/pair, so 7,5us/byte -> 60us/char.
Title: Re: Fast MODE 2 text printing routines
Post by: TFM on 18:17, 27 January 15
Quote from: opqa on 12:15, 27 January 15
@TFM (http://www.cpcwiki.eu/forum/index.php?action=profile;u=179)
Your video is also impressive. What printing technique are you using, char per char or line per line? Is there source code available?


Hi! Well, I basically use control codes which are quick when printing the same character multiple times in X or Y. With increasing diversity in text it get's a bit slower. However you can use all 256 characters and you can change them, it's not fixed to few unchangeable characters...


Yours is actually very impressive. And printing out a whole page is what actually happens in applications, games and demos. Maybe not in word processors, but in this case the user is slower than the routine.  :laugh:


Great work!!!  :)
Title: Re: Fast MODE 2 text printing routines
Post by: Gryzor on 19:02, 27 January 15
What are the normal speeds though, for comparison's sake?
Title: Re: Fast MODE 2 text printing routines
Post by: TFM on 19:47, 27 January 15
Quote from: Gryzor on 19:02, 27 January 15
What are the normal speeds though, for comparison's sake?


The following section may give you an idea, but even if it is not directly checking that alone:
Speedcheck - CPCWiki (http://www.cpcwiki.eu/index.php/Speedcheck#Display_.2F_Print_characters_on_Screen_.2F_Printer)
(The most interesting part could be the display of 64 KB hex dump).
I don't know if there is a HEX monitor for Symbos, but Prodi can for sure add that data.

Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 04:47, 28 January 15
Just for the fun and for my iMPdraw, i do my own routine to print a fast char 8x8.
My routine takes 53 nops per char (8x8).... Maybe I can do better using the stack pointer but i'm not sure  :-X
Title: Re: Fast MODE 2 text printing routines
Post by: opqa on 09:52, 28 January 15
I've been working in further optimizations to improve the short texts printing speed. I've focused on the interrupt-friendly fastprint routine, changes I've made don't alter the long text speed or other functionality. The only drawback is that one of the optimizations prevents from porting it to ROM, but it is a smaller one and could be undone.

These are the numbers for fastprint right now:

For 1 char: 370 us       -> 370 us/char
For 10 chars: 946 us   -> 94,5 us/char
For 40 chars: 2875 us -> 71,87 us/char
For 80 chars: 5475 us -> 68,44 us/char
For 160 chars: 10650 us -> 66,56 us/char
For 1000 chars: 65077 us -> 65,1 us/char
For 2000 chars: 129866 us -> 64,93 us/char

So now I think downscaling is not so bad, it becomes competitive from about 1 line of text on. Maybe it can be improved further, I don't know. New sources attached to this post.

Quote from: Ast on 04:47, 28 January 15
Just for the fun and for my iMPdraw, i do my own routine to print a fast char 8x8.
My routine takes 53 nops per char (8x8).... Maybe I can do better using the stack pointer but i'm not sure  :-X

This number is impressive, almost unbelievable. Is it really for a general 8x8 char? Could you paste the relevant part of the source code?
Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 17:47, 28 January 15
No problem, i'll post it tonight so you'll see...
:D


Édit : And yes it's to display a 8x8 char (mode 2) so 1 mode 2 char with 8 lines.
I've coded it last night for iMPdraw text display.
Title: Re: Fast MODE 2 text printing routines
Post by: Prodatron on 19:40, 28 January 15
I am really curious about it! :)
Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 21:15, 28 January 15
As I wrote you, here comes my printing routine view...
All you have to know is that the font may be converted.



afftxt   
            ld bc,#C000 ; you know why?
            ld de,fnt ; fnt adress is where is loaded your converted font
            sub 32 ; coz I want to start by space char
            ld h,0
            ld l,a    ; a=char you want to print
            add hl,hl ; x2
            add hl,hl ; x4
            add hl,hl ; x8
            add hl,de ; add new position with your fontchar
            ex de,hl   ; font start in DE
            ld h,b
            ld l,c       ; get screen adr in HL
;
;          here comes the display
;
            ld a,(de)
            ld (hl),a     ; #c0xx
            inc de
            set 3,h      ; #C8xx
            ld a,(de)
            ld (hl),a
            inc de
            set 4,h      ; #D8xx
            ld a,(de)
            ld (hl),a
            inc de
            res 3,h     ; #D0xx
            ld a,(de)
            ld (hl),a
            inc de
            set 5,h     ; #F0xx
            ld a,(de)
            ld (hl),a
            inc de
            set 3,h     ; #F8xx
            ld a,(de)
            ld (hl),a
            inc de
            res 4,h    ; #E8xx
            ld a,(de)
            ld (hl),a
            inc de
            res 3,h    ; #E0xx
            ld a,(de)
            ld (hl),a
            ret

it's possible to win some us if you interlace some inc de/inc e


Have a good fun !

Title: Re: Fast MODE 2 text printing routines
Post by: Prodatron on 21:31, 28 January 15
But these are 8x8 NOPs for the core part (LD A,(DE):LD (HL),A:INC DE:RES/SET x,H:...) + a lot of more NOPs for all the stuff around.
How do you come to 53?
Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 21:35, 28 January 15
In fact, at the beginning, i only use inc e, so 56 nops... sorry for mistake!  :laugh:
Hi prodatron ! Isn't it correct?


7x8=56 nops


Edit : Using inc e/inc de use only 4 nops more
so 60 us for all chars printing....


or you can use this way :

pop hl ; adr
ldi       ; tranfert
pop hl
ldi      ; 8 times

but 64 us...
Title: Re: Fast MODE 2 text printing routines
Post by: Prodatron on 22:02, 28 January 15
Yes, you can use INC E instead of INC DE, as each char is probably 256 byte aligned. So now it's 56 NOPs for the core part. But you still have all the overhead stuff around it like calculating the address of the matrix for each new char, jumping back to the next screen address etc.
The interesting value is the true average time for one char which includes really everything, even the loop code.

Opqas' code is very impressive, as it's a completely new idea and for large texts as fast as my methode or even faster when using SP. What's about adding it to the Wiki article? With my routine a 256 charset is possible when decreasing the average speed from 65 to 67 NOPs.
Title: Re: Fast MODE 2 text printing routines
Post by: opqa on 22:31, 28 January 15
I like Ast idea, a lot, you've used the same set/res technique as in Prodratron et al. routine. I experimented with it in my own routines but it didn't fit well.

But...I've been thinking about Ast routine and it and it has potential to become as least as fast as my routine but without those downscaling problems.

What I would do is a mixture is between the combination of Ast's charset arrangement and mine. This is, character ascii code taking the whole low byte, and line inside the character the tree LSB's of the high byte of the address, but "disordered" like in Ast routine. "inc de" can be subtituted by "inc d" always in that case, the only requisite for the charset is being 256bytes aligned.

The core code would be like this:


B - Charset base address
DE - Screen address
HL - Text buffer address

; Build the next character address
ld b,CHARSET_BASE ; 2
ld c,(hl) ; 2               
             ; 4 until now

; Start printing
ld a,(bc)  ; 2
ld (de),a  ; 2
inc b      ; 1
set 3,d    ; 2

ld a,(bc)
ld (de),a
inc b
set 4,d

ld a,(bc)
ld (de),a
inc b
res 3,d

ld a,(bc)
ld (de),a
inc b
set 5,d

ld a,(bc)
ld (de),a
inc b
set 3,d

ld a,(bc)
ld (de),a
inc b
res 4,d

ld a,(bc)
ld (de),a
inc b
res 3,d

ld a,(bc) ; 2
ld (de),a ; 2
res 5,d   ; 2
          ; 7x7 + 6 = 55
          ; 59 until now

; Increase screen address
inc de    ; 2
; Increase text buffer address
inc hl    ; 2

          ; TOTAL = 63us


The loop overhead is still to be added but this has many possible variations, it can either be partially unrolled like in my routines to reduce its impact over long texts penalizing short ones, or it can be reduced to just "dec ixl: jp nz,beginning", which would add 5 extra us.

EDIT:
The previous routine using the stack (not interrupt friendly), 61,5us/char + loop overhead if my summations are right. There would also be some overhead from the stack moving code, but this would be only once, not per char.

H   - Charset base address
DE - Screen address
SP - Text buffer address

pop bc            ; 3
ld h,CHARSET_BASE ; 2
ld l,c            ; 1
                   ; 6 until now
ld a,(hl)  ; 2
ld (de),a  ; 2
inc h      ; 1
set 3,d    ; 2

ld a,(hl)
ld (de),a
inc h
set 4,d

ld a,(hl)
ld (de),a
inc h
res 3,d

ld a,(hl)
ld (de),a
inc h
set 5,d

ld a,(hl)
ld (de),a
inc h
set 3,d

ld a,(hl)
ld (de),a
inc h
res 4,d

ld a,(hl)
ld (de),a
inc h
res 3,d

ld a,(hl) ; 2
ld (de),a ; 2
res 5,d   ; 2
          ; 7x7 + 6 = 55
              ; 61 until now
; Increase screen address
inc de    ; 2
; Build the next character address
ld h,CHARSET_BASE ; 2
ld l,b            ; 1
             ; 66 until now
ld a,(hl)  ; 2
ld (de),a  ; 2
inc h      ; 1
set 3,d    ; 2

ld a,(hl)
ld (de),a
inc h
set 4,d

ld a,(hl)
ld (de),a
inc h
res 3,d

ld a,(hl)
ld (de),a
inc h
set 5,d

ld a,(hl)
ld (de),a
inc h
set 3,d

ld a,(hl)
ld (de),a
inc h
res 4,d

ld a,(hl)
ld (de),a
inc h
res 3,d

ld a,(hl) ; 2
ld (de),a ; 2
res 5,d   ; 2
          ; 7x7 + 6 = 55
          ; 121 until now
inc de ; 2
          ; TOTAL = 123 per character pair


In this case this version is not so interrupt-unfriendly, as the text buffer is only read once, if it is considered to be one-use only then the routine can be used without deactivating the interrupts, you just need to care to reserve some spare bytes before the beginning of the text buffer.
Title: Re: Fast MODE 2 text printing routines
Post by: Ast on 22:35, 28 January 15
Inc E would have to work in each case as each char is 256 bytes aligned.
I didn't say that Opqas's code is not impressive but I only said my print routine could be faster, that's all.... I just want to add my help in this topic, no more.  ;D


So when you make the count it's 53 us, no more!!!!


here is the calc :
ld a,(de):ld (hl),a:set/res:inc e ; 7 us
ld a,(de):ld (hl),a                       ; 4 us


->(7x7)+4 = 53!
Title: Re: Fast MODE 2 text printing routines
Post by: Prodatron on 23:53, 28 January 15
@Ast (http://www.cpcwiki.eu/forum/index.php?action=profile;u=573): Yes, that's the time for the core part. But for printing text strings you need to include all other stuff around it as well to get a realistic result.

@Opqa: Wow, I like this new solution very much! 68 or 66,5 NOPs [if using SP] per char, but static (doesn't depend on the size of the text anymore)! (MaV+my one would be 65 or 67 NOPs [if addressing all 256 chars]).

Maybe it's time to extend the Wiki article. TBH I love these hardcore CPC/Z80 optimization topics and discussions, thanks a lot! :)
Title: Re: Fast MODE 2 text printing routines
Post by: opqa on 22:15, 29 January 15
Well, so here's the new idea made true. I've implemented the "simple" version which doesn't use the stack. It's much better than all the previous routines posted in this thread, a little bit slower for long texts, but it is very fast anyway and the visual behaviour is better.

There is a single routine that can be compiled to support either "short" texts up to 256 characters and much longer ones. You can choose between the two of them with the assembly variable LONGTEXT. The second one has some small additional initial overhead for setting the counters and an even smaller one for the outer loop counter (every 256 chars, so impact is minumum).
Title: Re: Fast MODE 2 text printing routines
Post by: pmeier on 21:17, 04 May 18
Is there a chance to get newfasttext working for MODE 1?

I'm just programming a very simple game, but now I want to draw a level faster. And this routine looks perfect...

UPDATE: I found cpct_drawStringM1_f in CPCtelera. Unfortunately it's a little bit too slow. (1s for the whole screen.)
You see, I try to do my homework ;-) Any comments appreciated.
Title: Re: Fast MODE 2 text printing routines
Post by: pmeier on 16:08, 21 May 18
Sorry, to ask again, but I was not able to speed up cpct_drawStringM1_f() from CPCtelera.I tried to hardcode foreground and background color. This did not improve the speed.

And modifying newfasttext is beyond my skills. I studied http://cpctech.cpc-live.com/source/sixpix.html (http://cpctech.cpc-live.com/source/sixpix.html) which does a mode 2 to mode 1 conversion.

Background: I'm writing a little BASIC game, that displays the MODE 1 text levels with cpct_drawStringM1_f(). It's already quite playable, but faster level switching would improve the fun a lot...
Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 20:40, 21 May 18
Quote from: pmeier on 21:17, 04 May 18
Is there a chance to get newfasttext working for MODE 1?

I'm just programming a very simple game, but now I want to draw a level faster. And this routine looks perfect...

UPDATE: I found cpct_drawStringM1_f in CPCtelera. Unfortunately it's a little bit too slow. (1s for the whole screen.)
You see, I try to do my homework ;-) Any comments appreciated.
Drawing mode 1 and mode 2 text is different, because pixel codification is totally different. cpct_drawStringM1_f is optimized for speed, and it's quite fast compared to other similar routines. If you want to draw text much faster, you probably need to switch to drawing sprites and creating a custom font made of sprites. Drawing coloured text out of ROM character definitions requires converting them to pixel values, and that takes CPU time for modes different than mode 2.

Another thing I don't understand is, are you using text drawing routines to draw maps on screen? Why don't you use sprites for a game? Why using text? Is it that you are programming your game in BASIC and using text drawing routines as RSX commands?
Title: Re: Fast MODE 2 text printing routines
Post by: pmeier on 08:12, 22 May 18
My idea was, just code a simple BASIC game, with the knowledge I already had when I was 12. But then I found these nice assembler routines, which could speed up screen drawing.

Of course you wonder why I just don't code everything in assembler, use sprites etc. But frankly that's beyond my skills. I'm glad so far that I could adapt the method cpct_drawStringM1_f to MAXAM and use fonts from RAM.

My levels have only two colors at the moment. But my hack didn't speed up the code. And of course the level has many spaces. So there is plenty of room for optimization...

Thank you very much for your answer. Maybe I should rework the whole concept. (I don't use RSX, just calls to the assembler routines.)
And of course I already noticed that I have to double the pixels and the examples I found are also well commented, but still a huge challenge for me...
Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 21:52, 22 May 18
Okay, I understand what you are doing. If that is your idea, then I would advice you to do a full-BASIC game as a start. Create your game, engine, animations, etc in BASIC and finish your game. After finishing one complete fully-functional it would be better to accept greater challenges, like adding assembler routines as RSX, program in C or using sprite drawing routines. For instance, to draw screen sprites (like you are doing with characters), CPCtelera's drawTile functions are lightning-fast. Adapting them for following projects could be a great improvement. But I think going step by step is better approach and more rewarding.


You also can look for other approaches like using 8BP, which is an RSX game engine for BASIC games. It could be a nice option for you too. In any case, I always advice to go step by step, enjoying and learning at each phase, and not trying to progress too fast :) .


By the way, I reviewed CPCtelera string drawing routines and I worked out new ways to improve and make them much faster. Your comment gave me a great idea. Thank you :)
Title: Re: Fast MODE 2 text printing routines
Post by: pmeier on 09:05, 23 May 18
Great, you gave me many keywords to improve my project... and finally you say I helped you... cool  ;D
Thanks for your help... put you on my credits list...
Title: Re: Fast MODE 2 text printing routines
Post by: ervin on 12:58, 23 May 18
Quote from: ronaldo on 21:52, 22 May 18
I reviewed CPCtelera string drawing routines and I worked out new ways to improve and make them much faster. Your comment gave me a great idea. Thank you :)


This sounds fantastic!
What sort of improvements have you discovered?
Title: Re: Fast MODE 2 text printing routines
Post by: SRS on 20:38, 23 May 18
Quote from: ervin on 12:58, 23 May 18

This sounds fantastic!
What sort of improvements have you discovered?
Simple (to simple i guess) "solution": Assume Char 32 always blank (as cpctelera reads chars from ROM, so it must be blank) ...

So:

_cpct_drawStringM2::
drsm2_nextChar:   cp #32                          ;; is this "Space" ?
   jr z, drsm2_save        ;; yes, save print
   

   push hl                             ;; [11] Save HL and DE to the stack befor calling draw char
   push de                             ;; [11]
   ld  b, a                            ;; [ 4] B = Next character to be drawn
   call cpct_drawCharM2_asm            ;; [17] Draw next char
   pop  de                             ;; [10] Recover HL and DE from the stack
   pop  hl                             ;; [10]

   ;; Increment pointer values

   drsm2_save:

   inc  de                             ;; [ 6] DE += 1 (point to next position in video memory, 8 pixels to the right)
   inc  hl                             ;; [ 6] HL += 1 (point to next character in the string)

Tried it, you can compare to the original "Easy Strings Example" with this:

Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 07:34, 27 May 18
Quote from: SRS on 20:38, 23 May 18
Simple (to simple i guess) "solution": Assume Char 32 always blank (as cpctelera reads chars
Well, the idea is not so simple. In fact, your idea seems right until you think twice of it. Spaces have to be printed for two reasons: they are not always made of the same colour of the background (you can pick a paper colour when you call drawString and drawChar), and you also cannot safely assume that your background will be empty.


So, I'm afraid to say that this optimization cannot be used, even if it gains some cycles.
Title: Re: Fast MODE 2 text printing routines
Post by: pmeier on 13:37, 27 May 18
Right, I clear the screen previously. The speed gain can be measured and is also noticeable. So thank you. Maybe it's not a candidate for the cpctelera API, but my private API is using current pen/paper colors and blank optimization w/ clear screen and wait for frame flyback.
Title: Re: Fast MODE 2 text printing routines
Post by: SRS on 18:43, 27 May 18
Quote from: ronaldo on 07:34, 27 May 18
Well, the idea is not so simple. In fact, your idea seems right until you think twice of it. Spaces have to be printed for two reasons: they are not always made of the same colour of the background (you can pick a paper colour when you call drawString and drawChar), and you also cannot safely assume that your background will be empty.


So, I'm afraid to say that this optimization cannot be used, even if it gains some cycles.
As said: too simple ;)

But not if you know you do not need to change pen/paper for the blank parts. So there it may save some cycles.

But not in general, absolutely agree with you.
Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 13:36, 10 June 18
Today I had some time to put ideas into code and I have a working version for mode 0. I have made a test and these are the results:   // Testing with 2 simple strings                          | New ver. | Old ver. |
   cpct_drawStringM0("Hello World!", (u8*)0xC000, 3, 5);  // | 10391 us | 14734 us | => ~42% faster
   cpct_drawStringM0("Dolly man",    (u8*)0xC0A0, 1, 9);  // |  7829 us | 11062 us | => ~41% faster
I still have to finish documentation before pushing up the code to development branch, but the improvement is quite remarkable. The improvement comes at a cost of +13 bytes for the total space taken by function code. Although there is still some little room for improvement, it won't be so significant.
In my mental designs, Mode 1 function should improve more than Mode 0. We'll see :)
Title: Re: Fast MODE 2 text printing routines
Post by: ervin on 13:51, 10 June 18
Quote from: ronaldo on 13:36, 10 June 18
Today I had some time to put ideas into code and I have a working version for mode 0. I have made a test and these are the results:   // Testing with 2 simple strings                          | New ver. | Old ver. |
   cpct_drawStringM0("Hello World!", (u8*)0xC000, 3, 5);  // | 10391 us | 14734 us | => ~42% faster
   cpct_drawStringM0("Dolly man",    (u8*)0xC0A0, 1, 9);  // |  7829 us | 11062 us | => ~41% faster
I still have to finish documentation before pushing up the code to development branch, but the improvement is quite remarkable. The improvement comes at a cost of +13 bytes for the total space taken by function code. Although there is still some little room for improvement, it won't be so significant.
In my mental designs, Mode 1 function should improve more than Mode 0. We'll see :)

WOW!!!
I'm really looking forward to trying the new routine!

Any chance you could show the improved code?
;D
Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 16:41, 10 June 18
Quote from: ervin on 13:51, 10 June 18
Any chance you could show the improved code?
Yes, of course :) . As I said before, just needed some time to finish documentation ;) .

You may want to check Strings code folder (https://github.com/lronaldo/cpctelera/tree/development/cpctelera/src/strings) under CPCtelera`s development branch (https://github.com/lronaldo/cpctelera/tree/development).

Functions improved include:Hope you enjoy it ;)
Title: Re: Fast MODE 2 text printing routines
Post by: ronaldo on 21:59, 28 June 18
And there you go, optimized versions for cpct_drawStringM1 are now ready and pushed to CPCtelera's development branch:

   // Testing with a 40-character Mode 1 string.
   // cpct_drawStringM1_f ==> Old fast version  (379 bytes in total)
   // cpct_drawStringM1   ==> New version       (216 bytes in total, 163 bytes less, 43% less space)
   cpct_drawStringM1_f("0123456789012345678901234567890123456789", (u8*)0xC0A0, 3, 5);  // | 24486 us |
   cpct_drawStringM1  ("0123456789012345678901234567890123456789", (u8*)0xC000, 3, 5);  // | 19501 us | => ~25% faster

As you can see, the new version takes 163 bytes less than previous fast version and is ~25% faster.
Moreover, there is an interesting new side effect. It is easy to add a new version of the same function to use your own character set. You only need to place your character set at 0x3800 and then remove the lines at any cpct_drawString/cpctdrawChar function that enable and disable ROM. Drawing is decoupled from either cpct_drawString/cpct_drawChar functions (implemented in cpct_drawCharMx_inner_asm functions) which also enables their direct use. You may create your own versions of cpct_drawString/cpct_drawChar functions calling inner functions and without the cost of including their code in your binary.
Title: Re: Fast MODE 2 text printing routines
Post by: ervin on 00:45, 29 June 18
Fantastic!
Thanks so much for your work on this!
Title: Re: Fast MODE 2 text printing routines
Post by: Widukind on 08:59, 29 June 18
[ot]What an interesting topic this is. I am new to retro programming, but as a former Z80 and ARM assembler programmer I enjoy reading such topics a lot, including Prof Ronaldo's work (http://www.memoryfull.net/articles.php?id=25). We can still learn a lot in retrospect! So please continue. :-)

QuoteI was worried about the decreasing knowledge my students shown about how programs actually work. Then I thought that asking them to create games for the Amstrad CPC could be a great way to force them to deal with low-level stuff and learn from it. I also tend to create ways to transform assignments into real world projects. I don't want my students to hand me what they think I expect from them: I want them to develop real projects for real people.
[/ot]
Powered by SMFPacks Menu Editor Mod