Author Topic: Fast text output in mode 2, ~60,5 nops per character  (Read 693 times)

0 Members and 1 Guest are viewing this topic.

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
OK, so... I had this idea for a new way of printing characters on the screen… ;)

I mean, honestly I don't think that I'm really the first one to think of this, but it's not mentioned anywhere (at least not the places I know...). CPCWiki says, Prodatrons routine is the fastest, but technically, this one ist a tiny bit faster (121735 nops for the whole screen -> 60,5 nops per character).

The way it works is quite simple: Beforehand, the font data must be rearranged so that every (aligned) memory page contains just one line of each character one after the other.
When displayed, the text is looped through 8 times, once for each scanline, and only the respective scanline of the text on the screen is written. That way, the dragging new-line calculation has to occur only 8x (in this case it's just +#30, i.e. jump over the non displayed part in screen memory) and the rest is just "inc" (or ldi, for that matter)

Of course, it has a lot of drawbacks: The text must be prepared as exactly 2000 continuous bytes in memory. All empty spaces have to be filled with " " (#20), control characters won't work (#00 and #ff will even crash the routine badly). The font has to be prepared, there's no reading font data directly from ROM. (And I imagine that all this is reason enough why no serious programmer ever considered this way of doing it really valuable…)
But, apart from speed, there are a few other advantages as well: You don't have to delete the screen first, because all empty positions are "drawn" as well. The routine is quite small in size (and could be extremely reduced in exchange for a little speed). It supports custom fonts – as long as they are arranged the way the routine needs it – and: All of the 256 characters can be used (OK, except for #00 and #ff as mentioned…)

So, please, would the experienced programmers please tell me, who already thought of this method and where it's applied? As I said, I'm sure I'm not the first one... :laugh:
like
0
No reactions

Offline andycadley

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.015
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #1 on: 17:05, 07 June 21 »
It's not a million miles from how a lot of games tender background "tiles" to be honest (although sometimes they use a line by line approach to sacrifice some speed to avoid tearing as much).


Obviously the downside is it means the whole screen needs to be text (or graphics constructed from font entries), but there are probably plenty of places that's true. And doesn't play so well with hardware scrolling, which might be more of a downside for text heavy data, tbh.


Not sure why character codes of #00 or #FF need to be problematic though?
like
0
No reactions

Offline zhulien

  • 6128 Plus
  • ******
  • Posts: 881
  • Country: au
  • aka Vorax
    • 8bitology
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #2 on: 17:28, 07 June 21 »
To me that doesn't sound like he fastest way to output text, but rather a very specific scenario of wanting to fill an entire screen of text fast.


What if you only wanted to print 'Hello, World!' as fast as possible?


likely the fastest test routine for printing just 'Hello, World!' is still faster than 121735 nops.

Similarly for writing to the screen - I think the fastest way to write text to a screen is to detatch it from the physical screen - write to a logical text screen that is in fact linear and non-complicated.  Only when you want to display that logical screen to a physical screen you need to convert to our beloved screen layout as fast as possible - imagine a Protext document (which is likely a linked list of lines) is their method of a virtual screen - but a virtual screen could also be a 2000 character block of ram.  How fast can you write to that?  If you have multiple tasks running at once, each writing to their own 2000 character logical screen, press ctrl+1 (for example) to display the logical screen to the physical, ctrl+2 to display logical screen 2 etc... you can have very fast writing to the logical screen if you are not looking at it.
like
0
No reactions

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #3 on: 17:57, 07 June 21 »
Not sure why character codes of #00 or #FF need to be problematic though?
Because of the LDI command for copying:
If C were to be 0, B would be decremented and the wrong 2nd character would be displayed
And if L is #ff, H would be incremented and we'd get the data for the next scanlines
like
0
No reactions

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #4 on: 18:08, 07 June 21 »
To me that doesn't sound like he fastest way to output text, but rather a very specific scenario of wanting to fill an entire screen of text fast.
Of course, I'm afraid I didn't mention clearly enough that this is indeed for a very specific scenario, t.i. filling the whole CPC screen with characters in mode 2. Bragging with 121735 / 60.5 nops, I just wanted to refer/compare to the speed indications of the other methods listed in the "Source Code" section of the Wiki.

But, as I stated myself, you are right, in most cases, this is not very practical, because doing the layout of a text to fit in 2000 bytes would need probably morde computation time than using a char by char routine and processing carriage returns etc. on the fly.

The routine could of course be adapted to print just one line or a fixed amount of characters on a specific position on the screen and I guess it would not be as bad in the speed competition neither.
like
0
No reactions

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #5 on: 18:20, 07 June 21 »
What if you only wanted to print 'Hello, World!' as fast as possible?
Code: [Select]
hello_text: db "Hello, world! "

fast_print_mode2:
di                     ;b/c we mess with the stack
ld iy,hello_text
ld (spgarage),sp       ;backup sp
ld de,#c000            ;start of screen memory
ld h,font_table/#100   ;1st line of symbol data
  lineloop:
  ld sp,iy             ;every newline -> restart text from beginning
      repeat 7     
      pop bc       ;3  ;pop fetches two characters at once
      ld l,c       ;1  ;point HL to font data of 1st character
      ldi          ;5  ;transfer to screen
      ld l,b       ;1  ;same with the/
      ldi          ;5  ;/other character
      rend             
  inc h                ;next line in font memory (is aligned)
  ld a,e:sub 14:ld e,a
  ld a,d:add 8:ld d,a       
  jp nz,lineloop       ;and if d=0 we're done
ld sp,(spgarage)       ;restore sp
ei
~70 nops/char... not horribly slow either...  ;D
like
0
No reactions

Offline andycadley

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.015
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #6 on: 18:31, 07 June 21 »
Because of the LDI command for copying:
If C were to be 0, B would be decremented and the wrong 2nd character would be displayed
And if L is #ff, H would be incremented and we'd get the data for the next scanlines
Is LDI the fastest way here though? You don't really need 16-bit incrementing because you know it will only change the high byte every 256 characters. My gut says this can be unrolled differently to cope with that and get a better overall performance though I haven't looked hard enough at timings to confirm...
like
0
No reactions

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #7 on: 18:36, 07 June 21 »
Is LDI the fastest way here though?
Well, as far as I understand it, without LDI we'd need 4 nops for the data transfer and 2 for Inc E and L, that's 6 while LDI takes only 5 nops. But there might be another way?
like
0
No reactions

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 308
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #8 on: 19:58, 07 June 21 »
You are right, many thought of that! Typically it's not worth it. An alternative way is to arrange the font in Gray order so you only have RES/SET one bit to change line.


Orgams' routine is a bit slower than Prodatron's & al:
 - 57 Nops for space.
 - 75 Nops for other chars.
 - So ~72 Nops in average for english text (and less for assembly).
This includes string management and properly looping at C7FF->C800 when screen has scrolled.


That's still very fast, as the 'type' command in the monitor would demonstrate.


I completely agree with @zhulien : your routine would only be faster in scenarios ... that might seldom occur.
A fast text-line clear routine is ~8 Nops per character.
If a string is 60 chars long rather than 80, Prodatron's would take 80*8 + 60*65 (3890) < 80*60.5 (4840).


Now, if the line is already cleared, displaying space becomes even faster: you just skip it, lowering evermore the average per char.


Related: http://www.cpcwiki.eu/forum/programming/fast-mode-2-text-printing-routines/

like
0
No reactions

Offline GUNHED

  • 6128 Plus
  • ******
  • Posts: 2.844
  • Country: de
  • Reincarnation of TFM
    • FutureOS - The quickest OS for the CPC and Plus
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #9 on: 21:04, 07 June 21 »
60 NOPs is very nice, even with FutureOS the minimum is 45 NOPs per character.
like
0
No reactions
http://futureos.de --> Get the revolutionary FutureOS (Update: 2021.01.24)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.07.15)

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #10 on: 21:22, 07 June 21 »
Related: http://www.cpcwiki.eu/forum/programming/fast-mode-2-text-printing-routines/

Thank you for this link, I was looking for a thread like that before posting this but I apparently used the wrong keywords... I will devour it later!
Quote
You are right, many thought of that! Typically it's not worth it.
Yeah, I start to understand that. I played around a little bit with ways to prepare the layout, but even if I only leave the loop when finding #0D#0A (and fill the rest of the line with " " instead), it takes too much time and "my" display method is slower in the end.
like
0
No reactions

Offline arnolde

  • CPC464
  • **
  • Posts: 22
  • Country: at
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #11 on: 21:53, 07 June 21 »
http://www.cpcwiki.eu/forum/programming/fast-mode-2-text-printing-routines/
OK. His main routine is literally the same as mine:  :picard2:
Code: [Select]
pop bc
ld l,c
ldi
ld l,b
ldi
But he also has a quite clever iteration calculation so he can display all lengths and not just a whole page.
like
0
No reactions

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 308
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #12 on: 07:04, 08 June 21 »
60 NOPs is very nice, even with FutureOS the minimum is 45 NOPs per character.

Impressive. I wonder how you get this number. It seems either too big, too small or irrelevant  :)

Now on a related note, I plan to make the ultimate hex editor (see https://www.cpcwiki.eu/forum/applications/monogams-to-behave-more-like-a-file-based-hex-editor/),
with a memory dump so fast your CPC will age slower than you.
I also plan to use a dedicated mini font (e.g. height either 6 or 7 lines rather than 8) for more real estate.
like
0
No reactions

Offline GUNHED

  • 6128 Plus
  • ******
  • Posts: 2.844
  • Country: de
  • Reincarnation of TFM
    • FutureOS - The quickest OS for the CPC and Plus
    • Awards
Re: Fast text output in mode 2, ~60,5 nops per character
« Reply #13 on: 14:42, 08 June 21 »
Impressive. I wonder how you get this number. It seems either too big, too small or irrelevant  :)
All explained in the topic you posted before. In brief: smart control codes.

I also plan to use a dedicated mini font (e.g. height either 6 or 7 lines rather than 8)  for more real estate.
Funny! With FutureTex I use 9 scanlines and the middle 7 of them are for character data (for most chars), just to make visibility better. You definitely must have gotten eagle eyes.  :o
« Last Edit: 14:46, 08 June 21 by GUNHED »
like
0
No reactions
http://futureos.de --> Get the revolutionary FutureOS (Update: 2021.01.24)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.07.15)