- Top Stats

Top Posters Top Topic Starters Top Topics
arnoldemu 1128
TFM 1124
ervin 745
ronaldo 403
arnoldemu 51
ervin 42
EgoTrip 35
Arnaud 33
FutureOS corner - 70621 Views TFM/FS 07:12, 01 October 10
Example Z80 as... - 65970 Views arnoldemu 10:59, 04 April 10
#CPCtelera 1.4... - 32956 Views ronaldo 13:59, 11 May 15
BASIC programm... - 30837 Views arnoldemu 16:23, 22 May 09
CHAMP Assemble... - 24948 Views Bruce Abbo... 12:59, 16 January 14

Author Topic: Ingenius uses of the index register where CPU cycles get saved  (Read 240 times)

0 Members and 1 Guest are viewing this topic.

Offline AHack

  • CPC664
  • ***
  • Posts: 72
  • Country: au
  • Liked: 113
I was looking at ways to improve my map unpacker where a byte holds 4 pixels worth of data at the cost of 4 colours in an 8x8 block. Each 8x8 block can have a colour attribute for it. Pretty much how the C64 uses its chars. The unpacker processes to create a strip of graphics for the CRTC scroll update. There's really fast ways to do this at the cost of memory but because I'm working within 64KB limits I need to think about memory usage.

I'm willing to sacrifice memory for a 256 colour LUT.

Each byte of map data works this way:
LR of first 2 pixels packed like 01010101
LR of the second 2 pixels packed like 10101010
And combined will be 11111111

The LUT holds pixel data for 2 mode 0 pixels and expects the data to be 01010101. Where the zeros are you would set a value in anyone to obtain a possible combination of 16 to obtain 2 mode 0 pixels.

For the solution I came up with this using IX:

ld b,(ix+position in map)

ld a,b
and d
or e ; combine the colour attribute bits
ld l,a
ld c,(hl) ; get 2 mode 0 pixels

ld a,b
and d
or e ; combine the colour attribute bits
ld l,a
ld b,(hl) ; get 2 mode 0 pixels
push bc ; push out 4 mode 0 pixels

This is unrolled for the length of the graphics strip and it's output is 2 pixels times the length of the strip. I tried many combinations of different ways of doing this and this is far the quickest. I think the reason why IX works better here is because it holds information doubled up and you don't need to add to the address to get to the next bit of data. Also the stack push saves updating the address for that.

Anyone eles used the index registers in clever ways to save cycles?
« Last Edit: 22:02, 15 May 19 by AHack »

Offline andycadley

  • Supporter
  • 6128 Plus
  • *
  • Posts: 826
  • Liked: 371
I occasionally like to use JP (IX) or JP (IY) in loops as it's faster than JP nnnn but doesn't hog a useful register in the way JP (HL) does.

It's always worth remembering they are there though. A lot of Z80 coders get into the mindset of "don't use Index registers, they're slow" but the workarounds, particularly when spare registers are scarce, is sometimes worse overall.

Offline GUNHED

  • 6128 Plus
  • ******
  • Posts: 1.108
  • Country: de
  • Reincarnation of TFM
  • Liked: 581
Even more coders forgot about the second register set. Using them properly really gives a boost. Sadly the native OS is not using them (except for some interrupt stuff). They came form 8080 obviously.  ;)
http://futureos.de --> Get the revolutionary FutureOS (Recent update: 2019.01.14)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Ver.: 2018.08.15)

Offline AHack

  • CPC664
  • ***
  • Posts: 72
  • Country: au
  • Liked: 113
Yep, the second register set is crucial to crititical inner loops... I find them best to use to load and do work on the A register then swap with ex, then do more work with A, then store and swap back. Also the use of the undocmented instructions like ixl, ixh, iyl, and iyh as they are not as bad as their 16bit counterparts. Sometimes you can avoid an ex and use something like or ixl which is quicker than doing ex,ex combinations.

But I can understand why programers avoid the 16bit index registers... it's so hard to justify using them. I use them on rare occasions like my example above because that was the quickest way out all the ways I tried.