Author Topic: Fast decrement of 16-bit value via index register?  (Read 970 times)

0 Members and 1 Guest are viewing this topic.

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
So I have a structure in memory that, at least for compact, readable code, it makes sense to use an index register to access. Within it are a couple of 16-bit values that depending on the contents of a, get incremented or decremented.

Increment isn't a big deal;

Code: [Select]
    inc (ix+Struct1)
    jr nz,NextThing
    inc (ix+Struct1+1)
    jr NextThing

But the problem comes with decrement... Given it doesn't affect the carry flag, I don't see a way of easily doing the decrement without also affecting the a register, which I'd like to preserve... Currently I load it into hl, decrement hl and load back, but that's going to be twice as slow as the above.

Code: [Select]
    ld l,(ix+Struct1)
    ld h,(ix+Struct1+1)
    dec hl
    ld (ix+Struct1),l
    ld (ix+Struct1+1),h

I guess I could do something like this instead:

Code: [Select]
    ld h,a
    ld a,&ff
    add (ix+Struct1)
    ld (ix+Struct1),a
    ld a,h
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing

But that just seems so arcane... Am I missing something even simpler/better?
like
1
Members reacted like:
Urusergi,

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #1 on: 20:25, 24 October 21 »
Heh, of course as soon as I type it out, I realise I actually don't need to preserve a and so that second one reads fine without the ld h,a / ld a,h... Still interested to hear if anyone has anything faster though :)
like
0
No reactions

Offline fgbrain

  • CPC6128
  • ****
  • Posts: 249
  • Country: gr
    • index.php?action=treasury
    • Chaos CPC Homepage
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #2 on: 20:55, 24 October 21 »

why use IX in the first place?
its much faster to use HL instead..


Quote


ld hl, ix + Struct1

dec (hl)   ;  OR inc (hl)



and you're done.
like
0
No reactions
_____

6128 (UK keyboard, Crtc type 0/2), 6128+ (UK keyboard), 3.5" and 5.25" drives, Reset switch and Digiblaster (selfmade), Inicron Romram box, Bryce Megaflash, SVideo & PS/2 mouse, , Magnum Lightgun, X-MEM, X4 Board, C4CPC, Multiface2 X4, RTC X4 and Gotek USB Floppy emulator.

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #3 on: 21:36, 24 October 21 »

why use IX in the first place?
its much faster to use HL instead..





and you're done.

Indeed, this would be quicker if it wasn't part of a larger structure with multiple values - I did do this without IX first, but the amount of address manipulation required made the code pretty unreadable and though likely quicker, wasn't so much quicker to be worth the cost in readability (at least for now). The examples I gave above are dummy examples, not the actual code.
like
1
Members reacted like:
eto,

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #4 on: 21:42, 24 October 21 »

why use IX in the first place?
its much faster to use HL instead..





and you're done.

Oh, I actually misread this - you're not done there because that's an 8-bit inc/dec, but it is faster in the case of over/underflow as it only has the one IX access in that case instead of the current two. Nice :)
like
0
No reactions

Offline Prodatron

  • 6128 Plus
  • ******
  • Posts: 877
  • Country: de
  • Back on the Z80
    • index.php?action=treasury
    • SymbOS SYmbiosis Multitasking Based Operating System
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #5 on: 22:14, 24 October 21 »
ld hl, ix + Struct1
Well, such a Z80 opcode doesn't exist :) but I guess you mean something like this:
Instead of...


LD IX,datarecord
...
do some stuff with (IX+struct1)...


...you would do...


LD HL,datarecord+struct1
...
do some stuff with (HL)


But as soon as you have multiple data records with the same structure, you would have to patch your code or use e.g. LD BC,struct1:ADD HL,BC to achieve the same what you can do with IX.


But this could still be faster, if you don't need to manipulate more things inside the data record.


Example:

Code: [Select]
ld ix,datarecord      ;4
[...]
ld l,(ix+Struct1)     ;5
ld h,(ix+Struct1+1)   ;5
dec hl                ;2
ld (ix+Struct1),l     ;5
ld (ix+Struct1+1),h   ;5 -> 26 NOPs

Vs.:

Code: [Select]
ld hl,datarecord      ;3
[...]
ld bc,Struct1         ;3
add hl,bc             ;3
ld c,(hl)             ;2
inc hl                ;2
ld b,(hl)             ;2
dec bc                ;2

ld (hl),b             ;2
dec hl                ;2
ld (hl),c             ;2 -> 23 NOPs

If you have several more things to do with your data structure, especially if the affected bytes are not very close to each other, the IX-usage could be faster again. That can be different from case to case.
« Last Edit: 22:28, 24 October 21 by Prodatron »
like
0
No reactions

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

Offline Prodatron

  • 6128 Plus
  • ******
  • Posts: 877
  • Country: de
  • Back on the Z80
    • index.php?action=treasury
    • SymbOS SYmbiosis Multitasking Based Operating System
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #6 on: 22:26, 24 October 21 »
Summary: if you can move through your data structure in a sequential way, the HL-methode is faster. If you have to access it in very a random way, the IX-methode ist faster.
like
0
No reactions

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 324
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #7 on: 22:36, 24 October 21 »
But the problem comes with decrement...

Then store the opposite of the value and increment it!


Also: consider structure of arrays rather than array of structures.
« Last Edit: 22:38, 24 October 21 by m_dr_m »
like
1
Members reacted like:
Cwiiis,

Offline andycadley

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.022
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #8 on: 23:41, 24 October 21 »
If you'll never have more than 256 records you can page align things so you only need Inc h to move to the next field of a record and Inc l to move between records. You can also jump to specific points in either case by directly loading h or l with the relevant offsets.
like
1
Members reacted like:
m_dr_m,

Offline fgbrain

  • CPC6128
  • ****
  • Posts: 249
  • Country: gr
    • index.php?action=treasury
    • Chaos CPC Homepage
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #9 on: 00:29, 25 October 21 »

Quote
Oh, I actually misread this - you're not done there because that's an 8-bit inc/dec, but it is faster in the case of over/underflow as it only has the one IX access in that case instead of the current two. Nice
my bad, you are right.


Back to original question,carry flag is not affected with the INC as well with DEC.


perhaps then some other flag is suitable for your task ??
like
0
No reactions
_____

6128 (UK keyboard, Crtc type 0/2), 6128+ (UK keyboard), 3.5" and 5.25" drives, Reset switch and Digiblaster (selfmade), Inicron Romram box, Bryce Megaflash, SVideo & PS/2 mouse, , Magnum Lightgun, X-MEM, X4 Board, C4CPC, Multiface2 X4, RTC X4 and Gotek USB Floppy emulator.

Offline eto

  • Supporter
  • 6128 Plus
  • *
  • Posts: 507
  • Country: de
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #10 on: 11:37, 25 October 21 »

perhaps then some other flag is suitable for your task ??


DEC affects the P/V flag, but only for 80h.


As A doesn't need to be preserved, might this work?


Code: [Select]
dec (ix+Struct1)
ld a,(ix+Struct1)
cp a,&ff
jr nz,nextThing
dec (ix+Struct1+1)
jr nextThing
like
0
No reactions

Offline MaV

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.126
  • Country: at
  • Ius summum saepe summa est malitia.
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #11 on: 14:11, 25 October 21 »

Not tested, but should be a viable solution.

Code: [Select]
    xor a
    cp (ix+datastruct)
    jr nz, dec_low_byte_only
    dec (ix+datastruct+1)
dec_low_byte_only:
    dec (ix+datastruct)
    jr somewhereelse

NOPs when not jumping:
1
5
2
6
6
3
--
23

NOPs when jumping:
1
5
3
6
3
--
18
like
0
No reactions
Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

Offline MaV

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.126
  • Country: at
  • Ius summum saepe summa est malitia.
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #12 on: 14:44, 25 October 21 »
However, your solution does not look arcane to me, @Cwiiis , if you'd just change the &ff to -1 and let the assembler do its work (I deleted the two unnecessary instructions, since you stated that A does not need to be preserved).
"Add -1 to the first byte, (save it), and if it underflows decrement the second byte" reads just fine.



Code: [Select]
    ld a, -1
    add (ix+Struct1)
    ld (ix+Struct1), a
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing



exit early:
 2
 5
 5
 3
--
15


decrement high byte as well:
 2
 5
 5
 2
 6
 3
--
23

like
2
Members reacted like:
Urusergi,m_dr_m,
Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 324
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #13 on: 14:46, 25 October 21 »
If you are fine with 15 bits values encoded bizarrely, you can test Sign flag after the first dec.
But that has no avantage over the -val encoding.
« Last Edit: 11:32, 02 November 21 by m_dr_m »
like
0
No reactions

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #14 on: 15:47, 25 October 21 »
Does seem like my last solution was fine really, perhaps I wasn't missing much in the end - in this particular case, struct of arrays doesn't really make sense and I think this is probably good enough. Could maybe save some cycles not using IX, at the expense of readability but at this point, it's definitely premature optimisation.

Thanks for all the discussion, has definitely helped :) I think I'm only a few routines away from being able to demo something, though limited time makes progress pretty slow...
like
2
Members reacted like:
MaV,m_dr_m,

Offline Urusergi

  • CPC6128
  • ****
  • Posts: 252
  • Country: es
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #15 on: 14:46, 01 November 21 »
Code: [Select]
    ld a, -1
    add (ix+Struct1)
    ld (ix+Struct1), a
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing

I really liked the routine, but at the same speed I think 14 is better than 15 bytes, isn't it?  8)

Code: [Select]
    xor a
    cp (ix+Struct1)
    dec (ix+Struct1)
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing


Cheers.
like
4
Members reacted like:
dthrone,MaV,Cwiiis,m_dr_m,

Offline Prodatron

  • 6128 Plus
  • ******
  • Posts: 877
  • Country: de
  • Back on the Z80
    • index.php?action=treasury
    • SymbOS SYmbiosis Multitasking Based Operating System
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #16 on: 00:32, 02 November 21 »
That looks promising, unfortunately the NOP you win with XOR A compared to LD A,-1 you loose with

dec (ix+Struct1)

instead of

ld (ix+Struct1), a

as second is just a write (5 Nops), while first one (6 Nops) is a read/write.
like
0
No reactions

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

Offline Urusergi

  • CPC6128
  • ****
  • Posts: 252
  • Country: es
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #17 on: 01:55, 02 November 21 »
That looks promising, unfortunately the NOP you win with XOR A compared to LD A,-1 you loose with

dec (ix+Struct1)

instead of

ld (ix+Struct1), a

as second is just a write (5 Nops), while first one (6 Nops) is a read/write.


Yes, I'm aware, that's why I said -at the same speed-  ;D  but in this case we gain one byte  8)
 
like
1
Members reacted like:
m_dr_m,

Offline Prodatron

  • 6128 Plus
  • ******
  • Posts: 877
  • Country: de
  • Back on the Z80
    • index.php?action=treasury
    • SymbOS SYmbiosis Multitasking Based Operating System
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #18 on: 02:24, 02 November 21 »
Ops you are right!
So you win probably? your code is the most optimized!
Always cool to see that there is still some more optimization possible with Z80 code
like
2
Members reacted like:
Urusergi,m_dr_m,

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 324
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #19 on: 11:31, 02 November 21 »
If you don't mind more variation in time taken and greater max time:


Code: [Select]
dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next


Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5


Still slower than my other suggestions, though!
like
3
Members reacted like:
Cwiiis,Urusergi,Prodatron,

Offline Cwiiis

  • CPC664
  • ***
  • Posts: 137
  • Country: gb
    • Blog
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #20 on: 12:06, 02 November 21 »
If you don't mind more variation in time taken and greater max time:


Code: [Select]
dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next


Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5


Still slower than my other suggestions, though!


Ah, this is nice and I hadn't thought of this :) I think I need to prepare for the worst branch to be taken, so this may or may not work, but there may be things I can do to mitigate that, in which case this could be the base for the best solution... Very nice, thanks!
like
1
Members reacted like:
m_dr_m,

Offline Urusergi

  • CPC6128
  • ****
  • Posts: 252
  • Country: es
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #21 on: 22:43, 02 November 21 »
Ops you are right!So you win probably? your code is the most optimized!Always cool to see that there is still some more optimization possible with Z80 code

Thanks, but it's the most optimized in size, speed isn't the best I would like. @m_dr_m is the winner  8)

Code: [Select]
dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next


Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5

Very intelligent! I like it so much. I think your code is the fastest possible, according to the requirements of @Cwiiis
like
2
Members reacted like:
m_dr_m,Prodatron,

Offline m_dr_m

  • 464 Plus
  • *****
  • Posts: 324
  • Country: gb
  • http://orgams.wikidot.com/
    • OrgaMS!
    • Awards
Re: Fast decrement of 16-bit value via index register?
« Reply #22 on: 23:11, 02 November 21 »

Edit: That's basically @andycadley 's suggestion!
Edit2: With hl, the add -1 trick is shorter and very slightly faster.

Indexed addressing is so slow that you might consider doing your own indexing.
If you have few structs, you can align them every &100.


Code: [Select]
; H = Struct MSB
ld l,struct1
ld a,(hl):add -1;ld (hl),a
jr c,.next
[...]


If you have few fields, and no more than 128 structs (or to be more precise, 256/n where n=size of biggest field):


Code: [Select]
; L = Struct LSB
ld h,struct1
[...]

Dropping the average time to ~11 nops.
« Last Edit: 23:19, 02 November 21 by m_dr_m »
like
4
Members reacted like:
Urusergi,roudoudou,Prodatron,MaV,