CPCWiki forum

General Category => Programming => Topic started by: Cwiiis on 18:20, 24 October 21

Title: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 18:20, 24 October 21
So I have a structure in memory that, at least for compact, readable code, it makes sense to use an index register to access. Within it are a couple of 16-bit values that depending on the contents of a, get incremented or decremented.

Increment isn't a big deal;

    inc (ix+Struct1)
    jr nz,NextThing
    inc (ix+Struct1+1)
    jr NextThing


But the problem comes with decrement... Given it doesn't affect the carry flag, I don't see a way of easily doing the decrement without also affecting the a register, which I'd like to preserve... Currently I load it into hl, decrement hl and load back, but that's going to be twice as slow as the above.

    ld l,(ix+Struct1)
    ld h,(ix+Struct1+1)
    dec hl
    ld (ix+Struct1),l
    ld (ix+Struct1+1),h


I guess I could do something like this instead:

    ld h,a
    ld a,&ff
    add (ix+Struct1)
    ld (ix+Struct1),a
    ld a,h
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing


But that just seems so arcane... Am I missing something even simpler/better?
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 18:25, 24 October 21
Heh, of course as soon as I type it out, I realise I actually don't need to preserve a and so that second one reads fine without the ld h,a / ld a,h... Still interested to hear if anyone has anything faster though :)
Title: Re: Fast decrement of 16-bit value via index register?
Post by: fgbrain on 18:55, 24 October 21

why use IX in the first place?
its much faster to use HL instead..


Quote

ld hl, ix + Struct1

dec (hl)   ;  OR inc (hl)



and you're done.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 19:36, 24 October 21
Quote from: fgbrain on 18:55, 24 October 21

why use IX in the first place?
its much faster to use HL instead..





and you're done.

Indeed, this would be quicker if it wasn't part of a larger structure with multiple values - I did do this without IX first, but the amount of address manipulation required made the code pretty unreadable and though likely quicker, wasn't so much quicker to be worth the cost in readability (at least for now). The examples I gave above are dummy examples, not the actual code.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 19:42, 24 October 21
Quote from: fgbrain on 18:55, 24 October 21

why use IX in the first place?
its much faster to use HL instead..





and you're done.

Oh, I actually misread this - you're not done there because that's an 8-bit inc/dec, but it is faster in the case of over/underflow as it only has the one IX access in that case instead of the current two. Nice :)
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Prodatron on 20:14, 24 October 21
Quote from: fgbrain on 18:55, 24 October 21ld hl, ix + Struct1
Well, such a Z80 opcode doesn't exist :) but I guess you mean something like this:
Instead of...


LD IX,datarecord
...
do some stuff with (IX+struct1)...


...you would do...


LD HL,datarecord+struct1
...
do some stuff with (HL)


But as soon as you have multiple data records with the same structure, you would have to patch your code or use e.g. LD BC,struct1:ADD HL,BC to achieve the same what you can do with IX.


But this could still be faster, if you don't need to manipulate more things inside the data record.


Example:


ld ix,datarecord      ;4
[...]
ld l,(ix+Struct1)     ;5
ld h,(ix+Struct1+1)   ;5
dec hl                ;2
ld (ix+Struct1),l     ;5
ld (ix+Struct1+1),h   ;5 -> 26 NOPs


Vs.:


ld hl,datarecord      ;3
[...]
ld bc,Struct1         ;3
add hl,bc             ;3
ld c,(hl)             ;2
inc hl                ;2
ld b,(hl)             ;2
dec bc                ;2

ld (hl),b             ;2
dec hl                ;2
ld (hl),c             ;2 -> 23 NOPs


If you have several more things to do with your data structure, especially if the affected bytes are not very close to each other, the IX-usage could be faster again. That can be different from case to case.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Prodatron on 20:26, 24 October 21
Summary: if you can move through your data structure in a sequential way, the HL-methode is faster. If you have to access it in very a random way, the IX-methode ist faster.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: m_dr_m on 20:36, 24 October 21
Quote from: Cwiiis on 18:20, 24 October 21But the problem comes with decrement...

Then store the opposite of the value and increment it!


Also: consider structure of arrays rather than array of structures.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: andycadley on 21:41, 24 October 21
If you'll never have more than 256 records you can page align things so you only need Inc h to move to the next field of a record and Inc l to move between records. You can also jump to specific points in either case by directly loading h or l with the relevant offsets.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: fgbrain on 22:29, 24 October 21

Quote
Oh, I actually misread this - you're not done there because that's an 8-bit inc/dec, but it is faster in the case of over/underflow as it only has the one IX access in that case instead of the current two. Nice
my bad, you are right.


Back to original question,carry flag is not affected with the INC as well with DEC.


perhaps then some other flag is suitable for your task ??
Title: Re: Fast decrement of 16-bit value via index register?
Post by: eto on 09:37, 25 October 21
Quote from: fgbrain on 22:29, 24 October 21

perhaps then some other flag is suitable for your task ??


DEC affects the P/V flag, but only for 80h.


As A doesn't need to be preserved, might this work?



dec (ix+Struct1)
ld a,(ix+Struct1)
cp a,&ff
jr nz,nextThing
dec (ix+Struct1+1)
jr nextThing
Title: Re: Fast decrement of 16-bit value via index register?
Post by: MaV on 12:11, 25 October 21

Not tested, but should be a viable solution.


    xor a
    cp (ix+datastruct)
    jr nz, dec_low_byte_only
    dec (ix+datastruct+1)
dec_low_byte_only:
    dec (ix+datastruct)
    jr somewhereelse


NOPs when not jumping:
1
5
2
6
6
3
--
23

NOPs when jumping:
1
5
3
6
3
--
18
Title: Re: Fast decrement of 16-bit value via index register?
Post by: MaV on 12:44, 25 October 21
However, your solution does not look arcane to me, @Cwiiis (https://www.cpcwiki.eu/forum/index.php?action=profile;u=3482) , if you'd just change the &ff to -1 and let the assembler do its work (I deleted the two unnecessary instructions, since you stated that A does not need to be preserved).
"Add -1 to the first byte, (save it), and if it underflows decrement the second byte" reads just fine.




    ld a, -1
    add (ix+Struct1)
    ld (ix+Struct1), a
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing



exit early:
2
5
5
3
--
15


decrement high byte as well:
2
5
5
2
6
3
--
23

Title: Re: Fast decrement of 16-bit value via index register?
Post by: m_dr_m on 12:46, 25 October 21
If you are fine with 15 bits values encoded bizarrely, you can test Sign flag after the first dec.
But that has no avantage over the -val encoding.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 13:47, 25 October 21
Does seem like my last solution was fine really, perhaps I wasn't missing much in the end - in this particular case, struct of arrays doesn't really make sense and I think this is probably good enough. Could maybe save some cycles not using IX, at the expense of readability but at this point, it's definitely premature optimisation.

Thanks for all the discussion, has definitely helped :) I think I'm only a few routines away from being able to demo something, though limited time makes progress pretty slow...
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Urusergi on 13:46, 01 November 21

    ld a, -1
    add (ix+Struct1)
    ld (ix+Struct1), a
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing


I really liked the routine, but at the same speed I think 14 is better than 15 bytes, isn't it?  8)


    xor a
    cp (ix+Struct1)
    dec (ix+Struct1)
    jr c,NextThing
    dec (ix+Struct1+1)
    jr NextThing



Cheers.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Prodatron on 23:32, 01 November 21
That looks promising, unfortunately the NOP you win with XOR A compared to LD A,-1 you loose with

dec (ix+Struct1)

instead of

ld (ix+Struct1), a

as second is just a write (5 Nops), while first one (6 Nops) is a read/write.
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Urusergi on 00:55, 02 November 21
Quote from: Prodatron on 23:32, 01 November 21
That looks promising, unfortunately the NOP you win with XOR A compared to LD A,-1 you loose with

dec (ix+Struct1)

instead of

ld (ix+Struct1), a

as second is just a write (5 Nops), while first one (6 Nops) is a read/write.


Yes, I'm aware, that's why I said -at the same speed-  ;D  but in this case we gain one byte  8)
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Prodatron on 01:24, 02 November 21
Ops you are right!
So you win probably? your code is the most optimized!
Always cool to see that there is still some more optimization possible with Z80 code
Title: Re: Fast decrement of 16-bit value via index register?
Post by: m_dr_m on 10:31, 02 November 21
If you don't mind more variation in time taken and greater max time:



dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next



Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5


Still slower than my other suggestions, though!
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Cwiiis on 11:06, 02 November 21
Quote from: m_dr_m on 10:31, 02 November 21
If you don't mind more variation in time taken and greater max time:



dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next



Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5


Still slower than my other suggestions, though!


Ah, this is nice and I hadn't thought of this :) I think I need to prepare for the worst branch to be taken, so this may or may not work, but there may be things I can do to mitigate that, in which case this could be the base for the best solution... Very nice, thanks!
Title: Re: Fast decrement of 16-bit value via index register?
Post by: Urusergi on 21:43, 02 November 21
Quote from: Prodatron on 01:24, 02 November 21Ops you are right!So you win probably? your code is the most optimized!Always cool to see that there is still some more optimization possible with Z80 code

Thanks, but it's the most optimized in size, speed isn't the best I would like. @m_dr_m (https://www.cpcwiki.eu/forum/index.php?action=profile;u=3015) is the winner  8)

Quote from: m_dr_m on 10:31, 02 November 21
dec (ix+struct1)
jp p,.next
ld a,(ix+struct1)
inc a
jr nz,.next
dec (ix+struct1+1)
.next



Early exit 1: 9 (half of the time)
Early exit 2: 18 (~half of the time)
Early exit average: 13.5

Very intelligent! I like it so much. I think your code is the fastest possible, according to the requirements of @Cwiiis (https://www.cpcwiki.eu/forum/index.php?action=profile;u=3482)
Title: Re: Fast decrement of 16-bit value via index register?
Post by: m_dr_m on 22:11, 02 November 21

Edit: That's basically @andycadley (https://www.cpcwiki.eu/forum/index.php?action=profile;u=327) 's suggestion!
Edit2: With hl, the add -1 trick is shorter and very slightly faster.

Indexed addressing is so slow that you might consider doing your own indexing.
If you have few structs, you can align them every &100.



; H = Struct MSB
ld l,struct1
ld a,(hl):add -1;ld (hl),a
jr c,.next
[...]



If you have few fields, and no more than 128 structs (or to be more precise, 256/n where n=size of biggest field):



; L = Struct LSB
ld h,struct1
[...]


Dropping the average time to ~11 nops.
Powered by SMFPacks Menu Editor Mod