News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_ronaldo

Possible bug in SDCC 3.5.0

Started by ronaldo, 20:52, 18 July 15

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ronaldo

I was doing tests while introducing new optimizations in CPCtelera, using __z88dk_callee calling convention, when this problem arised.

In the Sprites example, there is a function drawLogo with this code:

void drawLogo() {
    u8* pvideo;
    cpct_clearScreen_f64(0);
    cpct_setPalette(G_logo_palette, 4);
    cpct_setVideoMode(1);     
    pvideo = cpct_getScreenPtr(SCR_VMEM, 20, 4);

    // Relevant part: Function ends with a Call to other Function
    cpct_drawSprite(G_CPCt_logo, pvideo, LOGO_W, LOGO_H);
}

As the function ends with a call to other function, when using __z88dk_callee convention, the generated code does a optimization that may result in a bug. This is the generated code for the relevant part:

;; ... previous code from drawLogo function ....

;src/main.c:84: cpct_drawSprite(G_CPCt_logo, pvideo, LOGO_W, LOGO_H);
    ld    c, l
    ld    b, h
    ld    de,#_G_CPCt_logo
    ld    hl,#0xBF28
    push    hl
    push    bc
    push    de
    jp    _cpct_drawSprite   ;;; <<<<<<==== THIS OPTIMIZATION MAY BE FAULTY
;src/main.c:90: void main(void) {
;    ---------------------------------
; Function main
; ---------------------------------
_main::

;; ... code for main function continues here ....

As __z88dk_callee convention expects the callee to clean up the stack, SDCC uses a JP instead of a CALL to call the cpct_drawSprite function. This aims at returning directly to drawLogo caller instead of returning first to drawLogo and then executing another RET to return to the caller.

However, the problem here is not using CALL. CALL inserts the return address at the top of the stack, and this generated code DOES NOT. cpct_drawSprite function is hand-coded in assembly and has no way to know if this optimization has been applied or not. This means, effectively, that there is no way to know if parameters start at SP+2 or at SP. So, cpct_drawSprite gets confused interpreting first parameter as return address, as it expects to be called by a CALL instruction.

Using standard SDCC calling convention does not show this problem, as no optimization is done by the compiler (a normal CALL is performed).

Shall we consider this a bug? Should this optimization take this into account and POP/PUSH the return address for the callee to found it at the top of the stack?

FloppySoftware

Quote from: ronaldo on 20:52, 18 July 15
Shall we consider this a bug? Should this optimization take this into account and POP/PUSH the return address for the callee to found it at the top of the stack?
IMHO: Yes, of course.
Even the standard strcpy function included in the SDCC Z80 library, will fail, following your argument:

_strcpy:
pop bc
pop de
pop hl
push hl
push de
push bc
push de
xor a, a
loop:
cp a, (hl)
ldi
jr NZ, loop
pop hl
ret

floppysoftware.es < NEW URL!!!
cpm-connections.blogspot.com.es

ronaldo

I'm trying to open a issue for SDCC developers, but sourceforge is having problems today and there is no way to access developer panels.

Alcoholics Anonymous

#3
Quote from: ronaldo on 21:39, 18 July 15
I'm trying to open a issue for SDCC developers, but sourceforge is having problems today and there is no way to access developer panels.

SF has been down for two days.

This is a bug and it's caused by the peephole optimizer.

With sdcc, if you compile by turning the peephole optimizer off (--no-peep) then the call will not be optimized into a jump.

In fact you can see it is peephole rule #114:


replace restart {
    call    %1
    ret
} by {
    jp    %1
    ; peephole 114 replaced call at end of function by jump (tail call optimization).
}


This rule is only going to be valid for functions that don't take parameters or for __z88dk_fastcall functions.  I think you can be clever about it and say if there is a push before the call then you cannot do this optimization.  I'm going to try to suss that out now.



EDIT:  And this is how I fixed it.  I replace that rule with two rules:


replace restart {
    push    %1
    call    %2
    ret
} by {
    push    %1
    call    %2
    __z88dk_poison__
    ret
    ; peephole 114a prevent tail code optimization for functions taking parameters
}

replace restart {
    call    %1
    ret
} by {
    jp    %1
    ; peephole 114b replaced call at end of function by jump (tail call optimization).
}



The first rule is found first and if there is a push prior to the call, indicating the function takes parameters, then it poisons the code with "__z88dk_poison__" to prevent the second rule from working.  If, however, there is no push then the second rule will do the jp optimization.

At the very end of the rules file I added this:


replace {
    __z88dk_poison__
} by {
    ; z88dk poison removal
}



which removes the poison line so you have valid assembler after the peephole step.


A test:


extern void func_callee(int a, int b) __z88dk_callee;
extern void func_fastcall(int a) __z88dk_fastcall;
extern void func(int a, int b);


void test1(void)
{
   func_callee(1, 2);
}

void test2(void)
{
   func_fastcall(1);
}

void test3(void)
{
   test1();
}

void test4(void)
{
   func(1, 2);
}



Asm output:



;    ---------------------------------
; Function test1
; ---------------------------------
_test1::
;tail.c:10: }
    ld    hl,#0x0002
    push    hl
    ld    l, #0x01
    push    hl
    call    _func_callee
    ret
;tail.c:13: {
;    ---------------------------------
; Function test2
; ---------------------------------
_test2::
;tail.c:15: }
    ld    hl,#0x0001
    jp    _func_fastcall
;tail.c:18: {
;    ---------------------------------
; Function test3
; ---------------------------------
_test3::
;tail.c:20: }
    jp    _test1
;tail.c:23: {
;    ---------------------------------
; Function test4
; ---------------------------------
_test4::
;tail.c:25: }
    ld    hl,#0x0002
    push    hl
    ld    l, #0x01
    push    hl
    call    _func
    pop    af
    pop    af
    ret


where you can see the jumps are retained for the fastcall and void functions but the other functions taking params still do call/ret.


ronaldo

Thank you very much, @Alcoholics Anonymous. I have implemented your solution and it works :).

I'm doing some tests to include it in next push to CPCtelera.

ronaldo

@Alcoholics Anonymous: This commit fixes the problem. Thank you very much for your help :D



Alcoholics Anonymous

#6
I'm glad to hear it works.  The bug should still be reported to sdcc along with suggested fix if you want to do that.  sf is *still* down (3 days now) and they are talking about heading into next week before more services are up so it may take a few days yet.

If you'd like to try for fun, here's a peephole rule set we're using in z88dk:
sdcc_peeph.3 - Google Drive

It's the sdcc rule set + some bugfixes having to do with static variables (and this jp optimization thing) + about 140 new rules trying to reduce code size.  It's really only a beginning as we need to examine more source code to make things better but many of the rules are trying to fix the access to static variables.  One of the first steps taken when optimizing z80 c code is moving from local to static variables where this is possible and sdcc is generating really bad code for that.  Sometimes using --reserve-regs-iy will also lead to better code so it's worthwhile to try that.  The ruleset is determined with --max-allocs-per-node200000 so it's intended to work on compiler output when optimization level is high.

Now the caveat:  sdcc's peephole optimizer is unable to determine live registers when z88dk_fastcall and z88dk_callee functions are present so it errs conservatively and considers DEHL always live.  This means many rules cannot be applied when they should be.  Instead we're ignoring live variables determination in many cases so it's possible to get buggy code out the other end.  But so far that hasn't been the case.  We deal with it by supplying more than one ruleset and having this one as optionally selectable on the command line ("-SO3").

If you try it let me know if you see any significant changes in code size and speed.  If anyone else has some real programs they'd like to possibly treat to more aggressive peephole optimization, I don't mind using those as test cases for coming up with more rules.

Alcoholics Anonymous

I had to make one more change to fix the bug properly.  The rule removing the poison at the end of the rules file is not in the best place.  Rules just prior to that are trying to change jp to jr but if that invalid opcode poison line is present, the rules cannot compute jump distance across it and will leave jps as they are.

As before rule 114 is replaced with:


replace restart {
    push    %1
    call    %2
    ret
} by {
    push    %1
    call    %2
    nop
    ret
    ; peephole 114a prevent tail code optimization for functions taking parameters
}

replace restart {
    call    %1
    ret
} by {
    jp    %1
    ; peephole 114b replaced call at end of function by jump (tail call optimization).
}


I've changed "__z88dk_poison__" to a nop which works just as well but is not as scary.

The poision removal is done just before rule 146 and the first barrier:


///////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////
// POISON REMOVE
///////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////

barrier

replace {
    nop
} by {
    ; poison removal
}

.... sdcc rule follows:

// These ex-generating rules should be among the last ones since ex counts as a read from both hl and de for notUsed().
barrier

replace restart {
    ld    d,h
    ld    e,l
} by {
    ; peephole 146 used ex to load hl into de.
    ex    de,hl
} if notUsed('hl')

...



ronaldo

Thank you very much, @Alcoholics Anonymous :D. I'm implementing it in CPCtelera ASAP.

Also, will open an issue to SDCC developers when SF becomes available again.

Alcoholics Anonymous

Quote from: ronaldo on 13:34, 21 July 15
Also, will open an issue to SDCC developers when SF becomes available again.

Ronaldo I hope you don't mind but I was reminded of a similar earlier reported issue and tacked it on there.  I gathered you had forgotten about it :)

https://sourceforge.net/p/sdcc/bugs/2374/

ronaldo

Yep, you are right. While the web was off, concentrated on other things and forgot to report the bug. My fault  :-[

Is @alvin your nick there? Thank you for remembering :)

ronaldo

@Alcoholics Anonymous: I think the bug has more cases not covered by the peephole, like this one:

    push    af
    inc    sp
    jp    _cpct_etm_drawTileBox2x4


The "inc sp" previous to the call prevents your peephole from matching, and the bug appears again. I have created a new peephole, mimicking your previous one:

replace restart {
    push    %1
    inc     %2
    call    %3
    ret
} by {
    push    %1
    inc     %2
    call    %3
    nop
    ret
    ; peephole 114c prevent tail code optimization for functions taking parameters, with last parameter being an 8-bits one.
}


This seems to have done the trick, but I'm not sure if that could cause lateral effects. Any idea about that?

FloppySoftware

Just a fast comment.

I had problems with INC SP and the interrupts with MESCC and the PCW under CP/M, causing some erratic and random strange behaviour when running the compiled programs.

That was due to corruption in the local variables and pushed arguments in the stack because the interrupt system can push and pop values in the stack.
floppysoftware.es < NEW URL!!!
cpm-connections.blogspot.com.es

ronaldo

#13
Yep, you have to be careful about using INC SP, as you are exposing a byte of the stack when you do it. Moving the SP 1 byte forward leaves the previous top byte of the stack out of it. If an interrupt happens just after the INC, PUSH operations are going to overwrite the byte you left out with INC SP. If that byte was valuable information, you will lose it.

In this case, INC SP has no effect, as previous instruction PUSHes AF, wanting to insert the value of A in the stack. So, INC SP is just removing F from the stack, which is unwanted information. It can be safely overwritten by PUSHes if an interrupt occurs previous to the CALL. :)

Alcoholics Anonymous

Quote from: ronaldo on 12:39, 13 August 15
@Alcoholics Anonymous: I think the bug has more cases not covered by the peephole, like this one:

    push    af
    inc    sp
    jp    _cpct_etm_drawTileBox2x4


Right, passing chars as parameter.  I'd forgotten about that because I changed all my functions to accept int instead of char to avoid problems.  This feature is there because of the 8051 which can have a very limited stack space available so saving bytes may make a difference.  On the z80 it's more a waste of cycles and there was some discussion some time back about whether to get rid of this feature.

Where it causes problems is in function pointer calls.   The difference is in vararg functions versus regular functions.  In vararg functions, char is promoted to int before being pushed onto the stack so that chars will occupy two bytes on the stack.  This means vararg function pointers and regular function pointers must be different types in sdcc and I haven't actually checked that to see if that's the case (it's a bug if it isn't the case).  Under z88dk there is only one function pointer type so you can call a vararg function or a regular function using the same function pointer and because of that we have to make sure chars occupy two bytes on the stack.

Once you change a parameter to int (from char), the code generated is different for the function.  sdcc is much better with generating code for char types so by changing things to int you do lose something in the code generation.  You could assign the int param to a local char before using it to get around that.

Anyway for library functions we decided to change all char params to int.  This makes no difference to the lib functions as they are written in assembler and this allows function pointers to lib functions to always work.  C code compiled by sdcc will use one byte for char params and we depend on sdcc to generate errors if you try to assign the address of a vararg function to a regular function pointer (and vice versa) for regular C code to avoid crashes.

It would be better if sdcc just used two bytes for char params in the z80's case at least.

Quote
The "inc sp" previous to the call prevents your peephole from matching, and the bug appears again. I have created a new peephole, mimicking your previous one:

replace restart {
    push    %1
    inc     %2
    call    %3
    ret
} by {
    push    %1
    inc     %2
    call    %3
    nop
    ret
    ; peephole 114c prevent tail code optimization for functions taking parameters, with last parameter being an 8-bits one.
}


This seems to have done the trick, but I'm not sure if that could cause lateral effects. Any idea about that?

I think it could have side effects.


The bug has been fixed within sdcc a different way:


replace restart {
    call    %1
    ret
} by {
    jp  %1
    ; peephole 114 replaced call at end of function by jump (tail call optimization).
} if symmParmStack(%1)


It's using a condition I didn't know existed.  It should work in the char case too.


There's another bug in the peephole rules that was fixed:
https://sourceforge.net/p/sdcc/bugs/2407/

so it's worthwhile to do an update on your install.

ronaldo

Thank you for your detailed  answer, @Alcoholics Anonymous. Just after using this temporal fix for the problem I went straight to sourceforge to notify the bug. I read the changelog and I found the fix you mention, which is indeed much better. I also saw the other fix about variable left shifting. I updated SDCC to the last version and now everything is working fine :).

I imagine that the problem with int vs char would have been a harsh debate. It's not immediately clear which position is better in terms of performance. In this case, I'd tend to set one of them by default but enable the user to change that behaviour with switches. I think is sufficiently important for advanced users to have the ability to decide by themselves.

Powered by SMFPacks Menu Editor Mod