[cpctelera 1.4][sdcc 3.5] how to use assembly code inside C-function

gryken · 23:12, 09 January 17

Hello,

I'm trying to find out how to embed assembly code, inside c function.

I've found A LOT examples :

for instance :
- http://www.cpcmania.com/Docs/Programming/Programming.htm
- http://cpcwiki.eu/index.php/SDCC_and_CPC
- http://sitedesteph.free.fr/cpc/lec/index.php?page=sdcc&cour=
- http://cpcrulez.fr/coding-crossdev_sdcc-01-developper_en_C_pour_CPC.htm

I would like to get c parameters, inside assembly code.
And send back, a return value.

I'm completely lost
- sometimes, they use stack
- sometimes, they use ix, directly

But, these examples seem to to be outdated for SDCC 3.5 (inside CPCTELERA 1.4)...

- Since CPCTELERA 1.4, there no need to use a custom crt0.s ? (neither putchar_s for printf, stdio ?)
- Is compilation option "--fno-omit-frame-pointer" the default one ? mandatory one ?
- When i tried to handle input parameters with (ix), the program crash every time,
- Is return value using HL register ?

Thanks for your help

Code example :

Code Select



#include <cpctelera.h>
#include <stdio.h>

/* function prototypes  */
void choisir_mode(unsigned char mode);
void wait_key(void);

/* end prototypes */

const unsigned char message1[] = "current_mode : %d";

int main(void) {
    unsigned char m;
  /* loop with 3 graphics mode (from 0 et 2) */
  for (m=0;m<=2;m++)
  {
  
  choisir_mode(m); 

   printf(message1,m);

   wait_key();
  
  }
   
   return(0);
}


/*-----------------------------------------------------------------------*/

/*   attente pression touche            */
/* no entry param ; no return param */
void wait_key(void)
{
  __asm
   call 0xBB18
  __endasm;
}

/*-----------------------------------------------------------------------*/

/*   one parameter  (with stack SP)   (not ix) */ 
/* no return value */
void choisir_mode(unsigned char mode)   /* firmware SCR_SET_MODE */
{
  
  __asm
    ld    hl,#2
    add    hl,sp
    ld    a,(hl)
    call    #0xBC0E   ; SCR_SET_MODE
  __endasm;
    
}

SRS · 15:39, 10 January 17

cpcmania is a real good source to learn ASM in SDCC

You need to remeber SDCC ASM is not MAXAM style, so:

CALL 0XBB18 has top be CALL #0xBB1

Parameters are given at (IX) with SDCC,

so for your example

Code Select

void choisir_mode(unsigned char mode)   /* firmware SCR_SET_MODE */
{
__asm

ld a, 4 (IX) ; gets the byte from "mode"
call    #0xBC0E   ; SCR_SET_MODE
  __endasm;
}

Your program crashes imho from changing the Stackpointer in your mode-routine.

You can only use PRINTF from cpctelera programs (and your other two examples) if you let firmware enabled. with fimrware enabled, a lot of the cpctelera inside functions won't work.

ronaldo · 18:06, 10 January 17

Hi @gryken ,

Many questions in your post, and not easy answers for all of them. I'll try to give you some clues for you to continue learning. Let's give it a go,

Quote from: gryken on 23:12, 09 January 17
I would like to get c parameters, inside assembly code.
And send back, a return value.

Communication between function callers and callees is performed by "calling conventions". You may look for this in google and will find many interesting references. These conventions are sets of strict rules on where and how to put/retrieve parameters.

Most standard one is to put parameters on the stack. Before the call is performed, the caller function pushes parameters on to the stack in reverse order. Then, when the call is performed, the return address is also pushed to the top of the stack. Then, parameters are available there for the callee. The callee may retrieve them they way it wants, provided it follows these rules: 1) it must leave the stack pointer at the same location it was, and 2) the return address must be at the top of the stack.

Then, after returning from the function, the caller will automatically pop the parameters from the stack again, leaving it as it was previous to the call. Think of it for a moment: if you leave the stack pointer in a different place, this would have exploding consequences: all the local variables and return addresses are kept in the stack. If it gets changed, variables will get unexpected values and return addresses could point to any random place in memory. This is way is so important to follow the rules of the "calling convention" strictly.

With respect to the return value, if you are returning a byte, it must be held in L register; if it is a word, in HL and, if it is a double-word, in the pair DE:HL.

Then, to retrieve parameters from the stack, there are 2 main methods: 1) popping them and then pushing again to leave the stack unchanged, 2) Pointing to the stack with any index register (HL, IX or IY) and copying the values to registers. Just one advice: if you use IX or IY in your own assembly routines be sure to save and restore their values. SDCC uses IX and IY to access local variables and won't notice it changed, leading your program to undefined behaviour.

Quote from: gryken on 23:12, 09 January 17
I'm completely lost
- sometimes, they use stack
- sometimes, they use ix, directly

But, these examples seem to to be outdated for SDCC 3.5 (inside CPCTELERA 1.4)...

In fact, CPCMania tutorials and others are outdated. In old versions of the SDCC compiler, IX was synchronized with SP on every call. That let programmers get parameters by simple accessing the stack using IX as index. This was deprecated and removed because it was unnecesarily CPU consuming. Now, if you want IX to point to the stack, you can make it on your own, at the start of your functions.

Quote from: gryken on 23:12, 09 January 17
- Since CPCTELERA 1.4, there no need to use a custom crt0.s ? (neither putchar_s for printf, stdio ?)

All these things are part of what CPCtelera automatizes. The main idea behind CPCtelera is to save time to the developer: no need to continue doing "bureaucratic" stuff when all you want is to program. You may do these things if you really need them, but is automatized for you in the 99% of the cases. You just create your project, write your code and compile.

Also, CPCtelera has its own version of putchar, so printf works in CPCtelera. If you navigate the examples folder, you'll find many of them. You can also find examples written in assembler.

Quote from: gryken on 23:12, 09 January 17
- Is compilation option "--fno-omit-frame-pointer" the default one ? mandatory one ?
- When i tried to handle input parameters with (ix), the program crash every time,
- Is return value using HL register ?

This option is deprecated and its use should be avoided at all costs. Its implementation under SDCC has bugs that won't be fixed (because it's deprecated) and it caused generated code to be bigger and slower. It was used by some developers for comfort: it caused SDCC not to use IX register for local variables, so that it could be used to get stack parameters without the need for saving/restoring it. That looks like a gain, but it was instead a big loss, as the cost of having worse code generated was much much greater than the comfort gained.

This and previous comments explain why your code crashes when you try to use IX to get parameters. It won't be pointing to the stack unless you make it point there by yourself.

As previously said, yes, L, HL and DE:HL are used for returning values, depending on its type.

With respect to writing inline assembly code, I personally recommend you not to do it. Using CPCtelera it will be much more comfortable for you to write your assembly code on separate assembly .s files. CPCtelera will automatically compile .s files under your src/ folder. So, if you wanted to create a simple function like in your example, you may do it this way.

main.c:

Code Select


#include <cpctelera.h>
#include <stdio.h>

// Please, prefer always single-line comments to multi-line

// function prototypes
// ( Required for the compiler to know these functions exists.
//   As they are defined in other file, the linker will link 
//   them after compilation ends )
//
void choisir_mode(u8 mode);  // u8 = unsigned char in CPCtelera
void wait_key(void);

// Global constant ( This works well because it is a constant. 
//   Remember that global variables won't get initialized
//   because of SDCC's embedded behaviour )
//
const unsigned char message1[] = "current_mode : %d";

int main(void) {
   // CPCtelera typedefs are more comfortable to 
   // work with the exact byte-sizing you desire
   u8 m;    // u8 stands for unsigned 8-bits. Same as unsigned char m;
  
   // loop with 3 graphics mode (from 0 et 2)
   for (m=0;m<=2;m++) {
      choisir_mode(m);
      
      // You don't actually need message1 to be declared constant
      // You may write it directly inside printf. Same effect.
      printf(message1,m);  
      
      wait_key();
   }
   
   // Parentheses are not required here
   return(0);
}

firmware_funcs.s:

Code Select


;;
;; Choisir_mode function
;; input: 1 parameter (mode) 8-bits
;; output: nothing
;;
choisir_mode::       ;; 2 colons make it a global symbol
   ld    hl,#2     ;; /  Make HL point to SP+2 
   add   hl,sp     ;; \  (as the first 2 bytes are the return address)
   
   ld    a,(hl)    ;; A = mode to be set (got from the stack)
   call  #0xBC0E   ;; Firmware function SCR_SET_MODE

   ret             ;; Return

;;
;; WaitKey function
;; input:  nothing
;; output: nothing
;;
wait_key::
   call  #0xBB18   ;; Call firmware function.

   ret             ;; Return

If you compile these two files under your CPCtelera project, it should work (disclaimer: untested code. Didn't compile it myself).

Last, but not least, CPCtelera is free software. All library functions are written in assembler and you may read sources. All CPCtelera sources have been written with tons of comments to help those wanting to learn from them. You may find the code interesting for you. The easiest way to navigate and read its code is to read it directly in Github.

SRS · 18:24, 10 January 17

ah well - my knowledge is deprecated - sigh

One hint ...

wait_key::
call #0xBB18 ;; Call firmware function.

ret ;; Return

Faster and smaller: JP #0xBB18 and no RET

Docent · 00:14, 11 January 17

Quote from: gryken on 23:12, 09 January 17
Hello,

I'm trying to find out how to embed assembly code, inside c function.

I've found A LOT examples :

for instance :
- http://www.cpcmania.com/Docs/Programming/Programming.htm
- http://cpcwiki.eu/index.php/SDCC_and_CPC
- http://sitedesteph.free.fr/cpc/lec/index.php?page=sdcc&cour=
- http://cpcrulez.fr/coding-crossdev_sdcc-01-developper_en_C_pour_CPC.htm

I would like to get c parameters, inside assembly code.
And send back, a return value.

I'm completely lost
- sometimes, they use stack
- sometimes, they use ix, directly

But, these examples seem to to be outdated for SDCC 3.5 (inside CPCTELERA 1.4)...

- Since CPCTELERA 1.4, there no need to use a custom crt0.s ? (neither putchar_s for printf, stdio ?)
- Is compilation option "--fno-omit-frame-pointer" the default one ? mandatory one ?
- When i tried to handle input parameters with (ix), the program crash every time,
- Is return value using HL register ?

Thanks for your help

SDCC supports a number of calling convention methods for parameters passing.
Basically, you have four options:
Default calling convention
All parameters are passed on the stack, in right-to-left order (the last parameter is first put on the stack etc).
Return values are passed in registers. 8-bit return values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.

smallc calling convention
Forced by putting keyword __smallc after function declaration
All parameters are passed on the stack, in left-to-right order.
Return values are passed in registers. 8-bit return values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.

z88dk fastcall calling convention
Forced by putting keyword __z88dk_fastcall after function declaration.
Supports only one parameter of max 32 bits, passed in registers.
8-bit values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.
Return values are passed in registers. 8-bit return values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.

z88dk callee convention
Forced by keyword __z88dk_callee after function declaration.
Parameters are passed on the stack, but the stack is not adjusted for the parameters after the call. Can be combined with other keywords, for eg.__smallc.
Return values are passed in registers. 8-bit return values should be passed in L, 16-bit values in HL, 32-bit values in DEHL.

Except default option, all other calling conventions are implemented to support existing libraries or code.

SDCC uses IX register as a frame pointer. Frame pointer is a pointer to space for function's local variables, allocated on the stack. After storing used registers on stack, pushing arguments on stack and setting frame pointer, SDCC will call a function and then restore stack pointer and saved registers.

This behavior can be adjusted with compile options listed below or function attributes like __z88dk_callee.

--callee-saves-bc compile option tells the compiler not to save BC on the stack before calling a function. If you do not use BC in your asm function, it can save time and size required to store and restore BC from the stack for each function call.

--fomit-frame-pointer compile option will cause that frame pointer will be omitted when the function uses no local variables. As per SDCC documentation, for z80 code generator if this option is used, frame pointer will be omitted for all functions.

--fno-omit-frame-pointer will never omit the frame pointer, ie. frame pointer will be always set up for each function even without local variables. This will generate some overhead in prologue and epilogue of each function, but guarantee that IX will always point to frame pointer and local vars can be accessed via IX+n addressing.

This also explains why you saw both stack and IX methods of referencing function parameters in various examples - some of these examples used --fno-omit-frame-pointer and accessed parameters through IX register, while others just popped them from the stack.

BTW: If you want full control of your asm function, you can use the __naked function attribute. It prevents the compiler from generating prologue and epilogue code for that function.
This gives you full control of the function code, but you are fully responsible for saving any registers that may need to be preserved, not forgetting of returning via ret

In case of call BC0E - AF, BC, DE and HL are destroyed, so you should preserve them just to be on the safe side.

Sykobee (Briggsy) · 21:43, 11 January 17

Are there some performance metrics for the different types?

I presume that _naked would be like writing the CPC firmware functions which do the same? Low-level, lots of embedded assembler, not-portable, C for structure, type coding?

gryken · 20:23, 12 January 17

Hi,

Thanks everybody, for all these precious answers !!

Docent · 04:36, 13 January 17

Quote from: Sykobee (Briggsy) on 21:43, 11 January 17
Are there some performance metrics for the different types?

I presume that _naked would be like writing the CPC firmware functions which do the same? Low-level, lots of embedded assembler, not-portable, C for structure, type coding?

Using __naked will be always nonportable, because there is no guarantee that other compiler supports it, and because it tells SDCC compiler to skip any function assembly prologue and epilogue it usually adds to each function. It basically forces you to provide the body of such function in assembly.
Obviously, using __naked will be faster because it will eliminate additional code added by the compiler.
--fno-omit-frame-pointer adds additional call at the beginning of each function to setup ix register and a pop ix at the end.
Speedwise, the combination of __naked and __z88dk_fastcall for functions with one parameter and __z88dk_callee for others, together with --fomit-frame-pointer and (if bc is not used) --callee-saves-bc will give you probably the best results.

ronaldo · 13:35, 15 January 17

Quote from: Docent on 04:36, 13 January 17
Obviously, using __naked will be faster because it will eliminate additional code added by the compiler.
--fno-omit-frame-pointer adds additional call at the beginning of each function to setup ix register and a pop ix at the end.
Speedwise, the combination of __naked and __z88dk_fastcall for functions with one parameter and __z88dk_callee for others, together with --fomit-frame-pointer and (if bc is not used) --callee-saves-bc will give you probably the best results.

As I said before, --fomit-frame-pointer and --fno-omit-frame-pointer are deprecated for Z80, have bugs and are nt supported anymore by SDCC developers since 3.0. They are also not recommended, as the overall code that SDCC generates with them is worse (see this analysis).

The best recommendation for including asm functions in a CPCtelera project is writting them in their own .s file (not using __naked inline functions) and using the best calling convention for the parameters being passed (__z88dk_fastcall or __z88dk_callee are usually better than standard call, but make sure to understand calling conventions well before using them). No need to use any other hint to the compiler, as the compiler won't assume anything about your function. It will just save what it is using, set up the call and make it. Better not to use --callee-saves-bc, as it forces callees to always save BC, even when it's not required. SDCC will take care of BC for you only when required, same as other standard registers, if you don't pass this option to the compiler.

Then, if you make use of IX or IY and change them in your functions, be sure to save them to prevent side-effects, as SDCC may use them for accessing parameters on the stack. That's better approach than asking SDCC not to use them or set up them for you.

SRS · 21:05, 16 January 17

Is using Z88DK (and its enhanced libraries or optimized SDCC version) planned for version 1.5 ? seems to generate faster / better code nowadays.

Docent · 01:58, 17 January 17

Quote from: ronaldo on 13:35, 15 January 17
As I said before, --fomit-frame-pointer and --fno-omit-frame-pointer are deprecated for Z80, have bugs and are nt supported anymore by SDCC developers since 3.0.

Can you provide the source of this information? I did not find any information about --fomit-frame-pointer and --fno-omit-frame-pointer being deprecated anywhere, nor in sdcc documentation, wiki, mailing lists and forums nor the source code for z80 code gen.
--fno-omit-frame-pointer couldn't be unsupported since version 3.0, because it was introduced in 2012 in version 3.2 of SDCC (see this source commit: https://sourceforge.net/p/sdcc/code/8058/

Quote from: ronaldo on 13:35, 15 January 17
They are also not recommended, as the overall code that SDCC generates with them is worse (see this analysis).

Your results contradict with my experience with SDCC so far. First of all, you should get similar code speed results from compiling sdcc with --f-omit-frame-pointer and without it, because z80 codegen of SDCC tries to generate function frame pointer only if needed (or if forced by option --fno-omit-frame-pointer) and in most cases generates the same code with --f-omit-frame-pointer or without it.
The option -fno-omit-frame-pointer generates an additional call to set frame pointer in ix in function prologue and additional pop ix in function epilogue, making in simple functions a significant overhead of even 10-20%. as your results for ofp compared to default are worse about similar percent, perhaps you measured the code generated with --no-omit-frame-pointer?

Quote from: ronaldo on 13:35, 15 January 17
The best recommendation for including asm functions in a CPCtelera project is writting them in their own .s file (not using __naked inline functions) and using the best calling convention for the parameters being passed (__z88dk_fastcall or __z88dk_callee are usually better than standard call, but make sure to understand calling conventions well before using them). No need to use any other hint to the compiler, as the compiler won't assume anything about your function.

Actually it is more complex than that, it analyses which registers are in use in current function and can be used in its sub functions; which are required to store result from function and allocates/saves them accordingly.
So its important to provide correct definition for return values in function header. For example, if you return 32bit result in hl & de, but declare the return value being only 16 bits, code gen may assume that de is free to use and doesn't need saving before calling such function. This of course can lead to hard to track crashes.
Btw: Keeping assembler functions in separate asm files may have for some people one disadvantage - such files wont be optimized by peep hole optimizer because it is done in the code gen and not in assembler or linker.

Quote from: ronaldo on 13:35, 15 January 17
Better not to use --callee-saves-bc, as it forces callees to always save BC, even when it's not required. SDCC will take care of BC for you only when required, same as other standard registers, if you don't pass this option to the compiler.

You are right, using this option has no effect in context of __naked function I was talking about.

ronaldo · 12:07, 17 January 17

Quote from: SRS on 21:05, 16 January 17
Is using Z88DK (and its enhanced libraries or optimized SDCC version) planned for version 1.5 ? seems to generate faster / better code nowadays.

No, it's not planned. That would represent a whole redesign of almost everything. There should be a really strong reason for that. Being under the impresion that Z88DK generates better code at the present moment is not enough to justify such a redesign.

However, we are closely following the progress of Z88DK, as it actually is doing great progresses and its a great source for inspiration. We have it always in our radar, in any case.

ronaldo · 13:05, 17 January 17

Quote from: Docent on 01:58, 17 January 17
Can you provide the source of this information? I did not find any information about --fomit-frame-pointer and --fno-omit-frame-pointer being deprecated anywhere, nor in sdcc documentation, wiki, mailing lists and forums nor the source code for z80 code gen.
--fno-omit-frame-pointer couldn't be unsupported since version 3.0, because it was introduced in 2012 in version 3.2 of SDCC (see this source commit: https://sourceforge.net/p/sdcc/code/8058/

Yes, of course. It was told to us by @Alcoholics Anonymous while discussing a code that triggered one of the bugs this modifier has. We were talking about --fomit-frame-pointer and --old-ralloc.

Quote from: Alcoholics Anonymous
The omit-frame-pointer and especially old-ralloc stuff is not supported by sdcc anymore. There are bugs there that will probably never be fixed. Maybe if a code snippet can be made to reproduce the omit-frame-pointer problem they'll accept it. It's better to change the library code to save ix so that omit-frame-pointer doesn't have to be used.

Then that continued with me sending the bug to SDCC developers and chatting with them, confirming these words. I don't have a reference to the chat conversation I can find.

Quote from: Docent on 01:58, 17 January 17
Your results contradict with my experience with SDCC so far. First of all, you should get similar code speed results from compiling sdcc with --f-omit-frame-pointer and without it, because z80 codegen of SDCC tries to generate function frame pointer only if needed (or if forced by option --fno-omit-frame-pointer) and in most cases generates the same code with --f-omit-frame-pointer or without it.
The option -fno-omit-frame-pointer generates an additional call to set frame pointer in ix in function prologue and additional pop ix in function epilogue, making in simple functions a significant overhead of even 10-20%. as your results for ofp compared to default are worse about similar percent, perhaps you measured the code generated with --no-omit-frame-pointer?

That's why we do experiments, because measures are the real values we want, not just overall experience. However, I personally did this experiments because my overall experience with --fomit-frame-pointer wasn't good. Generated code got bigger because SDCC stopped using IX to access local variables (stops using it as frame pointer). Then it relied on HL and generally did many more operations to access arrays, structures and even local variables when several of them needed to be used in calculations. As that was my experience, I set up experiments to test it on real use cases, along with other modifiers to see which were optimal. Results are quite clear, as -fomit-frame-pointer loses on almost all cases. Also, the worst part is for its bigger binary size. There might be particular code constructs and uses that improve code generation with -fomit-frame-pointer, but the important part is this behaviour is general or particular. Moreover, having bugs as it has, it has many drawbacks to justify its general use currently. I'd keep it only for particular situations, knowing well what to expect.

There is no mistake: option being used was -fomit-frame-pointer. You can repeat the tests by yourself. Even better, I suggest you prepare new test sets and make comparisons. That will be useful for all of us, as more tests and situations will give better understanding.

Quote from: Docent on 01:58, 17 January 17
Actually it is more complex than that, it analyses which registers are in use in current function and can be used in its sub functions; which are required to store result from function and allocates/saves them accordingly.
So its important to provide correct definition for return values in function header. For example, if you return 32bit result in hl & de, but declare the return value being only 16 bits, code gen may assume that de is free to use and doesn't need saving before calling such function. This of course can lead to hard to track crashes.

This does not apply to your own assembly code. When calling a function you implement in assembly, SDCC saves the registers it is using, no matter what registers does your function use. You can easily check it, just compile this code:

Code Select


#include <stdio.h>
unsigned char test() __naked {
   __asm
      ld hl, #0xFF
      ret
   __endasm;
}
void main(void) {
   unsigned char  i, j;
   
   // Loop forever
   while (1) {
      for (i=0; i < 10; i++) {
         for (j=0; j < 20; j++) {
            test();
            printf("i: %d, j: %d\n\r", i, j);
         }
      }
   }
}

SDCC generates this code to call test function:

Code Select


00106$:
;src/main.c:36: test();
    push    bc
    call    _test
    pop    bc

It saves BC in the stack, as it is using B and C for i, j variables and does not make any assumptions on your assembly code. If it analyzed your assembly, it will know that BC is not used and there is no need for saving it.

Quote from: Docent on 01:58, 17 January 17
Btw: Keeping assembler functions in separate asm files may have for some people one disadvantage - such files wont be optimized by peep hole optimizer because it is done in the code gen and not in assembler or linker.

That's not a disadvantage, that's normal behaviour to be expected. You normally don't want any kind of optimization to mess up your code. When you program in asm is because you want to do it by yourself, so there is no reason to have peephole optimizations changing your code. What would you think if you go on debugging your code and find it has been changed?

However, you can easily configure the building for peephole optimizations to be appyied to your assembly code, if that's what you wanted. There is no need to write it inline.

News:

[cpctelera 1.4][sdcc 3.5] how to use assembly code inside C-function