News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu

this in asm : ((ny / 8) * 80) + ((ny % 8) * 2048) + (nx / 2)+ #c000

Started by funkheld, 22:05, 16 July 20

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

funkheld



Hi good afternoon.

how can you write that for the cpc in asm please?
nx and ny are integers (2 bytes).
thank you.
greeting
((ny / 8 * 80) + ((ny % 8 * 2048) + (nx / 2)+ #c000

ervin

Whenever I need to see how something in C would look in asm, I check the obj folder of the project.
There should be an asm file generated.
For example, if you have main.c, you would get main.asm in the obj folder.
Inside that you can see what the generated asm looks like, and then work with that.

roudoudou

Quote from: ervin on 08:19, 17 July 20
Inside that you can see what the generated asm looks like, and then work with that.
bad idea for this because there are many optimisations a compiler will not see
(ny/8)*80 => easier to raz 3 first bits and multiply by 10
(ny%8)*2048+#C000 => easier to do an AND 7, multiply by 3 then ADD #C0 + previous 8 upper bits results
then adding nx/2 => many opts depend on the software context (signed, unsigned, limits, ...)

My pronouns are RASM and ACE

ervin

Quote from: roudoudou on 08:50, 17 July 20
bad idea for this because there are many optimisations a compiler will not see
(ny/8)*80 => easier to raz 3 first bits and multiply by 10
(ny%8)*2048+#C000 => easier to do an AND 7, multiply by 3 then ADD #C0 + previous 8 upper bits results
then adding nx/2 => many opts depend on the software context (signed, unsigned, limits, ...)


Yes that's a good point.

roudoudou

this website allow you to choose many compiler and see what it does
switch C++ to C, choose SDCC compiler 4.0.0
then create a C function
int adr(int nx, int ny) {           

return  ((ny / * 80) + ((ny % * 2048) + (nx / 2)+ 0xc000;
}
The outputed coded is really really bad (huge, slow and even a CALL to MODULO function...SDCC is not enough clever to simplify MOD 8 by AND 7
without verification, i will do something like this instead
ld hl,(ny)
ld a,l : and %11111000 : ld l,a
ld b,h : ld c,l
add hl,hl
add hl,hl
add hl,bc
add hl,hl
;
ld a,(ny) : and 7 : add a : add a : add a : add h : add #C0 : ld h,a
;
ld de,(nx) : srl d : rr e : add hl,de


My pronouns are RASM and ACE

roudoudou

oups, forgot to mention the website with live compilation => https://gcc.godbolt.org/
and the "-mz80" option to add in order to produce Z80 code in the right window
My pronouns are RASM and ACE

roudoudou

My pronouns are RASM and ACE

GUNHED

Quote from: funkheld on 22:05, 16 July 20
((ny / 8 * 80) + ((ny % 8 * 2048) + (nx / 2)+ #c000


The same, but more simple...


((ny * 10) + ((ny * 256) + (nx / 2)+ #c000


Do you really need 16 Bit? If ny is 16 Bit then the result would need 3 bytes.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

roudoudou

Quote from: GUNHED on 13:46, 17 July 20

The same, but more simple...


((ny * 10) + ((ny * 256) + (nx / 2)+ #c000


Do you really need 16 Bit? If ny is 16 Bit then the result would need 3 bytes.

No, no, no, no, your formula is completly wrong
My pronouns are RASM and ACE

funkheld

ervin;
sdcc has no asm solution;

greeting



#include <stdio.h>
#include <stdlib.h>


void main()
{
   unsigned char *pScreen = (unsigned char *)0xC000;
   unsigned int nx;
  unsigned int ny;
  unsigned int z;
 
  nx=100;
  ny=120;
  z=255;
 
  pScreen[((ny / * 80) + ((ny % * 2048) + (nx / 2)] =z;
}



sdcc-asm:


;--------------------------------------------------------
; File Created by SDCC : free open source ANSI-C Compiler
; Version 4.0.2 #11722 (MINGW32)
;--------------------------------------------------------
   .module scradr
   .optsdcc -mz80
   
;--------------------------------------------------------
; Public variables in this module
;--------------------------------------------------------
   .globl _main
;--------------------------------------------------------
; special function registers
;--------------------------------------------------------
;--------------------------------------------------------
; ram data
;--------------------------------------------------------
   .area _DATA
;--------------------------------------------------------
; ram data
;--------------------------------------------------------
   .area _INITIALIZED
;--------------------------------------------------------
; absolute external ram data
;--------------------------------------------------------
   .area _DABS (ABS)
;--------------------------------------------------------
; global & static initialisations
;--------------------------------------------------------
   .area _HOME
   .area _GSINIT
   .area _GSFINAL
   .area _GSINIT
;--------------------------------------------------------
; Home
;--------------------------------------------------------
   .area _HOME
   .area _HOME
;--------------------------------------------------------
; code
;--------------------------------------------------------
   .area _CODE
;scradr.c:4: void main()
;   ---------------------------------
; Function main
; ---------------------------------
_main::
   push   ix
;scradr.c:15: pScreen[((ny / * 80) + ((ny % * 2048) + (nx / 2)] =z;
   ld   hl, #0xc4e2
   ld   (hl), #0xff
;scradr.c:16: }
   pop   ix
   ret
   .area _CODE
   .area _INITIALIZER
   .area _CABS (ABS)


roudoudou

Quote from: funkheld on 14:26, 17 July 20
ervin;
sdcc has no asm solution;

obviously you force nx and ny values in the function so SDCC forced the result, you must have this values as parameters...
see up in this thread, there is the SDCC output code
My pronouns are RASM and ACE

GUNHED

Quote from: roudoudou on 13:48, 17 July 20
No, no, no, no, your formula is completly wrong
Yours too, the "/8" is missing. You only entered "/".
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

funkheld

hello, why is the "compiler explorer" for win10 with sdcc?


thanks

roudoudou

Quote from: GUNHED on 14:37, 17 July 20
Yours too, the "/8" is missing. You only entered "/".
you miss the point
(var/8)*80 is very different from var*10
i let you try with var=1 and var=2


My pronouns are RASM and ACE

ronaldo

Quote from: roudoudou on 08:59, 17 July 20
The outputed coded is really really bad (huge, slow and even a CALL to MODULO function...SDCC is not enough clever to simplify MOD 8 by AND 7
without verification, i will do something like this instead
Before openly criticizing the compiler, you may want to pay attention to what you do carefully and consider that you are programming in C, not in ASM. Therefore, there are rules to consider, like using proper data types for the compiler to know what you want to do. If you pass signed integers to the compiler, you cannot expect it to use AND 7 for MOD 8, because that's wrong. You can only do that optimization if your data is unsigned. In fact, if you give proper data types, and use proper flags, results are not far from optimal in this case: https://gcc.godbolt.org/z/r1EM94. SDCC is not the best compiler in the world, but its not that bad either.


For this particular formula, you have an explained&commented implementation here: https://github.com/lronaldo/cpctelera/blob/development/cpctelera/src/video/cpct_getScreenPtr.asm. Only difference is that it does not divide nx by 2, because it works on bytes, not on pixels.

HAL6128

Isn't there a firmware instruction (&BC1D) available to calculate this? Maybe it's interesting to go through the source code?
...proudly supported Schnapps Demo, Pentomino and NQ-Music-Disc with GFX

roudoudou

Quote from: ronaldo on 17:28, 17 July 20
Before openly criticizing the compiler
this afternoon i made a stream about this function and compilation where i modified the C function to let SDCC optimize the best he can (unsigned char or even a table for lines)

i'm afraid there is no miracle  ;D
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
My pronouns are RASM and ACE

funkheld

hello , this write in ccz80.
is ok with tabelle.

the table(array word list) is a y-table from the left.

these are always 8 bytes below each other:
#C000,
#C800,
#D000,
#D800,
#E000,
#E800,
#F000,
#F800,

that's the next 8 bytes below each other
#C050,
#C850,
#D050,
#D850,
#E050,
#E850,
#F050,
#F850,

this is then from x 0 to 79 :
#C000  to #C04F
#C800 to #C84F........


include "cpc6128.ccz";

word p,py,pxy,x,y;
byte z;

array word list = {
       #C000, #C800, #D000, #D800, #E000, #E800, #F000, #F800,
      #C050, #C850, #D050, #D850, #E050, #E850, #F050, #F850,
      #C0A0, #C8A0, #D0A0, #D8A0, #E0A0, #E8A0, #F0A0, #F8A0,
      #C0F0, #C8F0, #D0F0, #D8F0, #E0F0, #E8F0, #F0F0, #F8F0,
      #C140, #C940, #D140, #D940, #E140, #E940, #F140, #F940,
      #C190, #C990, #D190, #D990, #E190, #E990, #F190, #F990,
      #C1E0, #C9E0, #D1E0, #D9E0, #E1E0, #E9E0, #F1E0, #F9E0,

      #C230, #CA3      #C280, #CA80, #D280, #DA80, #E280, #EA80, #F280, #FA80,
      #C2D0, #CAD0, #D2D0, #DAD0, #E2D0, #EAD0, #F2D0, #FAD0,
      #C320, #CB20, #D320, #DB20, #E320, #EB20, #F320, #FB20,
      #C370, #CB70, #D370, #DB70, #E370, #EB70, #F370, #FB70,
      #C3C0, #CBC0, #D3C0, #DBC0, #E3C0, #EBC0, #F3C0, #FBC0,
      #C410, #CC10, #D410, #DC10, #E410, #EC10, #F410, #FC10,
      #C460, #CC60, #D460, #DC60, #E460, #EC60, #F460, #FC60,
      #C4B0, #CCB0, #D4B0, #DCB0, #E4B0, #ECB0, #F4B0, #FCB0,
      #C500, #CD00, #D500, #DD00, #E500, #ED00, #F500, #FD00,
      #C550, #CD50, #D550, #DD50, #E550, #ED50, #F550, #FD50,
      #C5A0, #CDA0, #D5A0, #DDA0, #E5A0, #EDA0, #F5A0, #FDA0,
      #C5F0, #CDF0, #D5F0, #DDF0, #E5F0, #EDF0, #F5F0, #FDF0,
      #C640, #CE40, #D640, #DE40, #E640, #EE40, #F640, #FE40,
      #C690, #CE90, #D690, #DE90, #E690, #EE90, #F690, #FE90,
      #C6E0, #CEE0, #D6E0, #DEE0, #E6E0, #EEE0, #F6E0, #FEE0,
      #C730, #CF30, #D730, #DF30, #E730, #EF30, #F730, #FF30,
      #C780, #CF80, #D780, #DF80, #E780, #EF80, #F780, #FF80 
      };

mode(2);
cls();

p=list;

x=65;
y=199;


for (y = 0; y <= 199; y=y+1)
{
  py=p+y*2;
  for (x = 0; x <= 79; x=x+1)
  {
    pxy=**py+x;
   *pxy=129;
  }
}
 
while (1);

AMSDOS

Quote from: HAL 6128 on 19:20, 17 July 20
Isn't there a firmware instruction (&BC1D) available to calculate this? Maybe it's interesting to go through the source code?


It does in it's own way, for example in MODE 1 the X-Coordinate ranges from 0 to 318, in MODE 0 this becomes 0 to 158 and the Y-Coordinate ranges from 0 to 199. The Memory Address is returned in HL, but C contains a bit mask for this pixel - I don't know what that means?  :(  and b contains the number of pixels stored in a byte -1.


Another way of using Firmware is to base the graphical routine on &C000 and use &BC26  SCR NEXT LINE to calculate the value of the next line.


EDIT:On my quest to understand the bit mask, it relates to how each mode handles each pixel in a byte, so in MODE 0 the byte mask is &AA - &x10101010, &88 in MODE 1 (&x10001000) and &08 in MODE 2 (&x00001000). So it could be used to blank certain inner pixels within a Byte.
* Using the old Amstrad Languages :D   * with the Firmware :P
* I also like to problem solve code in BASIC :)   * And type-in Type-Ins! :D

Home Computing Weekly Programs
Popular Computing Weekly Programs
Your Computer Programs
Updated Other Program Links on Profile Page (Update April 16/15 phew!)
Programs for Turbo Pascal 3

funkheld


Which "C" do you recommend for the CPC?
SDCC or z88dk?

thank you.
greeting

ronaldo

Quote from: roudoudou on 20:13, 17 July 20
this afternoon i made a stream about this function and compilation where i modified the C function to let SDCC optimize the best he can (unsigned char or even a table for lines)

i'm afraid there is no miracle  ;D
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
Next time you can simply click on my link and see the miracle (https://gcc.godbolt.org/z/r1EM94). It stops using modint and uses AND 7, as expected. Passing unsigned ints makes generated code nearly optimal. And that is without aggressive optimizations. Of course, human-coded assembly always has an advantage if coded well. But SDCC is doing what you are asking it to do: if you use X data type it has to respect what that data type means.

Quote from: roudoudou on 20:13, 17 July 20
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
What do you exactly mean by inline function in this context? Are you referring to using the keyword "inline" in C to hint the compiler on embedding the code instead of making a call? If that is what you want, you only need to type-in "inline" in front of the function declaration in the header file.


Quote from: funkheld on 08:27, 18 July 20Which "C" do you recommend for the CPC?
SDCC or z88dk?
Both are very good. However, none of them is near to modern compilers as GCC o CLANG, but they do a great job. For a single project I personally think that I would choose z88dk nowadays. However, both of them have advantages and disadvantages and fluctuate over time on the long term. Just take into account that they even have developers collaborating on both compilers, and they share common knowledge. Differences are not relevant on many cases (it mostly depends on what you want to do). In any case, I think I would start with z88dk for single projects nowadays.

roudoudou

Quote from: ronaldo on 14:59, 18 July 20
Next time you can simply click on my link and see the miracle
i do not see significant difference on your link. The formula really need to be changed to see a small improvement
about inline, yes i was talking about inline prefix because with an optimised function, the management of the stack and parameters is barely the same size as the function code size  :o
the best improvement occurs when replacing  (ny /8) * 80 with (ny & 0xF8) * 10
My pronouns are RASM and ACE

ronaldo

Quote from: roudoudou on 15:14, 18 July 20
i do not see significant difference on your link. The formula really need to be changed to see a small improvement
about inline, yes i was talking about inline prefix because with an optimised function, the management of the stack and parameters is barely the same size as the function code size  :o
Well, I see a great deal of difference between these 2 versions. Left one receives 2 unsigned ints and returns a void*. Right one receives 2 ints and returns a int. From my point of view, there is a great deal of difference. In fact, as I have said on previous posts, excluding parameter management, the code from the left version is nearly optimal. Only problem is that SDCC 4.0 still does not accept __z88dk_callee convention for C functions. With that calling convention, there would probably be minimal differences between optimal version and compiled version. I'm afraid I cannot nearly say the same about the right version. In fact, I agree with you in that right version is very bad code for the purposes of this function.

[attach=1,msg190014]

With respect to inlining, you can just write "inline" on the header file. However, that is only a hint to the compiler: the compiler can allways decide on best strategy (inlining or not). In CPCtelera there is also a macro (cpctm_screenPtr) which you can use when your values are known at compile-time. That gets converted into a single 2-byte value by the compiler. Moreover, you can call cpct_getScreenPtr from assembly code, just putting the passed values into registers, so you get rid of the parameter passing code. In any case, parameter passing code in CPCtelera is done with __z88dk_callee convention, which makes a total of 13 microseconds and 4 bytes, making a 24.5% of the total time cost of the function. It is important, for such a small function, but not nearly as bad as using IX to access the stack.


The last thing you can easily do, is creating an assembler macro that expands to the code of the function, to force inlining. If you know what you are doing and the costs you are paying (mainly, space costs), you may do such thing. It may be useful in some contexts.

reidrac

I use a similar function in my current project. Instead of hardcoding 0xc000, I use a variable that tells me which memory address to use so I can draw in the back buffer (generally) or in the active video memory.

Something like this:

return (uint8_t *)(vidmem_addr + x + ((y / 8) * 80) + ((y % 8) * 2048));


And the code that SDCC generates isn't that bad. I was looking to write that by hand but I made a mistake and I looked at the cpctelera code (that is brilliant, by the way); and I don't want to be "influenced" too much to risk breaking the licence or adding a dependency :D

Anyway, at the end of the day you want to save you some time and be able to write more complex code by using C. That's never going to be free; but those extra cycles you spend with so-so ASM code are totally worth it.

I'm with Ronaldo on this one: you want to use the right types to make the compiler more efficient.

I think I'll use the SDCC code as base-line and improve it a bit, probably moving it to __z88dk_callee.

EDIT: actually, that's a common issue in the code that folk often post in cpc wiki forum: they don't use the most appropriate types. I know it comes with experience, but is important to write C code that is good for SDCC/Z88DK.
Released The Return of Traxtor, Golden Tail, Magica, The Dawn of Kernel, Kitsune`s Curse, Brick Rick and Hyperdrive for the CPC.

If you like my games and want to show some appreciation, you can always buy me a coffee.

fgbrain


Guys...
Search the wiki!!!


There's an excellent routine by Executioner for Fast Plot
for each screen mode....


http://www.cpcwiki.eu/index.php/Programming:Fast_plot


The quickest way to calculate this formula without precalculations... I guess.
_____

6128 (UK keyboard, Crtc type 0/2), 6128+ (UK keyboard), 3.5" and 5.25" drives, Reset switch and Digiblaster (selfmade), Inicron Romram box, Bryce Megaflash, SVideo & PS/2 mouse, , Magnum Lightgun, X-MEM, X4 Board, C4CPC, Multiface2 X4, RTC X4 and Gotek USB Floppy emulator.

Powered by SMFPacks Menu Editor Mod