Hi good afternoon.
how can you write that for the cpc in asm please?
nx and ny are integers (2 bytes).
thank you.
greeting
((ny / 8 * 80) + ((ny % 8 * 2048) + (nx / 2)+ #c000
Whenever I need to see how something in C would look in asm, I check the obj folder of the project.
There should be an asm file generated.
For example, if you have main.c, you would get main.asm in the obj folder.
Inside that you can see what the generated asm looks like, and then work with that.
Quote from: ervin on 08:19, 17 July 20
Inside that you can see what the generated asm looks like, and then work with that.
bad idea for this because there are many optimisations a compiler will not see
(ny/8)*80 => easier to raz 3 first bits and multiply by 10
(ny%8)*2048+#C000 => easier to do an AND 7, multiply by 3 then ADD #C0 + previous 8 upper bits results
then adding nx/2 => many opts depend on the software context (signed, unsigned, limits, ...)
Quote from: roudoudou on 08:50, 17 July 20
bad idea for this because there are many optimisations a compiler will not see
(ny/8)*80 => easier to raz 3 first bits and multiply by 10
(ny%8)*2048+#C000 => easier to do an AND 7, multiply by 3 then ADD #C0 + previous 8 upper bits results
then adding nx/2 => many opts depend on the software context (signed, unsigned, limits, ...)
Yes that's a good point.
this website allow you to choose many compiler and see what it does
switch C++ to C, choose SDCC compiler 4.0.0
then create a C function
int adr(int nx, int ny) {
return ((ny / * 80) + ((ny % * 2048) + (nx / 2)+ 0xc000;
}
The outputed coded is really really bad (huge, slow and even a CALL to MODULO function...SDCC is not enough clever to simplify MOD 8 by AND 7
without verification, i will do something like this instead
ld hl,(ny)
ld a,l : and %11111000 : ld l,a
ld b,h : ld c,l
add hl,hl
add hl,hl
add hl,bc
add hl,hl
;
ld a,(ny) : and 7 : add a : add a : add a : add h : add #C0 : ld h,a
;
ld de,(nx) : srl d : rr e : add hl,de
oups, forgot to mention the website with live compilation => https://gcc.godbolt.org/ (https://gcc.godbolt.org/)
and the "-mz80" option to add in order to produce Z80 code in the right window
(https://i.postimg.cc/sXMKD1XX/2020-07-17-095652-3840x1080-scrot.png)
Quote from: funkheld on 22:05, 16 July 20
((ny / 8 * 80) + ((ny % 8 * 2048) + (nx / 2)+ #c000
The same, but more simple...
((ny * 10) + ((ny * 256) + (nx / 2)+ #c000
Do you really need 16 Bit? If ny is 16 Bit then the result would need 3 bytes.
Quote from: GUNHED on 13:46, 17 July 20
The same, but more simple...
((ny * 10) + ((ny * 256) + (nx / 2)+ #c000
Do you really need 16 Bit? If ny is 16 Bit then the result would need 3 bytes.
No, no, no, no, your formula is completly wrong
ervin;
sdcc has no asm solution;
greeting
#include <stdio.h>
#include <stdlib.h>
void main()
{
unsigned char *pScreen = (unsigned char *)0xC000;
unsigned int nx;
unsigned int ny;
unsigned int z;
nx=100;
ny=120;
z=255;
pScreen[((ny / * 80) + ((ny % * 2048) + (nx / 2)] =z;
}
sdcc-asm:
;--------------------------------------------------------
; File Created by SDCC : free open source ANSI-C Compiler
; Version 4.0.2 #11722 (MINGW32)
;--------------------------------------------------------
.module scradr
.optsdcc -mz80
;--------------------------------------------------------
; Public variables in this module
;--------------------------------------------------------
.globl _main
;--------------------------------------------------------
; special function registers
;--------------------------------------------------------
;--------------------------------------------------------
; ram data
;--------------------------------------------------------
.area _DATA
;--------------------------------------------------------
; ram data
;--------------------------------------------------------
.area _INITIALIZED
;--------------------------------------------------------
; absolute external ram data
;--------------------------------------------------------
.area _DABS (ABS)
;--------------------------------------------------------
; global & static initialisations
;--------------------------------------------------------
.area _HOME
.area _GSINIT
.area _GSFINAL
.area _GSINIT
;--------------------------------------------------------
; Home
;--------------------------------------------------------
.area _HOME
.area _HOME
;--------------------------------------------------------
; code
;--------------------------------------------------------
.area _CODE
;scradr.c:4: void main()
; ---------------------------------
; Function main
; ---------------------------------
_main::
push ix
;scradr.c:15: pScreen[((ny / * 80) + ((ny % * 2048) + (nx / 2)] =z;
ld hl, #0xc4e2
ld (hl), #0xff
;scradr.c:16: }
pop ix
ret
.area _CODE
.area _INITIALIZER
.area _CABS (ABS)
Quote from: funkheld on 14:26, 17 July 20
ervin;
sdcc has no asm solution;
obviously you force nx and ny values in the function so SDCC forced the result, you must have this values as parameters...
see up in this thread, there is the SDCC output code
Quote from: roudoudou on 13:48, 17 July 20
No, no, no, no, your formula is completly wrong
Yours too, the "/8" is missing. You only entered "/".
hello, why is the "compiler explorer" for win10 with sdcc?
thanks
Quote from: GUNHED on 14:37, 17 July 20
Yours too, the "/8" is missing. You only entered "/".
you miss the point
(var/8)*80 is very different from var*10
i let you try with var=1 and var=2
Quote from: roudoudou on 08:59, 17 July 20
The outputed coded is really really bad (huge, slow and even a CALL to MODULO function...SDCC is not enough clever to simplify MOD 8 by AND 7
without verification, i will do something like this instead
Before openly criticizing the compiler, you may want to pay attention to what you do carefully and consider that you are programming in C, not in ASM. Therefore, there are rules to consider, like using proper data types for the compiler to know what you want to do. If you pass signed integers to the compiler, you cannot expect it to use AND 7 for MOD 8, because that's
wrong. You can
only do that optimization if your data is unsigned. In fact, if you give proper data types, and use proper flags, results are not far from optimal in this case: https://gcc.godbolt.org/z/r1EM94 (https://gcc.godbolt.org/z/r1EM94). SDCC is not the best compiler in the world, but its not that bad either.
For this particular formula, you have an explained&commented implementation here: https://github.com/lronaldo/cpctelera/blob/development/cpctelera/src/video/cpct_getScreenPtr.asm (https://github.com/lronaldo/cpctelera/blob/development/cpctelera/src/video/cpct_getScreenPtr.asm). Only difference is that it does not divide nx by 2, because it works on bytes, not on pixels.
Isn't there a firmware instruction (&BC1D) available to calculate this? Maybe it's interesting to go through the source code?
Quote from: ronaldo on 17:28, 17 July 20
Before openly criticizing the compiler
this afternoon i made a stream about this function and compilation where i modified the C function to let SDCC optimize the best he can (unsigned char or even a table for lines)
i'm afraid there is no miracle ;D
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
hello , this write in ccz80.
is ok with tabelle.
the table(array word list) is a y-table from the left.
these are always 8 bytes below each other:
#C000,
#C800,
#D000,
#D800,
#E000,
#E800,
#F000,
#F800,
that's the next 8 bytes below each other
#C050,
#C850,
#D050,
#D850,
#E050,
#E850,
#F050,
#F850,
this is then from x 0 to 79 :
#C000 to #C04F
#C800 to #C84F........
include "cpc6128.ccz";
word p,py,pxy,x,y;
byte z;
array word list = {
#C000, #C800, #D000, #D800, #E000, #E800, #F000, #F800,
#C050, #C850, #D050, #D850, #E050, #E850, #F050, #F850,
#C0A0, #C8A0, #D0A0, #D8A0, #E0A0, #E8A0, #F0A0, #F8A0,
#C0F0, #C8F0, #D0F0, #D8F0, #E0F0, #E8F0, #F0F0, #F8F0,
#C140, #C940, #D140, #D940, #E140, #E940, #F140, #F940,
#C190, #C990, #D190, #D990, #E190, #E990, #F190, #F990,
#C1E0, #C9E0, #D1E0, #D9E0, #E1E0, #E9E0, #F1E0, #F9E0,
#C230, #CA3 #C280, #CA80, #D280, #DA80, #E280, #EA80, #F280, #FA80,
#C2D0, #CAD0, #D2D0, #DAD0, #E2D0, #EAD0, #F2D0, #FAD0,
#C320, #CB20, #D320, #DB20, #E320, #EB20, #F320, #FB20,
#C370, #CB70, #D370, #DB70, #E370, #EB70, #F370, #FB70,
#C3C0, #CBC0, #D3C0, #DBC0, #E3C0, #EBC0, #F3C0, #FBC0,
#C410, #CC10, #D410, #DC10, #E410, #EC10, #F410, #FC10,
#C460, #CC60, #D460, #DC60, #E460, #EC60, #F460, #FC60,
#C4B0, #CCB0, #D4B0, #DCB0, #E4B0, #ECB0, #F4B0, #FCB0,
#C500, #CD00, #D500, #DD00, #E500, #ED00, #F500, #FD00,
#C550, #CD50, #D550, #DD50, #E550, #ED50, #F550, #FD50,
#C5A0, #CDA0, #D5A0, #DDA0, #E5A0, #EDA0, #F5A0, #FDA0,
#C5F0, #CDF0, #D5F0, #DDF0, #E5F0, #EDF0, #F5F0, #FDF0,
#C640, #CE40, #D640, #DE40, #E640, #EE40, #F640, #FE40,
#C690, #CE90, #D690, #DE90, #E690, #EE90, #F690, #FE90,
#C6E0, #CEE0, #D6E0, #DEE0, #E6E0, #EEE0, #F6E0, #FEE0,
#C730, #CF30, #D730, #DF30, #E730, #EF30, #F730, #FF30,
#C780, #CF80, #D780, #DF80, #E780, #EF80, #F780, #FF80
};
mode(2);
cls();
p=list;
x=65;
y=199;
for (y = 0; y <= 199; y=y+1)
{
py=p+y*2;
for (x = 0; x <= 79; x=x+1)
{
pxy=**py+x;
*pxy=129;
}
}
while (1);
Quote from: HAL 6128 on 19:20, 17 July 20
Isn't there a firmware instruction (&BC1D) available to calculate this? Maybe it's interesting to go through the source code?
It does in it's own way, for example in MODE 1 the X-Coordinate ranges from 0 to 318, in MODE 0 this becomes 0 to 158 and the Y-Coordinate ranges from 0 to 199. The Memory Address is returned in HL, but C contains a bit mask for this pixel - I don't know what that means? :( and b contains the number of pixels stored in a byte -1.
Another way of using Firmware is to base the graphical routine on &C000 and use &BC26 SCR NEXT LINE to calculate the value of the next line.
EDIT:On my quest to understand the bit mask, it relates to how each mode handles each pixel in a byte, so in MODE 0 the byte mask is &AA - &x10101010, &88 in MODE 1 (&x10001000) and &08 in MODE 2 (&x00001000). So it could be used to blank certain inner pixels within a Byte.
Which "C" do you recommend for the CPC?
SDCC or z88dk?
thank you.
greeting
Quote from: roudoudou on 20:13, 17 July 20
this afternoon i made a stream about this function and compilation where i modified the C function to let SDCC optimize the best he can (unsigned char or even a table for lines)
i'm afraid there is no miracle ;D
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
Next time you can simply click on my link and see the miracle (https://gcc.godbolt.org/z/r1EM94 (https://gcc.godbolt.org/z/r1EM94)). It stops using modint and uses AND 7, as expected. Passing unsigned ints makes generated code nearly optimal. And that is without aggressive optimizations. Of course, human-coded assembly always has an advantage if coded well. But SDCC is doing what you are asking it to do: if you use X data type it has to respect what that data type means.
Quote from: roudoudou on 20:13, 17 July 20
the cpctelera optimised function suits well. There is only missing an inline function to be perfect
What do you exactly mean by inline function in this context? Are you referring to using the keyword "inline" in C to hint the compiler on embedding the code instead of making a call? If that is what you want, you only need to type-in "inline" in front of the function declaration in the header file.
Quote from: funkheld on 08:27, 18 July 20Which "C" do you recommend for the CPC?
SDCC or z88dk?
Both are very good. However, none of them is near to modern compilers as GCC o CLANG, but they do a great job. For a single project I personally think that I would choose z88dk nowadays. However, both of them have advantages and disadvantages and fluctuate over time on the long term. Just take into account that they even have developers collaborating on both compilers, and they share common knowledge. Differences are not relevant on many cases (it mostly depends on what you want to do). In any case, I think I would start with z88dk for single projects nowadays.
Quote from: ronaldo on 14:59, 18 July 20
Next time you can simply click on my link and see the miracle
i do not see significant difference on your link. The formula really need to be changed to see a small improvement
about inline, yes i was talking about inline prefix because with an optimised function, the management of the stack and parameters is barely the same size as the function code size :o
the best improvement occurs when replacing (ny /8) * 80 with (ny & 0xF8) * 10
Quote from: roudoudou on 15:14, 18 July 20
i do not see significant difference on your link. The formula really need to be changed to see a small improvement
about inline, yes i was talking about inline prefix because with an optimised function, the management of the stack and parameters is barely the same size as the function code size :o
Well, I see a great deal of difference between these 2 versions. Left one receives 2 unsigned ints and returns a void*. Right one receives 2 ints and returns a int. From my point of view, there is a great deal of difference. In fact, as I have said on previous posts, excluding parameter management, the code from the left version is nearly optimal. Only problem is that SDCC 4.0 still does not accept __z88dk_callee convention for C functions. With that calling convention, there would probably be minimal differences between optimal version and compiled version. I'm afraid I cannot nearly say the same about the right version. In fact, I agree with you in that right version is very bad code for the purposes of this function.
[attach=1,msg190014]
With respect to inlining, you can just write "inline" on the header file. However, that is only a hint to the compiler: the compiler can allways decide on best strategy (inlining or not). In CPCtelera there is also a macro (cpctm_screenPtr (https://github.com/lronaldo/cpctelera/blob/64679f191c324d9fbeae8682e145dc8c5d710984/cpctelera/src/video/video_macros.h#L83)) which you can use when your values are known at compile-time. That gets converted into a single 2-byte value by the compiler. Moreover, you can call cpct_getScreenPtr from assembly code, just putting the passed values into registers, so you get rid of the parameter passing code. In any case, parameter passing code in CPCtelera is done with __z88dk_callee convention, which makes a total of 13 microseconds and 4 bytes, making a 24.5% of the total time cost of the function. It is important, for such a small function, but not nearly as bad as using IX to access the stack.
The last thing you can easily do, is creating an assembler macro that expands to the code of the function, to force inlining. If you know what you are doing and the costs you are paying (mainly, space costs), you may do such thing. It may be useful in some contexts.
I use a similar function in my current project. Instead of hardcoding 0xc000, I use a variable that tells me which memory address to use so I can draw in the back buffer (generally) or in the active video memory.
Something like this:
return (uint8_t *)(vidmem_addr + x + ((y / 8) * 80) + ((y % 8) * 2048));
And the code that SDCC generates isn't that bad. I was looking to write that by hand but I made a mistake and I looked at the cpctelera code (that is brilliant, by the way); and I don't want to be "influenced" too much to risk breaking the licence or adding a dependency :D
Anyway, at the end of the day you want to save you some time and be able to write more complex code by using C. That's never going to be free; but those extra cycles you spend with so-so ASM code are totally worth it.
I'm with Ronaldo on this one: you want to use the right types to make the compiler more efficient.
I think I'll use the SDCC code as base-line and improve it a bit, probably moving it to __z88dk_callee.
EDIT: actually, that's a common issue in the code that folk often post in cpc wiki forum: they don't use the most appropriate types. I know it comes with experience, but is important to write C code that is good for SDCC/Z88DK.
Guys...
Search the wiki!!!
There's an excellent routine by Executioner for Fast Plot
for each screen mode....
http://www.cpcwiki.eu/index.php/Programming:Fast_plot
The quickest way to calculate this formula without precalculations... I guess.
I altered the routine used in Fast Plot to Calculate the Screen Address, I don't know if there's bits of assembly within that, which aren't relevant for calculating the Screen Address, otherwise the routine works.
org &8000
ld hl,0
ex hl,de
ld hl,0
.caline
;; input: de = x (0..79), hl = y (0..199)
;; exit: hl = screen address
ld a,l ;; a = lowbyte y
and 7 ;; isolate Bit 0..2
ld h,a ;; =y mod 8 to h
xor l ;; a = bit 3..7 of y
ld l,a ;; = (y\8)*8 to l
ld c,a ;; store in c
ld b,&60 ;; b = &c0\2 = Highbyte Screenstart\2
add hl,hl ;; hl * 2
add hl,hl ;; hl * 4
add hl,bc ;; + bc = startaddress
add hl,hl ;; of the raster line
add hl,de ;; hl = screen address
ret
I sometimes have the feeling that certain system routines cause sdcc to shut down the system? ::)
greeting
My mind has been blown! :o
Based on @AMSDOS (https://www.cpcwiki.eu/forum/index.php?action=profile;u=330) code: 158 cycles; compared with my SDCC based version: 341 cycles.
And it works nicely!
Quote from: reidrac on 08:24, 19 July 20
My mind has been blown! :o
Based on @AMSDOS (https://www.cpcwiki.eu/forum/index.php?action=profile;u=330) code: 158 cycles; compared with my SDCC based version: 341 cycles.
And it works nicely!
The original Fast Plot Routine that this is based on was published in CPC Amstrad International with an article by Matthias Uphoff in 1989, it can be looked up on CPC-Rulez:
https://cpcrulez.fr/coding_src-list-fast-plot.htm (https://cpcrulez.fr/coding_src-list-fast-plot.htm)
But it only worked in MODE 1, @Executioner (https://www.cpcwiki.eu/forum/index.php?action=profile;u=17) converted it to MODE 0, all I did was remove the bitmasking required to allow the plotting and 'srl e' because Sprites are handled normally in bytes, so this won't handle perfect pixel animation, so maybe it would be better with 'srl e', so when e=0 or e=1 screen address = the same address, so perhaps that second position could be used for animated sprite.
Quote from: funkheld on 08:04, 19 July 20
I sometimes have the feeling that certain system routines cause sdcc to shut down the system?
greeting
The danger is if variables are outside their intended values and the formula sends the graphic offscreen and somewhere which is sensetative or corrupting code. The firmware routines used to calculate the screen positions or next or previous lines, safely keeping it on the screen, they are slower though then calculating screen addresses, but prevent accidents from happening.
Quote from: AMSDOS on 10:18, 19 July 20
The original Fast Plot Routine that this is based on was published in CPC Amstrad International with an article by Matthias Uphoff in 1989, it can be looked up on CPC-Rulez:
https://cpcrulez.fr/coding_src-list-fast-plot.htm (https://cpcrulez.fr/coding_src-list-fast-plot.htm)
But it only worked in MODE 1, @Executioner (https://www.cpcwiki.eu/forum/index.php?action=profile;u=17) converted it to MODE 0, all I did was remove the bitmasking required to allow the plotting and 'srl e' because Sprites are handled normally in bytes, so this won't handle perfect pixel animation, so maybe it would be better with 'srl e', so when e=0 or e=1 screen address = the same address, so perhaps that second position could be used for animated sprite.
I made some tweaks myself to better work with my codebase. I'm only interested in the address; this is used to draw sprites mostly.
Quote from: reidrac on 16:57, 18 July 20
And the code that SDCC generates isn't that bad. I was looking to write that by hand but I made a mistake and I looked at the cpctelera code (that is brilliant, by the way); and I don't want to be "influenced" too much to risk breaking the licence or adding a dependency :D
If you want to use the code you are completely free to do so. We are deciding on moving CPCtelera to MIT/Apache license, and that is the most likely decision for CPCtelera 2.0. In the meantime, you could consider this post as a license permision. Even without it, we would never use the license to claim anything from other Amstrad authors. In fact, this is completely aligned with MIT/Apache license premises, and that's why we will be changing into it. We started with GPL because that was default on most of our projects to encourage further free&open software. But in the case of CPCtelera I think that's much more reasonably accomplished with the less restrictive licenses.
So, please, don't be afraid of breaking any license: the code is there to be used whenever you feel it may be of use for you. And of course, if anyone finds bugs, improvements or better ideas on how to do things, they are always welcome.
Quote from: ronaldo on 18:22, 19 July 20
If you want to use the code you are completely free to do so. We are deciding on moving CPCtelera to MIT/Apache license, and that is the most likely decision for CPCtelera 2.0. In the meantime, you could consider this post as a license permision. Even without it, we would never use the license to claim anything from other Amstrad authors. In fact, this is completely aligned with MIT/Apache license premises, and that's why we will be changing into it. We started with GPL because that was default on most of our projects to encourage further free&open software. But in the case of CPCtelera I think that's much more reasonable accomplished with the less restrictive licenses.
So, please, don't be afraid of breaking any license: the code is there to be used whenever you feel it may be of use for you. And of course, if anyone finds bugs, improvements or better ideas on how to do things, they are always welcome.
Thank you, appreciated!
I think MIT is perfect for this type of library, I may release code too. Since Kistune's Curse I've been working on more generic code instead of custom for the project, and that may be reusable by someone else. I'm still using cpcrslib for keyboard/joystick, but I plan to write something a bit more minimalistic (I did it for the speccy and I'm happy with the result).
Anyway, it was a surprise finding that nice routine in CPC Wiki forum. I'm using it currently!
CPCtelera 2.0. ....... ;) ;)
1.5 cpctelera is not even out yet.
what's with 2.0?
burns the licenses.
life is easier without licenses ...
Quote
burns the licenses.
life is easier without licenses ...
Thats the purpose of mit license ;)
Quote from: roudoudou on 15:04, 17 July 20
you miss the point
(var/8)*80 is very different from var*10
i let you try with var=1 and var=2
You too. :) In his formula, there are no brackets.
Quote from: GUNHED on 08:57, 22 July 20
You too. :) In his formula, there are no brackets.
I know you miss the point, it's useless to say it again ;D
try with value=1 and value=2, see the difference
Quote from: roudoudou on 09:03, 22 July 20
I know you miss the point, it's useless to say it again ;D
try with value=1 and value=2, see the difference
In his original post there were no brackets. See first post.
Quote from: GUNHED on 09:05, 22 July 20
In his original post there were no brackets. See first post.
You are wrong. Check again and slowly the original post, maybe you will understand why you completly miss the point...
title has a computable formula with brackets
this in asm : ((ny / 8) * 80) + ((ny % 8) * 2048) + (nx / 2)+ #c000 (https://www.cpcwiki.eu/forum/programming/this-in-asm-((ny-8)-*-80)-((ny-8)-*-2048)-(nx-2)-c000/msg189911/#msg189911)
post has a formula with 2 typos (this formula cant be computed because there is 2 brackets never closed), you may obviously guess it's wasn't copy/past then typos
((ny / 8 * 80) + ((ny % 8 * 2048) + (nx / 2)+ #c000
he removed 2 ending brackets because of smileys ;D
now it's ok for you?
Whatever. Back to topic... ;D ;D ;D
Yeah, please Fix the formula in the first post :P using CODE tag ( # icon )
(https://i.postimg.cc/RFhT7CrH/2020-07-22-102414-3840x1080-scrot.png)
here in the forum the "(" and the ")" are not output correctly.
jscompile POC with no optimisation for:
function main()
{
var intA = ((ny / 8) * 80) + ((ny % 8) * 2048) + (nx / 2) + c000;
}
generates:
main:
call main_constructor
ret
main_constructor:
ld hl, ny
push hl
ld hl, 8
push hl
call sys_div
ld hl, 80
push hl
call sys_mult
ld hl, ny
push hl
ld hl, 8
push hl
call sys_mod
ld hl, 2048
push hl
call sys_mult
call sys_add
ld hl, nx
push hl
ld hl, 2
push hl
call sys_div
call sys_add
ld hl, c000
push hl
call sys_add
pop hl
ld (main__inta), hl
ret
main__inta: defw 0
It is a typical expected RPN output.