Just a question from a newbie in assembler

HAL6128 · 12:08, 07 March 12

Hi,
I tried "Executioners" Fast-Plot-Routine in Mode 1 (http://www.cpcwiki.eu/index.php/Programming:Fast_plot)
Just to draw a line from left upper corner diagonal to bottom and made a compersion in speed with the BASIC Draw command. But BASICs DRAW is faster. Did I made an error or mistake?
Thanks for helping.

Code Select


10 t1=TIME:CALL &4000:t2=TIME:PRINT (t2-t1)/300
11 CALL &BB18:CLS
15 t1=TIME:ORIGIN 0,400:DRAWR 399,-399:t2=TIME:PRINT (t2-t1)/300
16 CALL &BB18:CLS
20 GOTO 10

Assembler Code for Fast-Plot-Routine (at label "plot" the Code is from Executioner)

Code Select

 
        org &4000
cmask   equ &b6a3                                                               ;Graphics Pen
        ld a,&01
        call &bc0e                                                              ;Mode 1
        ld hl,&0001                                                             ;starting point at Y (Low-Byte = 1) 
        ld de,&0001                                                             ;startomg point at X (Low-Byte = 1)
        ld b,199                                                                ;counter: 199
        
loop    push bc                                                                 ;save bc
        push de                                                                 ;save de
        push hl                                                                 ;save hl
        call plot                                                               ;call plot routine
        pop hl                                                                  ;rescue hl
        pop de                                                                  ;rescue de
        inc l                                                                   ;increase l plus 1
        inc e                                                                   ;increase e plus 1
        pop bc                                                                  ;rescue b
        djnz loop                                                               ;b minus 1 > 0 then back to loop 
        ret
        
plot    ld a,l                                                                  ;A = Lowbyte of Y
        and %00000111                                                           ;isolate Bit 0 to 2 for division with 8 (means l MOD 8)
        ld h,a                                                                  ;H = result from A
        xor L                                                                   ;now isolate Bit 3 to 7 for last part of division
        ld l,a                                                                  ;L = result from A
        ld c,a                                                                  ;C = result from A
        ld b,&60                                                                ;B = &C0\2 = Highbyte Screenstart\2
        add hl,hl                                                               ;HL * 2
        add hl,hl                                                               ;HL * 4
        add hl,bc                                                               ;+ BC > means HL*8
        add hl,hl                                                               ;HL * 10 > result is a multiplikation by 10
        ld a,e                                                                  ;Lowbyte of X to A
        srl d                                                                   ;calculate X\4, because
        rr e                                                                    ;4 pixel per byte
        srl e
        add hl,de                                                               ;+ HL = Screenaddress
        ld c,%10001000                                                          ;Bitmask for MODE 1
        and %00000011                                                           ;A = X MOD 4
        jr z,nshift                                                             ;-> = 0, no shift
shift  srl c                                                                   ;move bitmask to pixel
        dec a                                                                   ;loop counter
        jr nz,shift                                                             ;-position
  
nshift ld a,(cmask)                                                            ;get color mask
        xor (hl)                                                                ;XOR screenbyte
        and c                                                                   ;AND bitmask
        xor (hl)                                                                ;XOR screenbyte
        ld (hl),a                                                               ;new screenbyte
        ret                                                                     ;done

Ygdrazil · 12:37, 07 March 12

Hi!

Try to disable the interrupts :-) (DI)

/Ygdrazil

MaV · 12:54, 07 March 12

Quote from: hal 6128 on 12:08, 07 March 12
Did I made an error or mistake?

Lines are consecutive pixels, therefore the next pixel to plot is one of the possible eight neighbours.

In your special case of a vertical line the neighbours are above and below the current pixel. Depending on the direction you go, you'd only need to calculate the next address for the above or the below pixel, the pixel mask stays the same. Thus you'd only calc the byte address and mask of the first pixel and then calc the new byte address.

If you calc every pixel on the line, the algorithm is wasting cycles.

Axelay · 13:54, 07 March 12

Quote from: hal 6128 on 12:08, 07 March 12
Hi,
I tried "Executioners" Fast-Plot-Routine in Mode 1 (http://www.cpcwiki.eu/index.php/Programming:Fast_plot)
Just to draw a line from left upper corner diagonal to bottom and made a compersion in speed with the BASIC Draw command. But BASICs DRAW is faster. Did I made an error or mistake?
Thanks for helping.

You are using BASIC to time a very short piece of assembly. Try doing more lines at once, such as 8 lines with the code below, and the difference should become more apparent.

But I think a better comparison would be to not do the line draw in BASIC at all, but use the firmware line draw routines from assembly and compare that with the fast plot routine. Otherwise too much of that time you are measuring is about the BASIC interpreters speed.

Code Select


10 t1=TIME:CALL &4000:t2=TIME:PRINT (t2-t1)/300
11 CALL &BB18:CLS
15 t1=TIME:FOR x=0 to 15 step 2:ORIGIN x,400:DRAWR 399,-399:NEXT x:t2=TIME:PRINT (t2-t1)/300
16 CALL &BB18:CLS
20 GOTO 10

Code Select


org &4000

cmask   equ &b6a3                                                               ;Graphics Pen
        ld a,&01
        call &bc0e                                                              ;Mode 1
        ld hl,&0001                                                             ;starting point at Y (Low-Byte = 1) 
        ld de,&0001                                                             ;startomg point at X (Low-Byte = 1)
        ld c,8
outerloop
        push de
        call drawline
        pop de
        inc de
        ld hl,1
        dec c
        jr nz,outerloop
        ret

drawline
        ld b,199                                                                ;;counter: 199
        
loop    push bc                                                                 ;save bc
        push de                                                                 ;save de
        push hl                                                                 ;save hl
        call plot                                                               ;call plot routine
        pop hl                                                                  ;rescue hl
        pop de                                                                  ;rescue de
        inc l                                                                   ;increase l plus 1
        inc e                                                                   ;increase e plus 1
        pop bc                                                                  ;rescue b
        djnz loop                                                               ;b minus 1 > 0 then back to loop 
        ret
        
plot    ld a,l                                                                  ;A = Lowbyte of Y
        and %00000111                                                           ;isolate Bit 0 to 2 for division with 8 (means l MOD 8)
        ld h,a                                                                  ;H = result from A
        xor L                                                                   ;now isolate Bit 3 to 7 for last part of division
        ld l,a                                                                  ;L = result from A
        ld c,a                                                                  ;C = result from A
        ld b,&60                                                                ;B = &C0\2 = Highbyte Screenstart\2
        add hl,hl                                                               ;HL * 2
        add hl,hl                                                               ;HL * 4
        add hl,bc                                                               ;+ BC > means HL*8
        add hl,hl                                                               ;HL * 10 > result is a multiplikation by 10
        ld a,e                                                                  ;Lowbyte of X to A
        srl d                                                                   ;calculate X\4, because
        rr e                                                                    ;4 pixel per byte
        srl e
        add hl,de                                                               ;+ HL = Screenaddress
        ld c,%10001000                                                          ;Bitmask for MODE 1
        and %00000011                                                           ;A = X MOD 4
        jr z,nshift                                                             ;-> = 0, no shift
shift  srl c                                                                   ;move bitmask to pixel
        dec a                                                                   ;loop counter
        jr nz,shift                                                             ;-position
  
nshift ld a,(cmask)                                                            ;get color mask
        xor (hl)                                                                ;XOR screenbyte
        and c                                                                   ;AND bitmask
        xor (hl)                                                                ;XOR screenbyte
        ld (hl),a                                                               ;new screenbyte
        ret                                                                     ;done

HAL6128 · 14:45, 07 March 12

Oh, I see, it's a lot of times faster plotting more lines in MC than one.
So, does it mean, that the processes between different commands becomes in MC more faster, but the DRAW command itself in BASIC (or the MC-Code behind) is pretty fast.
So, if I want to give more speed in drawing a line or plotting many points there might be a faster or different routine than "Executioners Fast-Plot-Routine in Mode 1" as MaV or Ygdrazil mentioned above?

Quote from: Axelay on 13:54, 07 March 12
You are using BASIC to time a very short piece of assembly. Try doing more lines at once, such as 8 lines with the code below, and the difference should become more apparent.
...
But I think a better comparison would be to not do the line draw in BASIC at all, but use the firmware line draw routines from assembly and compare that with the fast plot routine. Otherwise too much of that time you are measuring is about the BASIC interpreters speed.

HAL6128 · 16:14, 07 March 12

ok, I tried a comparison in plain assembler. Don't know If I'm right (MC is tough for beginners...

).
It's a code with the help of firmware routines.

Code Select

 
        org &5000
        ld hl,0
        ld de,0
        call &bd10              ;reset Timer        
        
        call &bd0d              ;read Timer and store it in RAM
        ld (&4ffc),de
        ld (&4ffe),hl
        
        ld de,0                 ;origin 0,400
        ld hl,400
        call &bbc9
        
        ld de,399               ;draw line
        ld hl,-399
        call &bbf6
 
        call &bd0d              ;read timer since then and store it in RAM
        ld (&4ff8),de
        ld (&4ffa),hl
        ret

...now read out result from BASIC (Am I right with summerize the four byte figure?)

Code Select

 
10 t1=PEEK(&4FFC)*256*256*256+PEEK(&4FFD)*256*256+PEEK(&4FFE)*256+PEEK(&4FFF)
20 t2=PEEK(&4FF8)*256*256*256+PEEK(&4FF9)*256*256+PEEK(&4FFA)*256+PEEK(&4FFB)
30 PRINT (t2-t1)/300

...got the result of 33.28 (but don't know what the number means - fractions of seconds or whatever??)

The same procedure with "Executioners" Fast-Plot-Routine.
...got the result of 5.12.
So, this seams to be faster then?

MaV · 16:34, 07 March 12

Quote from: hal 6128 on 16:14, 07 March 12
The same procedure with "Executioners" Fast-Plot-Routine.
...got the result of 5.12.
So, this seams to be faster then?

Seems so. I haven't checked the assembly thoroughly, but the Firmware routines do a lot of extra calculation for different reasons, which you don't have to (and don't do).

My previous post was about how to optimise your routine with Executioner's Fast-Plot even further. There's a lot still that can be optimized here.
But anyway, nice work for a beginner. You'll learn all that in time.

Executioner · 06:49, 08 March 12

I'm not 100% sure, but I think the firmware DRAW routines are quite fast and don't use a pixel PLOT routine at all, rather they use what's called Bresneham's line drawing algorithm and determine first if the line is to be stepped horizontally or vertically, then use single pixel mask rotates and address increments which is much faster than recalculating the address and mask for each individual pixel. So the firmware draw could be much faster than using my fast plot routine.

Axelay · 11:24, 08 March 12

Quote from: hal 6128 on 16:14, 07 March 12
...now read out result from BASIC (Am I right with summerize the four byte figure?)
Code Select Expand
10 t1=PEEK(&4FFC)*256*256*256+PEEK(&4FFD)*256*256+PEEK(&4FFE)*256+PEEK(&4FFF) 20 t2=PEEK(&4FF8)*256*256*256+PEEK(&4FF9)*256*256+PEEK(&4FFA)*256+PEEK(&4FFB) 30 PRINT (t2-t1)/300
...got the result of 33.28 (but don't know what the number means - fractions of seconds or whatever??)

The same procedure with "Executioners" Fast-Plot-Routine.
...got the result of 5.12.
So, this seams to be faster then?

When storing a 16 bit register pair with an instruction like ld (&4ff8),de the least significant byte, in this case e, is stored at &4ff8, and then the most significant byte, d, is stored in the following register, &4ff9. So your BASIC program should be:

Code Select

 
10 t1=PEEK(&4FFD)*256*256*256+PEEK(&4FFC)*256*256+PEEK(&4FFF)*256+PEEK(&4FFE)
20 t2=PEEK(&4FF9)*256*256*256+PEEK(&4FF8)*256*256+PEEK(&4FFB)*256+PEEK(&4FFA)
30 PRINT t2-t1

I've simply moved the peeked values around rather than the multiplications, and I've also changed the time to straight 300ths of a second, as the numbers are quite small!

What Executioner just mentioned about the firmware rang a bell, and it does appear to be so from the vague entry in my old firmware book. It also mentions that the firmware draw routine recalculates the required length of it's vertical or horizontal lines after each individual line draw for better accuracy, so taking that into account, this test diagonal line is a worst case scenario for the firmware routine. If you try the firmware routine with less severe angles, or straight horizontal or vertical lines, it will give faster results.

MaV · 12:11, 08 March 12

Quote from: Executioner on 06:49, 08 March 12
I'm not 100% sure, but I think the firmware DRAW routines are quite fast and don't use a pixel PLOT routine at all, rather they use what's called Bresneham's line drawing algorithm and determine first if the line is to be stepped horizontally or vertically, then use single pixel mask rotates and address increments which is much faster than recalculating the address and mask for each individual pixel. So the firmware draw could be much faster than using my fast plot routine.

That's the way the firmware draws lines.

As I said, there's no need to determine the new pixel position by recalculating it completely. Instead you "move" to the next neighbouring position by one pixel. That usually is either horizontally determining the new pixel mask and perhaps add a byte to the screen address or moving up one line (usually subtracting &800) and retain the mask (since the next pixel is directly above the current). In the worst case you'll have to do both of the above because the line is diagonal.

@hal 6128: I would not recommend you to try to implement a Bresenham algorithm, because this is more advanced and not suited for a beginner level. You would be easily discouraged. Try horizontal, vertical and diagonal lines first, until you think you've completely understood it.

Now, if you look at the fast pixel plot routine, the first part tries to determine the screen address (up to the line with the comment "+ HL = Screenaddress". Once you have that address, you don't need to recalculate it again. Instead just add/subtract an offset to it to find the next byte if you need to move vertically, if necessary.
The second part of the fast pixel plot determines the exact pixel position within that screen address. You have to change that part whenever you try to move the pixel horizontally.
When you need to move diagonally, you need to combine both of the above.
And for all that you really need to get the hang of the CPC's screen layout!

If all that seems too difficult at first, try to write a BASIC version that does the same thing, and if that works, convert that to assembly.

HAL6128 · 16:59, 08 March 12

Quote from: Axelay on 11:24, 08 March 12
When storing a 16 bit register pair with an instruction like ld (&4ff8),de the least significant byte, in this case e, is stored at &4ff8, and then the most significant byte, d, is stored in the following register, &4ff9. So your BASIC program should be:

Code Select Expand
10 t1=PEEK(&4FFD)*256*256*256+PEEK(&4FFC)*256*256+PEEK(&4FFF)*256+PEEK(&4FFE) 20 t2=PEEK(&4FF9)*256*256*256+PEEK(&4FF8)*256*256+PEEK(&4FFB)*256+PEEK(&4FFA) 30 PRINT t2-t1

Thanks for the hint. Now the figure makes sense! Even if I divide "t2-t1" with 300 I have a small number, but in seconds. Great.

HAL6128 · 17:05, 08 March 12

Quote from: MaV on 12:11, 08 March 12
@hal 6128: I would not recommend you to try to implement a Bresenham algorithm, because this is more advanced and not suited for a beginner level. You would be easily discouraged. Try horizontal, vertical and diagonal lines first, until you think you've completely understood it.

Now, if you look at the fast pixel plot routine, the first part tries to determine the screen address (up to the line with the comment "+ HL = Screenaddress". Once you have that address, you don't need to recalculate it again. Instead just add/subtract an offset to it to find the next byte if you need to move vertically, if necessary.
The second part of the fast pixel plot determines the exact pixel position within that screen address. You have to change that part whenever you try to move the pixel horizontally.
When you need to move diagonally, you need to combine both of the above.
And for all that you really need to get the hang of the CPC's screen layout!

If all that seems too difficult at first, try to write a BASIC version that does the same thing, and if that works, convert that to assembly.

Ok. I see what you mean. It's a pity, cause the Bresenham-algorithm (with integer) itself is easy (in BASIC) but it still has a plot-Routine integrated. So, therefore nothing has been changed... I try following your recommendations...
By the way: I tried to compile it with FaBaCOM. But even that binary code took 1.2 seconds.

Code Select


10 REM
20 REM Bresenham algorithm (2) - only Integer
30 REM
40 REM * initialising
50 REM
60 MODE 2
70 ORIGIN 0,0
80 INK 1,26:INK 0,0:BORDER 13
90 DEFINT a-z
100 e=0:x1=0:x2=0:y1=0:y2=0:deltax=0:deltay=0
110 REM
120 REM * input
130 REM
140 LOCATE 2,2:PRINT"line from ... to .."
150 LOCATE 2,4:INPUT"x1:",x1
160 LOCATE 2,5:INPUT"x2:",x2
170 LOCATE 2,7:INPUT"y1:",y1
180 LOCATE 2,8:INPUT"y2:",y2
190 REM
200 REM * preparation
210 REM
220 t1!=TIME
230 ORIGIN x1,y1
240 deltax=x2-x1:deltay=y2-y1
250 e=2*deltay-deltax
260 REM
270 REM * calculation & output
280 REM
290 FOR i=1 TO deltax
300 PLOT x,y
310 IF e>0 THEN y=y+1:e=e+(2*deltay-2*deltax) ELSE e=e+2*deltay
320 x=x+1
330 NEXT i
340 t2!=TIME
350 LOCATE 50,24:PRINT"Zeit:";:PRINT USING "##.#";(t2!-t1!)/300;:PRINT " seconds"
360 CALL &BB18

Executioner · 23:29, 08 March 12

You should try timing the firmware PLOT routine in a loop just to test the speed difference between the firmware routine and my PLOT routine

TFM · 23:50, 08 March 12

Yeah! Because for drawing lines one shouldn't use a plot routine ;-) Plot is for plotting dots :-)

HAL6128 · 10:44, 30 March 12

I have just another quick question where I stuck in information:

I've read the post "Basic proramming tips" (basic programming tips) and try to find out how does the CALL command from basic works if I add some parameters.

I don't know if my assumption is right with that example:
the MC-Code routine behind the call is out of Sean McManus little Basic-Type-In-Game "Alien Intervention"

CALL &9C40,3,0,24

The Assembler routine at &9C40 starts with:

ld l,(ix+&0000)
ld h,(ix+&0002)
ld a,(ix+&0004)
.... and so on

I've read that the CALL-Routine put the parameters on the stack in reverse order.
1.) What does that mean? Does the CALL routine goes everytime from back to force and puts two bytes on the stack?

CALL &9C40,3,0,24 > means in memory &18,&00,&00,&00,&03,&00 (Am I right?)

2.) How does Register IX know with the call that RAM start at value &BFF8 where the datas are stored?

Thanks in advance for helping.

mahlemiut · 22:40, 30 March 12

At a guess, I'd presume the BASIC CALL function will push each value onto the stack, then set IX to point to the stack pointer at that time, and then call the ASM routine.

Stacks always grow backwards through RAM, PUSHing something onto the stack will decrease the stack pointer, POPping something off of the stack will increase the stack pointer.

For example, SP starts at 0xC000. pushing the 3 onto the stack first will decrease SP by 2 (now is 0xBFFE), and place the value pushed at that location. This will be repeated for the remaining parameters, so the 0 is written to 0xBFFC, and the 24 to 0xBFFA. BASIC may well use the stack for its own purposes also, so it places the address of the parameters in the IX register.

You should be right with what is expected, but if you're doing this in an emulator, you could set a debugger breakpoint at 0x9C40, and see what the data is at the location IX points to.

Now I just hope I haven't confused you.

HAL6128 · 11:00, 02 April 12

Quote from: mahlemiut on 22:40, 30 March 12
....Now I just hope I haven't confused you.

No, not at all. Thank you for the hint. I found out, that the first 2 Bytes of the stack won't be used for parameters by CALL or RSX commands. Why, I don't know at the moment (...try to find it out...) but it starts at &BFFD with the first value (Low-Byte), than &BFFC (High-Byte) and so on. IX Register is automatically filled with the last SP address filled.
And finally the A Register has the value or number of parameter you've transfered, so it's possible to use a CP command in assembler to check if the number of parameters have been correct.

arnoldemu · 13:20, 02 April 12

Quote from: hal 6128 on 11:00, 02 April 12

No, not at all. Thank you for the hint. I found out, that the first 2 Bytes of the stack won't be used for parameters by CALL or RSX commands. Why, I don't know at the moment (...try to find it out...) but it starts at &BFFD with the first value (Low-Byte), than &BFFC (High-Byte) and so on. IX Register is automatically filled with the last SP address filled.
And finally the A Register has the value or number of parameter you've transfered, so it's possible to use a CP command in assembler to check if the number of parameters have been correct.

You should use IX to get your parameters.
and A always has the number of parameters.

Executioner · 00:37, 04 April 12

Quote from: hal 6128 on 11:00, 02 April 12
No, not at all. Thank you for the hint. I found out, that the first 2 Bytes of the stack won't be used for parameters by CALL or RSX commands. Why, I don't know at the moment (...try to find it out...) but it starts at &BFFD with the first value (Low-Byte), than &BFFC (High-Byte) and so on.

The first two bytes on the stack are the internal return address for the BASIC interpreter to get back to where it was before it executed the CALL command.

P.S. The bold word should be then, not that anyone in the world seems to know when to use which word any more.

HAL6128 · 07:57, 04 April 12

Thanks to all for help me understanding. Now it becomes more clear.

P.S. And yes, apart the fact my English is crap... In German "then = dann" and "than = als". So "than" is optically more similar to "dann" because of its "a" in the word ...maybe a weak excuse of myself for than/then

Bryce · 08:22, 04 April 12

And what about "denn" ?

Bryce.

HAL6128 · 08:33, 04 April 12

Yes, you're right. "denn" has the same meening as "als". So: "than = denn" and "then = dann". Oh, shitty... $:-\$

Bryce · 08:58, 04 April 12

Wait till TFM comes online, then we can discuss the finer points of Hochdeutsch such as "als-wie"

Bryce.

MaV · 09:01, 04 April 12

Quote from: hal 6128 on 08:33, 04 April 12
Yes, you're right. "denn" has the same meening as "als". So: "than = denn" and "then = dann". Oh, shitty... $:-\$

To give you a bit of comfort: I'd say about half of the posters with English as their native tongue seem to have trouble distinguishing between then and than. The error is about as frequent as das and dass in German.

@Bryce: if in doubt, you should just use both in this order " als wie " or rather "ois wia" in dialects.

Bryce · 09:31, 04 April 12

Luckily I learnt "proper German", so there's no doubt: Als and wie should NEVER be next to each other in a sentence, no matter what order!

Bryce.

News:

Just a question from a newbie in assembler