News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_redbox

Let it snow, let it snow, let is snow...

Started by redbox, 23:38, 05 December 10

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

redbox

I have programmed this routine as a proof of concept - it works, but it's insanely slow and takes nearly a whole frame.

I would normally optimise it myself but, as you might have guessed, I would probably run out of time to make it good for a near release date  ;)

I don't want to do anything insane with it, just speed it up.  The other routines I've got to work alongside it are programmed nicely but I don't want to run out of time or have to cut this routine down so it has less of an 'effect'.

Anyone want to help...?  8)


        org &8000

        di

; <<<<<< MAIN LOOP >>>>>>

loop:        ld b,&f5        ; wait for v-sync
ml2:        in a,(c)
        rra
        jr nc,ml2

        LD BC,&7F10        ; change border colour to see routine time
        OUT (C),C
        LD BC,&7F54
        OUT (C),C

        call plot        ; do the plotting

        ld hl,&9000
        ld (snow_table_pt),hl
        ld hl,scr_buffer
        ld (scr_buffer_pt),hl

        LD BC,&7F10        ; change border colour to see routine time
        OUT (C),C
        LD BC,&7F4B
        OUT (C),C

        jr loop

; <<<<< SUBROUTINES >>>>>

plot:        ld b,160        ; number of plots (X res in MODE 0 is 160)

plot_loop:    ld hl,(snow_table_pt)    ; get screen address from table
        ld e,(hl)        ; and put it into DE
        inc hl
        ld d,(hl)
        ld hl,(scr_buffer_pt)    ; get buffer data
        ld a,(hl)        ; and put it into A
        ex de,hl        ; put screen address (DE) from earlier into HL
        ld (hl),a        ; plot buffer data back to screen

        call next_line        ; calculate next pixel line down

        ex de,hl        ; put HL (next pixel line down) into DE

        ld hl,(snow_table_pt)   ; get screen address pointer
        ld (hl),e        ; and put DE (next pixel line down)...
        inc hl            ; ...into the table
        ld (hl),d
        inc hl
        ld (snow_table_pt),hl    ; and store it

        ex de,hl        ; put screen address back into HL

        push hl            ; preserve screen address
        ld a,(hl)        ; get what's on the screen
        ld hl,(scr_buffer_pt)    ; and store it into...
        ld (hl),a        ; ...the screen buffer table
        inc l
        ld (scr_buffer_pt),hl
        pop hl            ; restore screen address into HL and A is what's on screen

        push af            ; preserve what's on screen
        ld a,(toggle)        ; get the toggle to see which pixel we're doing
        xor 1            ; toggle it between 1 and 0
        ld (toggle),a        ; store it for next time
        jr z,right_pixel    ; if it's 0, do the right pixel else do left pixel

left_pixel:    pop af            ; get back what was on screen
        and %01010101         ; AND it to ignore right pixel
        or &80            ; OR it with what we want to plot
        jr pixel_continue    ; and continue

right_pixel:    pop af            ; same as above but for right pixel
        and %10101010
        or &40

pixel_continue:    ld (hl),a        ; plot it on the screen

        djnz plot_loop        ; decrease B and loop back it not 0

        ret

next_line:      ld a,8            ; add 8 to H (the high byte)
            add a,h             ; &800 is next pixel line down
            ld h,a
            ret nc              ; return if no overflow

            ld de,&C050        ; otherwise add &C000+&50 to HL
            add hl,de           ; &C000 is top of screen, &50 next character line down
            ret

; <<<<< BUFFERS >>>>>

toggle:        defb 0

snow_table_pt:    defw &9000        ; contains 160 * screen addresses (16 bits), &140 bytes in length

scr_buffer_pt:     defw scr_buffer

        org &9200        ; align to page boundry for <256 bytes (use INC L etc)

scr_buffer:    defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
        defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00

Axelay

Well I need a break from my current project, so I thought I'd have a go  :)


From the di at the beginning of the code sample you've given, I've assumed I can use the stack and alternate registers freely.  So I think at a rough count the loop is down to 36 nops, so at 160 snow flakes that should be 90 scan lines plus the loop set up overhead.  Of course, it takes longer when the screen address crosses a character line, so it would be ideal if only some snow flakes did that each frame.


I have taken the toggle check out of the loop, this means all snow flakes in a given frame are left or right, all at once.  If you prefer some being left and right each frame, I'd recommend calling the loop twice instead with the preloaded registers set appropriately rather than putting the toggle back in the main loop.  At the current speed the snow flakes move left and right though, I dont think it'll be an issue!  They need to be slower, really.


Anyway, this demonstrates 5 snow flakes, you'll need to add more unique addresses yourself if you want more!
org &8000
run &8000

di

; <<<<<< MAIN LOOP >>>>>>

LD BC,&7F0f ; use colour 15 for snowflakes
OUT (C),C
LD BC,&7F4b ;  and set to white
OUT (C),C

    ld bc,&7F8c
    out (c),c ; set mode 0

loop: ld b,&f5 ; wait for v-sync
ml2: in a,(c)
rra
jr nc,ml2

LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F54
OUT (C),C

call plot ; do the plotting

ld hl,&9000
ld (snow_table_pt),hl
ld hl,scr_buffer
ld (scr_buffer_pt),hl

LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F4B
OUT (C),C

; this example only does five snow flakes, so delay to ensure doesnt execute twice per frame
    ld b,50
dly:
    defs 60
    djnz dly

jr loop

; <<<<< SUBROUTINES >>>>>

; can only be bothered putting in 5 unique addresses!
plot: ld b,5 ; number of plots (X res in MODE 0 is 160)

; initialise loop init variables
ld (SaveSP+1),sp ; save the SP at the end of the plot routine
ld sp,&9000 ; screen address list table
exx ; swap in alternate registers
ld de,scr_buffer ; preload de' with buffer pointer
ld bc,&c050 ; and bc' with value to add to hl if next address line requires reset
exx ; swap back out alternate registers

; do pixel select outside of loop for speed, and preload de with mask and
; snow flake byte (have changed to ink 15 for this example)
ld a,(toggle) ; get the toggle to see which pixel we're doing
add &80 ; toggle it between 1 and 0
ld (toggle),a ; store it for next time
jr c,right_pixel ; if it's 0, do the right pixel else do left pixel

left_pixel:
ld e,85 ; AND it to ignore right pixel
ld d,170 ; OR it with what we want to plot
jr pixel_continue ; and continue

right_pixel:
ld e,170
ld d,85
pixel_continue:

plot_loop:
exx ; swap to alternate registers
pop hl ; get old screen address into hl'
ld a,(de) ; ld saved byte from de'
ld (hl),a ; and write to address just popped

; stack in use, so cannot call 'next_line'.  could load ix & iy with routine & return
; address and jp (ix) / jp (iy), but faster to simply include next_line in loop

next_line: ld a,8 ; add 8 to H (the high byte)
add a,h ; &800 is next pixel line down
ld h,a
jr nc,nl_reset_skip
add hl,bc ; bc' was preloaded with &C000 is top of screen, &50 next character line down
.nl_reset_skip

push hl ; now put new address from hl' back to address list pointed to by stack
;pop hl ; would need to move forward the sp again to point to next address, with a 'wasted' pop
        ; but can save extra exx pair by doing it later
ld a,(hl) ; ld screen byte from hl'
ld (de),a ; and save to buffer pointed to by de'
inc e ; and move de' on with 8 bit inc, as buffer page aligned to &9200

exx ; swap out the alternate register set
pop hl ; this is the pop we needed to do before anyway, here it means we have the new address in hl
       ; and can mask with the values preloaded into de
and a,e ; a already contains the data from the screen, so mask it with e to clear the target pixel
or a,d ; now and with pixel data in d
; note, the and a,e above could be removed if you only use ink 15

; if we hadnt delayed the pop hl, we would have needed to put exx
; before and after this next instruction
ld (hl),a ; plot it on the screen

djnz plot_loop ; decrease B and loop back it not 0
.SaveSP
ld sp,0
ret


; <<<<< BUFFERS >>>>>

toggle: defb 0

snow_table_pt: defw &9000 ; contains 160 * screen addresses (16 bits), &140 bytes in length

scr_buffer_pt: defw scr_buffer
org &9000
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0


org &9200 ; align to page boundry for <256 bytes (use INC L etc)

scr_buffer: defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00



redbox

Interesting stuff Axelay, and it certainly is fast!  I knew you wouldn't be able to resist using the stack  ;)

The screen addresses in my table start at the left of the screen (x=0) and go to the right (x=159).  Obviously, because 1 byte holds two pixels, I had to have a toggle routine which decided whether to print the left or right pixel at the current screen address.  I did try and do this by XORing the pixel (to swap it between &80 and &40) but came unstuck when I needed to AND/OR it with what was on the screen, so I used the quick fix of a toggle  ;)

Will dissect the routine later and see if I can get it working in my main loop.

Axelay

Quote from: redbox on 09:45, 06 December 10
Interesting stuff Axelay, and it certainly is fast!  I knew you wouldn't be able to resist using the stack  ;)

The screen addresses in my table start at the left of the screen (x=0) and go to the right (x=159).  Obviously, because 1 byte holds two pixels, I had to have a toggle routine which decided whether to print the left or right pixel at the current screen address.  I did try and do this by XORing the pixel (to swap it between &80 and &40) but came unstuck when I needed to AND/OR it with what was on the screen, so I used the quick fix of a toggle  ;)

Will dissect the routine later and see if I can get it working in my main loop.
I admit it is hard remembering that, sometimes, the stack isn't the best way  :)


I may have misunderstood what the toggle was trying to do then, in which case, just leave that part out.

redbox

#4
I got it working but had to put the toggle back into the loop because of the way I'm drawing pixels to the screen.

However, your routine is over twice as fast as mine so many thanks Axelay!  :)

I love the way you use the alternate register set (which is something Fano mentioned to me before) and this is a good example of it working well - I did try it but couldn't make it work, so your code will help with that in the future.

TFM

TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

redbox


TFM

In case somebody wants to take a look a the source:

- goto: www.futureos.de

- click on Download (blue box, at the left)

- scroll down to "Source Codes"

- click on "Applications", save the DSK file on your system, transfer to a CPC with 3.5" drive and XD-DOS, V-DOS or FutureOS (in the worst case take an emulator...).

  (It's 0.7 MB Vortex format, you can read it with FutureOS, V-DOS or XD-DOS)

- Mount DSK... (usually on drive B, when using XD-DOS and MAXAM)

- SCH.MAX contains the source (also the SCH.GFX file is needed, contains GFX)

Happy X-Mas ;-)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Devilmarkus

Well... The same is also possible without Future OS.

I'll attach the original DSK + sources.
Have fun!
When you put your ear on a hot stove, you can smell how stupid you are ...

Amstrad CPC games in your webbrowser

JavaCPC Desktop Full Release

TFM

Quote from: Devilmarkus on 21:01, 06 December 10
Well... The same is also possible without Future OS.

I'll attach the original DSK + sources.
Have fun!

Thanks for providing the original source  :)  (which I don't have and), which may not be that different...  ::)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Devilmarkus

#10
I also adapted the original source with disassembled GFX and made a all-in-one assembler file.

CALL &100 - starts the demo
CALL &169D - stores the demo to disk

ENTER while demo is running: Change scene

Something about the falling snow and how it "sticks" on the grass and trees:
(In German, sorry guys)
Quote
Hi Markus,

auf dem CPC habe ich zwei verschiedene Schneeroutinen geschrieben.
Bei der einen (von der ich die Sourcen gerade nicht mehr finde) blieb der Schnee auf den
Hintergrundgrafiken liegen.
Bei der anderen (der, die ich Dir geschickt habe) gab es von den Hintergrundgrafiken zwei Versionen,
eine grüne, nicht beschneite Version und eine weiße, schneebedeckte Version.

Wenn eine Schneeflocke einen grünen Pixel berührt hatte, dann wurde der durch den entsprechenden
weißen Pixel von der schneebedeckten Version ersetzt. Das hatte den Vorteil, dass ich besser
bestimmen konnte, wo auf den Bäumen der Schnee liegen bleiben soll und wo nicht.
D.h. Du musst von Deinen Grafiken dann jeweils noch eine Version mit Schnee an all den Stellen,
wo der Schnee liegen bleiben soll, malen und einbinden musst.

Gruß,
Georg
When you put your ear on a hot stove, you can smell how stupid you are ...

Amstrad CPC games in your webbrowser

JavaCPC Desktop Full Release

Axelay

Oh, just realised I left the toggle part in my sample being done a 'different' way to the comments, how... mildly annoying  :)


Quote from: redbox on 17:42, 06 December 10
I got it working but had to put the toggle back into the loop because of the way I'm drawing pixels to the screen.

However, your routine is over twice as fast as mine so many thanks Axelay!  :)

I love the way you use the alternate register set (which is something Fano mentioned to me before) and this is a good example of it working well - I did try it but couldn't make it work, so your code will help with that in the future.
 
Happy to help! :)


Hmm, if you've put the toggle back in the loop, perhaps a way to speed it slightly, if you havent already and it works with your requirements, would be to use the mask itself as the toggle?  (using rlca for example)

redbox

Quote from: Axelay on 08:52, 07 December 10
Oh, just realised I left the toggle part in my sample being done a 'different' way to the comments, how... mildly annoying  :)

Don't worry, I did see what you've done.  Nice use of the carry flag  :)

Quote from: Axelay on 08:52, 07 December 10
Hmm, if you've put the toggle back in the loop, perhaps a way to speed it slightly, if you havent already and it works with your requirements, would be to use the mask itself as the toggle?  (using rlca for example)

Now that is a good idea.  Will see how much frame time I have left when I've bolted on the other routines and may well come back to that.

Working at bit level with the Z80 is so incredibly quick and often overlooked.  I sometimes start thinking about it when day-dreaming (thinking I'll come up with a mega screen address routine, but you've already done something towards that with set 3,h - haha!) and I never tell the truth when someone asks "what you thinking about?"  8)

redbox

Quote from: redbox on 11:20, 07 December 10
Working at bit level with the Z80 is so incredibly quick and often overlooked.

Talking about bit manipulation got me thinking about a solution to a problem I was having...

I basically wanted to turn an ASIC hardware sprite X location (stored at &6000, &6008, &6010 ... to &6078) into it's correlating ASIC hardware sprite data location (stored at &4000, &4100, &4200 ... to &4F00).  I could have used tables or nested loop additions etc, but instead I came up with this:


;; entry - HL contains ASIC hardware sprite X position (stored at &6000, &6008, &6010 ... to &6078)

        ld a,l            ;load L (ASIC X pos) into A
        rrca              ;divide it...
        rrca              ;...by 8 to turn it into a hardware sprite number
        rrca              ;...between 0 and 15
        and %00001111     ;ignore everything in rotated bits apart from 0-15
        add &40           ;A now contains between &40 to &4F
        ld h,a            ;put this into the high byte
        ld l,0            ;and reset the low byte
        ld (spr_asic),hl  ;HL now contains ASIC sprite location (&4000 to &4F00), store it in buffer



I used 3 * RRCA with the AND because it's quicker than 3 * SRA A. 

Just thought I'd share it  8)

steve

If you want snow you should get a late 70's computer where the microprocessor accessed memory any time, even while the circuitry was generating the screen display, resulting in "snow", this was a real problem with a machine called "nascom" and may also have been a problem with the TRS-80 model 1.

If you could turn off the firmware or hardware that prevents clashes between the microprocessor and the display circuitry, you would get the snow for free.

Powered by SMFPacks Menu Editor Mod