I have programmed this routine as a proof of concept - it works, but it's insanely slow and takes nearly a whole frame.
I would normally optimise it myself but, as you might have guessed, I would probably run out of time to make it good for a near release date ;)
I don't want to do anything insane with it, just speed it up. The other routines I've got to work alongside it are programmed nicely but I don't want to run out of time or have to cut this routine down so it has less of an 'effect'.
Anyone want to help...? 8)
org &8000
di
; <<<<<< MAIN LOOP >>>>>>
loop: ld b,&f5 ; wait for v-sync
ml2: in a,(c)
rra
jr nc,ml2
LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F54
OUT (C),C
call plot ; do the plotting
ld hl,&9000
ld (snow_table_pt),hl
ld hl,scr_buffer
ld (scr_buffer_pt),hl
LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F4B
OUT (C),C
jr loop
; <<<<< SUBROUTINES >>>>>
plot: ld b,160 ; number of plots (X res in MODE 0 is 160)
plot_loop: ld hl,(snow_table_pt) ; get screen address from table
ld e,(hl) ; and put it into DE
inc hl
ld d,(hl)
ld hl,(scr_buffer_pt) ; get buffer data
ld a,(hl) ; and put it into A
ex de,hl ; put screen address (DE) from earlier into HL
ld (hl),a ; plot buffer data back to screen
call next_line ; calculate next pixel line down
ex de,hl ; put HL (next pixel line down) into DE
ld hl,(snow_table_pt) ; get screen address pointer
ld (hl),e ; and put DE (next pixel line down)...
inc hl ; ...into the table
ld (hl),d
inc hl
ld (snow_table_pt),hl ; and store it
ex de,hl ; put screen address back into HL
push hl ; preserve screen address
ld a,(hl) ; get what's on the screen
ld hl,(scr_buffer_pt) ; and store it into...
ld (hl),a ; ...the screen buffer table
inc l
ld (scr_buffer_pt),hl
pop hl ; restore screen address into HL and A is what's on screen
push af ; preserve what's on screen
ld a,(toggle) ; get the toggle to see which pixel we're doing
xor 1 ; toggle it between 1 and 0
ld (toggle),a ; store it for next time
jr z,right_pixel ; if it's 0, do the right pixel else do left pixel
left_pixel: pop af ; get back what was on screen
and %01010101 ; AND it to ignore right pixel
or &80 ; OR it with what we want to plot
jr pixel_continue ; and continue
right_pixel: pop af ; same as above but for right pixel
and %10101010
or &40
pixel_continue: ld (hl),a ; plot it on the screen
djnz plot_loop ; decrease B and loop back it not 0
ret
next_line: ld a,8 ; add 8 to H (the high byte)
add a,h ; &800 is next pixel line down
ld h,a
ret nc ; return if no overflow
ld de,&C050 ; otherwise add &C000+&50 to HL
add hl,de ; &C000 is top of screen, &50 next character line down
ret
; <<<<< BUFFERS >>>>>
toggle: defb 0
snow_table_pt: defw &9000 ; contains 160 * screen addresses (16 bits), &140 bytes in length
scr_buffer_pt: defw scr_buffer
org &9200 ; align to page boundry for <256 bytes (use INC L etc)
scr_buffer: defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
Well I need a break from my current project, so I thought I'd have a go :)
From the di at the beginning of the code sample you've given, I've assumed I can use the stack and alternate registers freely. So I think at a rough count the loop is down to 36 nops, so at 160 snow flakes that should be 90 scan lines plus the loop set up overhead. Of course, it takes longer when the screen address crosses a character line, so it would be ideal if only some snow flakes did that each frame.
I have taken the toggle check out of the loop, this means all snow flakes in a given frame are left or right, all at once. If you prefer some being left and right each frame, I'd recommend calling the loop twice instead with the preloaded registers set appropriately rather than putting the toggle back in the main loop. At the current speed the snow flakes move left and right though, I dont think it'll be an issue! They need to be slower, really.
Anyway, this demonstrates 5 snow flakes, you'll need to add more unique addresses yourself if you want more!
org &8000
run &8000
di
; <<<<<< MAIN LOOP >>>>>>
LD BC,&7F0f ; use colour 15 for snowflakes
OUT (C),C
LD BC,&7F4b ; and set to white
OUT (C),C
ld bc,&7F8c
out (c),c ; set mode 0
loop: ld b,&f5 ; wait for v-sync
ml2: in a,(c)
rra
jr nc,ml2
LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F54
OUT (C),C
call plot ; do the plotting
ld hl,&9000
ld (snow_table_pt),hl
ld hl,scr_buffer
ld (scr_buffer_pt),hl
LD BC,&7F10 ; change border colour to see routine time
OUT (C),C
LD BC,&7F4B
OUT (C),C
; this example only does five snow flakes, so delay to ensure doesnt execute twice per frame
ld b,50
dly:
defs 60
djnz dly
jr loop
; <<<<< SUBROUTINES >>>>>
; can only be bothered putting in 5 unique addresses!
plot: ld b,5 ; number of plots (X res in MODE 0 is 160)
; initialise loop init variables
ld (SaveSP+1),sp ; save the SP at the end of the plot routine
ld sp,&9000 ; screen address list table
exx ; swap in alternate registers
ld de,scr_buffer ; preload de' with buffer pointer
ld bc,&c050 ; and bc' with value to add to hl if next address line requires reset
exx ; swap back out alternate registers
; do pixel select outside of loop for speed, and preload de with mask and
; snow flake byte (have changed to ink 15 for this example)
ld a,(toggle) ; get the toggle to see which pixel we're doing
add &80 ; toggle it between 1 and 0
ld (toggle),a ; store it for next time
jr c,right_pixel ; if it's 0, do the right pixel else do left pixel
left_pixel:
ld e,85 ; AND it to ignore right pixel
ld d,170 ; OR it with what we want to plot
jr pixel_continue ; and continue
right_pixel:
ld e,170
ld d,85
pixel_continue:
plot_loop:
exx ; swap to alternate registers
pop hl ; get old screen address into hl'
ld a,(de) ; ld saved byte from de'
ld (hl),a ; and write to address just popped
; stack in use, so cannot call 'next_line'. could load ix & iy with routine & return
; address and jp (ix) / jp (iy), but faster to simply include next_line in loop
next_line: ld a,8 ; add 8 to H (the high byte)
add a,h ; &800 is next pixel line down
ld h,a
jr nc,nl_reset_skip
add hl,bc ; bc' was preloaded with &C000 is top of screen, &50 next character line down
.nl_reset_skip
push hl ; now put new address from hl' back to address list pointed to by stack
;pop hl ; would need to move forward the sp again to point to next address, with a 'wasted' pop
; but can save extra exx pair by doing it later
ld a,(hl) ; ld screen byte from hl'
ld (de),a ; and save to buffer pointed to by de'
inc e ; and move de' on with 8 bit inc, as buffer page aligned to &9200
exx ; swap out the alternate register set
pop hl ; this is the pop we needed to do before anyway, here it means we have the new address in hl
; and can mask with the values preloaded into de
and a,e ; a already contains the data from the screen, so mask it with e to clear the target pixel
or a,d ; now and with pixel data in d
; note, the and a,e above could be removed if you only use ink 15
; if we hadnt delayed the pop hl, we would have needed to put exx
; before and after this next instruction
ld (hl),a ; plot it on the screen
djnz plot_loop ; decrease B and loop back it not 0
.SaveSP
ld sp,0
ret
; <<<<< BUFFERS >>>>>
toggle: defb 0
snow_table_pt: defw &9000 ; contains 160 * screen addresses (16 bits), &140 bytes in length
scr_buffer_pt: defw scr_buffer
org &9000
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
defb &00,&c0,&15,&c0,&05,&c0,&10,&c0,&20,&c0
org &9200 ; align to page boundry for <256 bytes (use INC L etc)
scr_buffer: defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
defb &00,&00,&00,&00,&00,&00,&00,&00,&00,&00
Interesting stuff Axelay, and it certainly is fast! I knew you wouldn't be able to resist using the stack ;)
The screen addresses in my table start at the left of the screen (x=0) and go to the right (x=159). Obviously, because 1 byte holds two pixels, I had to have a toggle routine which decided whether to print the left or right pixel at the current screen address. I did try and do this by XORing the pixel (to swap it between &80 and &40) but came unstuck when I needed to AND/OR it with what was on the screen, so I used the quick fix of a toggle ;)
Will dissect the routine later and see if I can get it working in my main loop.
Quote from: redbox on 09:45, 06 December 10
Interesting stuff Axelay, and it certainly is fast! I knew you wouldn't be able to resist using the stack ;)
The screen addresses in my table start at the left of the screen (x=0) and go to the right (x=159). Obviously, because 1 byte holds two pixels, I had to have a toggle routine which decided whether to print the left or right pixel at the current screen address. I did try and do this by XORing the pixel (to swap it between &80 and &40) but came unstuck when I needed to AND/OR it with what was on the screen, so I used the quick fix of a toggle ;)
Will dissect the routine later and see if I can get it working in my main loop.
I admit it is hard remembering that, sometimes, the stack
isn't the best way :)
I may have misunderstood what the toggle was trying to do then, in which case, just leave that part out.
I got it working but had to put the toggle back into the loop because of the way I'm drawing pixels to the screen.
However, your routine is over twice as fast as mine so many thanks Axelay! :)
I love the way you use the alternate register set (which is something Fano mentioned to me before) and this is a good example of it working well - I did try it but couldn't make it work, so your code will help with that in the future.
Snowfall:
http://www.youtube.com/watch?v=VtVYaQuDw5c (http://www.youtube.com/watch?v=VtVYaQuDw5c)
Quote from: TFM/FS on 19:25, 06 December 10
http://www.youtube.com/watch?v=VtVYaQuDw5c (http://www.youtube.com/watch?v=VtVYaQuDw5c)
Thanks, but doesn't help me much :)
Nice algorithm though.
In case somebody wants to take a look a the source:
- goto: www.futureos.de (http://www.futureos.de)
- click on Download (blue box, at the left)
- scroll down to "Source Codes"
- click on "Applications", save the DSK file on your system, transfer to a CPC with 3.5" drive and XD-DOS, V-DOS or FutureOS (in the worst case take an emulator...).
(It's 0.7 MB Vortex format, you can read it with FutureOS, V-DOS or XD-DOS)
- Mount DSK... (usually on drive B, when using XD-DOS and MAXAM)
- SCH.MAX contains the source (also the SCH.GFX file is needed, contains GFX)
Happy X-Mas ;-)
Well... The same is also possible without Future OS.
I'll attach the original DSK + sources.
Have fun!
Quote from: Devilmarkus on 21:01, 06 December 10
Well... The same is also possible without Future OS.
I'll attach the original DSK + sources.
Have fun!
Thanks for providing the original source :) (which I don't have and), which may not be that different... ::)
I also adapted the original source with disassembled GFX and made a all-in-one assembler file.
CALL &100 - starts the demo
CALL &169D - stores the demo to disk
ENTER while demo is running: Change scene
Something about the falling snow and how it "sticks" on the grass and trees:
(In German, sorry guys)
Quote
Hi Markus,
auf dem CPC habe ich zwei verschiedene Schneeroutinen geschrieben.
Bei der einen (von der ich die Sourcen gerade nicht mehr finde) blieb der Schnee auf den
Hintergrundgrafiken liegen.
Bei der anderen (der, die ich Dir geschickt habe) gab es von den Hintergrundgrafiken zwei Versionen,
eine grüne, nicht beschneite Version und eine weiße, schneebedeckte Version.
Wenn eine Schneeflocke einen grünen Pixel berührt hatte, dann wurde der durch den entsprechenden
weißen Pixel von der schneebedeckten Version ersetzt. Das hatte den Vorteil, dass ich besser
bestimmen konnte, wo auf den Bäumen der Schnee liegen bleiben soll und wo nicht.
D.h. Du musst von Deinen Grafiken dann jeweils noch eine Version mit Schnee an all den Stellen,
wo der Schnee liegen bleiben soll, malen und einbinden musst.
Gruß,
Georg
Oh, just realised I left the toggle part in my sample being done a 'different' way to the comments, how... mildly annoying :)
Quote from: redbox on 17:42, 06 December 10
I got it working but had to put the toggle back into the loop because of the way I'm drawing pixels to the screen.
However, your routine is over twice as fast as mine so many thanks Axelay! :)
I love the way you use the alternate register set (which is something Fano mentioned to me before) and this is a good example of it working well - I did try it but couldn't make it work, so your code will help with that in the future.
Happy to help! :)
Hmm, if you've put the toggle back in the loop, perhaps a way to speed it slightly, if you havent already and it works with your requirements, would be to use the mask itself as the toggle? (using
rlca for example)
Quote from: Axelay on 08:52, 07 December 10
Oh, just realised I left the toggle part in my sample being done a 'different' way to the comments, how... mildly annoying :)
Don't worry, I did see what you've done. Nice use of the carry flag :)
Quote from: Axelay on 08:52, 07 December 10
Hmm, if you've put the toggle back in the loop, perhaps a way to speed it slightly, if you havent already and it works with your requirements, would be to use the mask itself as the toggle? (using rlca for example)
Now that is a good idea. Will see how much frame time I have left when I've bolted on the other routines and may well come back to that.
Working at bit level with the Z80 is so incredibly quick and often overlooked. I sometimes start thinking about it when day-dreaming (thinking I'll come up with a mega screen address routine, but you've already done something towards that with set 3,h - haha!) and I never tell the truth when someone asks "what you thinking about?" 8)
Quote from: redbox on 11:20, 07 December 10
Working at bit level with the Z80 is so incredibly quick and often overlooked.
Talking about bit manipulation got me thinking about a solution to a problem I was having...
I basically wanted to turn an ASIC hardware sprite X location (stored at &6000, &6008, &6010 ... to &6078) into it's correlating ASIC hardware sprite data location (stored at &4000, &4100, &4200 ... to &4F00). I could have used tables or nested loop additions etc, but instead I came up with this:
;; entry - HL contains ASIC hardware sprite X position (stored at &6000, &6008, &6010 ... to &6078)
ld a,l ;load L (ASIC X pos) into A
rrca ;divide it...
rrca ;...by 8 to turn it into a hardware sprite number
rrca ;...between 0 and 15
and %00001111 ;ignore everything in rotated bits apart from 0-15
add &40 ;A now contains between &40 to &4F
ld h,a ;put this into the high byte
ld l,0 ;and reset the low byte
ld (spr_asic),hl ;HL now contains ASIC sprite location (&4000 to &4F00), store it in buffer
I used 3 * RRCA with the AND because it's quicker than 3 * SRA A.
Just thought I'd share it 8)
If you want snow you should get a late 70's computer where the microprocessor accessed memory any time, even while the circuitry was generating the screen display, resulting in "snow", this was a real problem with a machine called "nascom" and may also have been a problem with the TRS-80 model 1.
If you could turn off the firmware or hardware that prevents clashes between the microprocessor and the display circuitry, you would get the snow for free.