CPCWiki forum

General Category => Programming => Topic started by: db6128 on 05:11, 06 December 12

Title: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: db6128 on 05:11, 06 December 12
Edit: This is now massively outdated, as it's a lot faster and I've also released several other versions, including most recently QuadLife (page 3) and also a MODE 1 version that I might carry on with at some point. Please see the later posts for up-to-date information. :)

[original post]

Well, now that several embarrasingly self-inflicted bugs are gone and 03:40 in the morning has arrived—here it is! the first and most basic version of my implementation of John Conway's Game of Life (http://www.conwaylife.com/wiki/index.php?title=Conway%27s_Game_of_Life) for the CPC, ticking along at what I estimate as a not-at-all-shabby rate of about 20000 cells, i.e. just under 2 generations per second, all in glorious double-height MODE 0 pixels, no less!

Perhaps before the following text sends you to sleep, regardless of your timezone, just RUN"AMSTRIFE on the attached DSK :D By the way, the name is not necessarily final and, yes, probably very silly, but it's better than nothing!

Backstory: DISCLAIMER THIS BIT MIGHT BE VERY BORING SKIP IT IF YOU WANT ;) I spent ages—and lost far too much sleep and other advisable activities—playing around with Life and related cellular automata in more modern contexts as a (very amateur) hobby and way to learn more about programming. I  started with a terminal-based version programmed in C as one of my earliest programs while beginning to learn that language. That eventually led to an implementation in C/SDL that eventually got pretty darn fast—but became so horribly hackish (hideous macros and bitfields everywhere) that I ended up deleting it outright. Next, I taught myself the basics of C++ and eventually put together an object-oriented variant that incorporated various fairly decent features (see below) and was running pretty brilliantly for a while. But... it started to get slow, seemingly suddenly and for no reason (perhaps it was my laptop...), and that put me off the idea for a while, especially after I broke my laptop (...could it see the future?). I'm now back on Windows (secured a laptop with W7 whilst I still could) but haven't bothered to set up SDL or anything else, or even a proper IDE really.

Couple all of that with my renewed interest and/or extreme procrastination in the world of the CPC, and this is what we get! I've been wanting to do this for a while, but my command of ASM is only just getting acceptable, and it's been quite tricky to work out the logic behind this on the Z80 and in a fairly optimised way. Having said that, most of the progress was made tonight—er, well, it's now the very early morning, so I mean last night...—which is definitely very encouraging!

Of course, this is just the most basic first stage and a proof-of-concept. I'm sure plenty of people have done similar things before. I hope to get it to a state where it stands out, though—not that it'll ever be Hashlife (http://en.wikipedia.org/wiki/Hashlife) or anything, but—by optimising and expanding it as much as my abilities allow. Immediate plans include...

* quite possibly a MODE 1 version and thus a larger world, at least for the basic algorithm (see below) – assuming it doesn't get far too slow with all the extra RAM-ing and suchlike
* configurability, particularly including an editor for the initial pattern and in-game changes (reading keyboard input does not look fun... Come back, firmware; all is forgiven)
* a custom-made text-displaying routine and font with which to display statistics on request; real-time would slow things to a crawl and really just gets in the way, anyhow
* and then the actually fun stuff, ported from (mostly my memory of) my relatively successful C++ version, namely things like these:
—identifying just-born and just-died cells with distinct colours;
—implementing Generations, the variant in which cells have lifespans, which changes their interactions considerably;

—adding the option to give the world/field a topology, i.e. wrapping edges/bounded grids (http://golly.sourceforge.net/Help/bounded.html): toroidally, Klein bottle–esque, spherical, etc.
—implementing QuadLife, in which cells can have a user-defined number of up to four colours and inherit their colour from their 'parents'; and
—my proudest achievement so far since I started wasting my time with this back in the early days of struggling (and not sleeping) to teach myself C: something I nicknamed Inheritors, which combines Generations and the colour-inheriting system to create technicolour madness all over your screen.


With the emphasised caveats about how rudimentary this is aside, it is quite a bit faster than a completely naïve version would be and already supports completely customisable rules as to which numbers of neighbours allow cells to survive or cause them to be born. Having said that, standard Life is only of any real interest when using its classic rules (i.e. the ones set up here). I hope to have Generations working ASAP because it enables various combinations of rules, many of which produce really interesting behaviour. I could link to some examples, but I'll save that as a treat for you all, and it wouldn't be as good if you weren't watching it on a CPC, anyway. ;)

Z80babble: My immediate plan was to leave Generations limited to a maximal lifespan of 16 generations, which would allow me to remain using one byte per cell as in the bog-standard version of Life you see before you. However, I am seriously tempted to store cells in pairs of bytes and plough through the world at a ridiculous pace using my new best friend, SP >:-D That would also be necessary for more 'complex' 'algorithms' (scare-quoted because I don't want to mislead people into thinking I'm any good at maths...) such as Inheritors.

Anyway, I hope you enjoy this. Please give me any feedback or suggestions you have! And again, please stay tuned for many—and massive—updates. :)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 03:47, 08 December 12
My expirements with MODE 1 have showed that it is not worth it: even without double-buffering, it doesn't allow much extra size, due to the memory required rapidly increasing with the dimensions of the grid - and the relatively small increase in the size of the grid is almost complete offset by the smaller pixels meaning that I could only fill just over a quarter of the screen.

So, good old double-height MODE 0 it is! I've maxed that out: from my initial post, it's now up from 128x96 to 128x110, and I have just &1000 bytes of memory free for code and data. I guess I can move single-use code and data into areas that will subsequently be overwritten. I might need to remove the double-buffering for certain algorithms/variants; having said that, it doesn't look too bad with a single screen; sure, you see a line moving up as the cells are updated, but it's not really distracting, actually quite cool in its own way.

Speaking of double-buffering, though, I wonder if anyone has an idea about this: when I have it enabled, there is slight flicker, seemingly between the two screens, at the bottom of the grid (only visible if it's populated at the time, of course). Why could this be? I can't think of anything I'm doing wrong that would cause it, but then I can't see why it would just happen either.

I've attached the slightly updated (126x110, etc.) version if you can offer any thoughts about the above issue, and just for anyone who's interested. There's a lot more to come from this, but I'm having a little break for a couple of days. . . Although I said that yesterday, and so today I ended up . . . just trying to program something different instead. :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 07:09, 08 December 12
Quote from: db6128 on 03:47, 08 December 12
Speaking of double-buffering, though, I wonder if anyone has an idea about this: when I have it enabled, there is slight flicker, seemingly between the two screens, at the bottom of the grid (only visible if it's populated at the time, of course). Why could this be? I can't think of anything I'm doing wrong that would cause it, but then I can't see why it would just happen either.



I've had a bit of a look at what I presume is the point you are doing the double buffering, and I think the problem is you have not waited for vsync with the swap to &4000 after your screen copy.


I've attached a screen grab from Winape halted at the breakpoint I inserted just before your code swaps to the screen at &4000.  The horizontal dotted line Winape has put on the screen shows that at the point you are telling the CRTC to now use &4000 as the visible screen, the beam is not quite halfway through refreshing the monitor.  More importantly, the CRTC is in the middle of displaying the current screen, and it will not reflect a change to the screen base address until it begins a new screen.  So potentially, your code will go on to start modifying &c000 while the CRTC is still pointing to that area of memory for the current screen.  I believe that is what is visible briefly at the bottom of the screen.


Also, I hope you dont mind a comment on your screen copy.  You have used a short LDI string, I guess because LDI strings are faster than LDIR?  But your LDI string is too short if speed was the reason for using it.  LDI is only 1 NOP faster than LDIR per byte, and with just 8 LDIs, you only have 8 NOPs saved in that loop compared to LDIR, but your loop structure with two counters is at least 8 NOPs by itself, so there is no speed gain with the LDI list that short.  Although if I am not mistaken your copy completes with HL=&0000?  If that's correct then a faster loop would be to replace both the A and IXH counter checks with LD A,H / OR A,L / JR NZ,Loop after your LDI string.

Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 16:09, 08 December 12
Hi Axelay, thanks for a great reply! I have been heavily editing this post (as is characteristic of me) as I think more about your suggestions.

QuoteI've had a bit of a look at what I presume is the point you are doing the double buffering, and I think the problem is you have not waited for vsync with the swap to &4000 after your screen copy. [...] just before your code swaps to the screen at &4000 [...]  the beam is not quite halfway through refreshing the monitor.  More importantly, the CRTC is in the middle of displaying the current screen, and it will not reflect a change to the screen base address until it begins a new screen.  So potentially, your code will go on to start modifying &c000 while the CRTC is still pointing to that area of memory for the current screen.  I believe that is what is visible briefly at the bottom of the screen.
This is fantastic - thanks very much! :) In fact, now that I think about it, that would only have started after I changed my vertical counter to run negatively over the scanlines instead of from top to bottom - so that I could use DEC C:JP NZ instead of DEC C:LD A,C:CP 110 - which was an infinitesimal optimisation at best.

Thus, the new rendering frame began at the bottom of the screen - just in time for the CRTC to see it, I presume, as per your insights. Actually, I might have been looping backwards even before I increased the screen size, but if so, I assume that the lowest line was too high for me to notice the problem.

Changing the loops back to incrementing from 0 to 109, a.k.a. rendering from top to bottom, has everything working perfectly again! :)  That's just the timing of the refresh being kind to me, though! For a moment I was concerned that I might have to do two flybacks or something, or that there'd be no way around it. But after what you said, I realised that it seems I don't need to do even one V-sync, if the CRTC won't apply the new address until the next frame - certainly, removing the wait does not seem to harm anything. So, that's a second good hint!

However, I wonder if something like that will re-emerge when I implement more complex algorithms that might change this rather precarious balance between my code and the (however real or virtual) electron gun..

If I wanted to be really obsessive about trivial optimisations, I suppose I could store my table of scanline addresses in reverse order and go back to decrementing . . . but insanity lies that way, I think, considering how minor the outer loop  over the lines is. In fact, I might even see if I can take another slight hit to speed but gain some valuable memory by removing the table and just incrementing the address as I go along. At this rate, an extra 512 bytes might be a valuable commodity soon! Edit: Yes, it seems that adding manually will take only one more NOP but remove the need for that table - which I have just enthusiastically deleted. Ah, I love finding new tricks! :V

QuoteAlso, I hope you dont mind a comment on your screen copy.  You have used a short LDI string, I guess because LDI strings are faster than LDIR?  But your LDI string is too short if speed was the reason for using it.  LDI is only 1 NOP faster than LDIR per byte, and with just 8 LDIs, you only have 8 NOPs saved in that loop compared to LDIR, but your loop structure with two counters is at least 8 NOPs by itself, so there is no speed gain with the LDI list that short.  Although if I am not mistaken your copy completes with HL=&0000?  If that's correct then a faster loop would be to replace both the A and IXH counter checks with LD A,H / OR A,L / JR NZ,Loop after your LDI string.
I can't complain about constructive comments and suggestions, really; delving into someone else's machine code, especially mine, is an effort that I think should usually be appreciated - and anyway, I think everyone here is too nice to steal anything!

I believe my calculations for this were a bit off before, but the unrolled variant should still be slightly faster (or maybe my concept of micro-time is a bit off), about 15 ms, than a simple LDIR.

Your suggestion about HL is very clever - at first it seemed like the same 16-bit loop counter 'trick' (http://wikiti.brandonw.net/index.php?title=Z80_Optimization#Looping_with_16_bit_counter) could still win (thankfully the added overhead of using IXH was minimal due to its low value) if I increased the level of unrolling - but then I realised that my sums had been wrong (again...) - your version seems to be slightly faster, saving me about 6 ms! Which more than outweighs the puny 1 NOP per scanline additional cost of summing to the next scanline on the screen manually instead of using the table.

Here are my latest sums:

; simple
ldir ; ~&7800x6 = 184324 NOPs / 184 ms

; unroll x8
; thus outer counter must be &7800/&100/8 = 15
ldi ; 8x5 = 40 NOPs
dec a:jr nz ; 4 = 44 x 256
dec ixl:jr nz ; 5 * 15
; = 44x256x15 + 5*15
; = 169035 NOPs / 169 ms

; unroll x12 (16 is not divisible into (&7800/&100))
; thus outer counter must be &7800/&100/12 = 10
ldi ; 12x5 = 60 NOPs
dec a:jr nz; 4 = 64 x 256
dec ixl: jr nz  ; 5 * 10
; = 64x256x10 + 5*10
; = 163890 NOPs / 164 ms

; Axelay's suggestion
ldi ; 16x5 = 80 NOPs
ld a,h ; 1
or l ; 1
jr nz ; 3
; = 85 * 1920 = 163200 NOPs / ~163 ms


So, you're now a contributing author! :D (Incidentally, even though there's no point changing back, I wonder if these would have been enough of a gain to stop the flickering, haha)

Btw, as for unrolling, that's a pretty nifty CLS and memory clear at the beginning, right? ;) It felt even more epic when there was just one gigantic contiguous area of memory to clear! But that layout had the corollary that my screen and data were not adjacent, meaning that I had to do two LDIRs per frame - so, out of a single-time loop and one that happens ever frame, it was obvious which one would have its way with my RAM layout.

Really appreciate the great comments. In particular, it might have taken me a while, perhaps forever, to realise how I'd messed up the screen refreshing, so that's brilliant. Thanks a lot!

Edit to add: I might as well attach the latest revision incorporating all the changes I've mentioned/rambled at length about:
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 09:42, 10 December 12
Quote from: db6128 on 16:09, 08 December 12
So, you're now a contributing author! :D (Incidentally, even though there's no point changing back, I wonder if these would have been enough of a gain to stop the flickering, haha)
 
8)

...But I guess I'm hoping you eventually find a way to use both screens as buffers so that copy isnt needed at all.  :)


Quote from: db6128 on 16:09, 08 December 12
Btw, as for unrolling, that's a pretty nifty CLS and memory clear at the beginning, right? ;) It felt even more epic when there was just one gigantic contiguous area of memory to clear! But that layout had the corollary that my screen and data were not adjacent, meaning that I had to do two LDIRs per frame - so, out of a single-time loop and one that happens ever frame, it was obvious which one would have its way with my RAM layout.


Yep, the SP is great for 'misusing'.   :)  Most or all of my previous projects feature using the stack in some way other than 'normal', Sub Hunter especially, though I didn't think of that particular approach myself, that came from Operation Wolf via a comment about it from arnoldemu.



Quote from: db6128 on 16:09, 08 December 12
Really appreciate the great comments. In particular, it might have taken me a while, perhaps forever, to realise how I'd messed up the screen refreshing, so that's brilliant. Thanks a lot!




You're welcome.  I look forward to seeing how this, um, evolves.   :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 22:59, 10 December 12
Thanks to everyone who's Liked this :)

Quote from: Axelay on 09:42, 10 December 12
8)

...But I guess I'm hoping you eventually find a way to use both screens as buffers so that copy isnt needed at all.  :)
See, the reason I don't use the screen as the buffer is that a simple on/off format is (theoretically) inefficient and also that I've planned to store various other things once I add more features/variations.

I suppose that, for the simplest version where cells indeed can only be on or off (which I might find out is the only one that can run anywhere near fast enough!), it might be the case that the extra speed gained (again, theoretically) from the current type of storage is outweighed by the large copy, and so just using the screen(s) might be best. So now you've made me consider creating an alternative version to test that some time!

Actually, sorry :P but I've already changed the branch after the LDIs - it's a byte shorter, 2 NOPs faster, and not dependent on the final address(es) to use JP PE,label instead.

QuoteYep, the SP is great for 'misusing'.   :)  Most or all of my previous projects feature using the stack in some way other than 'normal', Sub Hunter especially, though I didn't think of that particular approach myself, that came from Operation Wolf via a comment about it from arnoldemu.
It's really handy, too handy to leave in its boring normal role! Aside from CLSing, the other things I've used it for so far are for ploughing through addresses of scanlines in sprites or to store commonly used 16-bit numbers (e.g. the offset needed to go up or down a line). I try to avoid subroutines, so the SP is rarely needed for anything else, in which case I can save and restore it; at all other times, it would be sitting doing nothing if I didn't give it a task, wasting that valuable ADD HL,SP instruction and forcing me to use normal registers instead. :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 14:30, 11 December 12
Another minor update and DSK, really just as a proof-of-concept to show how fast it's capable of running without double-buffering the screen – which, for this program, requires copying its entire 16 kB due to the major differences between frames – and also due to a couple of quite significant (and in retrospect, obvious!) optimisations I made. These speed-ups have the following results:

* Disabling double-buffering and hence copying of the screen saves 85 ms per generation.
* More clever (translation: less stupid) mathematics save 3 NOPs for every cell, which, over the current 126×110 grid, is almost 42 ms per generation!
* Different improvements save 2 NOPs for every cell that changes its state at all (on to off or vice-versa); if we assume (with no real mathematical basis : P) that this includes 25% of them, that would increase the saving almost to 50 ms.

So, in total, this version is faster by about 130 ms per generation than the last one I posted. I think that's quite evident as you watch it. I'd estimate it does at least 2.5 generations per second, maybe approaching 3, which is faster than the ~2 achieved by the same source but with double-buffering enabled.

In fact, compared to the first version I posted here, which had a sizeable number of suboptimal parts, the double-buffered variant of this updated version runs slightly faster – despite the fact that it now has 14 more rows! :)

This is now at a stage where it's almost fast enough that the real-time rendering is not too ugly/distracting and therefore double-buffering is not completely necessary. However, I don't really want to remove the option (yes, you'll be able to choose, once I get far enough to have a menu and so forth!). Firstly, it looks nice – if you can tolerate the reduced speed – and more professional; I try to imagine what the 'lay(wo)man user of the CPC' (a mythical creature nowadays, perhaps? :) ) would think, and I think they might prefer, maybe be impressed by, double-buffering! Secondly, once I put more complex algorithms in, the rendering might be too slow and thus make double-buffering important to keep things looking nice, or at least tolerable. Quite a conundrum. : P

My next plan is to add whichever of the following is most cooperative with my attempts to code it: Generations, which allows cells to have lifespans rather than just being on or off; or QuadLife, which allows cells to have one of up to four colours, which are inherited from their 'parents'. In relation to the previous paragraph, if I wanted to have >16 ages/states in Generations, which I'd quite like (although only after getting the <= 16 version working), I would need to disable double-buffering anyway in order to free an extra byte for each cell (and also need to decrease the vertical size of the grid).

Ideally, I'll have one program comprising several automata run by dedicated routines, each optimised for purpose, rather than one algorithm that tries to do everything and is unnecessarily slow on simple rules. But one thing at a time! Anyway, I have to stop CPCing for a few days and do some 'real life'... sigh

Prodatron: I just saw in the list of related threads that you made a version of Life for SymbOS. Thanks for clicking Like on my one! I was seemingly able to compile your source using WinAPE, but SymbOS (2, disc version, WinAPE set to 64+256 kB) tells me that it is "not executable in the SymbOS environment". Is it only for the next edition? Anyway, keep up the good work on all of your projects! I hope I'm not too humbled when I do see your version. :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 14:38, 11 December 12
Quote from: db6128 on 22:59, 10 December 12
See, the reason I don't use the screen as the buffer is that a simple on/off format is (theoretically) inefficient and also that I've planned to store various other things once I add more features/variations.

I suppose that, for the simplest version where cells indeed can only be on or off (which I might find out is the only one that can run anywhere near fast enough!), it might be the case that the extra speed gained (again, theoretically) from the current type of storage is outweighed by the large copy, and so just using the screen(s) might be best. So now you've made me consider creating an alternative version to test that some time!

Actually, sorry :P but I've already changed the branch after the LDIs - it's a byte shorter, 2 NOPs faster, and not dependent on the final address(es) to use JP PE,label instead.


Devastated!  :laugh:    One day I must have another look at those instructions like JP PE,addr again, I'm so used to using JR for the byte saving I dont usually think about the other options available with JP.

With the software buffer copy, well I'm probably just thinking in terms of game engines too much, the 8 vbls your buffer & screen copy takes seems staggering to me, while compared to the processing per pixel you have to do it looks to be comparatively modest.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Gryzor on 14:43, 11 December 12
Rubbish gfx, no sound. And a mode 0 game with only two colours? Pffft. Control is also so slow it seems your input has no effect at all.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Gryzor on 14:44, 11 December 12
...heheh, always loved Life implementations. Now that I think of it I might look for an Android version to just have around... mesmerising concept, and I like what you did there! I understand optimisation comes first, but your to-do ideas are very nice too...

How about a cell counter at some corner?
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 15:06, 11 December 12
Axelay: Yeah, it definitely hurts a lot having to do the copy! Even if I exclude the screen, it still burns. :D But I can't envision another way. For my previous versions in C and C++, I considered combining the evolving pass over the grid with the copy, i.e. getting each cell's properties from one buffer, checking whether it needs to change, saving it to the other buffer, and eventually flipping the pointers to the two buffers (rather than copying) after the full grid has been processed. I settled on a separate block copy as that seemed much simpler and possibly a bit faster. Perhaps my reasoning about the speed was flawed for high-level languages, but on the Z80, if I'm not mistaken, flipping back and forth between the two buffers with every cell would slow things down massively.

Gryzor: Uh-oh, I've been found out! I thought I could just do a lot of PR and gullible people looking for the next Crysis would just give me their money without asking questions. I would have gotten away with it, too...

Thanks for the nice feedback! Optimisation can take a back-seat now*, as I'm quite happy with its current state, so features will be the name of the game. I agree that a counter would be good: in my C(++) versions, I always wanted to include (but never did due to having no idea how to make a GUI) statistics not only for total population and generation#, but also for things like the number of births, deaths, ageings, etc. in the last generation. However, on the CPC, I think these will have to be calculated upon request only (by comparing the two buffers, just before the copy), because to increment/decrement as I go along would slow things down a lot – not only because 16-bit numbers would be needed, but also because I'd probably have to use IX/IY. And we know what they're like! But yeah, there will be statistics of some kind. Thanks for the good suggestion. :)

* I'm sure most programmers know what Donald Knuth would say about your "optimisation comes first", but the Z80 doesn't leave us much choice for things like this!


Quotealways loved Life implementations
Next up, a CPC emulator running in Life running on a CPC? Haha, it's always impressive what people can do with Turing machines and the like when they put their minds to it.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Gryzor on 15:36, 11 December 12
I think you could 'play' Life in the original Sims, actually. It would be cool if someone made a cpc-emu mod running Life then. A very esoteric joke.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Prodatron on 00:46, 12 December 12
Wow, it's VERY impressive, how fast this is! Maybe this is the fastest GoL-Implementation on the CPC we have ever seen?!
Realising a GoL-based Turing-machine on the CPC comes closer and closer!  :P
I didn't look at the code, but I wonder:...
- is it optimized for the 23/3 rule? (the original Conway rule for his Game of Life)
- so is the rule "hardwired" or would it be possible to configure other rules?
This summer I developed such an application on the Z80, too. It isn't optimized (and still bugy), and I only focused on the interface/options/figure library/rule settings/memory friendly etc.:

SymbOS - Conway's Game of Life application (http://www.youtube.com/watch?v=SDCTJO9Su9s#)

Maybe you can adopt some of the options and features, too, in your optimized version? If I can provide something, just tell me.

CU,
Prodatron

Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 02:38, 12 December 12
I was just going to post another update, but now I have to fire a giant wall of text at poor Prodatron  :-X

QuoteWow, it's VERY impressive, how fast this is! Maybe this is the fastest GoL-Implementation on the CPC we have ever seen?!
Thanks very much! That would be much-appreciated coming from anyone, but especially one of the CPC wizards. :)

Despite searching quite a bit, the only other versions I could find were one by Nich of CPCGameReviews and other projects (http://users.durge.org/~nich/) and a type-in from Amstrad Action back in 1989 (side-note: I very much like, and hope to take advantage of, how the Web Archive has PDFs of loads of issues). Neither seemed to have been designed with speed in mind.

I don't want mine just to be a barebones show of speed, of course, but being as fast as possible is an important starting-point for any later features, when we have so few MHz to work with - which isn't always a bad thing for forcing us to program well! I can't claim to be a genius in coding; it's only because both the algorithm is simple and the Z80 is limited that I have been able to figure out ways to save cycles, and it's been a really educational experience. More on that story below!

QuoteRealising a GoL-based Turing-machine on the CPC comes closer and closer! :P
Haha, I already proposed the idea of a CPC running Life, and that Life running a CPC emulator. Well, they say nothing's impossible!

QuoteI didn't look at the code, but I wonder:...
- is it optimized for the 23/3 rule? (the original Conway rule for his Game of Life)
- so is the rule "hardwired" or would it be possible to configure other rules?
Rules for birth and survival are looked up from tables, so they could be totally changed if the user wanted. However, I find that Life itself doesn't work very well with rules other than Conway's original one - more evidence for how good a discovery it was, I think! On the other hand, perhaps my main priority is to add support for Generations, in which cells have a lifespan longer than 1: If their count of neighbours goes to a 'death-causing' level, they start to age until they reach their maximal age and die, but during ageing they stop being included in the neighbour-counts of other cells but do continue to occupy space, preventing other cells from being born there. It changes the dynamics a lot and enables many different rules, some of which are really good. Check out the site for the old program MCell and the author Mirek's own rule Star Wars - it's totally fantastic :D

QuoteThis summer I developed such an application on the Z80, too. It isn't optimized (and still bugy), and I only focused on the interface/options/figure library/rule settings/memory friendly etc.:
Great; thanks for the info, and especially the video, since I couldn't find any footage/pictures, after mentioning it earlier:
QuoteProdatron: I just saw in the list of related threads that you made a version of Life for SymbOS. Thanks for clicking Like on my one! I was seemingly able to compile your source using WinAPE, but SymbOS (2, disc version, WinAPE set to 64+256 kB) tells me that it is "not executable in the SymbOS environment". Is it only for the next edition? Anyway, keep up the good work on all of your projects! I hope I'm not too humbled when I do see your version. :D
Speaking of which, the sorts of user-friendly features in your version are very similar to what I wanted to add to my previous versions in C(++) and SDL. Back then, I was halted by a total absence of knowledge about programming GUIs! Now, for the CPC, I want to have configurability but not let it affect speed. So, for example, I would only calculate and display statistics if the user requested them, not constantly/real-time. Ideally, there would be enough clock cycles to do things like that continuously, but there aren't! However, your version looks great - as does all of SymbOS, a very impressive achievement - and the options are eerily similar to what I would program if I could. :D Thanks for the offer of info; I'll definitely let you know if I have any questions/ideas/etc. once my program starts coming together more!

Anyway, thanks again for the compliments :)

Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 03:00, 12 December 12
(update at end of post)

Axelay: You'd think I'd remember this, having found out the hard way exactly the same before in my C(++) versions, but here goes: Copying to the future as you go along does not work, because obviously (again, only in retrospect) that'll overwrite cell 90 (for example) with old information after its future has just been altered by changes in cell 89, and so on. So, indeed, the entire buffer generated by the previous iteration has to be copied before one begins the next. I just found this out during my latest frenzied session of coding, which meant that work was a total a waste of time. Just like when I supposedly learnt the same lesson before!

But!

I removed pointless code that was wasting 3 NOPS for every cell! Admittedly, later and unrelated changes have cancelled out 2 of the 3 - but that's still a saving of 14 ms. Besides, I've also removed more pointless code that was wasting 3 NOPs for every cell that changed state . . . and . . . reallocation of registers has saved me another 4 for the same cells. :D  So that's a guaranteed saving of 14 ms every generation, plus a variable speed-up that rapidly scales up as the degree of action on the grid increases, with an saving of 7 NOPs per changing cell compared to the last version. I tried to estimate how many ms per generation that would add up to on average, but I have no idea what percentage of changing cells to use . . . So let's just pick a figure out of thin air and say it's 30 ms better. : P

Of course, that's hardly significant when compared to the overall time, and less significant after my previous and larger speed-ups - especially if using double-buffering, which is now included as DUBLBUFF.BIN (note: most previous versions already double-buffered, as I only disabled it see how much faster it was) . . . but it's still a nice saving! This must be close to totally optimal now . . . or at least close enough to stop wasting time trying  ;)  Overall, I'm quite confident that I've ironed out all the silly mistakes and found shortcuts almost everywhere that's possible - or worth doing!

So, I think I'll celebrate my recent successes with the Z80 by bumping this up to v0.1, which you can download below! Again, thanks to everyone for the compliments, hugely appreciated, and please stay tuned for lots of features in the future. :)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: arnoldemu on 10:26, 12 December 12
I like all the optimisations you are doing.

Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 13:57, 12 December 12
(Edit: slight update to the DSK: I was copying slightly too large a chunk of memory in non-double-buffered mode, after forgetting to change back something I was testing)

arnoldemu: Thanks; I'm glad you're enjoying reading :) You'll like the next bit, then!

As it turns out - although not including the fixes for silly delays that shouldn't have been there in the first place, which have stayed fixed - the saving of 4 NOPs for changing cells was outweighed by the added overhead of 2 NOPs on all cells due to the register reallocation that made the quicker changes possible - i.e. using IXH as a loop counter, etc. Changing cells would have to represent half of the entire population during every generation to make that worthwhile, which definitely is not the case. It might be my imagination, but I think the difference is noticeable. . . again  ::)

So, although 0.1 should indeed have been a bit faster than 0.04 due to 0.04 having silly delays, this one should be markedly faster than both of the two previous versions. Of course, we're still talking about 10s of ms, but it all counts!

I'd say this algorithm is definitely as good as it gets and now worth moving on from to focus on features, because although I'm proud of the speed I've achieved, I've proved that any further attempts I make to speed it up . . . don't actually work! :D Here's the latest DSK.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 14:20, 12 December 12
Yes, it's been fun to follow the development.  :)


Quote from: db6128 on 03:00, 12 December 12

(update at end of post)

Axelay: You'd think I'd remember this, having found out the hard way exactly the same before in my C(++) versions, but here goes: Copying to the future as you go along does not work, because obviously (again, only in retrospect) that'll overwrite cell 90 (for example) with old information after its future has just been altered by changes in cell 89, and so on. So, indeed, the entire buffer generated by the previous iteration has to be copied before one begins the next. I just found this out during my latest frenzied session of coding, which meant that work was a total a waste of time. Just like when I supposedly learnt the same lesson before!



So are you saying you need two copies of the 'current' cell map to produce a new generation?



Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 14:25, 12 December 12
I just reuploaded the DSK of 0.11 due to it previously copying slightly too big a chunk of memory in non-double-buffered mode, which was left behind from testing. Sorry for spamming you all with DSKs :|

Axelay: I'm glad you've found it interesting! By the way, I've heard a lot about your games, so I'm going to repay the favour by checking them out tonight. :) Yep, we need to copy the most recently generated future to the present, so they're both now the same, and then compute (write) the new future whilst leaving the present (from which it's read) completely unchanged.

Edit: This is at least the case for the particular way that I process the evolution. A more naive method would probably make copy-as-you-go possible. I chose my one because it saves a lot of time-wasting during the checking process. However, it does necessitate the full copy afterwards. On balance, though, it's totally worth it.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Prodatron on 16:42, 12 December 12
Quote from: db6128 on 02:38, 12 December 12Rules for birth and survival are looked up from tables, so they could be totally changed if the user wanted.
That's really good! As I could see Nich C. implemented it hardcoded (but that doesn't make a big difference for the optimization)

Quote from: db6128 on 02:38, 12 December 12However, I find that Life itself doesn't work very well with rules other than Conway's original one - more evidence for how good a discovery it was, I think!
Regarding the complexity of different "life forms" in such a world, this is really true! Though some of the other alternative worlds are producing also very interesting results (like the copy- or the labyrinth-world). But yes, you can spend much more time with Conway's 23/3 world.

Quote from: db6128 on 02:38, 12 December 12On the other hand, perhaps my main priority is to add support for Generations, in which cells have a lifespan longer than 1: If their count of neighbours goes to a 'death-causing' level, they start to age until they reach their maximal age and die, but during ageing they stop being included in the neighbour-counts of other cells but do continue to occupy space, preventing other cells from being born there. It changes the dynamics a lot and enables many different rules, some of which are really good.
I never saw this before, this sounds really interesting!

Quote from: db6128 on 02:38, 12 December 12Prodatron: I just saw in the list of related threads that you made a version of Life for SymbOS. Thanks for clicking Like on my one! I was seemingly able to compile your source using WinAPE, but SymbOS (2, disc version, WinAPE set to 64+256 kB) tells me that it is "not executable in the SymbOS environment". Is it only for the next edition? Anyway, keep up the good work on all of your projects! I hope I'm not too humbled when I do see your version.
Sorry, I missed this before. Did you assemble the "App-GameOfLife1.asm" (which is somehow the "make"-file, which includes the other ones)? Anyway I will attach the actual version as single files/DSK/source codes.
My version is clearly slower than yours one. I think the main reason is, I am using a bit-field instead of a byte-field. As my field is 64x64, only 512bytes (=64x64/8) are required for one copy of the field and 1KB for both fields together. But this makes it probably much slower, because of all this bit shifting/handling stuff.

You can find the two core-routines at:
CELCLC (calculates the whole field) and CELCNT (subroutine used by celclc, which counts the neighbours of one field)
The average time for calculating one cell is about 145 NOPs - just for generating it in the new bitfield (without the graphic output).
I wonder how much time is needed for the "core"-calculation of one of your cells?

CU,
Prodatron
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 19:33, 12 December 12
QuoteThat's really good! As I could see Nich C. implemented it hardcoded  (but that doesn't make a big difference for the optimization)
In fact, in many cases, it can be quicker to look it up from a table, as I don't have to do multiple CPs/JRs – this is always faster when, as with S23, there are at least two possible 'yes' numbers and the first is not found, which occurs most of the time – and it will certainly be a lot faster for rules with more possible matches.

QuoteI never saw [Generations] before, this sounds really  interesting!
Yep, you're in for a real treat when you check out some of those rules.  :D

QuoteDid you assemble the "App-GameOfLife1.asm" (which is somehow the  "make"-file, which includes the other ones)? Anyway I will attach the actual  version as single files/DSK/source codes.
Yeah, I started from that file, and WinAPE said everything worked (although I  had to remove some weird characters at the start of each file), but SymbOS  wouldn't execute the resulting file.

QuoteMy version is clearly slower than yours one. I think the main  reason is, I am using a bit-field instead of a byte-field. As my field is 64x64,  only 512bytes (=64x64/8) are required for one copy of the field and 1KB for both  fields together. But this makes it probably much slower, because of all this bit  shifting/handling stuff.
It's a great example of the classic trade-off between speed and memory! I can  see why you used this method, which is very efficient on RAM – very nice in  itself, and important for SymbOS, I imagine. Me, I'm monopolising almost the  entire memory just to save NOPs.

QuoteThe average time for calculating one cell is about 145 NOPs - just  for generating it in the new bitfield (without the graphic output).
I wonder  how much time is needed for the "core"-calculation of one of your  cells?
I guess I'll do some sums later and let you know. Whatever the time, it'll be even better now than it was before, as I'll explain in my next post. :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 19:40, 12 December 12
I think I found one more silly mistake, although it apparently wasn't messing everything up as one would imagine it would, so maybe something else was counteracting it...?! Whatever, it's gone now.

And folks, I've been doing a little something that I'm going to christen (hopefully I'm the inventor of this phrase ;) ) 'unrolling branches': I duplicated my end-of-loop code into each place where it's needed, rather than having to jump to it from various other points; and I also changed some branches to take advantage of how JR is 1 NOP faster when the condition is false. A full explanation would be a bit complex and boring, but as far as I understand the relevant timings, this:
* saves 3 NOPs for dead cells that are born
* saves 2 NOPs for dead cells that stay dead
* saves 2 NOPs for alive cells that die
* costs 2 NOPs for alive cells that survive, one of which is due to the added distance meaning that I have to to use DEC B:JP NZ instead of DJNZ

Considering that most cells are dead and stay dead, 2 NOPs each is bound to be a massive saving in total. For dead cells that are born, there is also a bigger saving of 3 NOPs. Also, although there's an added NOP for live cells that stay alive. I imagine that, in an actual game, many live cells die – either as part of complex, ever-changing patterns or as part of oscillating shapes. And guess what? Live cells that die save 2 NOPs!

To analyse it a little more: The slight cost of live cells that stay alive (2 NOPs) would only hypothetically matter on a grid in which most cells had settled into still-life or oscillating patterns; in either case, it would be outweighed by large speed-ups from the majority of cells that will be dead and stay dead (save 2 NOPs), and in the case of oscillators, from the changing cells (save 3 NOPs from dead to alive or 2 NOPs from live to dead).

So, overall, this is a massive boost to speed. Although, yes, also to my greedy usage of RAM. :P

Earlier, after I realised that I could unroll the branches like this, I also realised that (A) it's 2012-12-12 and (B) the successful revision would be v0.12. Pretty cool! I have given it an even more interesting version number to celebrate. :D Please have a look if you are interested; I think the gain in speed is especially significant this time! :)

The files are MAXSPEED.BIN (no double-buffering) and DUBLBUFF.BIN.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Prodatron on 19:50, 12 December 12
Great stuff!  :)

Would a 6502 be able to perform the same??  :P

Btw, it's quite easy to measure the time:
- set a breakpoint in WinApe at the beginning and at the end of the routine
- when reaching the first breakpoint, clear the timer (directly below the registers)
- when reaching the second breakpoint you will see the amount of NOPs
(maybe you already knew this)

CU,
Prodatron
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 20:20, 12 December 12
Thanks for the positive feedback! And for the useful hint, which means I can now answer your question:

Just from running through the first 40 or so generations of the non-double-buffered version in WinAPE, including the copy of the buffers, each seems to take approximately 440–460 ms, so about 2.222... per second. This suggests that my previous estimates were maybe a bit high, especially as the versions only get slower as we go back in time. :P Since the copy takes 74337 NOPs, and assuming an average 450000 NOPs per total generation, that means the processing of each cell takes an average of (450000-74337)/(126*110) = 27.1 NOPs.

Of course, these figures are not necessarily reliable for further-evolved random patterns, or for user-designed patterns (something that I am, of course, going to make possible) – however, if anything, both are likely to be faster than my numbers suggest, as the number of living and especially constantly surviving cells will probably decrease with time and/or in engineered patterns, which would speed things up even more, as I described earlier. In other words, we can't lose :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 13:02, 13 December 12
Quote from: db6128 on 14:25, 12 December 12
Axelay: I'm glad you've found it interesting! By the way, I've heard a lot about your games, so I'm going to repay the favour by checking them out tonight. :) Yep, we need to copy the most recently generated future to the present, so they're both now the same, and then compute (write) the new future whilst leaving the present (from which it's read) completely unchanged.



Hmm, I guess considering the number of SET/RES instructions you might need to add to avoid copying the visible screen, the copy may well be the faster option after all!


With respect to my games, if you are interested (mad?  ;) ) then Edge Grinder (http://formatwar.net/view_article.php?art=collabortition_1) also has commented (after a fashion) source.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 20:39, 13 December 12
Quote from: Axelay on 13:02, 13 December 12Hmm, I guess considering the number of SET/RES instructions you might need to add to avoid copying the visible screen, the copy may well be the faster option after all!
Sorry, maybe being dense, but I'm not sure what you mean. Do you mean if I was using the screen to store the cells?
QuoteWith respect to my games, if you are interested (mad? ) then Edge Grinder also has commented (after a fashion) source.
Oh, good to know; thanks! These games look great. I'm going to load them up soon and see whether I have any bare minimum of skill left from my days of playing Hellfire on the Mega Drive. :D
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 20:41, 13 December 12
I heard you like MODE 1

Game of Life for the Amstrad CPC in MODE 1 (http://www.youtube.com/watch?v=MN0sPujfOg8#)
The reason that it's not as much slower as you might expect, despite processing 21844 cells instead of 13860, is that, thanks to the horizontal resolution being 256, I can move north/south simply by INC/DECing H.

Although this is a nice proof-of-concept, I don't know whether I'll pursue it much further. The only way to get the additional vertical height that the grid is screaming out for will be to use the extended RAM, but flipping back and forth between the two buffers will then take 2 additional NOPs for every cell that changes. So, although I could stretch as far as a 254×174 grid and still have a passable amount of space for future increases in the size of the program, it would be massively slow by that point, due not only to the doubled vertical resolution but also to the slowdown required to page between base and extended RAM.

Do we really need a larger grid that much? (Regardless of the computational requirements, it's getting ever so slightly hard to see : P) The only contexts in which I've found large grids to be necessary during my previous travels are
(A) for Turing machines and other computer-within-computer–type things, whether based on Life or some other cellular automaton – but these require much more space than 256×256 and are obviously far beyond the processing power of the Z80
(B) more relevantly to my plans and to the reality of the CPC, for some of the rules of Generations – but most of those have cells with more than 3 ages, to which MODE 1 is not conducive, due to Generations really requiring different colours for different ages (a limitation that would affect any automaton with >3 states).

Still, any thoughts/opinions/etc.?

Anyhow, back to normality! Safer ground – phew! With a video I recorded of the latest version of the normal MODE 0–based edition. It would be nice if it might attract the attention of people who saw this thread but didn't try the program yet. ;)

John Conway's Game of Life on the Amstrad CPC -- fastest code in the West! (http://www.youtube.com/watch?v=BAv9h0UlYR4#)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: TFM on 21:09, 13 December 12
That's really a great application, the best implementation ever seen on CPC. Greets!!!
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Prodatron on 22:26, 13 December 12
This is still so damn fast!

Quote from: db6128 on 20:41, 13 December 12
I heard you like MODE 1

Game of Life for the Amstrad CPC in MODE 1 (http://www.youtube.com/watch?v=MN0sPujfOg8#)
The reason that it's not as much slower as you might expect, despite processing 21844 cells instead of 13860, is that, thanks to the horizontal resolution being 256, I can move north/south simply by INC/DECing H.

Although this is a nice proof-of-concept, I don't know whether I'll pursue it much further. The only way to get the additional vertical height that the grid is screaming out for will be to use the extended RAM, but flipping back and forth between the two buffers will then take 2 additional NOPs for every cell that changes.

If I see it correctly, you do the field generation and conversion to the screen in the same step. Is that the way you do it? So I wonder, if you could seperate it into two steps without loosing performance (maybe you could even gain speed, as you have more registers for each step - on the other hand you have to read the data twice...). If you do it in two steps, you can store and calculate the two fields completely in the 2nd 64K ram bank (out #7f00,#c2). Only for plotting it on the screen, you change to the 1st ram bank. You only need to do memory banking about 3 or 4 times for each generation.

In any case, IMHO the Mode1 version is VERY impressive for an 8bit system, as it shows, that the CPC can do this even at a very high resolution!

CU,
Prodatron
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 01:49, 14 December 12
Quote from: TFM/FSThat's really a great application, the best implementation ever seen on CPC. Greets!!!
Wow, thank you very much! :)

Quote from: ProdatronThis is still so damn fast!
Thanks to you, too, again! :) Yes, the main advantage of MODE 1, and a big one, is that it enables rows of 256 cells wide, which therefore are aligned on page boundaries, which makes things a lot faster when I need to navigate up and down. That's why it doesn't slow down as much as might be predicted from the increased resolution (i.e. like it would if I had to add and subtract numbers other than 256).

Another boost, which I didn't think to mention before for some reason, is that this version doesn't have to plot twice vertically as in MODE 0; that's not a huge consumer of NOPs compared to other things, but it definitely counts. Incidentally, I have an option in my MODE 0 source to make the pixels single-height, but it doesn't look particularly nice!

Quote from: ProdatronIf I see it correctly, you do the field generation and conversion to the screen in the same step. Is that the way you do it?
Yeah, it's all done at once. But what you said next is very interesting and might also mean that I don't give up on the idea of a MODE 1 version quite yet... so excuse me while I think aloud in my next paragraphs ;)

Quote from: ProdatronSo I wonder, if you could seperate it into two steps without loosing performance (maybe you could even gain speed, as you have more registers for each step - on the other hand you have to read the data twice...).
I probably wouldn't have considered separating the steps for the MODE 0 version, but since you've mentioned it and the potential benefits due to reallocation of registers, maybe it just might be better to separate the two, even despite having to read the buffer twice... It's worth a try, anyway. I'm coding that version now and will report the timings when I can.

Anyway, in a – completely hypothetical! – MODE 1 version using the extended RAM, that would definitely be the best way to do it, because...
QuoteIf you do it in two steps, you can store and calculate the two fields completely in the 2nd 64K ram bank (out #7f00,#c2). Only for plotting it on the screen, you change to the 1st ram bank. You only need to do memory banking about 3 or 4 times for each generation.
Almost a whole 64 kB... Depending on how much space I allow for the program, I'm calculating that I could have 190, 222, maybe even 238 lines of vertical resolution – wow! I would save a tonne of NOPs by not having to page back and forth for every changing cell – which I now feel would have been very silly – instead, only before and after drawing. Also, even besides the RAM-paging, having a full 64 kB to play with would let me go back to the faster method of flipping the buffers that I use in the MODE 0 version; the current MODE 1 has to use a slower method to fit in the lower 64 kB alongside the screen. Actually, with separate computing and drawing passes, being able to reallocate registers would enable an even faster way of doing it! What an array of possibilities :O

So, combining all of these intriguing possibilities, you've made a fantastic suggestion – thanks a lot! :) This does make me quite a bit more likely to try this out, even if it's only due to extreme curiosity...  Since the current slower method generates about 1.2 generations per second (at an optimistic estimate), I wonder if, using all these advantages, I could at least double the vertical resolution and not go below 1 generation per second – which I would say is good enough, especially when we consider the numbers of cells that would be involved.

Although it wouldn't matter for this thanks to your idea, just to confirm a silly question: If I did want to use the full 128 kB from one bit of code, which would want to access bytes from both of the 64 kB pages – i.e. flipping between configurations 0 and 2 – I have to duplicate it at the same addresses in both pages, right, so that it's still there when they are swapped? Silly question, I know...

QuoteIn any case, IMHO the Mode1 version is VERY impressive for an 8bit system, as it shows, that the CPC can do this even at a very high resolution!
Again, thanks a lot for the great positive feedback! And someone with your experience on the CPC/Z80 doesn't technically need to have a Humble Opinion, so it's appreciated even more. :P

And I hope it's clear that I really appreciate these suggestions; I'll let you know if I get any of them going. But there I was just doing simple things like sketching out how I can implement the four-colour–inheriting system of QuadLife; thinking about Generations, too; and other things... instead, now I'll be distracted by thoughts about an entire other version! Haha.
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 13:10, 14 December 12
Quote from: db6128 on 20:39, 13 December 12
Sorry, maybe being dense, but I'm not sure what you mean. Do you mean if I was using the screen to store the cells?
Ah, sorry.  Wrote out some thoughts, came to the conclusion I was wrong and just wrote the conclusion.    So this time I'll just leave it!  As you were saying the 'current' screen was required and I was trying to think of ways to avoid the initial copy, instead just swapping between the two.   It would perhaps use SET & RES to swap BC between the 'previous' screen (actually the visible one) and the new current one.   Possibly DE could always point to the buffer...  But then I thought having to do 2 RES & SET per byte, as there were two cells to a byte, would be slower than the single LDI.  Except *that* was wrong too!  There's two bytes per two cells in the mode 0 version.  And from what you've said I guess you're only writing cells that change, so maybe it could still be faster not copying the screen, assuming the cells that change are in a minority.  If I understood the code I looked at correctly, the cells were being masked in or out individually as they changed, so I'm also wondering about whether handling the cells in pairs (or 4's for mode 1) might provide any advantage.  Say by only reading the original byte once and handling all the cells in it at once, and only then doing a single write back to the screen buffer if any of them change, rather than masking the individual changed cells in or out of the screen data.  But after my last post, I expect to realise that this too is wrong after I post it!  :) .... Ah no, realised it already...  :laugh:  Without copying the screen, there'd be a need to copy cells that havent changed in the last two generations rather than one, which makes a real mess of things.


Quote from: db6128 on 20:39, 13 December 12
Oh, good to know; thanks! These games look great. I'm going to load them up soon and see whether I have any bare minimum of skill left from my days of playing Hellfire on the Mega Drive. :D

Well I shouldnt imagine any of them will be that troubling!  :)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 03:28, 15 December 12
Axelay: I think I'm more confused than you now, and I wrote the thing! : P Copying the screen is only necessary for double-buffering – I do need two copies of the cells, but not of their visual representation, if that's what you thought? – and, if enabled, isn't a huge hassle, really, as it represents a reasonably small fraction of the time required for each generation (about 70 ms on top of the 440 ms for the cells alone). Combining it with my data wouldn't work for the current methods or any of the ones I have planned, but I can see the appeal theoretically. I appreciate you taking the time to think over things like that and the swapping method, in any case!

Prodatron: I tried your suggestion of doing the plotting separately, and it did enable me to write a clever and fast routine for translating the data to pixels. It was only a little bit slower. I don't think I'll use it in MODE 0, but, as I said, it will definitely be essential for MODE 1. Of course, in the latter, the same nice and clever method won't work, because I have to process four pixels at a time rather than just two. :|

Next up, since some people are never happy...
Quote from: Gryzor on 14:43, 11 December 12Rubbish gfx, no sound. And a mode 0 game with only two colours? Pffft. Control is also so slow it seems your input has no effect at all.
...Let's see if I can't do something about that ;) 

Conway's Game of Life variant QuadLife for the Amstrad CPC! (http://www.youtube.com/watch?v=t2hUY900Y80#)
(video updated to reflect the newer version I've uploaded since originally posting this and show more of its features) [Edit One Million: well, it's already out of date now...]

This is QuadLife! Every cell has one of four types, conveniently represented by colours. A newly born cell inherits the colour/type that is the majority amongst its parents – or, if all three were different, the new cell inherits the one type/colour that none of its parents had.

The video shows how groups of particular colour can migrate and perhaps even 'outcompete' others in interesting ways. Randomly assigning colours can also provide its own type of interesting experience in terms of watching how the subsequent dynamics play out. Of course, things like this would be most interesting in the context of 'competitions' between different custom-drawn patterns.

And on that note, since I'm starting to get to grips with programming the keyboard, thanks to a guide from Kevin's site (which I initially found as a sneaky copy-and-paste without attribution on some other site), an editor shouldn't be too far away. In any case, I've already added a few little options for interaction. So, as well as enjoying this photorealistic depiction of the struggle for existence, you can now press (updated)...Graphics, controls... Gryzor probably will insist that I still have to add sound effects or something. Are tank noises OK, or do you want lasers? :D

This might remain separate from the 'basic', even-more-optimised variant, but I am very pleased with – and surprised by – how well it turned out, and also how quickly it came together. To get around the overhead introduced by the inheriting system, I had to make some significant changes and develop several new clever tricks, and surprisingly, they all worked with a rare minimum of fuss and frustration. 

Edit:

Aaaaaand apparently I just chopped about 80 ms off each generation of QuadLife.

But that's not to ignore good ol' Original MODE 0 Version, as it's just gotten another boost of almost 100 ms per generation, taking it to around 350 ms for each.
Game of Life CPC - latest speed-ups, now almost 3 generations per second (http://www.youtube.com/watch?v=jXLEPNa9e6I#)
Is 3 generations per second possible? I guess we'll eventually find out one way or another!

I've bundled both, now including both double-buffered and non-double-buffered versions of each, in the (yet another) attached DSK. Please enjoy and let me know what you think :)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: Axelay on 05:49, 16 December 12

Quote from: db6128 on 03:28, 15 December 12I think I'm more confused than you now, and I wrote the thing! : P
Glad I could help!  :laugh:


Quote from: db6128 on 03:28, 15 December 12
Copying the screen is only necessary for double-buffering – I do need two copies of the cells, but not of their visual representation, if that's what you thought?
By the way you were AND/OR-ing the pixels in (iirc, or not), I thought you were preserving the contents of the bytes other pixel(s), so I thought that was why you were copying the whole screen, so you would only need to modify the changed cells on the visible screen.


Quote from: db6128 on 03:28, 15 December 12
Combining it with my data wouldn't work for the current methods or any of the ones I have planned, but I can see the appeal theoretically.
I thought this might well be the case, unfortunately after studying your earlier code a bit to confirm to myself my answer to your original question was good, I just couldnt quite manage to stop thinking about some of the rest of it.  :)
Title: Re: AMSTRIFE—John Conway’s Game of Life for the CPC—fast + many features on the way!
Post by: db6128 on 06:42, 16 December 12
Yep, the pixels are masked, and cells are only plotted if they are changing. However, the screen does not have to be copied. That's only necessary if a static duplicate is desired to enable flicker-free transition between generations via double-buffering. There have to be two distinct buffers of the cells themselves (so that accumulating writes to the future do not affect reads from the present and thus mess everything up), but there is no such limitation to their pixels on the screen; altering the display on-the-fly is no problem at all.

My latest optimisations since the last DSK have gained significant speed-ups in both the basic MODE 0 version and QuadLife. QuadLife has gained about 80 ms per generation and thus is now running as fast as the non-colourised version used to, whereas that has sped up by 100 ms per generation and – whereas it wasn't quite there when I edited my previous post – can now indeed hit the milestone of 333 ms per generation, and thus 3 generations per second, fairly often. :)

I need a little break from moving squares for a while, I think...! but I doubt it will be long before I start on an editor and also some kind of basic interface for customising some parameters.

One thing I must correct: my calculations about MODE 1 were wrong... by a factor of 2, embarrassingly. The most lines I could get and still have some space left for code would by 118 (by 254), and I don't know if that's really worth all the trouble: mainly having to do the plotting entirely separately, naively, and using 4 possible shifts; but also losing colours (=less possible states/types); and other things, too. I'm considering other ways to do it, but I don't know if I'll judge it to be feasible enough in the end, at least at the kind of speed I want. So, MODE 0 is what I'm really thinking about.
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: Gryzor on 20:49, 16 December 12
Wow, now something is starting to shape at least. No laser sounds, of course, but gulping or withering-away sounds would be nice (good luck deciding which cells would emit a sound. O the cacophony!)

Yes, quite some interesting things going on there with your options, nice complexity! Thanks, this is turning out really nice :)

Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: db6128 on 00:43, 18 December 12
Quote from: Gryzor on 20:49, 16 December 12No laser sounds, of course, but gulping or withering-away sounds would be nice (good luck deciding which cells would emit a sound. O the cacophony!)
I know! Sir Alan, why so few channels ;_;

QuoteWow, now something is starting to shape at least. [...] Yes, quite some interesting things going on there with your options, nice complexity! Thanks, this is turning out really nice :)
Thanks very much; I'm glad you're enjoying it!




Latest updates: The monochrome version now breaks the 3 generations per second (333 ms per generation) barrier for most frames, meaning that it's over 1.5× faster than the very first one I posted here. Ah, younger days... :D And QuadLife has also been sped up a lot, now hitting about 380 ms per generation instead of the slothful 450 ms of its youth. I've also resized the screen so that there's no gap at the bottom with the unused rows 28–31, which will also allow me to use their &800 bytes (4 rows × 8 scanlines × 64 bytes) of memory for not boring and definitely awesome things.

Once again, both are available in max-speed and double-buffered forms in the attached DSK, so as usual, please enjoy! :)
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: db6128 on 02:12, 21 December 12
Confirmed to work on real hardware! Just as good... Actually, it seems even better/faster :D
Looks flashier on an actual TV, too, I must say – even if this is only one of 3 TVs (2 CRT, 1 flat) on which this RGB SCART cable works (1) with colour and/or (2) without ghosting/shuddering sideways – something I'll have to ask the seller about if it carries over to any more sets (I can believe that the two that didn't work might be due to their own faults, not the cable).

The only gripe is that, unlike WinAPE, my 6128 only accepts the first key of input and is unresponsive after that. I was just setting up the &F6 PPI A read, &F4 set matrix, etc. once and then only doing LD B,&F4:IN A,(C):BIT n,A within the main loop – is that not right?
Another program I tried seemingly crashed until I disabled my keyboard-reading code altogether, although that might be because it was using IN A,(C):JP P,0 to reset if the key in bit 7 was pressed, which might be a bad way to do it and might have made it reset every time.
Basically, on real hardware, which steps of that lengthy keyboard-initialising process are needed only once, and which are needed every time I want to test a key/line? Thanks in advance.

Also, I have to load things onto my 6128 via tape right now; can anyone suggest a better workflow than WinAPE -> 2cdt -> tapir?

Anyhow, off to find that BASIC type-in that supposedly identifies which CRTCs I have in my CPCs. The suspense... bad news for the 6128, at least :|
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: db6128 on 18:03, 25 January 13
No, despite the date of the previous post, this project did not experience its own private Mayan apocalypse!

But a combination of holidays, extreme laziness, trying to buy a million synthesisers, and selling-my-laptop-and-having-to-use-a-crappy-desktop meant that I haven't felt like working on this for a while. However, I should be receiving a new laptop tomorrow – basically, the same exterior but several times better inside! – meaning that I will be able to compute in comfort, rather than having a desktop sitting on my floor (no, I don't have any other space here; that's how bad it is! Short of moving it into the kitchen – which is freezing), and will have a computer that I actually like. :P

So, WinAPE will doubtlessly be one of the first new residents of that oh-so-fresh hard disk, and I dare say that my constant procrastination will eventually redirect itself from extremely pointless things to slightly less pointless things such as the Game of Life. ;) As I am fairly confident that it's almost exactly as fast as is possible (I know I've said that a million times, but look at it now: over 3 generations per second in monochrome mode!), including some additional speed-ups since my last released DSK, I will move to focussing on actual functionality, starting with a basic pattern editor.

I did wonder whether this could be a candidate for the 16 kB competition, but then it's not really a game, despite its name... Does anyone have any opinions on its validity or the lack thereof? Although to be honest, I don't think I'll have enough time to produce anything worthwhile, and I don't know whether the code being in ROM (mapped to the same range as the screen, thus freeing up the lower 48 kB) would offer a sufficient number of advantages. But I guess you never know what might happen if I find enough time... and, crucially, inspiration! Some of my previous breakthroughs did happen very rapidly, after all. :D
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: Gryzor on 16:18, 30 January 13
Just an idea - if you do a ROM version with it, maybe you could do a (slowed-down) version that runs in a corner window while the user can use the computer?
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: db6128 on 16:29, 30 January 13
If so – world's biggest if! – that's actually a brilliant idea. But... I haven't a clue of how the separate window would be implemented, though; I imagine it would be easiest in BASIC, but I have no idea how it could be made to run in any other environment, particularly if the user was likely to be doing anything graphical of their own.

I don't have enough money to keep that laptop (returning it unopened), so programming won't be fun, but I may still give it a go. Sadly, no promises, as I don't know if I'll manage even to keep up with all my normal work starting soon, let alone programming on top of that. :( But again, y'never know.
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: Gryzor on 15:53, 31 January 13
Oh yes, obviously in BASIC, I doubt it could be made to work under anything else or if a program was loaded. Still it'd be fun :)

Too bad about your laptop, wish you the best :(

Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: litwr on 15:27, 20 July 14
I am late but full of curiosity.  I hope somebody may continue talk.

Quote from: Prodatron on 00:46, 12 December 12This summer I developed such an application on the Z80, too. It isn't optimized (and still bugy), and I only focused on the interface/options/figure library/rule settings/memory friendly etc.:SymbOS - Conway's Game of Life application (http://www.youtube.com/watch?v=SDCTJO9Su9s#)


I can't understand 0 and 9 stay alive/new cell conditions.  IMHO 9 is impossible case, and 0 is very difficult to implement.

Quote from: db6128 on 16:29, 30 January 13If so – world's biggest if! – that's actually a brilliant idea. But... I haven't a clue of how the separate window would be implemented, though; I imagine it would be easiest in BASIC, but I have no idea how it could be made to run in any other environment, particularly if the user was likely to be doing anything graphical of their own.


Are there any news?  BTW I think it is possible to spend only about 65 NOPs per cell calculation.
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: Prodatron on 18:25, 20 July 14
Quote from: litwr on 15:27, 20 July 14
I can't understand 0 and 9 stay alive/new cell conditions.  IMHO 9 is impossible case, and 0 is very difficult to implement.
Haha, yes, 9 was from an old beta version, which is shown in the video. The actual version only has 0-8.
But why is 0 difficult to implement? It's just a simple value which represents the number of neighbours, and this could be also 0.
There are quite interesting worlds you can create if using the 0 as well!
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: litwr on 17:03, 21 July 14
Wow!  Thank you for this reply.  :)
Quote from: Prodatron on 18:25, 20 July 14
But why is 0 difficult to implement? It's just a simple value which represents the number of neighbours, and this could be also 0.
There are quite interesting worlds you can create if using the 0 as well!
"stay 0" condition is not difficult, but "born 0"... See the picture of the cellular automaton "period 168 knightship" designed by David Eppstein which has "stay 01" and "born 013568" conditions - http://litwr2.atspace.eu/demo/demo.html - or try it with Golly or Xlife. ;)  
BTW where to take the new version of GoL for SymbOS?  This OS looks like something impossible for 8-bit computer...  The only dark spot is the absence of free sources.  I know man who wants to use it with Commodore.
Quote from: Prodatron on 19:50, 12 December 12Great stuff!  :)  Would a 6502 be able to perform the same??  :P 

http://litwr2.atspace.eu/xlife/xlife4cbm.html   ;D
Title: Re: Conway’s Game of Life for CPC—very fast+many features coming! New: QuadLife! p.3
Post by: kelp7 on 21:46, 21 July 14
Wow, love this project. Have been enamoured with Conway's "Life" since I first saw it as a kid on a VIC-20. Think I might write a version for my Z80 machine. Will be checking this out in an emulator :)
Powered by SMFPacks Menu Editor Mod