News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu

A new fast Mandelbrot generator

Started by litwr, 13:08, 13 December 21

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

litwr

I have just released a super-fast Mandelbrot generator program for the Amstrad CPC and several other platforms. Its algorithm doesn't use multiplications at all, it uses squaring instead. Benchmark results are here: the Z80 shows some advantages of its 16-bit registers but a home PDP-11 shows even slightly better results.

Gryzor


ComSoft6128

I'll record this for YouTube later today :)

ComSoft6128

#3
16 screens are generated by M4.BIN & 16 screens are generated by M16.BIN.
Please note that screen generation is not automatic, keyboard input (one key) is required.



Not emulated - original hardware and software.
Please note that  the aspect ratio for this YouTube video is 16:9 but the CPC monitor
has an aspect ratio of 4:3 so you may wish to adjust your viewing device accordingly.

litwr

Quote from: ComSoft6128 on 16:29, 13 December 21
16 screens are generated by M4.BIN & 16 screens are generated by M16.BIN.
Please note that screen generation is not automatic, keyboard input (one key) is required.

Not emulated - original hardware and software.
Please note that  the aspect ratio for this YouTube video is 16:9 but the CPC monitor
has an aspect ratio of 4:3 so you may wish to adjust your viewing device accordingly.
Thank you very much. The music is very good.  IMHO your video from real hardware looks better than the same images from emulators.

ComSoft6128

Glad you like it :)


The music is provided by a second 6128 Running the famous Soundtrakker program by @BSC .

cpcitor

This is extremely impressive. 30 years ago there were already Mandelbrot demo, I think I saw one sporting 45 seconds.

There is something wrong with the zoom. The set becomes distorted.
I've seen this before in a Mandelbrot program I made on PC, when zooming too much compared to the precision.
My guess is: this is possible so fast because computation is done with extremely limited precision.
Had a CPC since 1985, currently software dev professional, including embedded systems.

I made in 2013 the first CPC cross-dev environment that auto-installs C compiler and tools: cpc-dev-tool-chain: a portable toolchain for C/ASM development targetting CPC, later forked into CPCTelera.

Prodatron

#7
Quote from: litwr on 13:08, 13 December 21
I have just released a super-fast Mandelbrot generator program for the Amstrad CPC and several other platforms. Its algorithm doesn't use multiplications at all, it uses squaring instead. Benchmark results are here: the Z80 shows some advantages of its 16-bit registers but a home PDP-11 shows even slightly better results.

Very very impressive!
And thanks to ComSoft for the YT video!

Btw I only had a quick look at the source code, but this...

        push hl     ;*100        add hl,hl        add hl,hl
        pop de
        add hl,de
        push hl
        add hl,hl
        add hl,hl
        pop de
        add hl,de
        add hl,hl
        add hl,hl


->

        ld e,l
        ld d,h
        add hl,hl   ;2
        add hl,hl   ;4
        add hl,de   ;5
        ld e,l
        ld d,h
        add hl,hl   ;10
        add hl,hl   ;20
        add hl,de   ;25
        add hl,hl   ;50
        add hl,hl   ;100

...will save 10 NOPs for the *100/3 routine (but probably not making a remarkable effect).
It's a very cool project, @litwr, I am thinking about porting it :)

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

litwr

Quote from: cpcitor on 20:53, 13 December 21
This is extremely impressive. 30 years ago there were already Mandelbrot demo, I think I saw one sporting 45 seconds.

There is something wrong with the zoom. The set becomes distorted.
I've seen this before in a Mandelbrot program I made on PC, when zooming too much compared to the precision.
My guess is: this is possible so fast because computation is done with extremely limited precision.
It would be very interesting to test this demo from the past.  I know one very fast Mandelbrot for the CPC - https://www.octoate.de/2012/09/13/mavdelbrot-a-fast-mandelbrot-set-calculation-by-mav/ - It also uses 16-bit fixed point math but I didn't know how many points it calculates.  I have used an algo which calculates 128x256 points.
You are right about some distortion but you may also call it scaling. :) I have calculated the same image #8 using the CPC program and a Linux C++ program which uses the standard 8-byte floating point.  The CPC algo builds the picture on the right.

litwr

Quote from: Prodatron on 21:43, 13 December 21
Very very impressive!
And thanks to ComSoft for the YT video!

Btw I only had a quick look at the source code, but this...

...will save 10 NOPs for the *100/3 routine (but probably not making a remarkable effect).
It's a very cool project, @litwr, I am thinking about porting it :)
Thank you very much!
Thanks to the attempt to make the code better.  But the selected code is outside any fast cycles so I optimized it for size not for speed.  Indeed it will be fantastic to port this program somewhere else.
And, indeed, a lot of thanks for your fantastic software for the CPC.

Prodatron

#10
Quote from: litwr on 20:02, 14 December 21
I know one very fast Mandelbrot for the CPC - https://www.octoate.de/2012/09/13/mavdelbrot-a-fast-mandelbrot-set-calculation-by-mav/
I remember this from MaV, it was really fast as well.
Holy shit, this is already 9 years ago! :D
But I guess yours is again much faster.

Quote from: litwr on 20:10, 14 December 21And, indeed, a lot of thanks for your fantastic software for the CPC.
Oh, spasibo, this is very nice to hear.

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

MaV

Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

litwr

Thanks for the positive feedback.  I couldn't stop and did several more optimizations...
Using the ZX Spectrum fan help, I have been able to speed up the Amstrad code by 20%!  Now the Amstrad results are 10% better than for the more expensive and later BBC Master.  It is really astonishing.  For 16-bit optimized code the Z80 is much better than for the 6502.  However the Z80 optimized code is much harder to get than the 6502 optimized code.  BTW the program uses the image y-symmetry, 14.5-bit arithmetic, and a lookup table for squares.
Quote from: cpcitor on 20:53, 13 December 21There is something wrong with the zoom. The set becomes distorted.
I checked the issue more carefully.  I have found out that I was slightly incorrect in my previous post about this.  I suggested that the program draws distorted images because of the precision problem.  However actually the images are not distorted at all!  The fixed point arithmetic used is quite good for the task.  The two images I showed above are different because the CPC program loses precision during an image transformation, but this only affects the Mandelbrot parameters - the images themselves are rather perfect.  I add a perfect image to my page - if anybody has interest I can attach my C-program that builds perfect images for the Mandelbrot parameters given. However fractal programs are rather ubiquitous things.  :)

MaV

Quote from: litwr on 17:29, 22 December 21BTW the program uses the image y-symmetry, 14.5-bit arithmetic, and a lookup table for squares.
I am wondering how the data in the lookup table is organised. I was pondering the idea back in the day, but a 64k*2 byte table for all possible square values is simply not possible on a stock 6128.
That is probably the one optimisation that will give you the highest speed boost. Even if not using a table, a separate square function can improve speed a bit (which I used back then).

You could also implement a cardioid check to avoid calculating to maximum depth in most of the inner area.
Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

ralferoo

The biggest speed up will come algorithmically... ;)


If you remember fractint on the PC, it was already very fast and then they added the line following mode, where it decides which pixels to calculate by following a line until the number of iterations changes, then does all the points in that area, and then tries to follow the edge of the region with the same iteration count. When it's made a completely enclosed loop, it can flood fill the area.


There are some gotchas - it's possible to miss some fractal features if the box you trace completely contains it, which is most obviously true for a zoomed-out mandelbrot. Also, when you're zoomed in a lot, there will be areas where this entirely breaks down - in the interesting areas, the iterations per pixel will be changing a lot, and you'll need to give up edge tracing in these parts.


As an aside, you might want to also look at doing Julia sets. These are also done using the equation z'=z*z+c, but instead of c being the initial coordinate, c is a constant across the entire set. You can actually think of the Mandelbrot set as a specialism of the Julia set that's useful for finding interesting areas, but if you find an interesting area of the Mandelbrot set, the Julia sets around there will also be very interesting and you can get some really pretty animations by calculating all the Julia sets with c going from one interesting point on the Mandelbrot to another.


Gryzor

Haha Fractint-that brought back some memories!

litwr

#16

Thanks but I didn't have an intention to make the fastest Mandelbrot.  ;)  I have just ported a tiny good program.  I didn't even change Mandelbrot parameters.  IMHO I could have chosen other values for better pictures.
Quote from: MaV on 23:08, 25 December 21
I am wondering how the data in the lookup table is organised. I was pondering the idea back in the day, but a 64k*2 byte table for all possible square values is simply not possible on a stock 6128.
That is probably the one optimisation that will give you the highest speed boost. Even if not using a table, a separate square function can improve speed a bit (which I used back then).

You could also implement a cardioid check to avoid calculating to maximum depth in most of the inner area.
The algo uses an 11.5 KB lookup table for squares.  The author of the original program made also a variant for the ZX Spectrum, he published it only recently - https://github.com/smaslovski/foobar - this code is documented better than mine (in particular the table structure is described there)  but it uses 14-bit arithmetic and this made it a bit faster.  He also used completely different coding based on the SP use that doesn't make the code faster but disables interrupts.

MaV

Quote from: ralferoo on 09:40, 28 December 21
The biggest speed up will come algorithmically... ;)


If you remember fractint on the PC, it was already very fast and then they added the line following mode, where it decides which pixels to calculate by following a line until the number of iterations changes, then does all the points in that area, and then tries to follow the edge of the region with the same iteration count. When it's made a completely enclosed loop, it can flood fill the area.
I remember fractint well. Back when I did program MaVdelbrot I wondered if I should implement such an algorithm. I decided against it, simply because I tried to avoid any discussion over why not every pixel is calculated and worrying over somebody calling it cheating. There's always the one guy who rears his ugly head ...
Nine years later I mostly don't care about these, simply because you need to learn to get over the BS on the internet because the amount of it exploded in the last years, but I cared about that back then.

QuoteThere are some gotchas - it's possible to miss some fractal features if the box you trace completely contains it, which is most obviously true for a zoomed-out mandelbrot. Also, when you're zoomed in a lot, there will be areas where this entirely breaks down - in the interesting areas, the iterations per pixel will be changing a lot, and you'll need to give up edge tracing in these parts.
With 16-bit fixed point arithmetic, I would not worry too much about zooming in. ;)


QuoteAs an aside, you might want to also look at doing Julia sets. These are also done using the equation z'=z*z+c, but instead of c being the initial coordinate, c is a constant across the entire set. You can actually think of the Mandelbrot set as a specialism of the Julia set that's useful for finding interesting areas, but if you find an interesting area of the Mandelbrot set, the Julia sets around there will also be very interesting and you can get some really pretty animations by calculating all the Julia sets with c going from one interesting point on the Mandelbrot to another.
If I recall correctly, while I programmed MaVdelbrot I figured that Julia sets need a bit more precision over the mandelbrot set; not by much, but again 16-bit arithmetics will limit you.
I may have been completely wrong, and unfortunately I can't remember what led me to that conclusion, so take it with a grain of salt.


fractint - if you remember - had a preview mode for Julia sets in the later versions that drew an approximation of the Julia set in a small window next to a mandelbrot set and you could cruise around with the cursor keys. The longer one stayed at one spot in the mandelbrot the clearer the contours of the Julia set emerged out of the pixel chaos. This was I think the first time next to reading about it that I fully realised that a Julia set and the corresponding part of the mandelbrot showed self-similarity.

Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

MaV

Quote from: litwr on 19:04, 28 December 21
Thanks but I didn't have an intention to make the fastest Mandelbrot.  ;)  I have just ported a tiny good program.  I didn't even change Mandelbrot parameters.  IMHO I could have chosen other values for better pictures.
Well, you are always on the lookout for programs to compare retro computers, so such a thing was bound to happen. ;)


I could look for the values of the MaVdelbrot pictures. But you'd need to draw 320x200 and in overscan (I can't recall now which dimensions I used for overscan).



QuoteThe algo uses an 11.5 KB lookup table for squares.  The author of the original program made also a variant for the ZX Spectrum, he published it only recently - https://github.com/smaslovski/foobar - this code is documented better than mine (in particular the table structure is described there)  but it uses 14-bit arithmetic and this made it a bit faster.  He also used completely different coding based on the SP use that doesn't make the code faster but disables interrupts.
Interesting. And I have never heard of a squaring algorithm before. Seeing how Maslovski uploaded this code two years ago, this must be a recent discovery, right?

Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

litwr


Happy New 2022 Year to All!  :)
Quote from: MaV on 19:28, 30 December 21
Well, you are always on the lookout for programs to compare retro computers, so such a thing was bound to happen. ;)


I could look for the values of the MaVdelbrot pictures. But you'd need to draw 320x200 and in overscan (I can't recall now which dimensions I used for overscan).
It would be interesting to know the details. ;)

Quote from: MaV on 19:28, 30 December 21
Interesting. And I have never heard of a squaring algorithm before. Seeing how Maslovski uploaded this code two years ago, this must be a recent discovery, right?
It was a private repo, he made it public only several weeks ago because of my request.  ;)  However a similar algo was used in a good program for the BBC Micro a year before - https://stardot.org.uk/forums/viewtopic.php?f=54&t=15116

Powered by SMFPacks Menu Editor Mod