Screen fill benchmark?

IndyUK · 12:14, 02 February 22

Hi Everyone,

I've been experimenting with using the Stack to plot pixels to the screen as it's quite evident from everyone here that using the Stack is the fastest way. So, now that I have something up and running and it's filling the screen with a colour, it's looking good. However, I read in a very old post here that in order to know whether what I have is fast, I need to be able to beat the vsync time of 20K us. Is that true? Is that in all screen modes? Judging by the timings I have conducted, the best I have for a flood fill is over 100K us in Mode 1. I used the builtin Timer in WinApe to do the test.

FYI - In order to stay below the vsync time, I was only able to plot ~1000 pixels, which by my account won't really amount to many sprites on screen.

What do you think?

Thanks

roudoudou · 12:49, 02 February 22

Fill up screen need to fill 16384 bytes (in fact less...) so this can be made with 8192 PUSH => 32768 us => 1.64 vsync

IndyUK · 13:36, 02 February 22

Quote from: roudoudou on 12:49, 02 February 22
Fill up screen need to fill 16384 bytes (in fact less...) so this can be made with 8192 PUSH => 32768 us => 1.64 vsync

Trouble is, I have one routine to fit all (I know not the best way - blame high level languages for giving me this habit). So basically I pass in the various data values into the required registers and then call the routine. I suppose I really designed it for plotting sprites which probably it will work ok'ish.

Based on your stats, it doesn't appear that we could ever beat the vsync. There would inevitably always be screen tearing, which probably explains why most CPC opted for the Speccy screen size. As a further experiment I might try resizing the screen and running my code on that and see what time I get.

gurneyh · 18:08, 02 February 22

Hi,

You can't race the beam, but is it really necessary to redisplay everything?
And on the cpc, it is possible to use page flipping to avoid tearing.

Nworc · 18:20, 02 February 22

Quote from: IndyUK
However, I read in a very old post here that in order to know whether what I have is fast, I need to be able to beat the vsync time of 20K us. Is that true?

20k us is just the time a vsync takes. I would call it a frame. I think what they mean is, that whatever you do must fit within that frame or else the result would be something below 50FPS, and hence would just not look nice because of the stuttering (except for if what you are doing persuades with 25 FPS).

One trick you already mentioned: reduce the screen size. There are many more options to this.

The need to touch each and every byte of the screen area is very questionable as this would only rarely be required. One trick can be to know which areas on the screen really need be touched to make Frame 2 out of Frame 1, so in most cases you need to touch less than these 16384 bytes to get Frame 2 out of Frame 1. You can see your program as the implementation of a function which takes Frame 1 plus some state as input parameter to produce Frame 2.

Quote from: IndyUK
Judging by the timings I have conducted, the best I have for a flood fill is over 100K us in Mode 1. I used the builtin Timer in WinApe to do the test.

You can also use a calculator: a screen is considered the size of 16384 as roudoudou said. Using Push to touch every byte is as fast as you can get, and costs you 4 NOP/word = 2 NOP/byte = 16384*2 / 20000 = 1.64 vsync or frame. You can use microsecond and NOP synonymously, as these two things are almost the same on the Amstrad.

IndyUK · 20:44, 02 February 22

Quote from: gurneyh on 18:08, 02 February 22
Hi,

You can't race the beam, but is it really necessary to redisplay everything?
And on the cpc, it is possible to use page flipping to avoid tearing.

Is it at all possible to query where the beam is at any given moment? i.e. which address line has it reached.

roudoudou · 21:57, 02 February 22

You can only compute where you are
Or do estimation, not necessarily a us précise calculation

lmimmfn · 02:14, 03 February 22

What about splitting the screen in 2, split half the screen horizontally showing the same mem, filling one screen fills the other and saves half the time.

Jean-Marie · 04:33, 03 February 22

Quote from: IndyUK on 20:44, 02 February 22Is it at all possible to query where the beam is at any given moment? i.e. which address line has it reached.

I had read somewhere an interview of Elmar Krieger regarding his game Prehistorik 2. He said he didn't use a double buffer because it was too memory-demanding, and he wanted the game to run on 64 Kb.
He explained he queried the position of the electron beam before redrawing the sprites, to make sure it would not interfere and create any flicker. I'm gonna try to find back his exact words.

Edit: find it there : https://es.paperblog.com/los-trucos-de-programacion-de-la-demoescena-siguen-siendo-utiles-en-la-era-de-los-ghz-y-gb-entrevista-a-elmar-krieger-5969026/
"How about your printing sprites routines? What made them so efficient?
Titus wanted the games to run on the CPC 464 with 64KB, and together with the large overscan resolution this made it impossible to use page flipping (i.e. having two screens and updating one while showing the other). The sprite routine in Super Cauldron therefore kept track of the position of the electron beam in the monitor, and of sprite positions, sizes and overlaps, and scheduled the background restoration and sprite drawing such that it was not caught in the middle by the electron beam, thus avoiding flicker without double buffering."

Axelay · 11:54, 03 February 22

Quote from: IndyUK on 12:14, 02 February 22I've been experimenting with using the Stack to plot pixels to the screen as it's quite evident from everyone here that using the Stack is the fastest way. So, now that I have something up and running and it's filling the screen with a colour, it's looking good. However, I read in a very old post here that in order to know whether what I have is fast, I need to be able to beat the vsync time of 20K us. Is that true? Is that in all screen modes? Judging by the timings I have conducted, the best I have for a flood fill is over 100K us in Mode 1. I used the builtin Timer in WinApe to do the test.

I wouldn't agree that using the stack is the fastest way to write to the screen as a blanket statement. There is overhead involved in setting it up to write to the screen above what you would need for a standard register pair, so for 'small runs' as with sprites only a few bytes wide, it possibly isn't the best approach. You might be better off using the stack to read the sprite data instead, assuming you aren't using direct addressing, because then you'd only need to set up the stack once to read the entire sprite data rather than needing to alter it for new pixel lines on the screen.

If your whole screen flood fill is taking over 100k us, I can't help but think you have taken a wrong turn somewhere.

Even using a basic LDIR on the screen memory will do it in just under a 100k us! In terms of clearing a screen, the way I would do it to avoid tearing, if not using a double buffer as already suggested, would be to wait for vsync, set the inks all the same (or maybe in mode 0, change to mode 2 and set just one colour) then clear the actual screen data however you like after that in whatever time it takes. Then wait for vsync again to set the colours back.

IndyUK · 13:11, 03 February 22

Quote from: Axelay on 11:54, 03 February 22If your whole screen flood fill is taking over 100k us, I can't help but think you have taken a wrong turn somewhere. Even using a basic LDIR on the screen memory will do it in just under a 100k us!

You know I'm beginning to wonder whether my routine is not the best approach for a flood fill as it was mainly designed for typically sized sprites. However, if one thought of the entire screen as a large sized sprite, I wouldn't have thought that it would matter as long as the code was efficient in the first place but, I do see your point about not necessarily being the correct approach/option. As I mentioned in my original post, I used the WinApe Timer function to measure the time. I placed a breakpoint one instruction before the start, where I reset the WinApe timer and on the final RET where I read off the final value. Is that how it's done?

In your humble opinion, could you please give me some sort of indication of how long a 16x16 block should take to fill? I am still learning this stuff and as you say maybe my code needs to be re-examined.

Thanks

Axelay · 14:41, 03 February 22

Quote from: IndyUK on 13:11, 03 February 22
You know I'm beginning to wonder whether my routine is not the best approach for a flood fill as it was mainly designed for typically sized sprites. However, if one thought of the entire screen as a large sized sprite, I wouldn't have thought that it would matter as long as the code was efficient in the first place but, I do see your point about not necessarily being the correct approach/option. As I mentioned in my original post, I used the WinApe Timer function to measure the time. I placed a breakpoint one instruction before the start, where I reset the WinApe timer and on the final RET where I read off the final value. Is that how it's done?

In your humble opinion, could you please give me some sort of indication of how long a 16x16 block should take to fill? I am still learning this stuff and as you say maybe my code needs to be re-examined.

Thanks

Yes, that sounds right about using the Winape timer.

By 16x16, do you mean 16x16 mode 1 pixels? You mentioned mode 1 and pixels in your original post, so I will assume that you mean 4 bytes wide by 16 byte high sprites for 64 bytes in total.

So as a rough estimate on that basis, perhaps 8 scan lines / 512 nops for a direct addressed sprite that size, or banked sprite over blank background, or twice that for a traditional banked sprite with masking. Though you need to consider adding between half to as much as that again for clearing each sprite too, depending on your background.

In the case of a banked sprite, I have assumed unrolling the code for each byte in a pixel line, and not using a DJNZ or whatever for each and every byte. If you want fast code a loop of that frequency just adds too much time to your routine. In my opinion, of course.

News:

Screen fill benchmark?