News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_ervin

triple buffering

Started by ervin, 04:13, 22 April 24

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ervin

Hi folks.

I'm experimenting with using the extra RAM in the 6128, and I've read a few posts about triple buffering.
I'm trying to figure out how I could benefit from triple buffering, and how to use it sensibly.
Can anyone offer some tips?
Or share any experiences with triple buffering?

Thanks!

andycadley

The problem with triple buffering on the CPC comes down to only having 64K of usable video memory and each screen display being 16K. If you can reduce the usable area down to 8K, I guess it would be more manageable although your screen would be tiny. 

There's also the issue of finding a convenient memory arrangement that lets you access the right pages of RAM both for reading graphics data and writing to whichever target screen you need at the time. The 128K banking arrangements aren't terribly helpful in that regards, especially given you need suitable locations for things like the interrupt handler and stack too.

It's probably less of an issue in a GX/Plus cartridge game, because you can bank ROM much more flexibly and RAM is a lot more plentiful when you don't need to store code in it. You also have the benefit of the screen split hardware which could help having to replicate things like a status bar in different banks.

McArti0

You can create a system with software only for addresses 4000-7FFF based on jumps to other banks and use the C0, C4-7 settings
Then you have access to 3 buffers bank 0,2,3. Of course the firmware doesn't work. Mode 2 interrupts, common stack in each bank.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

McArti0

But we've already talked about this:
https://www.cpcwiki.eu/forum/index.php?msg=235994

Maybe tell us where you have doubts?
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

ervin

#4
Thanks for your replies guys.
I've managed to make stuff work with different ram banking schemes (thanks to the discussions in that other thread), but I haven't tried any code related to triple buffering yet.
I'm still at the thinking and understanding stage.
Assuming I can figure out how to do it, what would the benefit be?

@abalore has mentioned that triple buffering was one of the most important optimisations in Alcon.
https://www.cpcwiki.eu/forum/programming/hyperdrive-development/msg223048/#msg223048

And @arnoldemu mentioned a long time ago that by using triple buffering, there is no need to sync with frame flyback.
https://www.cpcwiki.eu/forum/amstrad-cpc-hardware/60hz-cpc/msg4856/#msg4856

@gerald mentioned something similar here.
https://www.cpcwiki.eu/forum/programming/frame-locking/msg65913/#msg65913

Is that potentially the main benefit?
Would I do something like the following?

buffer A is currently visible.
buffer B has been drawn completely.
buffer C is being drawn to.

When buffer C is finished, buffer A goes to the back of the queue, buffer B is made visible, and buffer C moves up. Buffer A then starts getting drawn to.

buffer B is currently visible.
buffer C has been drawn completely.
buffer A is being drawn to.

When buffer A is finished, buffer B goes to the back of the queue, buffer C is made visible, and buffer A moves up. Buffer B then starts getting drawn to.

etc.

Is that the idea?

GUNHED

Quote from: McArti0 on 07:56, 22 April 24You can create a system with software only for addresses 4000-7FFF based on jumps to other banks and use the C0, C4-7 settings
Then you have access to 3 buffers bank 0,2,3. Of course the firmware doesn't work. Mode 2 interrupts, common stack in each bank.
A common stack in each bank? So you copy the stack when switching the bank? Won't work in real life. Better use a own stack for every bank.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2024.10.27)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

GUNHED

The whole triple buffer thing makes only sense if the time needed to draw one frame has a huge variation from screen to screen.

Or you use two screens and one buffer for the background. Makes only sense if you work with static pictures - w/o scrolling.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2024.10.27)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

McArti0

Quote from: GUNHED on 15:32, 22 April 24A common stack in each bank? So you copy the stack when switching the bank?
NO. I understand this as no change SP when changing banks.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

andycadley

Quote from: GUNHED on 15:32, 22 April 24
Quote from: McArti0 on 07:56, 22 April 24You can create a system with software only for addresses 4000-7FFF based on jumps to other banks and use the C0, C4-7 settings
Then you have access to 3 buffers bank 0,2,3. Of course the firmware doesn't work. Mode 2 interrupts, common stack in each bank.
A common stack in each bank? So you copy the stack when switching the bank? Won't work in real life. Better use a own stack for every bank.
It can work as long as you're very, very confident about when bank switching will occur. It's pretty much a sure fire route to shooting yourself in the foot unless you're enormously careful though.

andycadley

The goal of triple buffering is never to be stalled waiting for a display flip to happen (unless rendering is so quick it takes less than a frame).

It's a more complex setup though and RAM gets very tight. More often than not you can get away with just double buffering if you can keep rendering time down to a minimum. It's really only when it takes between a frame and a frame and a half to render on average that you're really likely to be winning.

abalore

I don't know if you would call it triple buffering. Alcon had two switching visible buffers and a third invisible buffer which holds a clean copy of the background. That makes the sprite erasing in the other two buffers a lot faster.

andycadley

Yeah, that's not triple buffering, it's just using three buffers. The clean background doesn't even need to be accessible to the Gate Array, which simplifies things even further.

McArti0

#12
I've read it and I understand it now. Triple buffering is needed when most frames are rendered in 20ms and suddenly some frames are slightly longer or even one frame is twice as long.
Then it's time to render it because we have two left to display. Of course we have a lag of 40ms.

On CPC with the 512kB extension, we can create even a quadruple buffer with the C1+, C3+ settings because we have many banks instead of bank7 and we see the entire 64kB as vram.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

djaybee

So, yeah, triple-buffering has the advantage that you (OP) describe.

You can work with a single buffer, but it's got to remain clean enough at all times (e.g. constantly working incrementally).

You can work with a double buffer, where one buffer (front) is clean and gets displayed while you work into the other buffer (back). If you can't immediately page-flip (either because the hardware doesn't support it or because you don't want the tearing to be visible), you end up in a situation where you've finished drawing into your back buffer and you have to wait for the page-flip before you can draw your next frame.

With a triple buffer, you have 2 back buffers, once you're done drawing into one of them, you can immediately start drawing into the other one, so you have a higher frame rate compared to double-buffer since you can always start drawing (with the understanding that there'll be some judder). This is not useful if you can draw faster than your refresh rate, but that's a good problem to have.

Double- and Triple-buffering on the CPC face 3 main challenges:
-all the buffers must fit in the low 64kB of RAM.
-if you use the firmware, it carves out 2 chunks of RAM in those low 64kB.
-bank mapping only has a small number of configurations, which creates constraints. E.g. using banks 1 and 3 for buffers feels natural, until you realize that no memory config maps bank 1 at the same time as banks 4-6, so you can't directly copy graphics from banks 4-6 to bank 1.

For triple-buffering specifically, the smaller your screen, the easier things get. Smaller than 5.33kB, 3 buffers fit in 1 banks, that's very easy. Smaller than 8kB, 3 buffers fit in 2 non-contiguous banks, that's very feasible and very similar to regular double-buffering. Smaller than 10.66kB, 3 buffers fit in 2 contiguous banks. Up to 16kB, 3 buffers need 3 banks, Beyond that and up to 21.33kB, you need all 4 banks and madness lies ahead (you can do resolutions like 312x280, 352x248, 376x232 in mode 1).

ervin

Thanks everyone for your replies.
All of that gives me a lot to think about.

Anthony Flack

I'm not aware of any CPC games that use triple buffering, are there? 

I am doing the exact same thing as abalore; I have a front buffer, a back buffer and a clean buffer for restoring the background. I guess we both independently concluded this was fastest. 

ervin

Thanks Anthony.
It sounds like it might be the most useful idea for me as well.

Anthony Flack

If you want to do something along these lines, the memory arrangement I used:

Main code goes in bank 0.

Front/back buffers are in bank 2 and 3 at &8000 and &c000.

Clean buffer, compiled sprites, and any other code all swap in to bank 1 at &4000, so that you can copy from any of these banks into either screen buffer. 

ervin

Sounds like a good scheme.
Thanks!

djaybee

Deep inside, I can't stop thinking about a closely related question: what situations would benefit from triple-buffering, i.e. what are the types of graphics where the performance gains of triple-buffering outweigh the memory costs?

Typically, triple-buffering results in an unsteady frame rate, which is especially visible at high frame rates, such that it's not necessarily a good idea for situations where the level of complexity is similar from frame to frame (which is the case for anything that's heavy on sprites and background graphics). On the other hand, 3D games, especially those drawn with polygons, might have a slower frame rate and and an inherently unsteady frame rate, such that those might be more appropriate situations for triple-buffering. In good news, those games might rely less on backgrounds and sprites and bitmaps in general, so the memory pressure from having 3 buffers might not be so high.

No, I don't have time to try this (I already have a lot of code on my plate), but, if I did, especially on 6128, I'd be using a 136x160 mode 0 display, with my buffers in banks 2 and 3, core code in bank 0, and banks 1 and 4-7 for situations where code and data can be paged in and out. In such a scheme, I would use memory modes 0 and 4 through 7, but I also note that memory mode 2 could be useful if there's some compressed data that needs to be decompressed on demand (with all the usual caveats about memory mode 2 and interrupts, of course).

McArti0

Quote from: djaybee on 13:02, 25 April 24Deep inside, I can't stop thinking about a closely related question: what situations would benefit from triple-buffering, i.e. what are the types of graphics where the performance gains of triple-buffering outweigh the memory costs?
Fast gameplay 50fps with huge explosions . You need to copy 10kB to screen.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

andycadley

Triple buffering doesn't necessarily produce a more variable frame rate, indeed the main reason for doing it is a more consistent frame rate than double buffering in any case where double buffering can stall due to excessive waiting.

Typically it works best when render time for a frame is a little over a frame (or a little over two frames etc). And, of course, you can still rate limit actual frame swaps to make things more consistent.

I don't think it's worth the pain on a standard CPC. As I said it might be more interesting on a GX4000 game as you have a lot more free RAM to play with, a lot more flexibility in terms of screen splitting and you can run much more from ROM to give you a much larger effective address space (assuming your screen buffers can be write only).

GUNHED

Sorry, the GX4000 has only half the RAM compared to the CPC6128. Furthermore the CRTC RAM is 64 KB in both cases. ROM is another thing. The GX4000 itself has none.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2024.10.27)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

andycadley

Quote from: GUNHED on 15:04, 25 April 24Sorry, the GX4000 has only half the RAM compared to the CPC6128. Furthermore the CRTC RAM is 64 KB in both cases. ROM is another thing. The GX4000 itself has none.
It doesn't work like that in practice.

When you write cartridge software, you don't typically need much actual RAM for storing stuff (because 99% on code and assets can be in ROM), thus you typically have almost all of the 64K free to dedicate to whatever the CRTC needs. And you can be a lot smarter about arranging things by leaving ROMs paged in and relying on write-through for updates (assuming you don't need masking etc) which gives you an effective usable address space of 96K (and obviously up to 512K of code/data space in total).

On a 128K CPC most of the RAM tends to end up storing code + assets and you have to work with just 64K of effective address space (assuming you're not ROM software).

djaybee

Quote from: McArti0 on 14:18, 25 April 24Fast gameplay 50fps with huge explosions . You need to copy 10kB to screen.
Oh, interesting, I hadn't considered that explicitly. Essentially, if most frames take well less than 20ms but some take more than that, having a triple buffer allows to "borrow" time from a short frame into a long latter one.

I think I once did something like that in a demo for the Atari ST: my code took near-constant time, but the music player I used didn't, and I made it so that the average would fit in my 20ms budget (40064 NOPs on the ST). Overall, the delay never added up to more than the size of the bottom + top borders, so my frame boundary never crossed into the visible part of the display.

Powered by SMFPacks Menu Editor Mod