News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_cngsoft

Proof of concept: adaptive software sprite renderer

Started by cngsoft, 14:50, 27 September 12

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

cngsoft

While thinking in the many obstacles I'm going to run into in the development of "Parasol Stars" for CPC, I started thinking in how to improve the performance of a general purpose sprite renderer, knowing that the majority of sprites are basically rectangles of decreasing opacity as we move away from the centre towards the edges.

The starting point will be the all-purpose minimal scanline renderer, as defined by Executioner a while ago:

; BC points to the sprite pixels, DE points to the target buffer
; H (L is irrelevant) points to a segment-aligned precalculated mask table

ld a,(bc) ; 2 NOPs
inc c ; 1 NOP (inc bc, 2 NOPs if not every sprite is contained within one segment)
ld l,a ; 1 NOP
ld a,(de) ; 2 NOPs
and (hl) ; 2 NOPs
or l ; 1 NOP
ld (de),a ; 2 NOPs
inc e ; 1 NOP (inc de, 2 NOPs if not every scanline is contained within one segment)
...


This means that (depending of whether we can keep all data aligned to 256-byte segments or not) the work of rendering one byte requires between 12 and 14 NOPs; for further bytes, we can either repeat the whole code several times on memory (unrolling) or use an additional register as a counter (looping), for example DEC XL: JR NZ,DRAWBYTE: RET; either way, the "..." stands for the bridge towards the next iteration, either unrolled or looped.

However, if we assume that in most cases the pixel data of each scanline will start with blank pixels, then follow with opaque pixels, and end with blank ones again, we'll notice an interesting feature: fully blank bytes (the ones being read from BC) will be equal to 0, and fully opaque pixels will have their mask table values (those found in HL) set to 0 too.

Here's my first attempt at taking advantage of this property. It assumes the same (lack of) features from the minimal example:


.readnull ; expecting blank bytes
ld a,(bc)
inc c
and a
jr z,.drawnull ; found a fully blank byte
; not a blank one, we must mask it
ld l,a
ld a,(de)
and (hl)
or l
ld (de),a
inc e
...
; we assume opaque bytes follow
.readfull ; expecting opaque bytes
ld a,(bc)
inc c
ld l,a
ld a,(hl)
and a
jr z,.drawfull ; found a fully opaque byte
; not an opaque one, we must mask it
ld a,(de)
and (hl)
or l
ld (de),a
inc e
...
; we assume blank bytes follow
jr .readnull
.drawnull ; draw nothing at all
inc e
...
jr .readnull
.drawfull ; draw a whole byte
ld a,l
ld (de),a
inc e
...
jr .readfull


Again, the "..." would stand for whatever code handles the next bytes to be drawn, if any: for example, using XL again as a counter, it could be DEC XL: RET Z. No optimisations have been done either besides the reusing of registers wherever possible.

What do you think of this idea? Does it have any potential of being useful?
(if you can't see the banner right now my server is currently offline)

Axelay

My feeling is the saving sounds a bit marginal?  The core bit of code that you are looking to avoid is


ld l,a
ld a,(de)

and (hl)
or l
ld (de),a



So that's a piece of code that's 8 NOPS if I have that correct.  but to test to skip it, using


and a

jr z,drawnull


is 4 NOPS if it is blank, and 3 if not.  (all timings off the top of my head though!)
So if the ratio of blank to non blank bytes was 50/50, you'd use 4 NOPS for the blank and 3+8 for the non blank for a total of 15, versus 2x8 if you didnt test at all, which is only a saving of just 1 NOP for every 2 bytes thought to be possibly blank.  Perhaps it might be worth it if you are printing a lot of sprites with quite irregular shapes, and you're sure the proportion of blank start & end bytes is very high, but it will end up being slower if much less than 50% of those bytes are in fact blank.  Compared to the saving from something like unrolling the loop of printing the bytes (not having the dec ixl, jr nz,drawbyte)  that's what, 5 NOPS for every byte you would save if having to use ixl?

Might a better saving be had by having a not so general purpose sprite routine, or rather two, one which masks all the bytes, used for the top and bottom of sprites (if you are cutting sprites into pieces) and another for the larger middle parts that presumes a certain amount of the centre does not require masking at all, if that suits the sprites?

arnoldemu

If you sprite has lots of wasted space around the edge, consider generating "compressed" sprites:

A code for each of these:
- skip x bytes
- mask x bytes
- direct write x bytes
- end of line

A general line could become:

<skip x bytes> <mask 1 byte> <direct write x bytes> <end of line>


Pre-process the gfx to generate this data, and replay it to uncompress. The compression code could decide best case depending on the branches and number of repetitions.

good where sprites are big like in final fight. a pain to clip on the left/right side :(

Something I considered doing but didn't try yet.
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

SyX

I use a variant of the arnoldemu "compressed sprites", i split the masking and the printing process, and instead of having precompiled sprites, i have precompiled masks (a python script take an sprite and generate a list of instructions using the next commands COPY_BG2SPR, MASK_LEFT value, SKIP value, MASK_RIGHT value).

cngsoft

Thanks for the replies, they're very appreciated. After all, this is a game where these behemoths must move on the screen!



Arnoldemu's solution provides the best performance IMO, albeit with the requirement of a nonstandard sprite data storage (no more peeking with WinAPE) that nevertheless takes just two or three lines of C to parse and generate. I'm already pondering a format where a couple of bytes could store the widths of the blank edges and the opaque body, and whose only flaw is that internal gaps would get merged with the body, as seein in the areas painted in purple of the third sprite:



I'll keep thinking on how to encode the purple gaps within that couple of bytes, although it's a problem that will probably solve most of itself while I redraw the graphics for MODE 0 and details get unavoidably lost in the procedure.

(also, hello, new profile icon!)
(if you can't see the banner right now my server is currently offline)

Powered by SMFPacks Menu Editor Mod