News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_ervin

Real-time sprite scaling

Started by ervin, 16:32, 26 June 21

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ervin

Hi everyone.

For a long time I've been very interested in the way sprite scaling worked back in the day, on the hardware that could actually do it.
I've been researching this topic for the last few months, with the intention of creating a sprite scaling routine for the CPC.

After running into a lot of dead-ends, I stumbled across the technique used by the Neo Geo console.
The Neo Geo created its huge sprites out of 16x16 pixel tiles, and each tile could be shrunk (by the hardware) to any one of 16 "magnifications".

I thought that maybe I had finally found a technique that would be viable for the CPC.
However, this has been an extremely challenging project, and there is still a long way to go.

The first thing I had to figure out was how to split a byte into individual pixel colours, and then plot that pixel colour individually where it is required. That's not so bad, admittedly (even though I was quite chuffed when I figured it out).
Then I had to figure out how to build a large sprite from lots of tiles. Again, not so bad.
The tricky part was figuring out how to scale each tile, even armed with the knowledge of how the Neo Geo did it.

So I turned to BlitzMAX, and got to work on a prototype.
That was a much easier way to figure out the finer points of the scaling algorithm, and when I finally got it working, I was very pleased with the results.

For the last few weeks I've been porting the prototype to SDCC (via cpctelera).
I now have something that runs, and produces the same results as my BlitzMAX prototype.

HOWEVER...

The current version is in unoptimised C, with only a few lines of z80 asm.
Also, each individual operation is currently a function call, because I had to break everything down into small pieces in order to make sense of the code as I was writing it.

As a result, this first version is SLOW.
When I say SLOW, I mean SLOW... like a snail crawling uphill through treacle.
Possibly slower.

We're talking 3 seconds per frame (yes, SPF, not FPS), when displaying the sprite at its largest size.
It does get faster as the sprite is scaled down (as it has less work to do), but if you're interested in trying the initial version of the program, please run it on the fastest speed that your emulator of choice provides.

Z scales the sprite down, and X scales it up.

Now to optimise!!!

Targhan

I have written this article a long time ago, but it is still relevant. Using fixed-point arithmetic to zoom a sprite is a good basis for a generic sprite zoom (you can see the result in the asteroid zoom in the introduction in Orion Prime!)
Heck, I even use this technique to play samples!
Targhan/Arkos

Arkos Tracker 2.0.1 now released! - Follow the news on Twitter!
Disark - A cross-platform Z80 disassembler/source converter
FDC Tool 1.1 - Read Amsdos files without the system

Imperial Mahjong
Orion Prime

Arnaud

Hi @ervin,
it's really interesting, even for real time animation it's too slow (for now) it could be really useful to have different sprite size without keeping in memory.

How many memory your scaling code takes ?

redbox

Really impressive work so far ervin.

Quote from: ervin on 16:32, 26 June 21The Neo Geo created its huge sprites out of 16x16 pixel tiles, and each tile could be shrunk (by the hardware) to any one of 16 "magnifications".

Have you considered using look up tables?  You could have a 256 byte table of bytes pre-calculated for each of the magnifications and reading the table would be really quick.

ervin

#4
Quote from: Targhan on 17:18, 26 June 21
I have written this article a long time ago, but it is still relevant. Using fixed-point arithmetic to zoom a sprite is a good basis for a generic sprite zoom (you can see the result in the asteroid zoom in the introduction in Orion Prime!)
Heck, I even use this technique to play samples!

That's a fantastic article - thanks for the link!

I've gone for a slightly different approach.
I actually used Bresenham's algorithm to help me determine which pixels to draw at each of the 16 magnification levels.

Also, I have another table which I use to determine which tile to change at each magnification level.
The sprite consists of 8x8 tiles, so I used Bresenham to create another table for 8 magnification levels.

Here is the 16x16 table:


u8 const scale_16[]={
0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,
0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,
0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,
0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,
0,1,0,0,1,0,0,1,0,0,0,1,0,0,1,0,
0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,0,
0,1,0,1,0,1,0,1,0,0,1,0,1,0,1,0,
1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,
1,0,1,0,1,0,1,1,0,1,0,1,0,1,0,1,
1,0,1,1,0,1,0,1,1,0,1,1,0,1,0,1,
1,0,1,1,0,1,1,1,0,1,1,0,1,1,0,1,
1,1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,
1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,
1,1,1,1,0,1,1,1,1,1,1,1,0,1,1,1,
1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
};


However, I've got a working prototype of a technique that should result in a large speedup, even without optimisations.
I'm going to use a function for each of the 16 magnification levels, and those functions won't rely on the 16x16 lookup table.

ervin

Quote from: Arnaud on 17:21, 26 June 21
Hi @ervin,
it's really interesting, even for real time animation it's too slow (for now) it could be really useful to have different sprite size without keeping in memory.

How many memory your scaling code takes ?

The entire BIN file is 10,087 bytes, but that includes 8K of sprite data.
So the program itself is less than 2K.
But that also includes a couple of small lookup tables.

Also, I'm using an 8K buffer to draw into, and then that buffer is doubled vertically to fill the screen.
So I'm effectively working with a 128x128 resolution.

The sprites are drawn at (mode 0) pixel accuracy, rather than at byte-aligned positions.

ervin

Quote from: redbox on 17:40, 26 June 21
Really impressive work so far ervin.

Have you considered using look up tables?  You could have a 256 byte table of bytes pre-calculated for each of the magnifications and reading the table would be really quick.

Yes, I am using some lookup data for the 16 magnification levels.
I've described it a bit better in my reply to Targhan.
8)

sigh

Have you spoken to Optimus?
Optimus created a fantastic demo that involved walking through a Doom style labyrinth with real time sprite scaling. You controlled this manually too.

ervin

#8
Quote from: sigh on 11:56, 27 June 21
Have you spoken to Optimus?
Optimus created a fantastic demo that involved walking through a Doom style labyrinth with real time sprite scaling. You controlled this manually too.

Is this the demo you mean?


https://www.youtube.com/watch?v=V0NXGO5b25k

It is indeed very impressive, especially considering it is written in C!
It looks as though it is drawing byte-width "pixels" (forgive me if I am mistaken), whereas I'm going for actual mode 0 width pixels.

sigh

Yes, it's in C, so it hasn't even been optimised and you can already see the potential if it were to be put into assembly. It's an old demo.

There was also another example shown on this forum and its was 3D scaling but you were walking through a gallery of sprites. That was amazing too.
When you say "...I'm going for actual mode 0 width pixels" what is the difference between the two?

redbox

Quote from: sigh on 09:17, 28 June 21When you say "...I'm going for actual mode 0 width pixels" what is the difference between the two?

One byte holds two pixels in Mode 0.  They're interleaved within the byte which complicates things...


ervin

#11
Quote from: sigh on 09:17, 28 June 21
Yes, it's in C, so it hasn't even been optimised and you can already see the potential if it were to be put into assembly. It's an old demo.

There was also another example shown on this forum and its was 3D scaling but you were walking through a gallery of sprites. That was amazing too.
When you say "...I'm going for actual mode 0 width pixels" what is the difference between the two?

That other project... do you mean this one?
http://www.cpcwiki.eu/forum/games/chunk-pixel-curator/

That's something I worked on many years ago.  8)
The "pixels" in that were byte-width.

redbox's explanation of the difference between byte-width and pixel-width is spot on.

andycadley

Quote from: redbox on 13:23, 28 June 21
One byte holds two pixels in Mode 0.  They're interleaved within the byte which complicates things...
I'm glad my scrawling in MS Paint lives on.  :laugh:

zhulien

There seems to be some ways to scale in hardware on the CPC but I don't know the specifics.  There is a SNES mode-7 type scroller in either The Demo or The Terrific Demo (sorry I can't remember which one) which is pretty cool that might be able to be used for sprites.  I also used some CRTC outs from an Amstrad Action which doubled the size of all display objects automatically which was kind of cool. 


Is there a wiki page that at least at the high level outlines which hardware tricks can be used on the CPC in general?  (can a CPC display like an Atari 2600 for example with almost no video RAM and direct writes to the video chips)?

Optimus

#14
Quote from: ervin on 12:52, 27 June 21
Is this the demo you mean?
It is indeed very impressive, especially considering it is written in C!
It looks as though it is drawing byte-width "pixels" (forgive me if I am mistaken), whereas I'm going for actual mode 0 width pixels.


That's a specialized vertical column scaler and indeed in byte only, also the zooming sprite in this video was just a brute force code as far as I remember, not well optimized yet. The rendering parts are in assembly even if the whole project is in C. I have other ideas for optimizing that sprite rendering with different unrolled assembly code, but not tried yet.


I think the fastest method I've done, especially if you want pixel perfect Mode 0 instead of Byte, is in my old demo X-Kore https://www.cpc-power.com/index.php?page=detail&num=7691
I lament I didn't use that method for way more effects I was thinking and with better graphics at the time.


But anyway, this is as good I could get it, the trick is that I do waste memory here, I have stored the bitmap in 4 different configurations, with shifted pixels or 2X scaled pixels, and in such a way that later a series of unrolled LDIs (different for different zoom levels) will go through the bitmap, and with INC/DEC H (or DEC L to delay for more scaling) will traverse through the precalced bytes and use them accordingly. So, this method will eat 4x your bitmap data, and some more space (maybe less than 16k) for a series of unrolled codes. This way, while the scaling will be pixel precise, the real code will copy bytes and avoid wasting CPU for merging left/right pixels, etc.


This is the fastest I could find to scale horizontally, and then you have control on the vertical which bitmap line to copy and where, so you can do a regular 2d scale or also the pseudo-3d effects or sine distorts/cylinder in the demo. The negatives are of course way more memory consumption, also this method is not gonna work if you want zooming sprites with transparent parts that should blend with the background. The method is more for solid bitmap lines over a black background.


I've found an old early assembly preview of the X-scaler which I'll try to attach. It's a bit dirty code, you'll see the series of LDIs for different zoom levels and then the bitmap lines replicated 4 times. One is like "12 34 56" the next is shifted right like "*1 23 45" then the two scaled versions "11 33 55" and "22 44 66" so the LDI source will be very near on top of these 4 lines and copy and traverse vertically with the fewest steps.


p.s. The code or idea might be intimidating or hard to read or generate other data for it. I've been recently rewriting the thing in a C project, which generates the bitmap and zoom data on the fly, to make it easier and more automatic for a future demo or game. I was thinking of using it to do some pseudo scaled road for a futuristic racing game with more scaling and offset data, but it's on hold right now. The issue is always the memory with this technique. Maybe on GX4000 with extra cartidge and on top hardware sprites for free this would work better.

sigh

Quote from: Optimus on 09:59, 29 June 21
but it's on hold right now.
:'( :(
I would love to see this technique used to create a third person game on the CPC. I could imagine it looking like a souped up version of Xybots.

Sykobee (Briggsy)

Quote from: zhulien on 09:08, 29 June 21
There seems to be some ways to scale in hardware on the CPC but I don't know the specifics.

Is there a wiki page that at least at the high level outlines which hardware tricks can be used on the CPC in general?  (can a CPC display like an Atari 2600 for example with almost no video RAM and direct writes to the video chips)?


There's nothing that can do scaling of objects on the screen. You can use rasters to repeat parts of the screen, even to an individual line, which can give a full screen vertical scale/warp effect seen in demos.


So a non-integer scaling factor routine for sprites would be all software - and a bit of a pain on a Z80 with the CPC screen layout. You'd presumably use 4.4 fixed point integer maths which would limit your scaling choices but it'd be a lot of overhead especially horizontally - but maybe you could design different width sprites for the same object and only scale the appropriate sprite vertically if you can accept a bit of oddness.


If you want bigger scaling graphics, you might choose to not use a bitmap, but a list of line-oriented drawing instructions (skip x pixels, plot x pixels, change colour, end of line), you can scale the x, and skip/repeat lines as necessary to scale the y. Again, not trivial, and there's only so much the CPC can draw in a frame or two anyway so really large scaled objects are not viable anyway.

zhulien

is this 3d scroll and ball scroll a SNES mode 7-like effect or a software effect?



https://www.youtube.com/watch?v=4VJtG44cscw


Targhan

It is only rasters, used in a clever way :). Trailblazer uses the same technic.
Targhan/Arkos

Arkos Tracker 2.0.1 now released! - Follow the news on Twitter!
Disark - A cross-platform Z80 disassembler/source converter
FDC Tool 1.1 - Read Amsdos files without the system

Imperial Mahjong
Orion Prime

eto

Quote from: Targhan on 09:43, 02 July 21
It is only rasters, used in a clever way :) . Trailblazer uses the same technic.


quick guess: 3x6 "font" and a screen that is filled with black left and right and in the middle 3 sized columns with color 1,2,3 which are then set on every scan line to make the impression of moving text? and every now and then the screen is filled with a different form of the columns.

Targhan

Yes, exactly. Using a hardware pause would reveal the underlying graphics.
Targhan/Arkos

Arkos Tracker 2.0.1 now released! - Follow the news on Twitter!
Disark - A cross-platform Z80 disassembler/source converter
FDC Tool 1.1 - Read Amsdos files without the system

Imperial Mahjong
Orion Prime

ervin

Hi everyone.

I've been working on speeding this up, and have made it quite a bit faster.
However, there is still a lot of work to do, and 99% of the code is still in C.
There are a *lot* of things I can do to make it run much faster.
8)

If you're interested, I've attached scale_1.dsk to the original post.

sigh

This is faster. It really speeds up when you zoom out to 1 pixel.
How much faster do you think you could potentially get this too run?

ervin

Quote from: sigh on 10:47, 04 July 21
This is faster. It really speeds up when you zoom out to 1 pixel.
How much faster do you think you could potentially get this too run?

There are *many* things left to do.
My loops are extremely innefficient (still prototype quality), my screen redraw is slow, and most important of all, almost everything is in C.
So there is immense potential for a huge speed up.
8)

ervin

Alrighty.

The loops have been made much more efficient, as I am now pre-determining the routine to run for each column of the sprite (8 columns), and storing that routine in an array of function pointers (previously I was making that determination for every row of every tile, which was extremely inefficient, both in terms of speed and code size).
Also, the screen redraw has been changed from being LDIR based, to stack-abuse based (though that code can be improved further).
These changes have resulted in an 8% speed improvement.

The next job is to try to improve the screen redraw code a bit more.
Then it will be time to start converting the scaling routines to asm, and then we'll see some real speed-ups.
Hopefully.
;D

Powered by SMFPacks Menu Editor Mod