News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_ervin

CRTC help - *** REWARD OFFERED ***

Started by ervin, 14:53, 09 April 13

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ervin

Hi all.

This question may be familiar to some of you, but I'm going to ask again because this time it should be possible!
;D

I've been writing a program for quite some time now called Chunky Pixel Collision.
It is using big fat "pixels", that are 4x4 mode 1 pixels in size (or really 2x4 mode 0 pixels, as the game runs in mode 0).

Therefore I only need to draw the first line of each pixel, and therefore I only want to the the first line of each group of 4 lines for the whole screen.
This means that the entire screen can be drawn by writing 4K of data.

I would like the other 3 lines of each pixel to be drawn by using the CRTC to repeat raster lines, much like how the chunky bits of Batman Forever work, or the Sugarlumps demo, or Overflow's Backtro.
I've been researching it a lot over the last few weeks, and I have to admit defeat.
:'(

I just can't get it, and I'm finding my motivation to continue with the project waning.
(cue violins)

The screen drawing phase of my program is predictable and uses the same number of NOPS for each raster line.

It is just this code 50 times:

ld hl,xx
ld de,yy
call LDI80


LDI80 is 80 unrolled LDI commands, followed by RET.

Can anyone help me?
I am literally offering to pay a monetary reward to someone that can make it work correctly and efficiently, in order to achieve the highest possible frame rate. I'd also like it to work with all CRTC types, if possible.

fano

A little question , what is the average speed of your program, in NOPs/µs, excluding drawing on screen part (the (un)famous LDI80  code)
"NOP" is the perfect program : short , fast and (known) bug free

Follow Easter Egg products on Facebook !

ervin

Do you mean the rest of the game loop?
The processing of the sprites and things like that?

If so, then the answer to that question is: it varies.
Every frame can take a different amount of time.

Would that matter?
I'm thinking that the screen-drawing part could be started by waiting for a vsync (after the part of the program where the speed can vary), and then it would be completely 100% and predictable every time the screen is drawn.

fano

I asked because there are differents approachs if you want really to use a CRTC approach.


For the first approach , we can consider the refreshing code works like your LDI drawing but using CRTC :


Your actual drawing something code takes something like 20knops , using CRTC screen line split would take something like 12Knpos.
The problem is you need to do this stuff every 20Knops (each monitor frame) so you have just something like 8Knops per frame.The result could be slower than using your LDI approach.


It is difficult to explain so here is a simple example :
For a frame , your renderer takes 20Knops and your other code 20Knops , total = 40Knops
If you use a CRTC rendering , refreshing takes 12Knops every 8Knops so we need to interrupt the main work.
So you have this :
frame 1 : 8 Knops work , 12 Knops refresh
frame 2 : 8 Knops work , 12 Knops refresh
frame 3 : 4 Knops work , 12 Knops refresh
Total = 20+36 = 56Knops !


A second approach would be more complicated for you but would be very powerfull.You may have to review your program design as you must to cut your code in time fixed parts (something like 48nops each).








"NOP" is the perfect program : short , fast and (known) bug free

Follow Easter Egg products on Facebook !

fano

#4
After reading for new your previous subject about this ( CRTC help ) , was reading your takes already something between 60 and 80 Knops just for drawing the savage sprite.Could i take a look to your code to see if it is possible to optimise some little things ? I think it is possible to do something faster than LDI (3nops/byte instead 5 but that needs some code modifications before)
"NOP" is the perfect program : short , fast and (known) bug free

Follow Easter Egg products on Facebook !

ralferoo

Quote from: ervin on 14:53, 09 April 13
It is just this code 50 times:

ld hl,xx
ld de,yy
call LDI80

Similar to the question (and answers ;) ) I gave the previous time you asked this, are xx and yy constant each time or do you want them to be easily patchable (i.e. at an obvious address)?

To me, it looks like you're trying to make a frame buffer of 80x50 bytes and then copy that to the screen each loop. It's relatively easy to knock this up, but I do wonder why you want to waste 20000 cycles just copying graphics data around anyway. You'd be far better just having the 2 buffers and flicking between them. Otherwise, you'll end up with having to take at least 2 frames per update, and half of that time will just be copying from one buffer to another.

So, it can be done, but I'd question if it's the right thing to be done...

arnoldemu

I have been thinking over how this can be done with hardware.

it comes down to line by line splitting, which means setting the screen address every line, this doesn't work on crtc type 2, and timing is critical. It would be possible to fill in the free time with other game code.

I did think of various hardware methods and none gave you it for free.

The only computer where it is potentially "free" (as in you can keep your code as it is, and use interrupts to trigger the repeated lines) is the plus.

If I have some time I'll create some examples of various ideas that could be explored.

As others have said, trying to do it in hardware, because of these problems, may end up slower for you, than doing it in software.

My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

fano

Quote from: arnoldemu on 08:34, 10 April 13
I have been thinking over how this can be done with hardware.

it comes down to line by line splitting, which means setting the screen address every line, this doesn't work on crtc type 2, and timing is critical. It would be possible to fill in the free time with other game code.
For sure , without programmable interrupt that would be difficult.I'll add you have to setup address only every 4 scanline if you set R9=0 and R4=0 so every 256nops , that may let you something like 240nops more or less to do something.Afaik (but never tried) there is a solution to make it compatible with CRTC2 with R9=1 and R4=0 with splitting scanline with R0 but it would consume more time.
"NOP" is the perfect program : short , fast and (known) bug free

Follow Easter Egg products on Facebook !

ervin

Quote from: fano on 15:58, 09 April 13
I asked because there are differents approachs if you want really to use a CRTC approach.

For the first approach , we can consider the refreshing code works like your LDI drawing but using CRTC :

Your actual drawing something code takes something like 20knops , using CRTC screen line split would take something like 12Knpos.
The problem is you need to do this stuff every 20Knops (each monitor frame) so you have just something like 8Knops per frame.The result could be slower than using your LDI approach.

It is difficult to explain so here is a simple example :
For a frame , your renderer takes 20Knops and your other code 20Knops , total = 40Knops
If you use a CRTC rendering , refreshing takes 12Knops every 8Knops so we need to interrupt the main work.
So you have this :
frame 1 : 8 Knops work , 12 Knops refresh
frame 2 : 8 Knops work , 12 Knops refresh
frame 3 : 4 Knops work , 12 Knops refresh
Total = 20+36 = 56Knops !

A second approach would be more complicated for you but would be very powerfull.You may have to review your program design as you must to cut your code in time fixed parts (something like 48nops each).

:(
Wow - sounds like it may be a lost cause after all!

Unfortunately it isn't really possibly to cut my code up into time-fixed parts, as even the rendering of a sprite will change from frame to frame, as my game involves real-time sprite scaling, with sprites growing a bit each frame. This is done through heavy use of self-modifying code.

ervin

Quote from: ralferoo on 19:08, 09 April 13
Similar to the question (and answers ;) ) I gave the previous time you asked this, are xx and yy constant each time or do you want them to be easily patchable (i.e. at an obvious address)?

To me, it looks like you're trying to make a frame buffer of 80x50 bytes and then copy that to the screen each loop. It's relatively easy to knock this up, but I do wonder why you want to waste 20000 cycles just copying graphics data around anyway. You'd be far better just having the 2 buffers and flicking between them. Otherwise, you'll end up with having to take at least 2 frames per update, and half of that time will just be copying from one buffer to another.

So, it can be done, but I'd question if it's the right thing to be done...

xx refers to the 1st "pixel" in the a line in the frame buffer (which is indeed 80x50).
It is the address to copy data from.

yy refers to the screen memory, so of course for the first line it will be &c000, for the 2nd it will be &e000, 3rdwill be &c050, 4th will be &e050 etc.

I can't afford to use double-buffering, as one of the goals of my project is to be able to run on an unmodified cpc464.
My game will have LOTS of sprites in it, of varying sizes, and I can't spare the memory.

Sprites will start off small (i.e. far away) and grow as they get "closer" to the player. But as they approach, new sprites will continue to enter the play area in the distance.
This means that they will need to be sorted, and drawn from furthest away to closest. There will be overdraw using this method.
Consequently, using a frame buffer seems to be the most efficient way to do it (though admittedly I'm still a bit of an amateur with z80, so I may be wrong!).

ervin

Quote from: arnoldemu on 08:34, 10 April 13
I have been thinking over how this can be done with hardware.

it comes down to line by line splitting, which means setting the screen address every line, this doesn't work on crtc type 2, and timing is critical. It would be possible to fill in the free time with other game code.

I did think of various hardware methods and none gave you it for free.

The only computer where it is potentially "free" (as in you can keep your code as it is, and use interrupts to trigger the repeated lines) is the plus.

If I have some time I'll create some examples of various ideas that could be explored.

As others have said, trying to do it in hardware, because of these problems, may end up slower for you, than doing it in software.

Definitely sounds like my software method is the way to go.

Thanks everyone (once again!) for your suggestions.
I'll stop bothering everyone with this concept now!
:-[

ervin

Quote from: fano on 17:03, 09 April 13
After reading for new your previous subject about this ( CRTC help ) , was reading your takes already something between 60 and 80 Knops just for drawing the savage sprite.Could i take a look to your code to see if it is possible to optimise some little things ? I think it is possible to do something faster than LDI (3nops/byte instead 5 but that needs some code modifications before)

You're most welcome to examine my code.
:)
Though it's pretty horrific and I understand if it sends you running away in fear!

Note also that it is in ccz80 with a heck of a lot of inline assembler.
(ccz80 is awesome).

I'm curious about your idea regarding something faster than LDI... what do you have in mind?

ralferoo

Quote from: ervin on 15:39, 10 April 13
xx refers to the 1st "pixel" in the a line in the frame buffer (which is indeed 80x50).
...
yy refers to the screen memory, so of course for the first line it will be &c000, for the 2nd it will be &e000, 3rdwill be &c050, 4th will be &e050 etc.
...
Consequently, using a frame buffer seems to be the most efficient way to do it (though admittedly I'm still a bit of an amateur with z80, so I may be wrong!).
The point is that say your framebuffer is at &8000, then you're making an exact copy of it to &c000 (ignoring line addresses for now).

Instead of rendering to your framebuffer and copying it to the screen, you might as well using double buffer flipping. The idea is that when &c000 is visible on screen, you draw to your framebuffer (now known as off-screen buffer) at &8000. When you flip, &8000 becomes visible and you can now draw to &c000 without it being visible on screen (assuming you flip and then wait for vsync). Flip again and you're back to &c000 being visible and drawing into &8000.

Now, back to the line addresses... If you're using the chunky mode, you can reconfigure the CRTC to have 1-pixel high characters, so actually you don't get the normal line interleaving effect at all. 1st line at &c000, 2nd and &c050, 3rd at &c0a0, etc... You can try having 2-pixel high characters, but these will then repeat (depending how you do it) and so you just need twice as much graphics data. Good for stippling, but more data. If you are doing it like this, the 2nd line would actually be at &c800 not &e000 and &d000-&ffff would be unused.

Powered by SMFPacks Menu Editor Mod