News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_eto

RSX question - performance

Started by eto, 12:12, 30 June 21

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

eto

How much performance does the RSX system cost compared to a direct call?

I had the situation, where I was drawing a tiny sprite. One routine for deleting the old sprite and replacing it with the old background and then one routine to draw the sprite. Something like this:


200 FRAME
210 |CLEARSPRITE
220 |DRAWSPRITE,x,y


A sprite moved fluently in the lower part of the screen but in the upper part it was just choppy. I recognised, that this was simply not fast enough and the sprite was deleted but not redrawn before the beam passed by the sprite position. As soon as I included the clearsprite into the drawsprite routine, everything was fine:

200 FRAME

220 |CLEARDRAWSPRITE,x,y

This (to me) only makes sense, that I am either missing something of that the overhead for the RSX is massive and that a direct call would make MUCH more sense in such situations.

Any thoughts?

asertus

Basically, an RSX is a normal Call. Just one indirection to retrieve the specific call from RSX list, the overload should be minimum.

Maybe the problem you have is that the sync of the raster was not ok, it was in the middle of the screen so you update and the lower part of the screen is ok but the upper is not.



andycadley

There's probably quite a lot of overhead in looking up the RSX name compared to a direct machine code call. But it's likely to be entirely negligible compared to parsing a line of BASIC, converting numeric text to proper numbers etc.

GUNHED

A direct CALL is always WAY more quit.


Also: The more RSX you have, the LOOOONGER it takes to find it.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

eto

#4
Quote from: GUNHED on 10:58, 01 July 21
A direct CALL is always WAY more quit.


Also: The more RSX you have, the LOOOONGER it takes to find it.


Indeed... there's a huge difference. I now did a quick check with a RSX that only executes a RET.

I compared the execution time of a CALL, RSX or 2 RSXs to a REM statement. The overhead is as follows:


with no parameters:

       
  • 0.49ms for a CALL
  • 0.95ms for a RSX at 1st position
  • 1.63ms at 8th position
with 2 parameters:

       
  • 1.49ms for a CALL
  • 1.95ms for the RSX at first position in the list
  • 2.64ms for the RSX at position 8
  • 4.59ms for calling two RSX (one without parameters) at position 7 and 8
The numbers are probably not exact and just as good as I can measure them in BASIC, but they show the issue.


The third is what I am doing now and the third is what I did before. This is a lot of overhead compared to the 20ms I have per frame, so it's no wonder that  the sprite vanished. It was probably just fast enough to be deleted but then not fast enough to be drawn, before the part of the screen was displayed.

Definitely worth to consider to put time critical RSX to the front or even use calls instead.

(fun fact: each parameter adds almost exactly 0.5ms to the execution time.)

redbox

Would be interesting to know if the kernel still checks all ROMs for the RSX even when you are using a software defined RSX...?


&BCD1    KL LOG EXT

Action   Logs on a new RSX to the firmware
Entry    BC contains the address of the RSX's command table, HL contains the address of four bytes exclusively for use by the firmware
Exit     DE is corrupt, and all other registers are preserved

&BCD4    KL FIND COMMAND

Action   Searches an RSX, background ROM or foreground ROM, to find a command in its table
Entry    HL contains the address of the command name (in RAM only) which is being searched for
Exit     If the narne was found in a RSX or background ROM then Carry is true, C contains the ROM select address, and HL contains the address of the routine; if the command was not found, then Carry is false, C and HL are corrupt; in either case, A, B and DE are corrupt, and all others are preserved
Notes    The command names should be in upper case and the last character should have &80 added to it; the sequence of searching is RSXs, then ROMs with lower numbers before ROMs with higher numbers



eto

Quote from: redbox on 14:10, 01 July 21Would be interesting to know if the kernel still checks all ROMs for the RSX even when you are using a software defined RSX...?

As long as there is no name collision, this should work. RSX in RAM are searched first, so it will win over any RSX in ROM with the same name.

GUNHED


About REM commands for checking delays...

Funny thing the 'REM' command takes a different time compared to " ' " - which takes longer.


Basically ... Come over to our lovely Z80 machine language! And if you want to use your system really freely then I would have some nice suggestions regarding the OS for it.  ;D :)
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

eto

Quote from: GUNHED on 14:37, 04 July 21Come over to our lovely Z80 machine language!

Obviously that's what I am doing, otherwise I would not have questions about RSXes ;-)

But at the same time, some things are in Basic fast enough but WAY simpler to develop. Of course, over the months and years this might change, but I don't see a point to go for "Assembler only" just for the sake of it.


GUNHED

http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

NotFound

Quote from: GUNHED on 14:37, 04 July 21
Funny thing the 'REM' command takes a different time compared to " ' " - which takes longer.
The REM is just one token, but " ' " uses two tokens, the one for " ' " and a preceding ":" that is not shown in listings.

zhulien

Part of the overheads of RSX can be made by how you structure your RSXs to use.  Eg: don't animate your sprites by each animation being a single call but rather setup an animation structure and have it automatically animate by a single RSX - even multiple sprites etc..  Not only does it make programming them easier, but they animate  & move automatically in the background.  Likewise collision detection can be a single call to check an entire class of collisions.  This doesn't mean you lose the creativity and simplicity of using BASIC to program the games. 

SkulleateR

I would love to see commands like : |GET,1,100,100,150,150 to save gfx from screen to (in this example) a predefined set 1 and with |PUT 1,220,100 you would put it back ...


Guess it would be slow but could be cool for adventure/rpg/strategy games :)

GUNHED

That would be as fast / as slow as the Z80 routines dealing with it.


But where do you want to store the data? See, it need a memory management system.


Maybe the 2nd 64 KB, the 128 KB EEPROM of LambdaSpeak III?
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

SkulleateR

well, let's say we stay at a max of 50x50 pixels, you got 3 bytes per pixel so 750 bytes for a 50x50 "image"


so maybe it's possible to reserve lets say 5 places for this, that makes 3KB in total, guess that's possible even on a 464 with only 64KB and that would make this usable on any CPC out there :)

eto

Quote from: SkulleateR on 13:47, 15 July 2150x50 pixels, you got 3 bytes per pixel so 750 bytes for a 50x50 "image"

In Mode 0 it's 2 pixel per byte. so you need 1250 Bytes.

With such a fixed limitation you either waste memory or you don't have enough. It probably makes more sense to reserve memory in Basic and hand over the address of the buffer. The calculation is simple and the rest of the Assembler logic is relatively easy.

It would not be slow, the core part of the RSX will still work as fast as pure machine code. You just have to consider a significant overhead for the RSX call and the parameters.

SkulleateR

But then you need to know where to store and not overwrite the mem area later ....


Maybe something like |RESON |RESOFF to either reserve the space for the |GET |PUT command or leave it free if you don't want to use it ?

eto

Quote from: SkulleateR on 14:10, 15 July 21
But then you need to know where to store and not overwrite the mem area later ....


Maybe something like |RESON |RESOFF to either reserve the space for the |GET |PUT command or leave it free if you don't want to use it ?


In Basic I know exactly where I load machine code, and where the free memory for Basic ends. I have control over this and can "reserve" parts of memory with the memory command. Thus I can perfectly reserve the exact amount of memory that I need and can be sure that it won't be overridden.



SkulleateR

#18
Enough people out there that don't know the memory map of the computer but still code in Basic ;)


What i mean : if you make an extension for a coder starting language like basic, it SHOULD be beginners friendly even if most of the people using it WILL know what to do and where to use those mem spots ..

andycadley

One of the downsides of the firmware is the lack of a nicer allocator that plays well with BASIC unfortunately. On some other platforms you could use Strings to accomplish the same thing with a bit of hackery (indeed the SAM coupe actually had BASIC functions for storing graphics snapshots in strings like this) but I guess even if you can find the right calls to allocate a new string variable the 255 character limit will be somewhat constraining.

NotFound

Quote from: andycadley on 14:53, 15 July 21
One of the downsides of the firmware is the lack of a nicer allocator that plays well with BASIC unfortunately. On some other platforms you could use Strings to accomplish the same thing with a bit of hackery (indeed the SAM coupe actually had BASIC functions for storing graphics snapshots in strings like this) but I guess even if you can find the right calls to allocate a new string variable the 255 character limit will be somewhat constraining.
Unidimensional int arrays are more useful than strings as allocator. Just divide by 2 and substract 1 to get the DIM needed, and you can ERASE it when no longer needed.

eto

Quote from: SkulleateR on 14:38, 15 July 21it SHOULD be beginners friendly even if most of the people using it WILL know

100% agreed. But how would you reserve memory in machine code so there is no conflict with Basic? I am not aware of any routine that is doing that, but would be eager to learn that. If that doesn't exist, any solution inside of the library will either be limited or overwrite data that might be used by Basic.

If it's too complicated to explain the usage of MEMORY then maybe the idea of NotFound might be easy enough to explain: dim buffer(width*height/4):|get @buffer(0),x,y,width,height

As it will be a bit more complicated if we are talking about uneven x positions or width it might even be better if the buffer size calculation is done by the library, e.g. like that:
buffersize=0:|buffersize,@buffersize, width, height:dim buffer(buffersize):|get @buffer(0),x,y,width,height
(of course the buffer calculation only needs to be done once during initialisation)

Powered by SMFPacks Menu Editor Mod