News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_GUNHED

CPC four times faster...

Started by GUNHED, 16:43, 06 July 22

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

GUNHED

... can this be the solution?

https://en.wikipedia.org/wiki/R800_(CPU)

It needs less cycles per opcode. So running at 4 MHz would be like running at 16 Mhz because all opcodes are done more quick.

But it needs to be soldered on a little PCB to replace the regular Z80.

Can it be done?
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

Bread80

There are much faster Z80 processors which are still available new. Take a look at the Agon project: https://www.thebyteattic.com/p/agon.html

TotO

No and no. :)

They do not support undocumented Z80 features.
"You make one mistake in your life and the internet will never let you live it down" (Keith Goodyer)

eto

Quote from: GUNHED on 16:43, 06 July 22Can it be done?
Besides that I don't expect that you can buy the CPU, I doubt that the R800 could make use of its advantages in the CPC. 

Although some commands maybe benefit from it, with the gate array activating the WAIT signal so much, I guess that most commands would still be stretched to the same length. The 4x faster would probably shrink to 1.2x faster or so. 

Bread80

Quote from: eto on 18:18, 07 July 22Although some commands maybe benefit from it, with the gate array activating the WAIT signal so much, I guess that most commands would still be stretched to the same length. The 4x faster would probably shrink to 1.2x faster or so.

Indeed. The limit in the CPC is mainly the RAM. Upgrade that to modern SRAM and you have a lot more potential for speed-ups.

GUNHED

The tick of the Z80 replacements is not to access the RAM more quick. They are more quick because their opcodes need less time. So a 2 us command will be done in 1 us for example.

The availability of undocumented opcodes is a valid point. However we would mostly need only the undocumented opcode, but not the illegal opcodes.

Undocumented opcodes are not documented (like LD A,Xlow or so)

Illegal opcodes are opcodes which have some kind of function, but by accident.

What we would need for the CPC is a small PCB bearing the faster Z80-replacement. So we can very easy switch the CPU. Maybe even by adding a switch to the computer.  :)
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

eto

#6
Quote from: GUNHED on 10:29, 08 July 22The tick of the Z80 replacements is not to access the RAM more quick. They are more quick because their opcodes need less time. So a 2 us command will be done in 1 us for example.
If I understood correctly, this will not work in the CPC. I might be wrong, but this is my understanding of how this works:

Let's take LD a,(BC) as an example. This takes 7 T-states. For the Z80, the opcode will be read in the first 4 clock cycles, and then in the next 3 cycles it will read from (BC) and add it to A. In the CPC, the CPU will now wait for another clock cycle, as the gate array makes sure the CPU does not access the bus while gate array is using it.

Now I don't know exactly how many states LD a,(BC) will take on the R800, but the absolute best it can do is 2 T-states, as it still has to fetch the opcode and after that access the RAM. So let's use this.

If the R800 would be in a system without a gate array, you would be right, it would simply be faster. But not in the CPC:

The R800 reads the opcode during the first clock cycle, 4x faster than the Z80. But then it stops, as the gate array sets the CPU to wait. So for the next 3 cycles, although the CPU could do something, it has to wait until gate array has released WAIT.

Once this happens the R800 can perform the next part, read from RAM and add the value to A - again much faster than the Z80, in a single cycle - just to then again having to wait for another 3 cycles until the gate array is finished.

As a result, LD a,(BC) is exactly as fast as with a Z80: 8 clock cycles or 2us. The only difference is, that the Z80 is working during 7 clock cycles and waiting for one cycle. while the R800 is working during 2 cycles and waiting for 6 clock cycles.

GUNHED

#7
Ok, the new CPU will in your example not need 7 T-states, it only needs 2 T-states. Round it up to 4 = 1 us. And there you go.  :) (I don't know it the numbers are connect, but it shall serve as example). Instead of 2 us, it takes only 1 us.
The GA does align at 1 us boundaries, equals 4 T-States.

Well, yes. Only 2 times faster, but. YEAH!
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

andycadley

I suspect you wouldn't get much of a speedup without redesigning how the gate array manages contention on the bus, but any speed increase you do get will likely break everything like tape loading etc.

Would be interesting to see if anyone fancies trying it out I guess though.

eto

It's my understanding that you can't fetch the RAM while in wait. The R800 also has to fetch the opcode first, and then perform e.g. a RAM read.




GUNHED

What I really don't like is that people tell me what can't be done.  :) :) :)

Let's focus on what CAN be done.  ;D Well, and in this case all theory is grey, because obviously we start from different 'data' regarding CPU alternatives - just to say it friendly, because you're a great CPC user.

Whatever... Now it's really time to be be positive and give new ideas a try.  :) :) :) :) :) :) :)
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

eto

Quote from: GUNHED on 11:38, 08 July 22What I really don't like is that people tell me what can't be done.  :) :) :)
But...  

Quote from: GUNHED on 16:43, 06 July 22... can this be the solution?


Can it be done?



But I'll do as requested, will now sit back and watch the solutions popping up.

andycadley

I'm not sure it's being "unfriendly" to the idea, more just an acceptance that the way the Gate Array is wired up pretty much precludes speeding up the processor by much. It might go a smidge faster in cases that are mostly register based, but even then there is a hard limit on how many read operations the Z80 can execute that will limit the rate of instruction fetches (and even Z80 block instructions require at least an instruction read per iteration).

Of course if you could replace the Gate Array with something that didn't need to stall the bus so often (perhaps something with it's own internal shadow of video RAM) then you could run the CPC faster even with a stock Z80. At which point putting in a faster CPU might well be viable...

GUNHED

Quote from: eto on 12:53, 08 July 22
Quote from: GUNHED on 11:38, 08 July 22What I really don't like is that people tell me what can't be done.  :) :) :)
But... 

Quote from: GUNHED on 16:43, 06 July 22... can this be the solution?


Can it be done?



But I'll do as requested, will now sit back and watch the solutions popping up.
Like you?  ;D   :) :) :)
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

GUNHED

Quote from: andycadley on 13:51, 08 July 22I'm not sure it's being "unfriendly" to the idea, more just an acceptance that the way the Gate Array is wired up pretty much precludes speeding up the processor by much. It might go a smidge faster in cases that are mostly register based, but even then there is a hard limit on how many read operations the Z80 can execute that will limit the rate of instruction fetches (and even Z80 block instructions require at least an instruction read per iteration).

Of course if you could replace the Gate Array with something that didn't need to stall the bus so often (perhaps something with it's own internal shadow of video RAM) then you could run the CPC faster even with a stock Z80. At which point putting in a faster CPU might well be viable...

You got the point there. The solution would be to have fewer T-States in fewer us cycles. 
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

Prodatron

#15
Quote from: eto on 10:52, 08 July 22As a result, LD a,(BC) is exactly as fast as with a Z80: 8 clock cycles or 2us. The only difference is, that the Z80 is working during 7 clock cycles and waiting for one cycle. while the R800 is working during 2 cycles and waiting for 6 clock cycles.
You are correct.
In fact the CPU has access to a 1MHz RAM (as "the other 1MHz" of the total 2MHz is used by the gate array).

So if a command like LD A,(BC) or ADD (HL), which accesses a total of 2 bytes always will take 2us. It makes no difference how fast the CPU is working internally.

Most Z80 commands are limited by this RAM access, as they mostly do M1, RD or WR internally. All these M1,RD,WR will always take at least 1us, so a CPU, which is internally faster, won't be noticeable faster in the CPC environment.

There are a few exceptions like 16bit arithmetics (ADD HL,dd etc.) which are doing internal stuff, and these will be indeed 2-3 times faster with an R800.
Other commands like LD r,(IX+n) will be a little bit faster, but not so much.

So as long as you have this limited RAM speed in the CPC a faster CPU won't help that much.


The MSX TurboR, which is using the R800 CPU, has a much faster internal RAM, so it can use its full power and behave like a 28MHz Z80 (as it is running at 7MHz). That is also the reason, why they always try to use the internal RAM, no external memory expansions on the TurboR - the expansion bus still uses the normal clock - as that would slow down the CPU again.

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

eto

Quote from: Prodatron on 09:56, 09 July 22There are a few exceptions like 16bit arithmetics (ADD HL,dd etc.) which are doing internal stuff, and these will be indeed 2-3 times faster with an R800.
Other commands like LD r,(IX+n) will be a little bit faster, but not so much.
Right, that's how I came to my initial guess, that we could maybe see an increase of 20%. With the additional information now, I guess it's with normal, unoptimised use cases even less. On the Atari ST we have a very similar situation where pure CPU upgrades do not help much. Even a 32MHZ CPU will just give you 10-20% advantage over a 8MHz CPU as long as no cache or fast-ram/rom is used. 

Quick question: Could a dk'tronics compatible RAM extension be possible, that is just directly hooked to the CPU and not to the normal bus? As only CPU can access it anyway, might this be an option to implement fast ram? If yes, a much faster Z80 version combined with fast ram might work. As long as the CPU is accessing the additional RAM only it would not have to wait for the gate array. This could also be decoupled from the bus, so the gate array can work on the normal RAM. As soon as the CPU requires access to standard RAM, it will slow down to normal speed, as it has to sync with the wait signal, so e.g. screen updates will not benefit, but all computations in the background could. On the ST this allowed to roughly double the perceived speed, even in games that could benefit from it (Elite 2 frontier with a 28MHZ CPU inc. cache was about twice as fast).

TotO

It is a recurrent question on cpcwiki.
The best answer is to make faster programs. ;D

When the Z80 use an external RAM expansion, there is no bus arbitrer to allow to be faster.
Anyway, you will always require to write the video RAM to display something.

No way to make old games running faster w/o patching or reprogram them. But there are many way to make the CPC usage faster by using ROM or mass storage instead of tape or floppy disk.
"You make one mistake in your life and the internet will never let you live it down" (Keith Goodyer)

Bread80

Quote from: GUNHED on 11:38, 08 July 22What I really don't like is that people tell me what can't be done.  :) :) :)
Well, it's not so much that it can't be done. It's just that you'd need to replace the processor, RAM and gate array. And by that point it would be easier to draw up an entirely new motherboard.

BTW if you cache video RAM in the gate array, as somebody suggested, then all you need is: processor, gate array, SRAM (single chip), ROM, AY and some glue logic. Which feels like a pretty simple machine.

But I feel as though the real question is how much software would actually benefit from a faster machine? Productivity software of course, but most game are glued to the frame flyback and won't notice any affect.

GUNHED

#19
IMHO it would be enough to replace the Z80 (which is in a socket anyway).

A CPC can be run with 6 MHz (replace 16 Mhz crystal by a 24 Mhz one). But that's of course a different topic (somewhere in this forum)

If a Z80 replacement let say... uses 3 cycles for an opcode the new Z80 replacements only need 2 or one. That's the way to save time, because there's no need to alter anything else than the CPU. An example command could be ADD HL,BC.

Any kind of software would benefit. And games scrolling could be more smooth.

As PDT told, "It won't help that much". However, it would be of great interest to try it out and see what we get from it. Since only the CPU get's replaced it would harm anything.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

BSC

Fascinating how one can fail to grasp what others explain to them at length and come back to some utterly marginal aspect just to circumvent accepting that someones "idea" was something better be kept to oneself  :laugh:
** My SID player/tracker AYAY Kaeppttn! on github **  Some CPC music and experiments ** Other music ** More music on scenestream (former nectarine) ** Some shaders ** Some Soundtrakker tunes ** Some tunes in Javascript

My hardware: ** Schneider CPC 464 with colour screen, 64k extension, 3" and 5,25 drives and more ** Amstrad CPC 6128 with M4 board, GreaseWeazle.

TotO

Peoples wanting a CPC 4x faster with 16 colours MODE 1, have bought an Atari ST. :D
"You make one mistake in your life and the internet will never let you live it down" (Keith Goodyer)

GUNHED

http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

robcfg

I'd say, try it and see what happens.

Worst case scenario, you get little to no improvement. Best case scenario, you'll be getting some noticeable improvement.

eto

Quote from: TotO on 11:19, 09 July 22When the Z80 use an external RAM expansion, there is no bus arbitrer to allow to be faster.
Anyway, you will always require to write the video RAM to display something.

What about "internal" FAST-RAM that is not connected to the bus? 

If we would have a Z80 compatible CPU, that can run at 8 or 16MHz or/and could execute commands much faster internally: if we now add the RAM directly to the CPU instead of to the normal bus and decouple the CPU and FAST-RAM from the internal bus, we could let it run at full speed as long as no access to the bus is required. As soon as we need to access the normal bus, we would slow down the clock to 4MHz and pass through the WAIT signal to the CPU. Of course screen updates would be slow but everything that is computed while accessing fast ram would be a lot faster. A "Super Z80" similar to the R800 could run at e.g. 16Mhz and execute most M-cycles in just a single clock cycle. 

Maybe even a compatibility mode could be implemented, so that the "Super Z80" always behaves like a normal Z80.

I know that this is still not easy to do and there is no such "Super Z80", and it's still the question if anybody wants that, but just from a general perspective, would that potentially work? Or am I missing something in that logic?

Here a quick diagram, what I mean, of course probably lacking lots of details...You cannot view this attachment.

Powered by SMFPacks Menu Editor Mod