CPCWiki forum

General Category => Amstrad CPC hardware => Topic started by: Bread80 on 17:29, 14 October 24

Title: The CPC Revision Zero (Article)
Post by: Bread80 on 17:29, 14 October 24
As you probably know the Amstrad CPC was originally going to be based around a 6502 processor but the designers couldn't deliver. Last year Roland Perry took a PCB of the prototype to a Spanish meetup where user Deepbf photographed it.

How did it work, what were is capabilities and, above all, can it be fixed? The short answer is that it has a *lot* of problems but it's still interesting to see what they did.

https://bread80.com/2024/10/14/amstrad-cpc-revision-zero-can-it-be-fixed/
 
(https://i0.wp.com/bread80.com/wp-content/uploads/2024/10/CPCRev0_Top_Annotated.jpg?resize=1024%2C406&ssl=1)
Title: Re: The CPC Revision Zero (Article)
Post by: Prodatron on 20:08, 14 October 24
Quote from: Bread80 on 17:29, 14 October 24originally going to be based around a 6502 processor but the designers couldn't deliver.
They couldn't deliver? In my memories they switched to the Z80 because of Locomotive Software with their great Basic (best decision anyway :) ) Or wasn't Locomotive Software the main reason?
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 23:24, 14 October 24
IIRC the main reason was the lack of power. So they switched to Z80.
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 23:27, 14 October 24
Various reasons have been cited over the years, quite which comes first isn't entirely clear and probably never will be. The 6502 hardware clearly wasn't ready and needed an overhaul and they didn't have working system software either. Locomotive could give them a Z80 BASIC, which solved half the problem and the design wasn't that difficult to rework into a Z80 based machine, fixing the rest.
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 23:29, 14 October 24
Quote from: GUNHED on 23:24, 14 October 24IIRC the main reason was the lack of power. So they switched to Z80.
That I'd doubt, a 2Mhz 6502 (which is what they were probably aiming at) would've out performed the 4Mhz Z80.
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 23:35, 14 October 24
Quote from: andycadley on 23:29, 14 October 24
Quote from: GUNHED on 23:24, 14 October 24IIRC the main reason was the lack of power. So they switched to Z80.
That I'd doubt, a 2Mhz 6502 (which is what they were probably aiming at) would've out performed the 4Mhz Z80.
Have a look at Driller and lots of other stuff please.  :)
Title: Re: The CPC Revision Zero (Article)
Post by: lmimmfn on 00:41, 15 October 24
For curiosity reasons I would love to see a working CPC with a 6592. Not with basic/ROM but just a demo to see what would be different. 
Would the CRTC refresh be the same?
Would the first 256 bytes fast access of the 6502 make a difference?
Title: Re: The CPC Revision Zero (Article)
Post by: lmimmfn on 02:23, 15 October 24
^^6502 grrr
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 07:46, 15 October 24
Quote from: GUNHED on 23:35, 14 October 24Have a look at Driller and lots of other stuff please.  :)
On what? As far as I'm aware the only 6502 version is on the C64, which not only has a slower 1Mhz CPU but also an absolutely terrible layout for the display in its "bitmap" mode, making it probably the worst of all three of the major 8-bits for 3D graphics (since you can't easily use it's tile mode for that either).
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 17:25, 15 October 24
Well, then show something where the 6502 (2 MHz) is quicker than the Z80 (4 MHz). Guess you'll have a hard time searching.  ;) :)
Title: Re: The CPC Revision Zero (Article)
Post by: ZorrO on 18:15, 15 October 24
BBC Micro with 6502 2MHz have twice fast Basic than CPC.
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 19:18, 15 October 24
Yeah, on average the 6502 would be about 2.5 times faster at the same clock speed, but is typically clocked lower.

https://retrocomputing.stackexchange.com/questions/5748/comparing-raw-performance-of-the-z80-and-the-6502
Title: Re: The CPC Revision Zero (Article)
Post by: Dubliner on 19:38, 15 October 24
Quote from: andycadley on 23:27, 14 October 24Various reasons have been cited over the years, quite which comes first isn't entirely clear and probably never will be. The 6502 hardware clearly wasn't ready and needed an overhaul and they didn't have working system software either. Locomotive could give them a Z80 BASIC, which solved half the problem and the design wasn't that difficult to rework into a Z80 based machine, fixing the rest.
Actually, we know what came first. The first (external) development team delivered zero software and that motherboard that is now known as Prototype 0. Roland Perry tried to make it work but it was obvious it was absolute garbage. Mark-Eric Jones also though it was little chance to make that Prototype 0 to work. Since it was needed to start from zero, they used the Z80 because Locomotive Software had already support and other reasons. Extract from my interview to MEJ:

What help did you have from Amstrad and the other subcontractors? We worked very closely with our friends at Locomotive software who developed the operating software for the CPC464.  We were also in regular contact with Roland Perry who was managing the project overall.
 
What is your opinion about using the Z80 processor instead of the 6502 that was originally planned? Did you have any influence in this design change? We chose the Z80 rather than the 6502 for two main reasons: 1) It had good support for DRAM (which we also used for some video display functionality in the Gate Array) and 2) Our friends at Locomotive software had much better support for the Z80.
Title: Re: The CPC Revision Zero (Article)
Post by: Prodatron on 20:38, 15 October 24
Quote from: andycadley on 19:18, 15 October 24Yeah, on average the 6502 would be about 2.5 times faster at the same clock speed, but is typically clocked lower.

https://retrocomputing.stackexchange.com/questions/5748/comparing-raw-performance-of-the-z80-and-the-6502
Very old discussion. Even the Atari8 guys say, that a 4MHz Z80 may be faster than their 2MHz 6502 if coded in a good way.
Anyway this always depends on what you are doing.
For simple stuff the 6502 is probably faster.
For calculations, 16bit etc stuff, the Z80 just wins.
But the differences are not that big. What really sucks with the 6502 is the fixed 256byte stack:
You can't do neither highlevel languages nor preemptive multitasking in a good way, for both you have to use tricks etc.
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 23:02, 15 October 24
Quote from: ZorrO on 18:15, 15 October 24BBC Micro with 6502 2MHz have twice fast Basic than CPC.
Well, you think so? It ain't that simple. Have a look here:

https://www.youtube.com/watch?v=Gj-DdSD6C3k
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 23:12, 15 October 24
Sure there are lots of factors. The supporting hardware can make a massive difference (the C64 is way faster at printing text characters, because it only has to update a single byte to do it). And certain algorithms suit certain architectures better.

For some things the 6502 will need three to four times less cycles, for others the Z80 will nudge ahead (16-bit maths being a common one). And implementations of things like BASIC will make a massive swing (Sinclair BASIC is massively slower than Locomotive BASIC, even though both machines have similar speed Z80s).

But, on average the 2.5 rule is about right. And it would've fitted better in the CPC design, since it's bus access patterns would've avoided the slowdown the Z80 has to deal with because it's bus access is less predictable. Personally I'm glad they switched, because I prefer Z80 coding, but I'm not daft enough to ignore that the design was obviously a bit compromised by the change (a 2Mhz 6502 without slowdown might've been a bit more capable of shifting a 16K bitmap about, for example).
Title: Re: The CPC Revision Zero (Article)
Post by: lmimmfn on 02:13, 16 October 24
There was a similar conversation regarding 6502 use on the speccy - https://www.sinclairzxworld.com/viewtopic.php?t=4929

Interesting regarding dram refresh and the 6502s limitations regarding I/O.
I would be interesting if the 6502 if used on the CPC would result in a higher or lower bus sync and therefore CPU clock frequency.
Title: Re: The CPC Revision Zero (Article)
Post by: eto on 10:22, 16 October 24
Quote from: GUNHED on 23:02, 15 October 24Well, you think so? It ain't that simple. Have a look here:
Oh that video... I love how they basically over-weighted the only category, where the BBC was superior... I mean, comparing the available hardware expansions of a totally new system to a years old system - and giving one system 22 points and the other 5 - and then the BBC wins by 15 points.  :laugh:

Generally speaking, don't trust a benchmark you didn't write yourself ;-) As long as there is no standardized benchmark suite with clear rules about "optimising" you will never be able to give a clear answer. And even then you will have some people come up with "but if I use that particular optimisation, then it's different".

And in the end it doesn't matter anyway as real-world performance is relevant. And there - as we all know - the CPU performance is in most cases just partially relevant for the user experience.
Title: Re: The CPC Revision Zero (Article)
Post by: Shaun M. Neary on 13:37, 16 October 24
Quote from: GUNHED on 23:35, 14 October 24
Quote from: andycadley on 23:29, 14 October 24
Quote from: GUNHED on 23:24, 14 October 24IIRC the main reason was the lack of power. So they switched to Z80.
That I'd doubt, a 2Mhz 6502 (which is what they were probably aiming at) would've out performed the 4Mhz Z80.
Have a look at Driller and lots of other stuff please.  :)
Provide better examples please.

Sorry but Freescape games on the Z80 were slow, sluggish piles of crap.
Title: Re: The CPC Revision Zero (Article)
Post by: dodogildo on 16:27, 16 October 24
How about this, as a valid comparison:

Stunt Car Racer on C64's 1 MHz. 6502 vs CPC's 4 MHz. Z80

:picard:
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 18:22, 16 October 24
Quote from: dodogildo on 16:27, 16 October 24How about this, as a valid comparison:

Stunt Car Racer on C64's 1 MHz. 6502 vs CPC's 4 MHz. Z80

:picard:
Which demonstrates how silly it is to try and use a full blown commercial product as if it were a CPU benchmark. Everything from the supporting hardware, skill of the coder and different compromises made to produce something commercially viable on different platforms has an influence.
Title: Re: The CPC Revision Zero (Article)
Post by: HAL6128 on 19:48, 16 October 24
Hey gents, aren't we too old for such kind of "length" comparisons?
Our beloved homies are as they are for 40 years... Everything else is sugar-coating. :P

...let's continue the thread of @Bread80
Title: Re: The CPC Revision Zero (Article)
Post by: ajcasado on 20:12, 16 October 24
Would it be possible to have an expansion interface with a 6502 or compatible, similar to the Z80 ones that existed for the Commodore and BBC Micro? This would allow compiling code for both targets and comparing the results on the same hardware, and perhaps make it easier to port software for machines with the 6502 (I'm thinking of Maniac Mansion for the C64). A WDC W65C816S could be used.
Title: Re: The CPC Revision Zero (Article)
Post by: Bread80 on 21:12, 16 October 24
Quote from: lmimmfn on 02:13, 16 October 24I would be interesting if the 6502 if used on the CPC would result in a higher or lower bus sync and therefore CPU clock frequency.
If I understand the 6502 correctly, it makes a memory request on every clock cycle (depending on the opcode, of course). So a 6502 running at 2MHz generates two memory requests per microsecond whereas the Z80 generates a maximum of one request per microsecond. The CPC also requires two memory accesses per microsecond for reading video data.

The CPC specs 4164-20 DRAMs. These require 330nS for a read or write cycle. The CPC also uses the optimised sequential CAS cycles to read the two video data bytes.

Using the same scheme on a 6502 would require you to pause it for half of each microsecond. So, even if you ran it at 2MHz you'd only get an effective 1MHz clock speed. You could get clever and not pause it when not accessing RAM (Ie. for ROM, I/O or cycles which don't access RAM) but it's still not great.

The BBC got around that issue by using faster DRAMs - 4816A-3. If I'm not mistaken these require about 230nS for a read and 285nS for a write.

You could also use 4164-12 DRAMs which are 230nS for a read or write cycle. That would allow one CPU and one video access per CPU cycle.
Title: Re: The CPC Revision Zero (Article)
Post by: Bread80 on 21:16, 16 October 24
Quote from: HAL6128 on 19:48, 16 October 24Hey gents, aren't we too old for such kind of "length" comparisons?
Our beloved homies are as they are for 40 years... Everything else is sugar-coating. :P

...let's continue the thread of @Bread80
My opinion on the 'which is faster' debate is as follows: If we're still arguing about it forty years later then there's clearly no definitive answer and, therefore, the entire subject is pointless.

As for the thread, I was expecting an interesting discussion about what's actually on the board :shrug:
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 22:18, 16 October 24
Quote from: Bread80 on 21:16, 16 October 24As for the thread, I was expecting an interesting discussion about what's actually on the board :shrug:
As your original article notes, the board is barely functional and key control signals weren't even wired up. It seems pretty obvious that the "designers" just didn't really have much of a clue what they were doing and it's not much of a surprise that Amstrad ditched it in favour of getting someone with some actual experience to build something.
Title: Re: The CPC Revision Zero (Article)
Post by: ZorrO on 23:18, 16 October 24
The only really slow function in BBC Basic is LOG. And if benchmark uses it, CPC turns out to be a bit faster. But this is a rarely used function and without it BBC Micro has 2 times faster Basic than CPC.

And as for assembler, I'm no expert, but from what I've read, comparing Z80 clocked 2 times faster than MOS, single-byte calculations or copying data byte by byte are faster on MOS, but when we do it with numbers and registers 16-bit, which MOS doesn't have, so it has to do them in installments, Z80 is faster in counting and copying on 16-bit registers. Z80 clocked 4 times faster even does 8-bit calculations faster. And I have no idea which of them 8/16bit appear in code more often.
Title: Re: The CPC Revision Zero (Article)
Post by: PulkoMandy on 12:41, 17 October 24
As Bread80 says, the limiting factor in most 8 bit machines is not the CPU, but the memory bandwidth. They all deal with this in different and interesting ways.

On the CPC, the CPU is slightly slowed down. On the C64, it is slowed down only on some lines while the data is copied to some internal video chip memory. On the Alice 32 and VG5000, you have separate video RAM not accessible directly by the CPU. On Thomson TO and MO machines, the memory is 16 bit wide so that the video logic can fetch 16 bits at a time, and the CPU accesses one of the other banks (and later on they changd this to use "page mode", so the video logic can do two 8 bit accesses to related memory addresses in quick succession).

And that's just a few examples I know of.

Without taking this into account, it's difficult to compare just the CPU. Yes, maybe in theory one can run some loops and computations slightly faster, or use less instructions. But then you have to take into account the interaction with other logic on the motherboard and it gets a bit more complicated. So you have to run benchmarks on a complete machine, and transplanting a CPU in a machine designed for another is biasing your results if the goal is comparing the CPU (you are testing one in its native environment, and the other in an "alien" environment).

So, what could a 6502 CPC have looked like? We can have only speculations. The motherboard shows that the people attempting to do it might not have had a clear vision about it either. And on the contrary, the production machines show, I think, that MEJ had a pretty good idea what they were doing and good understanding of the chipset they used. Adn that's what makes a machine fast and cheap, not any particular choice of CPU.
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 15:19, 17 October 24
Quote from: PulkoMandy on 12:41, 17 October 24On the C64, it is slowed down only on some lines while the data is copied to some internal video chip memory. On the Alice 32 and VG5000, you have separate video RAM not accessible directly by the CPU. On Thomson TO and MO machines, the memory is 16 bit wide so that the video logic can fetch 16 bits at a time, .

The C64 also cheats slightly in this regard, in that it's "colour RAM" (which is only 4 bits wide) has a second port connected directly to the VIC so that the graphics hardware can read from it without disturbing the CPU, effectively giving it 12-bit video bandwidth. Although that has the downside that you can't move the colour information around in RAM like you can everything else - this is why a lot of C64 games stick to a consistent three colour background, so they don't need to shift data around as much.
Title: Re: The CPC Revision Zero (Article)
Post by: eto on 20:24, 17 October 24
Quote from: Bread80 on 21:16, 16 October 24As for the thread, I was expecting an interesting discussion about what's actually on the board :shrug:
I guess there is not much to discuss as there is not very much on the board that is "surprising" as it seems to be very similar to the architecture we have in the final product.

Actually that's the only surprising aspect for me that they already had the core architecture when they still thought about the 6502.

It also shows how simple the architecture was and that the only reason we ended with a machine that we love was the genius that has been put into the GateArray.
Title: Re: The CPC Revision Zero (Article)
Post by: SerErris on 11:52, 18 October 24
Some cents from my end on that debate.

The Z80 has a lot more registers that even can be used in different modes. The RAM models are vastly different and the 6502 actually have no 16bit operations at all, so even no 16bit load operations. Even stack is always 8bit operation, with the only exception of the call and return operations. They need to fetch two bytes for the address...

The 6502 has many other limitations like pages, where effectively you need to watch out that you are not getting over a page limit with your code, or things will getting really bad.

Also the 6502 had no IO commands, which means the IO interfaces Always needed to be memory mapped. That is not a problem if you have lots of anyhow unused ram, however in the time and age of 64 kbyte that was an issue. Maybe a minor thing anyhow.

All in all, to the skilled assembler programmer and looking at pure CPU benchmarks, I would say that the formula 6502 1mhz = z80 2mhz. However as soon as 16bit caculations come into place - that is all lost and Z80 will get its overhand. Esp if you make massive use of registers you can save a lot of memory cycles here.

Just for quick comparison:

6502@2Mhz (1 clock = 1 cycle)       Z80@4Mhz (4 clocks = 1 cycle)
LDA #imm = 2 cycles(1µs)            LD  A,#imm = 2 Cycles (2µs)
ADC B = 2 cycles (1µs)              ADC A,B = 1 Cycle (1µs)
ADC Zeropage = 3 cycles (1.5µs)     ADC A,Memory = 2 Cycles (2µs)
ADC $FFFF (4 cycles) (2µs)          ADC A,($FFFF) = 6 Cycles (3µs)
  ; This operations does not exist on Z80, would be
  ; LD B,A  (1 cycle)
  ; LD A,($FFFF) (4 cycles)
  ; ADC A,B (1 cycle)


Looks like the 6502 is much faster, right?

However if you have the address in HL, you could do:
                                    ADC A,(HL) (2 Cycles) 2µs
        ; this will speed up the process dramatically, esp in loops.
        ; and the processor commands like DNJZ are a really dramatic improvement.
       
Adding two 16 bit numbers from memory and then storing them back into a 3rd position really painful:
6502
LDA N1LO (4 cycles)
CLC (2 cycles)
ADC N2LO (4 cycles)
STA RSLTLO (4 cycles)
LDA N1HI (4 cycles)
ADC N2HI (4 cycles)
STA RSTLHI (4 cycles)

Total (26 cycles = 13 µs)

Now how does z80 do?
LD    HL,(N1) (5 NOPs)
LD    DE,(N2) (6 NOPs)
ADD  HL,DE  (3 NOPs)
LD    (RST),HL (5 NOPs)

Total (19 NOPs = 19 µs)

So even there - the 6502 has a slight gain.

If we now compare C64 with CPC that will result in
26µs C64
19µs CPC

Consider that in a loop and it will very fast add up.

But again all of that does not tell anything about the speed of the system. You would programm the 6502 taking the Zero page into account pretty much for everything, and then it is fast. If it is used similary like a Z80, it will be very slow. That same can be said the other way around. If you just would ignore all 16 bit registers and just use A and B, that would be dramatic slow and inefficient as memory cycles are slow on the Z80. 

All in all - lots of discussions with no outcome ... 

Lets put it in perspective:
Both are dead slow compared to anything we have today, even the tiniest microcontroler with atmel definition. Hell Multiplication/division anyone? Or Floatingpoint?
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 13:17, 18 October 24
Quote from: SerErris on 11:52, 18 October 24Just for quick comparison:

6502@2Mhz (1 clock = 1 cycle)       Z80@4Mhz (4 clocks = 1 cycle)


I mostly agree with your points above, but I'll point out the maths error here. Either you meant to write 6502@1Mhz or you need to double the number of cycles for Z80.
Title: Re: The CPC Revision Zero (Article)
Post by: SerErris on 13:37, 18 October 24
No, that is exactly how it is - the number of cycles are measured differently

A cycle on 6502 is a single clock.
A cycle (M cycle) on Z80 is 4 clocks (lets ignore the potential optimized commands for now).

So on a 1Mhz 6502 a single cycle would take 1µs. On a 2Mhz 6502 it would take 0.5µs.

Interestingly is (found it out right now) NOP is also taking 2 cycles on 6502.

So a NOP will be 2µs@1Mhz and 1µs@2Mhz.

The Z80 is also measured in NOPs - which is easier to calculate - so 1 NOP = 4 clock cycles = 1 M cycle = 1µs@4Mhz.

So that is the proof. The CPC can process NOPs twice the speed of a C64 
;D That makes the CPC twice as fast as the C64 on doing nothing ;D
Title: Re: The CPC Revision Zero (Article)
Post by: SerErris on 13:43, 18 October 24
One last note:

I would consider the Z80 speed to be 1/4 of the clock cycle. So it is actually a 1Mhz CPU - that is also the minimum time spend on any opcode and any opcode is actually multiples of that. This is esp. true on a CPC, that syncronizes the CPU to 4 clock cycles per operation.

If we look on the 6502 for opposite, it can use multiples of clock directly for different opcode timinings (e.g. 2,3,4 ...). 
So the speed of the 6502 is really at 1Mhz, where the Z80 is not really at 4Mhz, but very similar to a 1Mhz clock with 4 subdivisions .. if you see my point.

And if you then compare both - the Z80 is actually quite some bit faster than the 6502. But again that is just my opinion on how to compare stuff.
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 13:46, 18 October 24
Yes, but that's exactly my point. If 1 cycle on the 6502@2Mhz takes 0.5µs and 1 cycle on the Z80 take 1µs, then you can't directly compare cycle counts. The Z80 is taking twice as long to execute the same number of cycles (how long a NOP is doesn't really matter).

It works fine like that if you're talking about a 6502@1Mhz, such as in the C64.
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 16:37, 18 October 24
You just never admit if somebody else is right, do you?  ;) :)
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 17:26, 18 October 24
Quote from: GUNHED on 16:37, 18 October 24You just never admit if somebody else is right, do you?  ;) :)
Dude, there are some things which are opinions and everyone can have their own. This one is just basic maths. 

1000 cycles at 1us per cycle takes 1 second. 1000 cycles at 0.5us per cycle takes 0.5 seconds. Surely you can see that? It's not like you have to say the 6502 is better or anything?   ::)
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 13:33, 19 October 24
Quote from: andycadley on 17:26, 18 October 24
Quote from: GUNHED on 16:37, 18 October 24You just never admit if somebody else is right, do you?  ;) :)
Dude, there are some things which are opinions and everyone can have their own. This one is just basic maths.

1000 cycles at 1us per cycle takes 1 second. 1000 cycles at 0.5us per cycle takes 0.5 seconds. Surely you can see that? It's not like you have to say the 6502 is better or anything?  ::)
Lad, pure math is not everything. With CPUs there's lots more to that. And a portion of humor would be desirable in this case too.  :laugh:

And btw: 1000 cycles at 1us per cycle takes NOT 1 second.
It's 1000000 cycles at 1us per cycle that takes 1 second.
That much about math!  :)
Title: Re: The CPC Revision Zero (Article)
Post by: SerErris on 16:53, 19 October 24
Quote from: andycadley on 13:46, 18 October 24Yes, but that's exactly my point. If 1 cycle on the 6502@2Mhz takes 0.5µs and 1 cycle on the Z80 take 1µs, then you can't directly compare cycle counts. The Z80 is taking twice as long to execute the same number of cycles (how long a NOP is doesn't really matter).

It works fine like that if you're talking about a 6502@1Mhz, such as in the C64.
But isnt that actually why I did put in all the µs and not really the cycles?

How cares about cycles?

And yes a 2Mhz 6502 is pretty much the same speed as a 4Mhz Z80 in those operations - this is actually what I said.

So  a NOP on a 2Mhz 6502 us 1µs and as well on the Z80@4Mhz.

Anyhow I think the point is clear. 

What makes this whole point moot is, that you anyhow will programm them totally different, because of advantages of both architectures. 
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 17:24, 19 October 24
Quote from: GUNHED on 13:33, 19 October 24And btw: 1000 cycles at 1us per cycle takes NOT 1 second.
It's 1000000 cycles at 1us per cycle that takes 1 second.
That much about math!  :)
Quite right, my bad for rushing a post before heading out.

But the point still stands, if your going to invent a unit like "cycle" rather than comparing clocks, it doesn't make sense to make them incomparable.

Quote from: SerErris on 16:53, 19 October 24But isnt that actually why I did put in all the µs and not really the cycles?

How cares about cycles?

And yes a 2Mhz 6502 is pretty much the same speed as a 4Mhz Z80 in those operations - this is actually what I said.

It's really unclear what you're trying to compare in you original post because you keep switching between 1Mhz and 2Mhz 6502 timings. And I'm not sure you can say there the same speed when the 6502 is doing a 16-bit add in 13us Vs the Z80 taking 19us, even though that's arguably where the Z80 is supposed to be stronger.

But yes, this doesn't alter the fact that the CPUs require different approaches to achieve optimal results (hence it being difficult to just do a 1:1 routine timing) nor that changes to the overall architecture can make even more substantial differences (not that makes much difference if we're talking about a 6502 based CPC as per the OP).
Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 22:03, 20 October 24
Well, is there a game / program / application on a 2 MHz 6502 which is pretty good - so that we can see, if this can be done in a quicker way on the 4 MHz Z80 on CPC?

Of course it shouldn't be something which mainly uses screen access, instead it shall use CPU in a heavy way (and I'm not talking about data-transfer into screen).

My best examples are still freescape and vector games. And in 8 Bit world, the CPC seems to me to be leading.  :)


BTW: In Forth an 16 Bit addition is done this way:
POP HL
POP DE
ADD HL,DE ; This takes 3 * 3us = 9 us only  ;)
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 22:40, 20 October 24
As I said, it's a dumb way of comparing CPU performance to just point at individual games but if you really must then Elite on the BBC B is probably the best version.



And the Atari 8-bit port of Total Eclipse is pretty impressive (even if not quite a 2Mhz CPU)

Title: Re: The CPC Revision Zero (Article)
Post by: GUNHED on 12:59, 21 October 24
CPC beats them all  :) :) :)
Title: Re: The CPC Revision Zero (Article)
Post by: SerErris on 15:24, 21 October 24
Quote from: GUNHED on 22:03, 20 October 24Well, is there a game / program / application on a 2 MHz 6502 which is pretty good - so that we can see, if this can be done in a quicker way on the 4 MHz Z80 on CPC?
Of course it shouldn't be something which mainly uses screen access, instead it shall use CPU in a heavy way (and I'm not talking about data-transfer into screen).
My best examples are still freescape and vector games. And in 8 Bit world, the CPC seems to me to be leading.  :)
BTW: In Forth an 16 Bit addition is done this way:
POP HL
POP DE
ADD HL,DE ; This takes 3 * 3us = 9 us only  ;)

Yeah if both values are on the stack? So this is typical fortran/C style - call a function, put arguments on the stack.

That is all good, but you would need to still read the variable into HL/DE and push them, then call the function, and then return.

And then that does not make any sense whatsoever - because still you would need to have painful stack handling to even be able to return to the calling function ... It is a fast add, but it does not load the variables first, which the 19 NOPs version does.

So the question is:

How do we get the values from variable space onto the stack?
How do we call the add routine then
How to handle the stack correctly inside of the add routine, that it calculates correctly and savely returns?

That is all possible and standard technology, but will cost more NOPs to do it.

A typical implementation would more likely to a call to add routine with handover of variables in Registers, and even then you would probably only run add HL,DE inside of the routine, which adds a CALL and a RET to the equation.

All in all as ADD is a single Opcode (even for 16 bit), it does not make any sense to call it, or to push the variables to add to the stack and then call it. Everything will take longer than that.

If you would want a universal add routine, that would probably use the two index registers:

main:
    LD IX,(varin1)
    LD IY,(varin2)
    CALL add
    ...
    RET
   
add:
    ;adds two 16-bit numbers from variables indexed by IX,IY
    ;returns result in HL
    LD  L, (IX)          ; Load the lower byte of the first 16-bit number
    LD  H, (IX+1)        ; Load the upper byte of the first 16-bit number

    LD  E, (IY)          ; Load the lower byte of the second 16-bit number
    LD  D, (IY+1)        ; Load the upper byte of the second 16-bit number

    ADD  HL, DE          ; Add DE to HL (HL = HL + DE)
    RET



varOut:
    DW 0
varin1:
    DW 0
varin2:
    DW 0


And really universal you would probably push the addresses of the variables instead of the values - so all in all it will require much more time for the handling around it, instead of directly calculating it.
Title: Re: The CPC Revision Zero (Article)
Post by: Prodatron on 15:51, 21 October 24
Quote from: SerErris on 11:52, 18 October 24Adding two 16 bit numbers from memory and then storing them back into a 3rd position really painful:
6502
LDA N1LO (4 cycles)
CLC (2 cycles)
ADC N2LO (4 cycles)
STA RSLTLO (4 cycles)
LDA N1HI (4 cycles)
ADC N2HI (4 cycles)
STA RSTLHI (4 cycles)

Total (26 cycles = 13 µs)

Now how does z80 do?
LD    HL,(N1) (5 NOPs)
LD    DE,(N2) (6 NOPs)
ADD  HL,DE  (3 NOPs)
LD    (RST),HL (5 NOPs)

Total (19 NOPs = 19 µs)

So even there - the 6502 has a slight gain.
I never coded in 6502, but I can imagine that in many cases the Z80 will still win in exactly this situation (adding 16bits):
- for a 16bit addition the 6502 HAS to load 16bit values from memory and HAS to save them to memory again after adding them; you only have three 8bit registers at all
- the Z80 will load values and then keep them in the registers while working with them. You will hardly see exactly the example above in any code. Only at the very beginning of a function and at the very end you may have these (slow) LD HL,(nn)/(nn),HL instructions or something similiar

A pure 16bit addition takes 3 microseconds for the Z80. If I am not completely wrong, on the 6502 when you use XY for storing one parameter and the result you are not able to do it faster than in 6,5 or 5,5 microseconds. So in this case the Z80 has the double speed :D 

At least these are my thoughts...
Title: Re: The CPC Revision Zero (Article)
Post by: andycadley on 16:37, 21 October 24
Quote from: Prodatron on 15:51, 21 October 24A pure 16bit addition takes 3 microseconds for the Z80. If I am not completely wrong, on the 6502 when you use XY for storing one parameter and the result you are not able to do it faster than in 6,5 or 5,5 microseconds. So in this case the Z80 has the double speed :D

At least these are my thoughts...

IIRC a store to zero page is 3 cycles, a store to a arbitrary memory location is 4. So the quickest you could write a two byte value to RAM is 3us (6 cycles @2Mhz). I'd agree that a Z80 routine that can do multiple 16-bit calculations using only register values should be quicker.

Again micro-benchmarking isn't much better than comparing entire games because there's not enough context to know what a real world scenario would be. Longer, more thought out benchmarking over the many, many times this has come up over the decades has generally settled on a 1Mhz 6502 being roughly on par with a 2.5Mhz Z80 (which tallys up with the Spectrum and CPC generally outperforming the C64 on CPU heavy loads but giving machines like the A8 or BBC B a slight edge).

In all cases the supporting hardware is far more likely to swing any one task, other than just raw number crunching, in favour or against a particular machine depending on how good a fit it is and/or how well the code was optimised for it.
Powered by SMFPacks Menu Editor Mod