News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Optimus

Why are cycles like this on Amstrad? And other peculiar questions..

Started by Optimus, 11:23, 08 March 25

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Optimus

A bit like explain me like I am a 5 years old about hardware design or decisions.

I think I've read explanations about this before but I am curious from a hardware perspective, why did the Amstrad hardware round cycles up to multiples of 4? Why this for example didn't happen on ZX Spectrum? What's so different in hardware design that they had to round the cycles up (I heard the CPC will equally share the cycles between 4 processors, but don't other platforms also have more than the CPU and they have to do sharing?).

Also, is it a bad guess to think, CPC usually has z80 at 4mhz, but ZX Spectrum at 3.5mhz, while Atari ST has at 8mhz and Amiga 7.something. So the naive question is: Does the precise frequency against obscure fractional frequency speak anything about these different designs? Would ZX Spectrum if it also rounded up the cycle had a more integer style of mhz? Naive question.

Then,. other side questions. People mock the Z80 because they say it has 4bit ALU and that must be why it's slow and I saw a question "Why an ADD on Z80 takes a full 4 cycles". But I know even a NOP takes 4 cycles, so it shouldn't be this. Is the 4bit ALU a disadvantage or just a hardware design that maybe would have in parallel in a way that you don't see it as a performance loss from the programmer's perspective? If it used an 8bit ALU would some instructions be faster? Then at the same time thinking a NOP is 4 cycles anyway, it doesn't seem ADD to be taking more than it should. And we have good 16bit ADD HL,DE that if we did with ADD and ADC it would take a bit more.

p.s. Finally, I kinda am annoyed when there is this talk of "CPC is really like Z80 at 3.3mhz because of the cycle round fact". I've seen even worse takes in some recent demoparty slides, where both CPC and Spectrum say "Z80 at 1mhz (4mhz/4 cycles)". That's to compare it to 6502? But I think even a NOP in 6502 takes 2 cycles, so should it be "6502 at 0.5mhz (1mhz / 2 cycles)" or Amiga 68000 at 2mhz (8mhz / 4 cycles)" (ok almost but some of the 68000 instructions really look Z80 to me, they take 4,8,10 cycles, maybe the simple ones take 2). I mean, frequency is just the exact frequency, then every machine has it's own bottlenecks.

p.p.s. I might add the other curiosity, and maybe question what happens in hardware level roughly speaking? C64 hw sprites will steal cycles from CPU. CPC+ sprites absolutely don't steal as far as I know, they are totally independent. Are there advantages/disadvantages, why C64 didn't design sprites to work this way? I also know Sam Coupe will lose some of it's cycles during videoram, so that if you disable display, only then you get the full 6mhz performance (I heard an assembler did that during compilation). So maybe the cycle sharing design of CPC had the advantage that it didn't lose cycles later when the other hardware worked in parallel? (but effectively it did actually since it rounds the cycles up, but it's not obvious?)

pelrun

The CPC doesn't just round up instructions for no reason - those extra wait cycles are where the gate-array fetches video data from memory. It's not possible for the RAM to be accessed by multiple devices simultaneously, so when the video needs a byte, the CPU has to wait. The way the video memory addresses are generated serves double duty for refreshing the system DRAM, which is another critically necessary task.

Every platform that has memory-mapped video and doesn't have double-ported RAM (and nothing in this era had that) has to solve the same problem of bus contention, and they all do it in different ways. The C64 has "badlines" where cycles are stolen to copy video data to a buffer (you've even mentioned this yourself). Other 6502 machines literally lose a full 50% of their effective bus bandwidth to video ram access. The ZX Spectrum does the cheapest possible thing, and literally pauses the cpu clock while it accesses VRAM. Bugs in that implementation caused visual snow in certain cases on early speccy's.

> People mock the Z80

That's just the standard tribal garbage of "my platform is better than yours" that persists to this day. There are so many other things that go into how fast a platform is that the cpu is often the least important part. Anyone at a demoparty who is crowing about the power of their platform over others has missed the point of a demo entirely and needs re-education. It's not how big your thing is, it's what you can make it do in *spite* of those limitations. It's why every demo on a modern platform has some other drastic limitation imposed upon it, or it's not fun.

andycadley

Quote from: pelrun on 12:06, 08 March 25The ZX Spectrum does the cheapest possible thing, and literally pauses the cpu clock while it accesses VRAM. Bugs in that implementation caused visual snow in certain cases on early speccy's.
This is not correct. The Spectrum divides it's RAM into contended and uncontended RAM. Uncontended RAM runs at full speed (as does the ROM) but contended RAM takes a very big hit for every access, since it is shared with the video hardware.

You don't need any special timing to access RAM without corruption, it always "just works". The "snow" effect occurs if the I register points at contended RAM, because the Z80 puts the value of IR on the bus during a refresh cycle and that confuses the ULA into thinking it's a video access.

Optimus

So, I guess at the end everyone tries to avoid the contention differently, that's why other platforms lose those cycles later during the retrace, while we don't have to care about this on CPC (more stable to be honest).

All 8bits come roughly at the similar limits, as there are always compromises when more speed, like I wondered if 6502 is much better at the same mhz, why most 8bits just stick it to 1mhz and not 2 or 4 to kick the z80 platforms out of the water? And the answer could be just RAM, they'd need much faster RAMs. I heard 6502 access memory every single cycle.

Optimus

Damn, now I am bothered as I look at the actual z80 cycles, most of them are 1 cycle less than CPC.
Most are like 7, 11, 15, so 1 less than the round up.
INC/DEC 16bit register is a bigger loss, it's like 6 cycles but rounded up 8.

http://www.z80.info/z80time.txt

But I see PUSH being an exception and much worse loss. I didn't know it. It's like 11 cycles? But round it up would be 12 or 3NOPs? But I know on CPC PUSH is already 4 NOPs and POP is 3 NOPs (not rounded it's 10 cycles).
Why so much more loss? Who knows.. maybe because it accesses 2 bytes at once? but it's from 11 cycles to 16 like 5 loss!
And PUSH was already used to write data like clear screen or fill solid colors or patterns very fast even on CPC. It was already awesome. And now that makes me think how much more awesome it will be on ZX Spectrum with much smaller vram or Sam Coupe maybe. Minus their memory congestions :P

McArti0

It all starts with the length of a pixel on your screen.
One entire line of TV image is drawn in 64us.
The question is how many points do you want to have drawn on the screen at that time and how wide do you want that group of points to be?
Amstrad chose to draw 640 pixels in 40us. To do this they had to use 16MHz.
Simple division by 4 gives 4MHz which can be connected to the Z80A.

Sinclair wanted to draw 256 pixels narrower in 36us.
This give the 7.1Mhz clock.
Simply dividing 7.1 by 2 gives approximately 4 or 3.55 MHz.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip drver for 512kB extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

McArti0

Look at the cycle lengths the oppcodes in the Z80 manual and how they are described.

PUSH 
11 clocks but 5,3,3,   4+1=4+4=8,   3=4,   3=4
Overall 4+4+4+4=16
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip drver for 512kB extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

andycadley

Quote from: Optimus on 14:11, 08 March 25So, I guess at the end everyone tries to avoid the contention differently, that's why other platforms lose those cycles later during the retrace, while we don't have to care about this on CPC (more stable to be honest).

All 8bits come roughly at the similar limits, as there are always compromises when more speed, like I wondered if 6502 is much better at the same mhz, why most 8bits just stick it to 1mhz and not 2 or 4 to kick the z80 platforms out of the water? And the answer could be just RAM, they'd need much faster RAMs. I heard 6502 access memory every single cycle.
2Mhz 6502 pretty much meant using much more expensive RAM. 4Mhz wasn't even an option.

The designers of both CPUs weren't idiots, but they took different approaches. The 6502 is designed for lower clock frequencies, but that involves other compromises.

pelrun

Quote from: andycadley on 13:32, 08 March 25This is not correct.
I think you're just reading more into what I said than what I actually said. Yes, uncontested ram can be accessed independently without slowdown. Accessing the contested RAM at the same time as the ULA results in the ULA pausing the CPU clock, which is *how* that "very big hit for access" works. And I don't recall anyone talking about the data read by the CPU being corrupted?

Bread80

Looking at this from a different perspective - the Amstrad has 16k of video memory, the Speccy (IIRC) 8k. So the Amstrad has to read twice as much video data per frame. Reading that 16k requires two video bytes per 1uS cycle which leaves very little time left for the CPU. If they had designed it to only use 8k video memory it would only require one video read per uS cycle. If so they probably could have fitted that read within the Z80's REFRESH cycle and, possibly, paused it less often.

As others have said, the Speccy, uses separate RAM for video data and only pauses the CPU when the CPU wants access to that video RAM. It could therefore, be argued that the Speccy actually has the more sophisticated design, although the Amstrad has the flexibility to locate the video buffer anywhere in RAM.

The Enterprise Elan solved this all very neatly: the 'first' 64k block is video RAM. If the machine only has 64k installed it is also used as general RAM. The CPU must be paused if it requires access (I'm not sure of the methodology or the timings). If the machine has > 64k installed the video RAM is kept purely for video and the remainder will be used as general RAM with no bus contention and no delays.

Prodatron

Quote from: Bread80 on 17:27, 09 March 25The Enterprise Elan [...] If the machine has > 64k installed the video RAM is kept purely for video and the remainder will be used as general RAM with no bus contention and no delays.
In fact the Nick can access the four 16K segments #FC to #FF for video ram, and so the CPU will be slowed down when accessing this area as well. The slowdown seems to be similiar to the CPC, it is not a very big difference, if you execute code in the first 64K or in another area.

Of course a 128K Enterprise will still use parts of the "first" 64K for CPU memory, as the video ram is usually 16K in total like on the CPC, and the remaining 48K is not wasted but used for programs as well.

I wonder how they did it for the PCW. Here you can use the first 128K for the video ram. So is the timing for the remaining RAM different as well? I guess not, as that would require additional logic probably?

GRAPHICAL Z80 MULTITASKING OPERATING SYSTEM

MaV

For the 4-bit ALU, I think it's best to carefully read Ken Shirriff's blog post:
The Z-80 has a 4-bit ALU. Here's how it works.

At the beginning there's a link to an interview with F. Faggin. Basically, they needed to change the ALU to avoid copying too much of his previous designs at Intel. He also describes how the first machine cycle was deliberately made to always take 4 clock cycles. I remember reading that the reason was to be more predictable for peripheral hardware, but can't remember if this was another Faggin interview.
Because of that, a faster 8-bit ALU did not make sense, as it'd have to wait to finish the first machine cycle. But because of the 4-bit ALU, they could save a few transistors in their design. Apart from that, a 4-bit ALU also makes computing with BCD numbers easier (it's also why the 6502 has two 4-bit ALUs in serial, IIRC, though their BCD implementation is patented, so copying that was out of the question anyways.)
Black Mesa Transit Announcement System:
"Work safe, work smart. Your future depends on it."

Powered by SMFPacks Menu Editor Mod