CPC Z80 Commands and how long they take...

TFM · 18:18, 12 January 15

Well, that clearly shows that you can code!

Maybe they will not ever call it 'good code', but your's will always be faster.

ralferoo · 22:25, 13 January 15

Quote from: Executioner on 10:45, 12 January 15
It's only an assumption, but I'd think the internal operation is something like (using your terminology):

G1: /WAIT high, address and data bus multiplexed for Z80 use
G2: /WAIT low, address and data bus still for Z80 use
G3: /WAIT low, CRTC address on address bus, memory read into GA shift register
G4: /WAIT low, CRTC address + 1 on address bus, memory read into GA shift register

It makes the design of the GA a lot simpler to use alternate cycles for the video memory accesses, but you're right it could be that way too.

However, the reason I think my way is more likely is this:

On the original 464 gate array, pin 1 is ¬CPUADDR, pin 40 is MA0/CCLK. Not only is MA0/CCLK used for A0 during a video access, it's also the clock for the AY chip. This is 2MHz into the the AY, rather than 4MHz, so it's either 50:50 duty cycle and the video memory is accessed every 2nd cycle or it's a 75:25 duty cycle and the video memory is accessed the way you describe. It's also possible that ¬CPUADDR and CCLK are both 2MHz and out of phase though... BTW, the CPU clock is output on pin 39.

Again more clues that it's my way is that there's a 74LS373 which latches the data from RAM output/GA data bus during the memory read cycle and holds it on the CPU side data bus. This shouldn't be needed if there's a long CPU side ownership of the RAM chips. (The 74LS244 isn't part of this process - it's so that the GA can peek at the CPU side data bus during the RAS latch period when the RAM isn't outputting anything so that the GA can read data sent to its IO port).

There is, of course, a simple way to answer this - attaching a scope to pins 1 and 40 of the gate array and watching what happens to those signals...

Executioner · 00:50, 14 January 15

Quote from: ralferoo on 22:25, 13 January 15
There is, of course, a simple way to answer this - attaching a scope to pins 1 and 40 of the gate array and watching what happens to those signals...

Only simple if you have a scope of course

robcfg · 12:08, 14 January 15

Isn't the clock frequency of the AY 1 mhz rather than 2 mhz?

Bryce · 12:35, 14 January 15

Quote from: robcfg on 12:08, 14 January 15
Isn't the clock frequency of the AY 1 mhz rather than 2 mhz?

Yes, the AY runs at 1Mhz.

Bryce.

TFM · 16:50, 14 January 15

True this. Therefore the CPC has a cool bass. Now for other frequencies use the PlayCity.

MaV · 15:03, 15 January 15

I love that discussion and clarification on the interrelation of the GA and the Z80.

Another thing that bothers me a bit here is how the Z80 and the GA synchronise themselves at startup, or if they need to at all. The Z80 does a 2 cycle delay before it commences work, but there is no info about the GA in this case. (But I'm sure an oscilloscope can make that procedure a simple task.

)
That bit seems a bit esoteric for all practical purposes, but it would still be nice to know it.

arnoldemu · 20:52, 15 January 15

Quote from: MaV on 15:03, 15 January 15
I love that discussion and clarification on the interrelation of the GA and the Z80.

Another thing that bothers me a bit here is how the Z80 and the GA synchronise themselves at startup, or if they need to at all. The Z80 does a 2 cycle delay before it commences work, but there is no info about the GA in this case. (But I'm sure an oscilloscope can make that procedure a simple task. )
That bit seems a bit esoteric for all practical purposes, but it would still be nice to know it.

I would like to know this so I can emulate it.

ralferoo · 10:07, 16 January 15

Quote from: MaV on 15:03, 15 January 15
Another thing that bothers me a bit here is how the Z80 and the GA synchronise themselves at startup, or if they need to at all. The Z80 does a 2 cycle delay before it commences work, but there is no info about the GA in this case.

I'm not sure it matters, actually...

Normally the latches in the 2-bit counter the GA uses for the 4-cycle counter would have random values (or even indeterminate values) on reset unless the hardware makes a particular effort to clear them on reset. However, on the first clock cycle, the adder would interpret these bits as either 0 or 1 and so on the next clock cycle they would have a defined value. So, the GA could have a random number of cycles between 0 and 4 before ¬WAIT becomes high and the Z80 can proceed.

However, if I remember rightly the colour palette is reset to black (but not the border colour), so in that case the GA would have reset circuitry and so they might have forced the 2-bit counter to an initial value. That's considered good practice, but not always necessary.

MaV · 10:05, 19 January 15

Quote from: ralferoo on 10:07, 16 January 15I'm not sure it matters, actually...

That's what I implied with "esoteric". It probably has no practical impact, if it is left out. (But wouldn't you just love it to call your emulator or FPGA implemenation 100% correct?

)
However, I can see the necessity of knowing this for replacement gate arrays (in CPLD, FPGA or whatever other form) once those chips begin to fail in bulk (which is a long way away if Bryce is correct).

QuoteHowever, if I remember rightly the colour palette is reset to black (but not the border colour), so in that case the GA would have reset circuitry and so they might have forced the 2-bit counter to an initial value. That's considered good practice, but not always necessary.

That's an interesting observation. I know what I'm going to do, once I receive my scope.

Bryce · 12:56, 19 January 15

The GA is running well within it's specs and runs relatively cool at all times. So it very unlikely that a GA would fail unless you powered the CPC with 12V or reversed the poles.

Bryce.

freemac · 17:37, 19 January 15

In JavaCPC, you have an array of instructions chrono for Z80 and another for Amstrad's Z80. You can compare them (this way is better for emulation)
It is modulo 4 in order to alternate the 2 RAM access purpose : VRAM=>CRTC (pixels) and RAM<=>Z80 interactions. CRTC is truly linked to WAIT_n of Z80 (this way is better for FPGA)
An exception is just done on MEM_WR that takes 1 more modulo-4-times cycle than it's own modulo 4 cycle.

In FPGA I used a hacked WAITn Z80 entry (original WAITn of FPGA Z80 has problems due to delta time between under hidden 8080 recycled code source (8080 is encapsulated in Z80, in Z80 FPGA version (called T80, opencores.org))

fgbrain · 08:31, 02 October 16

The pdf posted in the first page has some errors...

1.PUSH IX/IY and POP IX/IY cant take 5 nops.. POP is always faster 1 nop
2. A command DEX HIX is a mistake

Please fix if you agree..

endangermice · 17:38, 27 October 16

Fascinating thread, it's answered a lot of questions I've had about how the Gate Array interfaces with the CPU. I'm on the lookout for a little more information and I'm hoping one of you might be able to help me out. I'm currently trying to figure out the exact Z80 instruction timings from the Z80 Users Manual, which I found at: http://www.phy.davidson.edu/FacHome/dmb/py310/Z80.Instruction%20set.pdf

I have a question on the grouping of the T-states. For example, DJNZ if 0 takes 8 T-states. What I'm interested in is how the T-states are broken down, in this case it's two groups of T-states, (5,3). What do these groups of T-states represent? 5 and 3 represent? I'm hoping they show how many T-states the CPU spends in a particular a cycle of execution e.g. 5 T-states spent in the fetch cycle and 3 T-states spent in the decode and execution cycle. Is this correct? If so, is there any documentation on which CPU cycle of execution is represented by the numbers for each instruction? For example, RLC (IX+d) has 23 T-states which are separated into 6 groups of T-states. What are the 6 different states that the CPU is in for each group of T-states? I've looked through the manual and I can't seem to find this info, though I may have just missed it!

The reason I'd like to know this is that I'm attempting to implement Wait on my Z80 emulation so that I can emulate how the Gate Array uses it in order to arbitrate between the CPU and CRTC access of memory. If I can emulate this timing correctly, time critical effects such as rasters should finally look correct.

From what I understand, the CPU needs to be in a particular type of machine cycle in order to honour the request to wait. I've read that it will honour a wait request if it's in the fetch, memory read, IO or interrupt acknowledge cycles, so I need to emulate the different states in order to get Wait to work correctly.

I'm after a complete version of what's on the Wikipedia page for the Z80: https://en.wikipedia.org/wiki/Zilog_Z80, the table under the "Instruction execution" section which shows what the machine cycles are for a few instructions. Is there a version of this for all Z80 instructions including undocumented ones?

andycadley · 19:50, 27 October 16

The official Zilog docs are a pretty good place to start, they include details of how many M-Cycles an instruction is broken down into and how many T-states each of those M-Cycles take. Unfortunately no details of the exact breakdown of what each M-Cycle is doing, but you can usually have an educated guess.

http://z80.info/zip/z80cpu_um.pdf

endangermice · 20:02, 27 October 16

Thanks Andy, I think that's what I'm going to have to do. From what I understand, it only matters when the CPU is in a cycle that accesses memory. In those scenarios, it has to wait so that the Gate Array can access memory first. I can debug through the existing code and see where the memory read and writes are occurring which will allow me to build up an idea of how it all goes together. Fortunately the Z80 class is broken down nicely into a series of sub methods which I can hopefully use to determine the type of machine cycle that's in progress.

Thanks again for the heads up.

Cheers,

Damien

1024MAK · 20:11, 27 October 16

In order to understand the Z80's wait input pin, you have to understand why it was provided. The wait input does not actually stop the Z80 if the Z80 is not accessing the address and data busses.

The idea behind the wait input is very simple. It allows slow devices to be directly connected to the data bus. When wait is asserted and a transaction is taking place on the data bus, the Z80 will continue to hold the address lines steady, if writing, it will continue to hold the data bus lines steady. It will continue to do this until the wait line is de-asserted. At which point, the Z80 will complete the bus transaction. This then enables hardware to signal the Z80 that it is accessing slow memory or slow I/O devices. As detailed above, the bus transaction is synchronised to the Z80 clock.

Amstrad very cleverly made use of this feature (as have other designers previously) to use it to synchronise the Z80 CPU to a video system that requires regular memory read accesses. This is why there are chips that isolate the Z80 address and data bus from the RAM address and data bus lines.

Mark

endangermice · 20:22, 27 October 16

Yes, that's my understanding too and it makes perfect sense, there's no need to do anything on a wait signal if the CPU is only working internally.

It is a clever way to arbitrate the sharing of memory access between the CPU and CRTC - simple yet elegant!

endangermice · 21:11, 27 October 16

I'm attempting to figure out how DJNZ = 0 takes 3us to execute. So here's my theory:

The Z80 user's manual states that it takes 8 T-states to execute the instruction using 2 machine cycles, one lasting for 5 clock cycles, the other lasting for 3. Timings of the instruction on the CPC are 3us. I believe the cycles are:

Machine Cycle 1 - Decrement B (5 T-states)
Machine Cycle 2 - Check if B is 0 or not 0 (3 T-states)

I'm assuming the CRTC 1Mhz clock is derived from the 4Mhz CPU clock by sending a clock pulse to the CRTC every 4th CPU clock cycle. The only way I can see that DJNA = 0 could take 3us is if, when the execution of the instruction ends, as it coincides with the 4th Clock cycle - the Gate Array is looking to read 2 screen bytes from memory to draw the next line for a column. On the 4th Clock Cycle, the DJNZ instruction has finished executing and the Z80 is about read from memory to retrieve the next instruction. At the same time, the Gate Array sets WAIT to low and holds it there for 2 cycles as it retrieves 2 bytes from memory in order to draw the next line of the next character. This causes the CPU to wait for 2 cycles and therefore extends the number of T-states for machine cycle 2 for the DJNZ instruction from 3 to 5. The timing for DJNZ = 0 therefore ends up as:

Machine Cycle 1 - Decrement B (5 T-states)
Machine Cycle 2 - Check if B is 0 or not 0 (3 T-states) + 2 T-states held by Gate Array reading 2 bytes from memory (5 T-states)

The timing for DJNZ = 0 becomes (5,5) or 3us.

Does this sound plausible? Since DJNZ = 0 does not require the Z80 to access memory I cannot see how else the instruction timing could be stretched to 3us.

Executioner · 21:23, 27 October 16

The current version of JEMU has all the instruction break-downs, it doesn't have a table of times. https://sourceforge.net/p/jemu/code/HEAD/tree/JEMU/src/jemu/core/cpu/Z80.java

The only thing it doesn't currently emulate properly is the SCF/CCF flag bit 5/3 handling. The CPC implementation holds the /WAIT signal high for 3 of every 4 T-States so all reads and writes get aligned to the same (of each 4) clock cycles.

endangermice · 21:59, 27 October 16

Richard, thank you for the source code, I think it's going to be very useful - though I do feel a bit guilty that I'm effectively plagiarising your hard work!

I see from your implementation of DJNZ that you're reading the jump address upfront which completely changes my theory on the operation of that instruction. In your sequence of events there is a memory read of 1 byte during the instruction, to get the address of the label we need to jump to (I'd forgotten for a moment that this is a relative jump so it can only be a signed byte in size thus representing a jump of 128 bytes backwards or forwards).

It looks like I can figure out the times by counting up the calls to the cycle() method.

Thanks for the info, I shall use it to rethink my theory.

Cheers,

Damien

Executioner · 22:08, 27 October 16

Weird changing font there. It was so small I couldn't read it, now it's larger than normal. The JEMU Z80 core is based on the original Java core I wrote but completely re-worked based on a combination of http://z80.info/z80ins.txt and the original Z80 User Manual which also has a break-down of cycles (and many errors, but that's ok if you use common sense).

endangermice · 22:31, 27 October 16

Yes the html editor is (as all I've come across) a little unpredictable, I try make sure I paste text in from Notepad to avoid additional formatting, but occasionally it sets the font size to tiny when I press the reply button. There's probably some devious logic at work somewhere just to keep us on our toes. Mind you I think anyone willing to write code in Javascript deserves a medal, it really is one of the worst languages. Anyway, hopefully my previous reply is now at a more expected font size!

One thing I am beginning to realise is that when it comes to emulation, new information causes a constant reworking of code - still if it makes it more accurate then it's all worth it. I'm very impressed with both WinAPE and Kev's Arnold emulator. I can't promise to ever reach your heights, but that won't stop me from trying!

Thanks for the additional info, it will be very useful. I'm still relatively new to emulation but I do feel like I'm genuinely getting a grip on it - but the wish to perfect is something that's difficult to resist!

endangermice · 23:00, 27 October 16

Sorry about the random post earlier - it seems that you can't delete posts on this forum - or I just can't work out how!

So I've been through DJNZ in your Java CPC Z80 class, and I believe the timings can be derived as follows:

DJNZ = 0
======
- 5 T-states. 4 T-states to fetch the opcode including 1 T-state to check wait, 1 T-state in the DJNZ method - for the decrement
- 3 T-states for Fetchbyte including 1 T-state for checking wait

DJNZ != 0
======
- 5 T-states. 4 T-states to fetch the opcode including 1 T-state to check wait, 1 T-state in the DJNZ method - for the decrement
- 3 T-states for Fetchbyte including 1 T-state for checking wait
- 5 T-states for the relative jump (jre)

It's interesting that the processor fetches the relative jump address before checking whether b is 0. You could potentially save 3 T-states by not doing this until the value of b has been determined. In this scenario, if b were 0, 3 less T-states would be required.

PulkoMandy · 09:26, 28 October 16

Quote from: endangermice on 23:00, 27 October 16
It's interesting that the processor fetches the relative jump address before checking whether b is 0. You could potentially save 3 T-states by not doing this until the value of b has been determined. In this scenario, if b were 0, 3 less T-states would be required.

That would not quite work. You would still need to increment the PC, so the next instruction read is actually the next instruction, and not the jump address.

So, you could skip the actual memory read, but you would still need to increment the PC, which would still use some T-States.

In the end it is simpler to use the same "fetch from PC and increment PC" as everywhere else, instead of wasting some space for a dedicated "increment PC without fetching" operation.

News:

CPC Z80 Commands and how long they take...