News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_endangermice

CRTC and HSync

Started by endangermice, 09:38, 18 October 16

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

endangermice

I've been spending the last few weeks attempting to accurately emulate the CRTC (you'll see my progress in the CRTC Question thread).

I have some pretty good basic emulation working, but now I'm stuck trying to figure out what I do in a scenario where R0 (Horizontal Total) is set to a column that occurs before the position of HSync. In this scenario, the counter never reaches the HSync column and therefore HSync is never triggered. If this happens in my emulator, it freezes since the GateArray is no longer receiving the HSync timing signal.

How is this handled in other emulators? I notice that they continue to function in this scenario. Do they always send an HSync signal even if it cannot be triggered by the CRTC counter? How is the timing calculated for this? Does it default to a 300Hz rate if HSync is not raised within an expected time?

My hunch is that the CRTC somehow compensates for this scenario, but I'm unsure exactly how.

Many thanks,

Damien
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

PulkoMandy

No, the CRTC won't do anything. There will be no HSync in that line. Maybe there will be one in the next line.


The results of this are:
- It is not possible for the gate array to change video mode until an HSync occurs
- The interrupts are based on the HSync, so, the interrupt don't happen
- And of course the screen will run out of sync


Everything else should continue to work normally.


Why does your gate array "freeze"? Why does it need the HSync signal to run?


Of course, it is not a good idea to do this permanently. But, there are some use cases for a line with no HSync.


For example, for a vertical split screen, each CTM rasterline will be made of (at least) two CRTC lines. The left one without an HSync, the right one with an HSync. This allows to change the address at the start of each line (with some additional tricks).

endangermice

#2
Thank you as always for your very helpful reply. It appeared as though my emulator was freezing due to the fact that I could no longer hear any sound when I pressed the delete key. However this appears to be due to the fact that the disabling of interrupts causes the keyboard to no longer be read and also stops the PSG from updating.

I was attempting to figure out why the Batman demo dies on my emulator when it gets to the bit where it zooms into the window just before the bats fly through (near the beginning). I've since traced this to two problems. 1. I wasn't applying an & mask to the lower 3 bits of my raster counter and 2. I was waiting until my raster counter equalled the MaxRaster (R9) before moving onto a new row. The problem with this is that if the register is changed at a certain time it could miss the equity check and the counter will march on into infinity. I have replace this with an equals or greater check and now the bats are working!

When the CRTC registers set, are they always set immediately or are some deferred until the next frame? I presume they are set with immediate effect but if this is not the case it could affect all sorts of timings so it would be great to be sure....

Thanks again - you are really helping me make a lot of progress :) .
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

PulkoMandy

This is where the difference between the CRTC types start to matter.


On the original CRTC2, most of the updates are delayed, to the start of a new screen, or a new character line. But, on the CRTC1 for example, this is missing or buggy and some changes will apply immediately. I don't remember the exact details for all of them. For example, R12 and R13 changes are taken into account when the VCC is reset to 0 (after hitting R4 value). However, on CRTC1, the changes are taken into account immediately as long as VCC is still 0, for the whole first character line.


In some cases, the counters WILL march on to "infinity"... or rather, to overflow and start over from 0.

endangermice

Yes, I'm getting close to the point where I'm going to have to code in changes for all the different CRTC types, mainly for completeness.

You're right about the counters, my strategy of having a greater or equals check is probably incorrect since it doesn't allow counters to run into infinity (which might actually be what the running program expects).

Instead I should constrain the counters to their size in bits. This appears to be directly related to the registers that act upon them. For example it appears that the column counter is 8 bits wide seeing as both the R0 and R1 registers have a range of 8 bits. For this I should change my integer counter to a byte which will automatically constrain the numbers to 8 bits.

The vertical counter on the other hand appears to have a 7 bit range seeing as R4 and R6 have a range of 7 bits, but I will likely have to constrain this with an & mask as the 8 bit byte is the closest C# data type.
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

PulkoMandy

Comparison are always for equality. The reason for that is, it is much simpler to implement in electronics. Equality is done with a bit-by-bit XOR. Greater-or-equal is done with a substraction, which needs management of carries, etc, and is much more complex.


So, unless there is a very good reason to use greater-or-equal, you will find only equality comparators.


And, yes, a simple mask will do, and the counter width matches the associated registers width. No wasted bits :)

endangermice

Well, it has been a few days and I'm pleased to report that I've been making good progress. A lot of stuff of the more complex stuff is working now, particularly after I fixed a bug with the halt instruction in the Z80 CPU implementation.

I'm now trying to fix the remaining glitches. One of these is with drawing rasters by writing directly to the Gate Array.

I have modified Kev's simple rasters example 1 (http://www.cpcwiki.eu/index.php/Programming:Simple_Raster_Example_1) so that it outputs just 1 horizontal line (tested working in WinAPE). When I run the same code on my emulator, the single line starts at column 20 and wraps around onto the next line. This is clearly a timing issue. If I remove the defs 20 delay (which in the demo listing delays the colour change position so it is not visible), my rasterline looks correct.

Having examined the code, I think it's a problem with the timing of the interrupts. They're coming in on the correct lines e.g. 52 lines apart with the first one occurring 2 Hsyncs after the start of VSync. However, my interrupts occur as soon as HSync starts which means that the CRTC column counter is (with standard CRTC settings), always at character column 46 which would be in the hidden area of the screen. The fact that Kev's code specifically has a delay to ensure that the raster doesn't start drawing immediately suggests that this is not correct.

Can someone confirm where the CRTC column counter should be when an interrupt is triggered?

Many thanks,


Damien
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

arnoldemu

@endangermice :

My suggestion is to run Richard Wilson's "plustest".
www.winape.net

This will tell you if instruction timings and interrupt timings are accurate. Run the instruction timing and the interrupt tests. Don't worry about the other tests they are for the Amstrad plus. Fix the timing issues first.

After this you can then run my "acid" tests.
http://www.cpcwiki.eu/forum/emulators/amstrad-cpc-'acid'-test/

In this test I would recommend running the cpu tests first (some in z80tests and some in cpctests). Then run the various cpctests. Some are automatic.
There are tests to verify R register, some for interrupts that kind of thing.

Then work on the cpctests.

For the crtc tests you need to read the comments because they are mostly visual and it's not obvious what they are doing. I haven't finished organising them.

A lot of your questions will be answered by the tests and feel free to post in the acid test thread if you have questions :)

My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

endangermice

Hi Kev,

Thank you for this, I think these tests are going to be extremely useful. I've just been running through the Instruction Timing Test and pretty much everything is failing so I think I know where to start looking.

Are there any details on these tests? For example - I'm not too sure what the instruction timing test is showing me, I presume it's instruction number followed by the number of detected T-States vs the number of T-States expected e.g.

The first result in WinAPE reads as 00:1 in my emulator it reads at 00:2/1. Now since NOP should take 4 T-States, the 1 result doesn't make any sense. I wondered whether it represents 1 set of 4 T-States though instruction CC reads as 5/3 which should be 17/10 according to http://clrhome.org/table/. Do you know how the results are supposed to be read? Is it something to do with the padding of the instruction lengths due to the video priority over the processor?

Thanks again,

Damien
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

arnoldemu

Correct. It is detecting the timing due to padding from the gate-array.

1 corresponds to 1us, or 4 t states.

http://www.cpctech.org.uk/docs/timing.html

I think there is a mistake with a couple of instructions in this doc.
http://www.cpctech.org.uk/docs/instrtim.html

The best thing is to implement the delays added by the gate-array itself then you get accurate timing *within* the instrution (i.e. timing for instruction fetch etc), but these will do as a start.
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

endangermice

#10
Thanks for the additional info. I had an initial stab at adding in the GateArray delay during lunch today (I lead an exciting life!) and a lot more instructions are now passing the test, so it definitely looks like I'm on the right track. I have a few instructions which are still failing and it's possible that they have the wrong timings. For example, DJNZ (opcode 0x010) I have the timings of:

13 T-states for not 0 (my gate array code pads the 13 to 16 = 4us)
8 T-states for 0 (no padding required, 8/4 = 2us )

The test expects 4/3us implying that the 8 T-states is incorrect. Indeed if I change this to 9, the test passes. I've had a sneaky look through you Z80 implementation for Arnold, and it appears you're measuring everything in us not t-states and that you've already taken into account the delay imposed by the gate array. This leads me to suspect that the actual t-states are unimportant (unless you're trying to emulate the gate array and the access of memory 100% accurately) so I could change the 8 to 9, 10, 11 or 12, as long as it doesn't cross another us boundary - it doesn't matter.

Am I understanding this correctly? I got the T-state values from  http://clrhome.org/table/ which you mentioned has some inaccurate figures. However I did double check some of the values against the Zliog documentation and it also states 8 T-states for DJZA = 0. How does the CPC arrive at 3us for the timing? Even when padded by the gate array the 8 t-states will provide a timing value of 2us. It looks like something more is going on here?

The table you kindly linked in your last reply http://www.cpctech.org.uk/docs/instrtim.html shows timings in us only so it's impossible for me to accurately compare the two, but perhaps that doesn't matter?

Thanks again for the help - I think I'm making sense of the XNACPC code I "inherited" and hopefully in the not too distant future I'll be able to release something useful!

Thanks again,

Damien
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

PulkoMandy

The padding is not just rounding up to the next multiple of 4, what happens is a bit more complex.


The z80 alternates between different states at each clock pulse. In some of these states, it performs a bus access (read or write to memory or a peripheral). In others, it performs internal tasks (reading or writing a register, decoding an instruction, etc).


The gate array also needs to access the memory for video display. When doing so, it raises a signal to the z80 to tell him "don't access memory in this cycle, I need it for the video".


If the signal reaches the z80 during an "internal" cycle, nothing changes. However, if it reaches the z80 during a bus access cycle, the cycle is cancelled and the z80 will retry the same operation at the next cycle.


All instructions start with the same sequence:
T-State 1) Read the instruction from RAM
T-State 2) Decode the instruction


The next steps differs depending on the instruction.


Usually, it is the instruction fetch (cycle 1 of the instruction) that is delayed by the Gate Array, but, in some cases, it can be some other access during the execution of an instruction. So, to be completely correct, you would have to emulate each of these states and the exact timing at which things happen. This is required for accurate emulation of split rasters and it is why most emulators shift them by a few pixels when compared to real hardware, because they are only accurate to the microsecond.

endangermice

#12
I was reading something about this late last night in a thread that Bryce started in January about the length of Z80 instructions and how the gate array via the wait signal causes the Z80 to hold off for one or more cycles http://www.cpcwiki.eu/forum/programming/cpc-z80-commands-and-how-long-they-take/1 if it wishes to access memory at that time.

The difficulty is there there's no firm agreement on exactly when the accesses occur, though there are several theories so it might be interesting to attempt to implement one or more of these and see what happens.

It is clear that the Z80 implementation that I've inherited from XNACPC is not entirely up to the task. It essentially runs the processor through a set number of T-states, the number is derived by dividing the clock speed by the FPS so in this case 4,000,0000 / 50 so 80,000 T-states per frame. The Z80 is then to told to execute instructions in a while loop until the number of desired T-states has been reached. During the execution it posts various callbacks to other classes e.g. the GateArray, PSG and CRTC. It attempts to feed a 1MHZ clock to the PSG and CRTC. However the actual execution order is not precise. It keeps a running counter of t-states after an instruction. Following the instruction, it goes through the counter and fires a callback for every four t-states. If t-states are left over they're added to the next batch after the next instruction.

The problem with this is although it will fire off enough callbacks to create an effective 1MHZ clock, those callbacks do not fire at a precise time i.e. they won't be exactly every 4 T-states - if an instruction takes 8 T-states, 2 callbacks will be fired at once. They really should be spaced out evenly.

I think a better way would to try and simulate the Z80 more accurately by allowing the code to literally move through 1 T-state at a time to allow the callbacks to be sent at exactly the right time. If I can also emulate the proper cycles for decode, fetch etc. I can then emulate the wait state and depending on the status of the Z80, decide whether it should pause for a cycle or not.

The good news is that I believe that the actual guts of the Z80 emulation is good, I just need to refactor and change the way it executes. I think the idea of running through a set number of T-states per frame is a good one, MonoGame / XNA can be made to produce an accurate 50hz timer, I just need to ensure that the other emulated components are fed with accurate timing then it should all hopefully work.....

One question I still have is how do the 8 T-states for DJNZ for 0 become 3us? Richard's Plus test program always expects this to be the result. I understand that although what is going on internally is more complicated that a simple padding of instructions to the nearest us, however with with emulators that deal in a resolution of us rather than T-states, that 3us is the timing given for DJNZ for 0. Since DJNZ is already 8 wait states or 2us, why does the WinAPE test expect it to be 3us? It is to do with the cycles that the Z80 has to go through to process that instruction, which effectively allows the gate array to pause it for 1 cycle and thus stretches the timing for that instruction beyond the 2us period?
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

PulkoMandy

First, be aware that the CPC display is not at exactly 50Hz. A frame is 312 lines of 64us each = 19968 microseconds. 50Hz would be a frame time of 20000 microseconds.


As for DJNZ, yes, my guess is that one or more of the internal cycles in the instruction are delayed. Internally, the instruction is executed like:


DEC B
JP NZ,address


It needs 3 memory accesses, 1 to read the instruction, and 2 to fetch the destination address. It's possible that the fetch of the address happens at a cycle where the gate array locks the bus, and in that case, it can be delayed. And when the instruction ends, it is now on the 9th T-State and the gate array locks the bus again, so the next instruction can't be fetched immediately, and it needs to wait again.


It's hard to tell without knowing the exact T-State timing of the instruction and what happens in each cycle.

endangermice

#14
Yes of course, I think you've mentioned the video timing anomaly before. The problem occurs because you can't divide 625 PAL lines exactly in half  so rather than 312.5 lines (which is what's needed for an exact 50Hz) the display draws 312 instead. I presume the monitor is happy with losing 32us of timing per frame, it's certainly pretty tolerant when playing around with various CRTC timings, more tolerant than my Trinitron TV!

Interestingly, is the 32us made up elsewhere, perhaps during V-Blank by the computer in order to provide a 50Hz timing or do we just accept that the CPC doesn't quite output 50Hz?

I think before refactoring the Z80 code, I might write a little simulation which tries to insert the extra wait states into some selected instructions and see what timings I get back. That will hopefully confirm whether or not my approach is valid and is a lot easier to debug than a full blown implementation. If I can get the fundamentals right and a decent framework in place refactoring the existing code will be a whole lot easier. I suspect I might have a few questions about when the Gate Array holds up the CPU, though I think it would be fun to put the theories floated around in Bryce's timing thread to the test!


I'm slightly worried about performance - I'm writing this in C#, but I don't want to fall back to C++ (everyone uses that!) so I think a lot of optimisation is going to have to take place. I'm most likely going to have to go with unsafe code and access arrays via pointers (back to the good old days....) which isn't quite in the spirit of the Managed environment but I really don't need bounds checking! I'm getting full speed right now, but that's running on a 4GHZ 5960x which is a pretty unrealistic platform! Anyway it will be a good opportunity to road-test Microsoft's Performance analysis tools in VS2015.

Thanks again for the info it is as always it is invaluable information. Should I ever release my work, you'll all be credited! I initially thought I'd just have a stab at enhancing an existing emulator but I think I've got the accuracy bug and to be honest getting my head around these concepts is really satisfying. The CRTC emulation is a lot better now I've re-written it, but I now need to get the timing accurate so I can move forward and get those split screens and rasters looking good! I think it will also resolve some issues where certain demos make the emulator hang. It's all about registers not being set etc. at exactly the right time.
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

andycadley


Quote from: endangermice on 13:39, 27 October 16
I'm slightly worried about performance - I'm writing this in C#, but I don't want to fall back to C++ (everyone uses that!) so I think a lot of optimisation is going to have to take place. I'm most likely going to have to go with unsafe code and access arrays via pointers (back to the good old days....) which isn't quite in the spirit of the Managed environment but I really don't need bounds checking! I'm getting full speed right now, but that's running on a 4GHZ 5960x which is a pretty unrealistic platform! Anyway it will be a good opportunity to road-test Microsoft's Performance analysis tools in VS2015.
Honestly I'd try and avoid it if possible. The .NET optimizer is pretty damn good at producing fast code and the usual killer for performance is too much object creation/deletion (where unsafe code isn't going to help). Unsafe code is actually often slower, because the bounds checking inherent in safe code allows for a whole swathe of optimisations that have to be disabled when you start playing with unsafe code simply because the optimiser can't guarantee what the side effects of something may be.

endangermice

I'd love to avoid it if I can since it feels like a step backwards. My day job involves low level video processing and we write a lot of our encoders in C++, though we have had good success converting some libraries to C#. It would be nice to have a wholly managed emulator so maybe I'll have a go and see where the bottlenecks are, maybe it won't be so bad. It would be interesting to see what the performance difference is between safe and unsafe code. There is a lot of evidence to say that safe code can be just as fast, but I have been forced to go unsafe in the past - particularly when it comes to writing into screen buffers directly. With the standard Windows Forms pixel manipulation for example, it's impossible to draw to the screen fast enough so I had to go a level down. It will be interesting to see how it all pans out....
For all the latest Starquake remake news check out my website - www.endangermice.co.uk

Powered by SMFPacks Menu Editor Mod