Optimal values of CRTC reg 3 for half-character horizontal scrolling

andycadley · 13:32, 11 March 24

I suspect the answer is "it was just easier to build the hardware to assert WAIT regardless."

It could have been designed more like the Speccy, where the Z80 is only slowed when accessing memory visible to the GA and only during the actual screen area, but it'd have been a more complicated design and the savings probably not considered worth it at the time.

martin464 · 17:30, 11 March 24

gotcha thanks Andy, looks like there's a sequencer in the gate array that follows a preset pattern, so the multiplexers always connect ram to either the CPU or CRTC following this 16 step sequence.

andycadley · 17:34, 11 March 24

Seems likely. I guess the design makes sense when you think about the 464 since all of the RAM is potentially video memory and the display can technically cover any part of the border area.

It's a lot more annoying on 128K and above because there is a lot of RAM that could have been used to run code faster, but adding all the bus arbitration logic would probably have been expensive.

martin464 · 03:18, 12 March 24

yes, there'd probably be no overscan otherwise maybe just as well!
I was thinking about this, its quite hard to work out how much it actually slows CPU execution down

most of the instructions seem to do memory access on the same phase of the cycle there's only a few where it's stretching them i suppose it's been worked out... but my guess was maybe a 15% hit to optimised code, maybe it's less than that
i've read some statements on the slowdown that surely must be wrong about it being a major hit, possibly from those Speccy owners you mentioned...

McArti0 · 07:51, 12 March 24

Too many 'maybe' and 'suppose'...
Tell me how much slower stack-based instructions are? CALL, RET, RET z, PUSH HL, POP HL.

martin464 · 14:18, 12 March 24

sure but how to measure this. good to mention the worst case ones
it can be slower but the % slower has to be an average

Looking at these stack based ones are about 20% slower with PUSH coming off worst. JP nnnn is 20% slower.

POP RR	10	12	20%
PUSH RR	11	16	45%
RET	10	12	20%
CALL	17	20	18%

All the 1 nop instructions will not be effected nor are a few others, like JP (IX) is the same. JR is the same and only 1 cycle slower on cc 7 vs 8.

it depends on what your code is doing
If it's using PUSH to copy data the hit is significant and OUT suffers is 33% slower
If your code is using other instructions it depends what they are and every unaffected instruction reduces the total slowdown as a % of the total

So it's impossible to do anything but make an approximate estimate as it's code specific and how many times a piece loops through and what instructions you're using

Ok how about this, about 10-20% normally but double that for routines hitting the stack or IO

Not that any of this is really the biggest problem
Most chunks of code have plenty of scope for optimisation, like remember routines you did in 200nops then figured out a way to do it in 100 and swear to yourself that's now the dogs nuts ultimate, and next day you get it to 88. so until anyone gone all the way optimising can't be blaming the gate array!

News:

Optimal values of CRTC reg 3 for half-character horizontal scrolling

andycadley

martin464

andycadley

martin464

McArti0

martin464