News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu

Creating a replacemant gate array

Started by Bread80, 18:11, 29 April 21

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

McArti0

#50
@Bread80
GUNHED showed that cpc work with clocks 24/6MHz its 166ns per tick.
In your diagram we can see that CPC needs only 1.5 ticks for CAS to CAS and RAS to RAS. (Fastest)
On this overclocked cpc it is 1.5 *166ns=250ns
In 16/4MHz and standard GA CPC we have RAS-CAS-RAS-CAS-CAS 3read/write per1us

So at 32/8MHz we nead only 4 read/write (4xRAS-CAS) cycle per 1us. (2times soft 2times screen ram)
4x250=1000ns/1us .... :P
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Singaja

@Bread80 
This is fascinating. I'm not really that familiar with embedded programming, but I do have a question about code in your 
void fsigs_program_init(PIO pio, uint sm, uint offset, uint first_pin) {
   for (int i=0;i<6;i++) {
      pio_gpio_init(pio, first_pin+i);
   }
   pio_sm_set_consecutive_pindirs(pio, sm, first_pin, 6, true);
   pio_sm_config c = fsigs_program_get_default_config(offset);
   sm_config_set_out_pins(&c, first_pin, 6);
   pio_sm_init(pio, sm, offset, &c);
}

Does loop unrolling provide any advantage, or since it's init it doesn't really matter for blazing fast arm running @ 160mhz?

Bread80

Quote from: McArti0 on 13:57, 26 August 24@Bread80
GUNHED showed that cpc work with clocks 24/6MHz its 166ns per tick.
In your diagram we can see that CPC needs only 1.5 ticks for CAS to CAS and RAS to RAS. (Fastest)
On this overclocked cpc it is 1.5 *166ns=250ns
In 16/4MHz and standard GA CPC we have RAS-CAS-RAS-CAS-CAS 3read/write per1us

So at 32/8MHz we nead only 4 read/write (4xRAS-CAS) cycle per 1us. (2times soft 2times screen ram)
4x250=1000ns/1us .... :P
But you also need time for the multiplexers to actually change the addresses going into the RAMs. As mentioned above that's 40 to 50 ns per change. Four changes per cycle (CPU/CRTC and address high/address low), which adds 160 to 200ns per cycle.

An 8MHz CPU actually needs RAS-CAS-RAS-CAS-RAS-CAS-CAS. Two full cycles for the CPU, then the sequential access for video.

But an 8MHz CPU is doing one memory cycle per half of the 1us gate array cycle. The second half of that 1us will still be mostly occupied by the video access, with the end result that the CPU will be paused every other memory cycle and not much real gain in performance.

Bread80

Quote from: Singaja on 14:12, 26 August 24@Bread80
This is fascinating. I'm not really that familiar with embedded programming, but I do have a question about code in your
void fsigs_program_init(PIO pio, uint sm, uint offset, uint first_pin) {
   for (int i=0;i<6;i++) {
      pio_gpio_init(pio, first_pin+i);
   }
   pio_sm_set_consecutive_pindirs(pio, sm, first_pin, 6, true);
   pio_sm_config c = fsigs_program_get_default_config(offset);
   sm_config_set_out_pins(&c, first_pin, 6);
   pio_sm_init(pio, sm, offset, &c);
}

Does loop unrolling provide any advantage, or since it's init it doesn't really matter for blazing fast arm running @ 160mhz?
There's probably a function which can initialise multiple pins in one go, but I've not bothered to look up what it's called. In any case, this is code which only runs once at startup.

If I really cared about performance I'd be writing directly to hardware registers. The built in functions are all slowed down by parameter validation and generalised code.

McArti0

I think we're misunderstanding each other, I'm talking about another new GA with different timing. How can you talk about a small increase in performance when the bandwidth of soft RAM increases x2. Mips cpu increases x2 too.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Quote from: McArti0 on 15:32, 26 August 24I think we're misunderstanding each other, I'm talking about another new GA with different timing. How can you talk about a small increase in performance when the bandwidth of soft RAM increases x2. Mips cpu increases x2 too.
Quite probably there's a misunderstanding somewhere.

With original hardware and DRAM I think there's little opportunity for improvement. A faster CPU is still constrained by the amount of time required for the gate array to read it's two bytes of data. With a new main board and SRAM then there's tonnes of potential.

Bread80

Quote from: McArti0 on 15:32, 26 August 24I think we're misunderstanding each other, I'm talking about another new GA with different timing. How can you talk about a small increase in performance when the bandwidth of soft RAM increases x2. Mips cpu increases x2 too.
Are you thinking of splitting the video data reads? So each 1ms cycles becomes two 500ns cycles, with on CPU access and one video access per (half) cycle? That might well be doable. Each (half) cycle would then be two memory accesses plus four multiplexer updates. The best case timing there is 2x160ns for memory and 4x40ns for multiplexers, a total of 480ns. Might be possible but very tight.

BTW the original specced RAM is 200ns. 160ns is not twice as fast.

McArti0

Quote from: Bread80 on 17:23, 26 August 24160ns is not twice as fast.
doesn't have to.
The catch is that in the standard 4MHz CPC you read the memory 3x per 1us.
And in the 8MHz version not 6x per 1us but only 4x per 1us
only 1,33x more ...
but GUNHED showed that cpc work overclocked 1,5x
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bryce

Quote from: McArti0 on 17:34, 26 August 24
Quote from: Bread80 on 17:23, 26 August 24160ns is not twice as fast.
doesn't have to.
The catch is that in the standard 4MHz CPC you read the memory 3x per 1us.
And in the 8MHz version not 6x per 1us but only 4x per 1us
only 1,33x more ...
but GUNHED showed that cpc work overclocked 1,5x


I remember looking into this many years ago (just overclocking, not a new GA architecture). Overclocking is possible, but there are issues with reading disks and some restrictions on how you can use the AY and read tapes too. It's an interesting idea anyway.
With the idea you're suggesting, it may be easier to resolve disk issues etc, but from a first glance, I think it will be extremely difficult to get the timing exactly right and it would probably require new firmware (to slow down the routines) to sync the CPU with the FDC and tape reading, which are essentially hardwired to use the CPU clock for timing.

Bryce.

McArti0

#59
If you do it through RP2350 you can do it as a turned on and off speedstep by port 3FFF. Like Normal/Turbo.

ps. I was also thinking about starting to make some ED 00-3F orders because they are free in Z80.
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Quote from: McArti0 on 17:34, 26 August 24
Quote from: Bread80 on 17:23, 26 August 24160ns is not twice as fast.
doesn't have to.
The catch is that in the standard 4MHz CPC you read the memory 3x per 1us.
And in the 8MHz version not 6x per 1us but only 4x per 1us
only 1,33x more ...
but GUNHED showed that cpc work overclocked 1,5x

I'll try arguing my point from a different direction.

A Z80 memory cycles takes four clock cycles. (Typically) two clocks for refresh and two for memory access. The original gate array timings use (roughly) two of those four clocks for reading video data. The CPU is put in a wait state during those video reads but much of that waiting is during the refresh part of the cycle, so doesn't actually slow down the CPU.

If you up the CPU speed to 8MHz and you now get 8 CPU clocks per gate array cycle. If you dedicate the same amount of time to video reads you end up with four clocks for the CPU and four clocks for the video read. Since the CPU still takes four clocks to execute a memory cycle you get four cycles when the CPU is executing and four when it is paused. The end result being that you still only get one CPU instruction executed per gate array cycle.

If you're using 160ns DRAM then you can obviously shorten the length of time required for video reads. It's possible that this period is now short enough that the CPU can now execute two memory cycles per gate array cycle - one operating unimpeded, the other stretched with wait states, but my gut feeling is that there is still not enough time for the CPU to execute two full cycles per gate array cycle.

I'm not saying that it *can't* be done. I'm just saying that *I* don't think it is possible, but I'm happy to be proved wrong.

The only way to answer the question is for someone to sit down with the relevant data sheets (160ns DRAM, multiplexers, Z80, HAL etc) and draw up a timing diagram to see how it pans out.

McArti0

Quote from: Bread80 on 11:32, 27 August 24If you up the CPU speed to 8MHz and you now get 8 CPU clocks per gate array cycle. If you dedicate the same amount of time to video reads you end up with four clocks for the CPU and four clocks for the video read. Since the CPU still takes four clocks to execute a memory cycle you get four cycles when the CPU is executing and four when it is paused. The end result being that you still only get one CPU instruction executed per gate array cycle.
Here you are making a mistake of adding in memory without paper and pencil.
New timing is simple.
Ras-cas for soft 
Ras-cas for video 2+2
Ras-cas for soft
Ras-cas for video 2+2
Sum=8

Imagine you are doing GA to CPC with a throughput of 1MB video per second. Hsync 7,8kHz, Vsync 25Hz.

You change the video ras-cas-cas to ras-cas.

CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

McArti0

CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

McArti0

#63
And error.  Left mod READY is longer than nMREQ  :-X
But mayby no error?
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Quote from: McArti0 on 09:21, 28 August 24Something like that....
As I mentioned above, show me something with *timings* and I might be willing to accept it. Until then it's just a theory.

Bread80

BTW this is the timing diagram I use these days. If shows not just the signal timings but also the update delays for the relevant ICs.

I can share the original SVG if you like, but the forum software won't let me upload it.

GUNHED

Yes, quite some file formats need to be ZIPped to be uploaded.
http://futureos.de --> Get the revolutionary FutureOS (Update: 2024.10.27)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

McArti0

ZIP is accepted by forum
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Here's the zipped SVG. It was created with Inkscape, if that makes any difference.

McArti0

@Bread80 
datasheet 4164-15 say that full cycle is 270ns.
270x2=540>500
it's tight, very tight.

CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Nice work. It looks like that may well work. But, one fly in the ointment: /CPU_ADDR is also the clock for the AY sound chip.

McArti0

#71
Quote from: Bread80 on 12:29, 04 September 24But, one fly in the ointment: /CPU_ADDR is also the clock for the AY sound chip.
Bushnel or Alcorn used to say that if something doesn't work properly, he called it a feature.  ;D
This is a feature to make Atari ST mods work.   :laugh:

Ps. AY in Spectrum 1.77MHz , AY (Yamaha) in Atari ST - 2MHz
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Bread80

Part 3: The 'conditional signals' (/244EN, /MWE and /CAS) which are asserted for a portion of the gate array cycle only if an input pin is asserted (or de-asserted). https://bread80.com/2024/09/16/pico-garry-2350-part-3-csigs-conditional-signals/

Bread80

Part 4: Triggering the /ROMEN and /RAMRD signals. These depend on the states of /RD, A15 and A14 as well as the upper and lower ROM enable states.

https://bread80.com/2024/11/11/pico-garry-2350-part-4-memory-read-select/

McArti0

#74
Quotethe port address of the Gate Array is dependent address line A15 (and only A15).
Are you 100% sure?  :o
LOGON wrote that A14 must be Hi. ::)

CALL &BCC8: REM disable events
OUT &7FFF,&10: REM GA border

OUT &3CFF,&5F: REM A14=0 and CAN NOT SET BORDER COLOR !!!
CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Powered by SMFPacks Menu Editor Mod