I'm thinking about making a ROM board for the CPCs (not plus). I want to use a CPLD instead of discrete logic for the ROM register and the decoding logic.
While having a look to the signals in the CPC EXP connector, I saw the BUSREQ and BUSAK signals and thought: why not trying to fit in the CPLD a tiny DMA controller to accelerate copying data from ROM? Sounds nice to me, but I as I have not studied CPC internals, I don't know if this is possible, and if it has some limitations. Some things that I think might be troublesome:
- RAM refresh: I would like to avoid having to handle refresh. Will this impose a limit on the maximum block size I can copy without losing data?
- Collision with internal CPC peripherals: to do DMA, is it enough just to request the bus (BUSREQ) and start when I get the BUSAK? Or should I be careful with other CPC internal peripherals that might access the bus while I have been given ownership by the CPU (e.g. CRTC accessing RAM to draw the screen).
Any suggestion or warning is welcome ;D
doragasu,
Good luck trying to get a DMA to work, but from what I understand the Z80 "wait" signal is used to get video data and if you use this as well you will have trouble.
The BUSREQ/BUSACK are matched pairs for block transfers and the transfer must be such that the data transfer speed does exceed the speed of the memory itself.
One other factor to account for is interrupt processing which further complicate a DMA implementation and another is that external memory expansions may have other limitations.
rpalmer
Thanks for the info. It looks like it will not be easy, so I think I'll abandon the idea :( . I have found an expansion board using an Intel 8237 DMA controller, but it is for Aleste CPC clones, that had extra pins for DMA in the expansion port...
Could you take some pictures of that board, please?
Sent from my iPhone using Tapatalk
I'm sorry but I don't have such pictures. When I say "I found an expansion board...", I mean I found data about it on the Internet. Fortunately somebody did already took photographs, you can find them here (http://www.cpcwiki.eu/index.php/Magic_Sound_Board).
Cool, thanks!
Sent from my iPhone using Tapatalk
OK, I have been reading here and there, and having a look to the CPC6128 schematics, and I think I'm missing some details to decide about whether the DMA is or not realizable, so here are some more questions...
1.- READY signal pulse duration: I have read (http://cpctech.cpc-live.com/docs/instrtim.html) that each microsecond, the GA pulls low READY signal to introduce WAIT states on the CPU while it fetches two bytes from memory. The reference above also states that this causes instruction executed times to be rounded to 1us, so if I understand properly this mechanism, between one and three wait states are inserted depending on the instruction being executed. Is this right? How is this accomplished? I don't think the GA is partially decoding instructions to know how many WAIT states to insert, I'm sure it must be a quite simple mechanism (that doesn't come to my mind).
2.- CPU WAIT states insertion: READY signal is used to insert WAIT states on the CPU. If I understand this mechanism correctly, this signal must only be asserted during memory reads and writes. So how does the GA know when to pull low READY signal? Again I don't think the GA partially decodes instructions...
3.- /CPU signal: while browsing schematics, I can see that /CPU and /CASADDR signals from the GA, are both used to multiplex CPU and CRTC addresses (also rows and columns). So, I suppose /CPU signal should be 1 MHz with a 25% duty cycle, right? Also how are /CPU and READY signals related? Is the rising edge of /CPU synchronized with the falling edge of READY?
Is there detailed documentation about how exactly these signals work?
You can look at the "gate array decapped" thread, where pople have reverse engineered the gate array from pictures of the die. It is indeed a rather simple chip.
It just locks the CPU bus whenever it needs to access the RAM. Sometimes the CPU does not need the BUS, and no extra wait state results. Sometimes the CPU needs to access the bus, and it has to wait.
If you want to do DMA, you can use the BUSRQ/BUSAK signals to lock the z80 from accessing the bus. But, you can't stop the gate array! This means your DMA engine will need to watch for the WAIT/READY actions from the gate array and only use the remaining cycles. Possible, but probably not that simple.
Thanks for the info!
So it just generates 1 wait state each microsecond "without looking" what's happening on the CPU, right? And if the CPU is not reading/writing, it just ignores the wait state, right?
Yes, that's the idea. I'm not sure about the exact sequencing of what happens on the pins, but the CPU will wait only when it tries to access the bus, during wait states it can still perform any internal operations.
Pulkomandy,
The WAIT/READY as I understand it is mostly for Video data access.
You can use this on a real Z80 DMA to pause the transfers, but as to how effective a DMA becomes is another issue.
See attached, CE/WAIT details for how this works.
rpalmer
I have quickly browsed the Z80 DMA engine, and I have seen that it takes at least 3 cycles (750ns@4MHz) to read a byte, and another 3 to write it. Cycle time of the RAM chips is 270ns, so maybe ìt could be cut to 2 cycles to read and 2 to write. This is of course without taking into account the READY signal.
Unfortunately there is no access to /CAS and /RAS signals. It would have allowed to do page reads/page writes in 1 cycle each (excepting the first one).
It would be interesting to see the timing of the /CAS and /RAS signals generated by the GA.
I have coded a preliminary implementation of the DMA, that reads from flash/ROM (activating fn_oe and fn_ce) and writes to CPC RAM (activating n_wr and n_mreq). On this preliminary implementation, each read/write takes two CPU cycles (I suppose this can be optimized, more on this later). I have also implemented wait states.
I wrote a dirty simulation that copies 3 bytes using DMA, from ROM at $C000 to RAM at $1000. On the second byte write I have inserted a wait state, and two on the third write. This is the resulting simulation. Tips/corrections are welcome:
(http://i.imgur.com/7OgoLwk.png)
Now let's start with the questions:
1. How does it look? Is there anything wrong? Should this work?
2. As I wrote before, this can be optimized. At least I'm sure that I can read from Flash/ROM in a single cycle (Flash chip is rated 90 ns). But can I write to RAM also in a single cycle? I suspect it might be possible. Why? Because of two reasons:
- First: GA reads. I think I read somewhere (I cannot find where) that each microsecond, the GA inserts a wait state, and during it, it reads TWO BYTES. So if in a single cycle, two bytes are read, it should be possible to read just one, shouldn't it?
- Second: The wait states mechanism. If my interpretation of CPC6128 schematics AND the wait states mechanism is right, for example if the CPU wants to read a byte, and the GA inserts a wait state at T2 (see graph below), the CPU buses are "disconected" from RAM (multiplexed) for the CRTC/GA to read the data. When the wait state is removed, the CPU continues the read operation: CPU address and control signals are applied to the RAM, that must perform the read IN A SINGLE CYCLE (T3 in graph below). Is this correct? If affirmative, a single cycle must be enough for the RAM to be read (or I am missing something).
(http://i.imgur.com/24q7a3U.png)
But there is also something that contradicts this... I had a look to the SRAM datasheet (HM4864-2) and it has a 270 ns cycle time... a bit more than a 250 ns cycle... So maybe my interpretation of the schematics and/or the wait states mechanism is wrong...
Liked. Not because I think you'll manage it, but because you've gone to the bother of trying it.
Bryce.
Quote from: Bryce on 22:46, 24 January 17
Liked. Not because I think you'll manage it, but because you've gone to the bother of trying it.
Bryce.
Hehe, thanks for my first like on this forum.
It looks like most people thinks that it is not doable, or that it is very very difficult to get DMA to work, but I'm still missing a detailed explanation about why. I understand that I must make wait states to pause the DMA (using READY signal), and that interrupts might cause problems. But if I implement the wait states and if interrupts are disabled when using DMA (what makes sense, because during DMA the CPU is halted), I don't see why this should not work. What am I missing?
Quote from: doragasu on 07:39, 25 January 17
Hehe, thanks for my first like on this forum.
It looks like most people thinks that it is not doable, or that it is very very difficult to get DMA to work, but I'm still missing a detailed explanation about why. I understand that I must make wait states to pause the DMA (using READY signal), and that interrupts might cause problems. But if I implement the wait states and if interrupts are disabled when using DMA (what makes sense, because during DMA the CPU is halted), I don't see why this should not work. What am I missing?
I don't know enough about the cpc hardware signals so what I am saying can't be treated as fully correct.
From what I understand, using ready to pause the cpu should be ok. I don't see how it would cause a problem for the gate-array or the cpu.
True it may cause interrupts to be delayed but i don't think they will be missed.
Gate-Array will assert them until the cpu answers.
What you may see is that if interrupts delay by more than 32*64us then the next interrupt is moved and the one that synchronises with the vsync may not happen.
BASIC and firmware may see this as a missed interrupt. But for programs that enable and use the DMA and know about it, this is no problem at all.
Which method of DMA are you choosing?
Is it one that interleaves it's access with the cpu so that it runs at the same time and cpu speed is not changed or one that will halt the cpu?
The Atari ST blitter and Amiga blitter can do both.
My thoughts about DMA is that transfering data using the cpu is about 2/3us per cycle at the best, more generally it's around 5us (LDI). so if you can read/write bytes faster than that then the DMA will be better. 1us read, 1 us write is not bad, 1 us read/write would be great.
Another thought, the gate-array controls access to ram for the z80, there is a 74ls which opens the bus to the z80. What I don't know is if the bus is open and it closes the bus for it's access or not. Now, because the gate-array controls the ram, it can read two bytes by toggling cas/ras (not sure which) quickly to read two bytes at a time. You will not get access to these signals from the z80 side.
I think of the ram living on the gate-array side and it is the gate-keeper and opens the gates for the z80 when it wants to.
"Now, because the gate-array controls the ram, it can read two bytes by toggling cas/ras (not sure which) quickly to read two bytes at a time. You will not get access to these signals from the z80 side."
The RAS/CAS signal are for setting up the internal RAM address and not read/write two bytes at a time. You can think of this as setting up the lower address and upper address of RAM within each chip.
The use of the ready line is (as i understand it) for access to RAM by the CRTC, so any use of it will most likely cause the display to "glitch" during display refresh. This may not be a problem in the short term, but will get annoying if it occurs to many times.
Another issue is interrupts which the CPC uses for the time the machine was last reset/switched on. Stopping this will effective leave timings out of sync for programs which rely on it (not to mention VSYNC issues).
Also the GA is not the master, but rather a cooperative friend as it interlaces video access with normal Z80 access and this is why the use of DMA is so difficult to get right on the CPC.
You could implement the Z80 DMA chip, but then the DMA transfer would be suspended whenever an interrupt is seen or when the "READY" line is active, but then you need the ready line to suspend the Z80. So it is like trying turn the page of a book while you stand on it.
rpalmer
Thanks for the information and thoughts, it is much appreciated!
Quote from: arnoldemu on 09:15, 25 January 17
Which method of DMA are you choosing?
Is it one that interleaves it's access with the cpu so that it runs at the same time and cpu speed is not changed or one that will halt the cpu?
I'm using #BUSREQ to stop the CPU, I'm afraid. I didn't even thought about interleaving, and THAT for sure would be difficult. As the READY signal from GA is not directly tied to the latch (it goes through a 82 ohm resistor), I might drive the line and add additional wait states. But that for sure would be a challenge, and also I'm afraid I might stress the GA READY output.
Quote from: arnoldemu on 09:15, 25 January 17My thoughts about DMA is that transfering data using the cpu is about 2/3us per cycle at the best, more generally it's around 5us (LDI). so if you can read/write bytes faster than that then the DMA will be better. 1us read, 1 us write is not bad, 1 us read/write would be great.
I'm positive that I can read from flash in 1 cycle. I suppose I can write in two cycles, maybe one could be possible. But as I do not know exactly how the GA works, I will not know until I test it. Add one wait state, and I should be able to read/write once each microsecond. But I could be wrong.
Quote from: arnoldemu on 09:15, 25 January 17You will not get access to these signals from the z80 side.
Yeah, too bad I cannot access the signals, and also I have seen no detailed timing information about them.
Quote from: rpalmer on 11:13, 25 January 17The RAS/CAS signal are for setting up the internal RAM address and not read/write two bytes at a time. You can think of this as setting up the lower address and upper address of RAM within each chip.
I think he is talking about Page Mode read, which can substantially accelerate sequential reads. You select the row using #RAS and then select the column using #CAS. Then if you want to read another column that is in the same row (page), you can immediately select the column using again #CAS without the need to select the row. Using this method, reads should take around 160 ns, instead of the usual 270 ns. But this doesn't explain how the GA is supposed to read two bytes in a wait cycle (250 ns).
Quote from: rpalmer on 11:13, 25 January 17Another issue is interrupts which the CPC uses for the time the machine was last reset/switched on. Stopping this will effective leave timings out of sync for programs which rely on it (not to mention VSYNC issues).
Hum, I have to investigate this issue. Anyway IIRC #BUSREQ has priority over #IRQ, so maybe I do not even need to disable interrupts, the DMA will just delay them (and the user must make sure the transfer is small enough not to delay them too much).
Quote from: rpalmer on 11:13, 25 January 17You could implement the Z80 DMA chip, but then the DMA transfer would be suspended whenever an interrupt is seen or when the "READY" line is active, but then you need the ready line to suspend the Z80.
About the interrupts, I'm not sure as I wrote above. And about pausing DMA during WAIT cycles, I don't think that it would slow DMA too much (and for sure not more than it already does to the CPU).
As for how the GA works, see the "Gate Array Decapped" thread, in the Hardware section of the forum.
Sent from my iPhone using Tapatalk
Quote from: rpalmer on 11:13, 25 January 17
Also the GA is not the master, but rather a cooperative friend as it interlaces video access with normal Z80 access and this is why the use of DMA is so difficult to get right on the CPC
It's the exact opposite.
The GA is the master, the Z80 just obey the READY/Waitn signal to synchronise to its allocated DRAM access slot.
If you obey the rules, there should not be any issue having a DMA access.
Rules are :
- You must request the bus to the Z80 (Busreq/busask)
- one access bus every 1ms, that is between 2 WAITn you can only access one address. You shall respect the WAITn !
- The address shall be stable half a 4MHz cycle before the read (like the in Z80 timing diagram). This is to allow the address to be stable for the RAS cycle.
Quote from: gerald on 20:11, 25 January 17Rules are :
- You must request the bus to the Z80 (Busreq/busask)
- one access bus every 1ms, that is between 2 WAITn you can only access one address. You shall respect the WAITn !
- The address shall be stable half a 4MHz cycle before the read (like the in Z80 timing diagram). This is to allow the address to be stable for the RAS cycle.
I was suspecting there was a restriction like this. Why only one address per microsecond?
Quote from: doragasu on 21:28, 25 January 17
I was suspecting there was a restriction like this. Why only one address per microsecond?
I made a typo : one access per microsecond (µm), not milisecond (ms).
The GA is doing all the magic of translating the access to the DRAM (ras/cas generation, address multiplexing), and only one access per microsecond is supported.
OK, clear then. I have to give a read to the GA decapping thread to read the juicy details. Anyway, for ROM to RAM writes I should be able to do a read/write every microsecond even with this restriction.
Thanks!
I have started analysing the GA using the schematic from the GA decapped thread. Here you can have a look to some of the signals during a complete sequence (1 us):
(http://i.imgur.com/m7GCAp1.png)
Almost everything makes sense... excepting the READY signal. If you look at the #RAS and #CAS signals, you can see that each microsecond three reads are made: two for the GA (#CPU = 1) and one for the CPU (#CPU = 0). The two GA reads are done using page read mode, as I suspected (#RAS is strobed once, and #CAS is strobed twice). The CPU read is done only if #MREQ = 0 (and it is not a refresh cycle), otherwise #CAS is not lowered. It makes perfect sense...
... BUT... READY signal is lowered when the CPU performs the read, and not when the GA does it!!! Everywhere I have read just the opposite, so maybe my interpretation of the signals is just wrong...
I assume you don't own a logic analyser? If I find time I'll wire up a GA and give you a screenshot of what the reality looks like.
Bryce.
Quote from: Bryce on 22:58, 26 January 17
I assume you don't own a logic analyser? If I find time I'll wire up a GA and give you a screenshot of what the reality looks like.
That would be very appreciated. I have a cheap 12 MHz Aliexpress one, but it is not fast enough to spot changes coming from rising and falling edges of a 16 MHz clock. If you could show in the same capture #PHI, #CAS, #RAS, READY, and #CPU, that could help a lot. #MREQ, #M1, #244EN
can be also interesting (and in fact any other control signal, but these are the ones I'm most intrigued about).
Quote from: doragasu on 22:29, 26 January 17
Almost everything makes sense... excepting the READY signal. If you look at the #RAS and #CAS signals, you can see that each microsecond three reads are made: two for the GA (#CPU = 1) and one for the CPU (#CPU = 0). The two GA reads are done using page read mode, as I suspected (#RAS is strobed once, and #CAS is strobed twice). The CPU read is done only if #MREQ = 0 (and it is not a refresh cycle), otherwise #CAS is not lowered. It makes perfect sense...
... BUT... READY signal is lowered when the CPU performs the read, and not when the GA does it!!! Everywhere I have read just the opposite, so maybe my interpretation of the signals is just wrong...
Indeed, there is a error in the state decoding.
- READY : U304 output should not be inverted (ie it's a NAND2 with one inverted input)
- CCLK : one inverter is missing on the output (ie U306 is a NOR2)
Note that the schematic is a work in progress ;)
Bonus, a trace captured on a 464 + 40010
[attach=2]
@gerald (http://www.cpcwiki.eu/forum/index.php?action=profile;u=250) Thanks a lot! It looks I almost nailed it, other than the READY signal, my drawing looks the same as the capture :)
Quote from: gerald on 18:24, 27 January 17
- READY : U304 output should not be inverted (ie it's a NAND2 with one inverted input)
I suppose you mean it's an AND2 with one inverted input. I'll have to update my drawing. Thanks again for the info and for the RE work on the GA!
Quote from: doragasu on 20:32, 27 January 17I suppose you mean it's an AND2 with one inverted input. I'll have to update my drawing. Thanks again for the info and for the RE work on the GA!
Yes :picard:
I assume you don't need me to capture it now? :)
Bryce.
Quote from: Bryce on 10:38, 28 January 17
I assume you don't need me to capture it now? :)
If you're like me, nothing will prevent you to get your analyser out. But it may not be the best time with all the rust you should have on your bench ;)
Actually, that a trace I captured when developing my RAM/ROM extension card.
Tell me about it, I have never vacuumed this room as often as I have in the last 3 days! :D
Bryce.
Quote from: Bryce on 10:38, 28 January 17
I assume you don't need me to capture it now? :)
Bryce.
I don't need it, but thank you very much anyway!
Working on the schematic right now...
One more question: How much current can I safely draw from the expansion port 5V rail? I made a quick estimation, and I think my cart should not draw more than 50 mA: 19 mA the CPLD, 8 mA the transceivers, 22 mA the flash chip. Flash chip power draw is computed for 4e6 reads per second, and transceivers for inputs/outputs switching at 4 MHz (what will happen only for the clock line). So real power consumption should be much lower.
I don't think 50 mA can cause problems, but I'm asking just to be on the safe side...
50mA is absolutely no problem to take from the expansion port. Other expansions pull a lot more (>200mA).
Bryce.
While bryce is correct that 50ma will not hurt, it must also be considered against what expansions are before/after yours as they too will draw from the CPC if they have no separate power supplied.
As always, thanks for the quick responses!
Quote from: rpalmer on 12:50, 06 February 17
While bryce is correct that 50ma will not hurt, it must also be considered against what expansions are before/after yours as they too will draw from the CPC if they have no separate power supplied.
Of course, but I'm afraid that is out of my control ;)
On Centronics versions of the CPC, the supply traces to the expansion port are seriously wide and can easily supply whatever your PSU can throw at it.
It's only on edge connector CPCs that I'd be worried. The last few millimeters of 5V positive trace go down 0.4mm width. Theoretically it could supply about 1.2A, however it will be getting hot at about 1A and would instantly burn through if any short circuit occured.
Bryce.
i couldn't find much info on the net for these hd64b180rop (http://www.microbeetechnology.com.au/store/hd64b180rop-6mhz-enhanced-z80-microprocessor.html) but it says it has enhancements such as dma inbuilt. I couldn't find whether they are pin compatible with z80 or not either.
Quote from: zhulien on 00:08, 11 February 17
i couldn't find much info on the net for these hd64b180rop (http://www.microbeetechnology.com.au/store/hd64b180rop-6mhz-enhanced-z80-microprocessor.html (http://www.microbeetechnology.com.au/store/hd64b180rop-6mhz-enhanced-z80-microprocessor.html)) but it says it has enhancements such as dma inbuilt. I couldn't find whether they are pin compatible with z80 or not either.
Eh.... No. It has 64pins and a Z80 has 40pins, so it's definitely not pin compatible.
Bryce.
hehe, yes, i missed that 64pins note. silly me.
A render while I wait for the PCBs to arrive...
(http://i.imgur.com/nRwkCap.png)
And now a question:
Currently I have to map on the I/O range 10 registers, the one for the ROM selection, and other 9 for the DMA engine. The ROM select register is at 0xDFXX, but where should I map all the other nine? For the current tests I'm using bits A7 ~ A15, so they are mapped at 0xD80X, 0xD88X, 0xD90X, 0xD98X, etc... but I'm not sure this is a good place to map them. As always, suggestions are welcome.
A fine looking piece of kit. What's the third row of holes above the header for?
Regarding the address assignments, I'd take a look at the table here: http://www.cpcwiki.eu/index.php/I/O_Port_Summary (http://www.cpcwiki.eu/index.php/I/O_Port_Summary) and choose some area that doesn't clash with other popular hardware or choose an address that's already used for something that shouldn't or wouldn't be used with your device at the same time.
Bryce.
Quote from: doragasu on 07:56, 14 February 17
A render while I wait for the PCBs to arrive...
And now a question:
Currently I have to map on the I/O range 10 registers, the one for the ROM selection, and other 9 for the DMA engine. The ROM select register is at 0xDFXX, but where should I map all the other nine? For the current tests I'm using bits A7 ~ A15, so they are mapped at 0xD80X, 0xD88X, 0xD90X, 0xD98X, etc... but I'm not sure this is a good place to map them. As always, suggestions are welcome.
The CPC I/O ports are partially decoded.
So your choice of D8xx and D9xx may not be good:
Hardware device | Read/Write | | | | | | | | | | | | | | | | Port bits |
b15 | b14 | b13 | b12 | b11 | b10 | b9 | b8 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
Gate-Array | Write Only | 0 | 1 | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
RAM Configuration | Write Only | 0 | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
CRTC | Read/Write | - | 0 | - | - | - | - | r1 | r0 | - | - | - | - | - | - | - | - |
ROM select | Write only | - | - | 0 | - | - | - | - | - | - | - | - | - | - | - | - | - |
Printer port | Write only | - | - | - | 0 | - | - | - | - | - | - | - | - | - | - | - | - |
8255 PPI | Read/Write | - | - | - | - | 0 | - | r1 | r0 | - | - | - | - | - | - | - | - |
Expansion Peripherals | Read/Write | - | - | - | - | - | 0 | - | - | - | - | - | - | - | - | - | - |
Ideally choose a range within Expansion Peripherals and one that is not yet taken.
Your rom select port is fine you can freely use that.
Quote from: Bryce on 09:28, 14 February 17
A fine looking piece of kit. What's the third row of holes above the header for?
Third row is connected pin by pin to the second one. It's to be able to solder to the PCB a card edge connector instead of a DIL pin header, in case you prefer connecting the cart directly to the CPC instead of using an MX4. (BTW the render has a flaw, the MX4 connector must be soldered on the other side of the PCB, but I'm too lazy to change it).
Thanks all for the info on the port mappings.
Quote from: doragasu on 07:56, 15 February 17
Third row is connected pin by pin to the second one. It's to be able to solder to the PCB a card edge connector instead of a DIL pin header, in case you prefer connecting the cart directly to the CPC instead of using an MX4. (BTW the render has a flaw, the MX4 connector must be soldered on the other side of the PCB, but I'm too lazy to change it).
Thanks all for the info on the port mappings.
Nice solution.
Bryce.
I have been working on the DMA. Now I synchronize writes with the READY signal. In the following simulation:
- DMA is started writing to $DB00 (will remap these registers later).
- Controller requests DMA (lowers #BUSRQ)
- Controller waits until bus is granted (#BUSAK is lowered).
- Controller waits to the first SYNC pulse (to synchronize following transfers).
- One byte is read from Flash src address in 1 cycle
- Readed byte and dst address are put on the buses, #MREQ and #WR signals are lowered to write to RAM.
- Controller waits until the READY pulse occurs, to make sure the byte is written to CPC internal RAM. Then address and data are tristated, #MREQ and #WR are raised. Addresses are increased (if requested).
- Steps 5 to 7 are repeated until transfer is completed.
Byte read/write (steps 5 to 7) take 1us (4 clock cycles). If my current understanding of the CPC internals is correct, this should work.
(http://i.imgur.com/XlQjagD.png)
I have synthesized the design, and it fits in a 2€ CPLD :) (currently using 104/128 macrocells).
Bonus: I have implemented another DMA mode, anyone dares guessing what is it for? :P
(http://i.imgur.com/S9IZ7h8.png)
I believed it would be possible. Thank you for making it true :)
Is the new dma mode a blitter with masking like on Amiga?
Well, I don't still know if it's possible. It should, according to my understanding of the CPC. But I cannot be sure until I test this on the real machine (captures above are simulations, with stimulus generated by me).
The first DMA mode works with the CPU stopped. Although the "bonus" DMA mode works in parallel with the CPU, I'm afraid it is not a blitter. I do not know how the Amiga blitter works, but I suppose it works in parallel with the CPU, accessing memory while the CPU does not need it. I don't think that can be done with the CPC because I suppose the RAM chips are almost always busy. It looks like the GA is always reading from RAM, even when screen is not being drawn (during CRT vertical blanking for instance).
What could maybe be done is a mode running in parallel with the CPU and stealing cycles from it. E.g. it could run in parallel with CPU, and let the CPU access RAM once each 2 microseconds (instead of once each microsecond). But I don't think this could be very useful...
Doragasu cannot log in right now, so I'm posting news on his behalf...
Quote from: doragasuAfter a long hiatus, I am resuming this project. I got the PCB fabricated and assembled a prototype. Plugged it to a CPC, and after a bit of debugging, I got basic ROM switch function almost working. And I say almost because ROMs apparently get properly enumerated (the ones printing messages on screen, print them as intended), but once the "READY" prompt appears, the CPC freezes: there is no response to keypresses... I would appreciate help to debug this thing, because I am currently out of ideas.
Also while debugging this, I have noticed something that looks strange. If I unplug the CPC and test continuity between #EXP (pin 48) and GND, I find there is a short. In the schematic I have, this pin is only connected to the PIO and to a 2k2 pull-up, so I suppose I should not see this short! I have 2 CPCs, and this is happening in both of them, so if the PIO has been damaged (presubably because of the ROMBA), it has happened in both of them. So is my PIO broken, or am I missing something? Can anyone please check if there is a short between pin 48 of the expansion port and GND.
Answer to doragasu: If you look at the schematics of the disk drive section of the 6128, you'll see that /EXP is connected to GND via LK7. I suspect that Amstrad intended using the EXP signal to detect whether a disk drive is connected (the DDI-1 does the same thing when connected to a 464). Because of this, the EXP signal can't be used on the CPC for anything else.
So your CPCs aren't broken, that's the way it's meant to be.
Bryce.
Bryce,
You can still use the /EXP pin on the expansion port provided that the Disk Interface is not connected.
rpalmer
On a 464 yes, but on a 6128 the "interface" is always connected.
Bryce.
And on a 6128, if you open the link, your cpc will try to boot from a CPM floppy. No more basic :o
Quote from: gerald on 16:33, 13 March 18
And on a 6128, if you open the link, your cpc will try to boot from a CPM floppy. No more basic :o
The reason being is the DISC ROM initialisation. If the ROM is detected zero then boot cp/m else the initialisation continues and returns to BASIC.
see below (I have disassembled to the DISCROM from KDS and added comments over a decade ago):
LC1BC
;***************************************************************
; INITIALISATION OF DISC ROM
; INPUT
; C FLAG IF ROM NOT IN SLOT 7
; CALLED BY
; NONE.
;
;***************************************************************
JR C,LC1C4
CALL &B912 ;CHECK CURRENT ROM
OR A ;ROM ZER0 ?
JR Z,LC1DC ;YES, REBOOT CP/M
LC1C4 PUSH IY ;NO
PUSH DE
LD DE,&FB00 ;-&4FF
ADD HL,DE ;SIZE OF WORK SPACE NEEDED FOR DISC ROM
PUSH HL
INC HL
PUSH HL
POP IY
CALL LC5DD ;SETUP DPB'S AND SEND SPECIFY COMMAND
CALL LCCA0 ;SETUP AMSDOS CALLS TO DISC ROM
POP HL
POP DE
POP IY
SCF
RET
;
LC1DC
; PURPOSE BOOT CP/M SYSTEM.
; INPUT
; NONE
; CALLED BY
; LC1BC - COMMON ENTRY POINT FOR AUTO BOOT and |CPM
;
LD SP,&C000
LD IY,&AC48 ;Chain Address to next ticker block
LD DE,&AD33 ;ADDRESS OF AREA TO CLEAR
LD BC,&A5 ;165 BYTES
CALL LCAAF ;CLEAR MEMORY FROM &AD33 TO &ADD8
LD HL,&AD41
DEC (HL)
LD A,&81
LD (03),A
XOR A
LD (04),A
LD HL,LC033 ;START OF CONTROL-? JUMP ENTRIES
LD DE,&BE80 ;LOCATION TO STORE JUMP TABLE
LD BC,&3F ;SIZE = 63 BYTES
LDIR ;DOWNLOAD RSX JUMP TABLE
CALL LC0C0
CALL LC5DD ;SETUP DPB'S AND SEND FDC SPECIFY COMMAND
LC20A LD C,&41 ;SECTOR &41
LD DE,00 ;(REG E) DRIVE A, (REG D) TRACK = 0
LD HL,&0100 ;LOAD ADDRESS
CALL LC666 ;READ IN BOOT SECTOR
CALL C,LC2AC ;CHECK SECTOR CONTENTS
JR NC,LC224 ;READ FAILED OR EMPTY BOOT SECTOR
EX DE,HL ;PUT LOAD ADDRESS INTO REG DE
LD BC,LC17F ;ADDRESS OF JUMP VECTOR
LD SP,&AD33 ;SET STACK POINTER
JP LC177 ;SETUP INTERRUPT ADDRESS & JUMP TO &100
LC224 LD A,&0F ;CP/M BOOT FAILED AERT NUMBER
CALL LCAB8 ;DISPLAY ALERT
JR LC20A ;TRY AGAIN
At last I recovered access to my account ;D
Thanks for the replies. Finally I found the problem and got the ROMBA working (at least as a simple Rombox).
Now I have Augusto working on a test program for the DMA modes ;)
Guess what, DMA is working!!! Both DMA modes in fact:
- We can play digital audio with approximately 0% CPU overhead. The limits are the bits per sample ( 8 ) and the maximum sample length (64 KiB). Sampling rate can be configured on the fly, and ranges from 3906 Hz to 50 kHz (and beyond).
- We can transfer data from external ROM to internal CPC RAM at 1 byte per microsecond. While doing DMA the Z80 is stopped, but this is much faster than doing the copy using software routines.
Too bad the CPLD is not bigger, because on a bigger one, I could try making a 2D DMA engine, and THAT could be really useful to speed up rendering sprites!
Quick and dirty DMA test ROM by Augusto Ruiz. The ROM does the following:
1. Display the CPCTelera logo while playing "Hadouken" audio sample (using DMA).
2. Copy the background (16 KiB) to the video memory 50 times using a SW routine, then scroll it a bit.
3. Copy the background (16 KiB) to the video memory 50 times using DMA, then scroll it a bit.
4. Print the screen green.
https://www.youtube.com/watch?v=Yejb4n5Q0BQ
Using DMA is about 5 times faster! :o 8)
Quite nice! :) :) :)
And everybody else always told me that DMA wouldn't be possible on CPC. ;)
This is truly amazing work!
I wonder about the potential for it to operate along side the M4, perhaps they can be mapped separately?
Quote from: GUNHED on 23:29, 18 April 18
Quite nice! :) :) :)
And everybody else always told me that DMA wouldn't be possible on CPC. ;)
And everybody else was correct. The "DMA" that Doragasu is showing isn't true DMA, because the CPU is being halted while the transfers are happening. In true DMA the CPU would continue to execute commands while the RAM is being accessed by others.
But I'm still impressed with this project, very cool indeed.
Bryce.
Quote from: Bryce on 07:44, 19 April 18
And everybody else was correct. The "DMA" that Doragasu is showing isn't true DMA, because the CPU is being halted while the transfers are happening. In true DMA the CPU would continue to execute commands while the RAM is being accessed by others.
But I'm still impressed with this project, very cool indeed.
Bryce.
Sorry Bryce, but a DMA can "suspend" a Z80 to transfer data (see attached Product Specification for how a Z80 DMA can operate). Doragasu DMA engine would be operating something like "Byte-at-a-time" mode (as defined in the attached PDF).
rpalmer
Ok, that's slightly closer to what I would consider "real DMA". I thought he was loading the entire 16K in a single halt. However, the CPU is still not getting to do a lot while the transfer is occuring is it?
Bryce.
I don't really understand what I'm looking at, the logo moves stops then the sound plays then the screen changes and wobbles, is the dma taking place to transfer the sound while the logo is moving?
Sent from my E5823 using Tapatalk
Quote from: tjohnson on 07:53, 20 April 18
I don't really understand what I'm looking at, the logo moves stops then the sound plays then the screen changes and wobbles, is the dma taking place to transfer the sound while the logo is moving?
It's just a quick timing test.
After the logo, an unchanging screen is displayed (with red at top and bottom). Actually, the screen is displayed 50 times, copying from ROM, but it just the same screen so it appears that nothing changed. Then the screen wobbles, and afterwards the same screen is displayed again. This time, the screen is again displayed 50 times, but now copying from ROM using DMA. Finally, the display goes green. Notice how much shorter the time to display the screen 50 times with DMA is than without.
I guess that in principle, this can be applied to any data being copied from ROM to RAM. I didn't really follow why it can't be used for sprites? Is it because it would have to be copied to a non-linear set of locations in video RAM? I guess that you could set up a number of small DMA transfers (one for each scan line of the sprite?) I guess it is not useful for small sprites because the setup time overhead would dominate the time taken for the transfer so it would be pointless?
@Bryce (http://www.cpcwiki.eu/forum/index.php?action=profile;u=225) Your initial guess is correct: I request the bus to the CPU, so the CPU is completely halted while the 16 KB are transferred. I could try not requesting the bus, and then inserting wait states (to pause Z80 while DMA does transfers), and also lowering the DMA transfer rate (e.g. 1 write each 2 us instead of 1 write per us) to let the CPU do things in parallel. But I am not sure that would be more useful than what I have running right now, because it would imply doubling the time to do a DMA transfer, while heavily slowing CPU speed (50%?) during the transfer. BTW I disagree when you say this is not DMA. It is not a requirement for DMA that the CPU runs in parallel (this is just a desirable feature), and in fact many old systems halt the CPU for doing ROM to RAM DMA (e.g.: the Gameboy Advance and the Genesis/Megadrive).
@tjohnson (http://www.cpcwiki.eu/forum/index.php?action=profile;u=2129) What is happening is what @Munchausen (http://www.cpcwiki.eu/forum/index.php?action=profile;u=792) has explained, with the addition of the "hadouken" audio sample play. The DMA can also be used to play digital audio samples, this time without halting the CPU (let's call this "Audio DMA"). When you start an Audio DMA transfer, samples are read from the specified ROM address and written directly to the "DAC" without disturbing the CPU operation (in fact I synchronize reads so the CPU can continue reading from ROM while the DMA engine also reads audio samples from the ROM!), and at the programmed sampling rate. Basically this allows playing audio samples with almost 0% CPU usage (just the writes to the DMA registers to start the operation).
@Munchausen (http://www.cpcwiki.eu/forum/index.php?action=profile;u=792) About moving sprites, basically it is what you have already written: for this to be useful, the DMA engine needs to be aware of the screen layout, to be able to copy "square/rectangle" rom regions. You can program transfers for each line, but as starting a DMA transfer requires 6 writes to I/O region... Also for moving sprites, the DMA engine should have some masking capabilities (e.g. if the pixel colour is 0xF, skip write). I could implement these features on a bigger CPLD/FPGA, but I would not like to increase the cost of the cartridge.
Quote from: Munchausen on 23:40, 18 April 18I wonder about the potential for it to operate along side the M4, perhaps they can be mapped separately?
I do not know about M4 internals, but unfortunately I don't think they can "cooperate" (e.g. use DMA to transfer data from M4 memory to internal CPC RAM) because current implementation only allows reading from the ROM embedded inside the cartridge.
Quote from: doragasu on 18:04, 22 April 18
@Munchausen (http://www.cpcwiki.eu/forum/index.php?action=profile;u=792) About moving sprites, basically it is what you have already written: for this to be useful, the DMA engine needs to be aware of the screen layout, to be able to copy "square/rectangle" rom regions. You can program transfers for each line, but as starting a DMA transfer requires 6 writes to I/O region... Also for moving sprites, the DMA engine should have some masking capabilities (e.g. if the pixel colour is 0xF, skip write). I could implement these features on a bigger CPLD/FPGA, but I would not like to increase the cost of the cartridge.
Thanks for the explanation. I'm quite amazed how well this works. I don't know what the extra cost would be, but it would be very cool if you could DMA whole sprites, with masking!
Quote from: doragasu
I do not know about M4 internals, but unfortunately I don't think they can "cooperate" (e.g. use DMA to transfer data from M4 memory to internal CPC RAM) because current implementation only allows reading from the ROM embedded inside the cartridge.
I was thinking more that if you could have both connected at the same time, it would be possible to download things using the M4 and copy them into your DMA ROM expansion. But the M4 already maps 16 ROMs, so is it possible to perhaps map your expansion to roms 17-32?
I guess the next question would be, if you can copy to the AY with the only cost being the setup overhead, can you use it to do DMA to other peripherals? Though I'm not sure there is a really good use case.
Quote from: Munchausen on 22:58, 22 April 18
I was thinking more that if you could have both connected at the same time, it would be possible to download things using the M4 and copy them into your DMA ROM expansion. But the M4 already maps 16 ROMs, so is it possible to perhaps map your expansion to roms 17-32?
Actually M4 maps 0 to 31, but only if they are used, unused rom slots, can be used by any other hardware (no romdis or other action is taken by M4). So there should be no problems as long as it doesn't clash with the M4 rom number itself (default 6).
I have currently mapped ROMs 0 to 511 on $DF00 and $DF01 IO addresses. Changing the mapping is just a matter of modifying the mux in the VHDL code, very easy. But you have to be careful with modifications because the CPLD is almost full (96% used so far) and I have yet to add a bit of code to allow writing to the Flash from the CPC itself. If the new mapping requires more logic, it might not fit.
Currently I do not suppport DMA to IO range. I'm starting to repeat myself a lot, but OK, I could implement it on a bigger CPLD/FPGA...
For the CPLD, currently I am using a Lattice LCMXO256C. It has 128 macrocells and costs 2 Eur. I could swap it for a LCMXO640C, that has 320 macrocells, and I suppose all the features we are discussing should fit inside it. But price rises from 2 Eur to 7.5 Eur. For that price maybe it would be better using a small FPGA instead of a CPLD. But that would require redesigning the PCB.
Currently the DMA is more of a proof of concept that something really useful. It happened just because a lot of the CPLD was unused and I wanted to do something with all that logic. But if this picks interest, I might consider designing a second iteration of the cart with a simple sprite engine :D
I for one would love it. DMA to any IO device would be awesome!
I started to wonder the other day about what can be achieved if the RAM is exchanged for dual port SRAM...
This project is getting more and more amazing. Will be fun to code a game for it using 1 us transfers.
First thing I would like to do is a video player ;-)
... finally Captain Future Video in HiRes TrueColor and 50 FPS :D
Anybody signed up for doing the DIY 4 GB RAM extension yet?
I want a pair of them! :) :) :)
How does it work? Does it transfer only from ROM to RAM or can it also transfer Bytes within the first 64kb of RAM?
Digging up an old topic because I'm fully interested in that project: any progress? any links to source which to build one?
Quote from: doragasu on 07:56, 14 February 17
(http://i.imgur.com/nRwkCap.png)
The cart is 100% working, including DMA functions. Unfortunately it seems development of the two games that were considering using it is halted, and no more devs have shown interest on it, so it is collecting dust in a drawer.
I usually open source my designs after a game has been released for them, but if there is interest, I might consider open sourcing this without waiting anymore, since maybe that is better than having it growing a thicker dust layer.
Well said! I'm pretty sure that new games/tools projects can use it w/o waiting. 8)
The way I see it, the M4 does DMA into ROM space. Because you are making a ROMBoard, you can very well do DMA to ROM space also. In fact, even if you make an external RAM, you can DMA to that external RAM too - just the CTRC for example cannot display directly from that external RAM - but you could enhance a CPC a lot with such DMA - such as super fast vector graphics maths in RAM, super fast new instructions to do memory transformations (almost instantly from a Z80 point of view).
Hi!
Somehow I think most of us missed that the card is finished and working now. It would be great if you open a new topic, something like "XXX release" and explain the working card in the first post. Also please tell us where to buy it and how much it would cost. :) :) :)
I'm very interested in supporting the card with my software, but it's kind of hard to read through all this posts.
In this forum lots of posts get published, it's easy to miss the important stuff (like this great expansion) between all the other things.
Quote from: doragasu on 18:06, 03 October 20
The cart is 100% working, including DMA functions. Unfortunately it seems development of the two games that were considering using it is halted, and no more devs have shown interest on it, so it is collecting dust in a drawer.
I usually open source my designs after a game has been released for them, but if there is interest, I might consider open sourcing this without waiting anymore, since maybe that is better than having it growing a thicker dust layer.
That's a great idea. Currently you cannot buy it. I am pretty busy and I have not produced the carts. When I release the schematics and sources, if anyone is interested in producing it, there is no problem. If no one wants to produce it, but it gains enough interest to send a batch to the assembly line, I can do the work.
But first I need some time to seat and release everything properly.
Hi doragasu
Very insteressting project...
I must have missed this project first time around!
How does it work?
What CPLD is used and can the CPLD be reprogrammed from the CPC or is a special device needed for (re)programming the CPLD?
My knowledge of CPLD's are minimal :(
Please do release the documentation...
Regards,Ygdrazil
Quote from: doragasu on 08:47, 10 October 20That's a great idea. Currently you cannot buy it. I am pretty busy and I have not produced the carts. When I release the schematics and sources, if anyone is interested in producing it, there is no problem. If no one wants to produce it, but it gains enough interest to send a batch to the assembly line, I can do the work.
But first I need some time to seat and release everything properly.
Quote from: doragasu on 08:47, 10 October 20
That's a great idea. Currently you cannot buy it. I am pretty busy and I have not produced the carts. When I release the schematics and sources, if anyone is interested in producing it, there is no problem. If no one wants to produce it, but it gains enough interest to send a batch to the assembly line, I can do the work.
But first I need some time to seat and release everything properly.
In this community we have some specialists in making PCBs for the CPC. Would be great if they can help. :)
I would love to buy two cards. If anybody else is interested please post here, so we can assess how big a first batch of this great project should be. :)
The CPLD is a LAMXO256C from Lattice. To program it, you need a programmer, but building one is very cheap, you can use any FT2232 board like this: https://www.aliexpress.com/item/32961246303.html
I built one using that board and designed and 3D printed a case:
(https://i.imgur.com/zkj0zAi.jpg)
To write ROMs into the cart, you need a programmer. I have also designed. It is very simple and it just has a single microcontroller chip and a bunch of passives. Here you can see the programmer with the cart attached:
(https://i.imgur.com/q2sK2JH.jpg)
The cart case in the photo is one I designed, 3D printed in ABS, then did a treatment using acetone steam to smooth the surface. Some more photos:
(https://i.imgur.com/NTrlSeK.jpg)
(https://i.imgur.com/62bFvRQ.jpg)
(https://i.imgur.com/rBicdaf.jpg)
An MX4 version would be great, or a version with drive through expansion port. ;)
The PCB has been designed with dual connector, so you can mount a DIL connector and plug it into an MX4. And as you can see, the programmer has also the connector, so you can program carts with the MX4 format. What I have not done is designing a 3D printable cartridge case using the MX4 format.
Great, now I just need to wait until I can get a ready-to-be-used card.
Open sourcing it would be great. I'm curious about how you deal with the GA restrictions. It would be a shame that all your work will end in a bin.
@Duke (https://www.cpcwiki.eu/forum/index.php?action=profile;u=1624) could an M4 board do DMA with an internal software upgrade?
Quote from: roudoudou on 12:22, 24 October 20
@Duke (https://www.cpcwiki.eu/forum/index.php?action=profile;u=1624) could an M4 board do DMA with an internal software upgrade?
Maybe possible, however it would probably be at the cost of some other feature, like rom/cart emu/hack menu at runtime.
Quote from: roudoudou on 12:22, 24 October 20
@Duke (https://www.cpcwiki.eu/forum/index.php?action=profile;u=1624) could an M4 board do DMA with an internal software upgrade?
What would the gain?
DMA is great for RAM, but not for ROM - well, except Cartridges maybe.
Quote from: GUNHED on 18:23, 25 October 20
What would the gain?
have DMAs with the M4 card ;D
;D ;D ;D
Well, it took a lot of time, but finally I had a slot to sit down and open source the project. You can find everything (schematics, sources, HDL, case CAD files) here:
https://gitlab.com/doragasu/romba
Currently documentation is scarce, but I plan to add at least some more README.md files for each subproject. Again, when I get the time, so no promises about the schedule.
Great! 8)
Now, with this kind of free project, it is IMHO better to not offer the gerber files to avoid eBay peoples making money on your back by only selling PCB to the final users without any support. The technical content to provide your knowledge is far enough. Thank you!!!
Well, let's say I was not planning to become a millionaire with this :D
Hi, reading the internet documentation this thing is super amazing!!!
Will there be somebody who can built them and sell me one?
Since I'm more a software person I would like to support it with FutureOS (and future games) too.
Quote from: doragasu on 21:16, 08 December 20
Well, let's say I was not planning to become a millionaire with this :D
Obviously... :-\
I have added more README.md files in each software/firmware subproject, and updated the main one. Find it in the Gitlab repo:
https://gitlab.com/doragasu/romba/
I have added almost all the documentation I wanted, excepting the registers mapping info and a brief description about how to use them.
And finally added the programming guide, explaining how CPC developers can program the thing. Read it here:
https://gitlab.com/doragasu/romba/-/blob/master/PROGRAMMING.md
BTW, support for RetroVM emulator was added some time ago, but I do not know if the code hit the public releases. Time to talk to the author to see if it is supported.
Looking at my RVM looks like there's support for X-Mem and Dandanator, and that's it.
Is not essential, but having at leas one emulator supporting this would be very beneficial. I don't know what was 4Hz using for development, may be a private version of RVM with ROMBA support?
4MHz have never used (as far as I know) my cart. The only people that are working with it are Augusto Ruíz and Rhino from Batman Group (but there is a lot of time since I talked to him, so I do not know how development is going).
Quote from: doragasu on 18:21, 12 December 20
4MHz have never used (as far as I know) my cart. The only people that are working with it are Augusto Ruíz and Rhino from Batman Group (but there is a lot of time since I talked to him, so I do not know how development is going).
that's some serious news!
Quote from: roudoudou on 18:41, 12 December 20
that's some serious news!
Well, I do not know if they will use my cart though. I think Augusto most likely will, but about Rhino, I am not that sure.
My bad then!
I kind of recall someone mentioning DMA re: Lady Phoenix and scroll, so I assumed it was based on your ROMBA. I'm probably wrong and it wasn't even using DMA :D
EDIT: that's right, they were using the Dandanator, as explained here: https://www.retromaniac.es/2018/04/tres-juegos-para-tres-grafistas.html
Quote from: doragasu on 19:08, 12 December 20
Well, I do not know if they will use my cart though. I think Augusto most likely will, but about Rhino, I am not that sure.
Vespertino, the next game from Batman Group game will surely use cartridge ... according to these words of the Rhino himself
https://www.cpcwiki.eu/forum/games/new-game-from-bg-games-it-will-be-announced-tonight/msg174201/#msg174201
Well, who will build a batch of this great card?
I do not own the equipment required for producing these in series, so that's the reason I have absolutely no problem if anyone wants to produce the carts and take the money.
But if nobody wants to make them, I can try building a batch. But for that to happen, it needs to pick interest, so I can send the PCBs to an assembler without the cost getting too high (as you might know, cost scales down with volume, making a few boards is very expensive, making a lot of them is cheap). And that is the reason I wanted a game released for these beauties. Because it might not be very interesting buying one if you have nothing special to run on it.
It would also be interesting to know how much people is willing to pay for the cart, so I can have an idea of how many boards I need to assemble to meet the price point.
I have all the equipment to build these. If there is enough interest I could run a small batch.
Bryce.
Quote from: Bryce on 09:32, 14 December 20
I have all the equipment to build these. If there is enough interest I could run a small batch.
Bryce.
OK, here we go !! .. 1 Romba for me, please :D
Hi,
Im interested too. Please one for me.
Thanks!
Quote from: doragasu on 17:48, 12 December 20
And finally added the programming guide, explaining how CPC developers can program the thing. Read it here:
https://gitlab.com/doragasu/romba/-/blob/master/PROGRAMMING.md (https://gitlab.com/doragasu/romba/-/blob/master/PROGRAMMING.md)
Eventually I had time to look at the documentation, it's great. But maybe I did miss a bit...
Missing:
- Is it possible to copy from RAM to RAM
- How do I program the ROM content (8 MB of ROM)
@GUNHED (https://www.cpcwiki.eu/forum/index.php?action=profile;u=2029) Thanks for looking to the docs.
It is not possible to copy from RAM to RAM, currently you can only copy from ROM to RAM, or from ROM to DAC (for sound playback).To program the ROM you need two things:
* The cartridge programmer that is plugged to the USB port of your PC.
* The programmer software (romba-cli).
The programmer software has a separate README.md file explaining how to read and write from the ROM, you can read it here:https://gitlab.com/doragasu/romba/-/blob/master/src/romba-cli/README.md
EDIT: Above I wrote "it is not possible to copy from RAM to RAM", but it is not true. It is in fact possible, but it is not implemented. It could be done with a bigger CPLD, but that would increase the price, and also speed would be half the one on ROM to RAM copy: instead of copying one byte per microsecond, you would get one byte copied each two microseconds.
Today I have good news and bad news.
The good news is that ROMBA has reached version 1.0. I have implemented a new register (CONF) with the bits needed to be able to write to the flash chip from the CPC itself. And with this, the CPLD usage has reached 100%, it's filled to the brim. You can read the details in the programming documentation (https://gitlab.com/doragasu/romba/-/blob/master/PROGRAMMING.md).
The bad news is that today I checked the CPLD at Digikey (LAMXO256C or the compatible LCMXO256C) and they have raised the price from near 2 Eur to 7 Eur. That's a f*cking 3.5x price increase!!! I have been buying these CPLDs since 2017 and they always costed less than 2 Eur! I hope this is temporal due to COVID or other problems, and they go back to their old price, the new one does not make sense. For 7 Eur you can put there a Spartan 3 FPGA with a whole lot more logic.
Ok for spartan 3 ans more fonctions like
Dma
2 more AY
1 sid chip
FPU unit for CPC
...
:P
Quote from: doragasu on 16:42, 27 December 20
The good news is that ROMBA has reached version 1.0. I have implemented a new register (CONF) with the bits needed to be able to write to the flash chip from the CPC itself.
This is awesome!!! :) :) :)
Quote from: roudoudou on 18:01, 27 December 20
Ok for spartan 3 ans more fonctions like
Dma
2 more AY
1 sid chip
That already exists, use the PlayCity and the SpeakSID.
Quote from: doragasu on 16:42, 27 December 20
The bad news is that today I checked the CPLD at Digikey (LAMXO256C or the compatible LCMXO256C) and they have raised the price from near 2 Eur to 7 Eur.
Well, maybe then it's better to use the bigger FPGA and enable RAM to RAM DMA. Even with 2 us per byte this is super awesome. Most software does use copy/move of RAM blocks way more often than from ROM to RAM. :)
With an FPGA I should be able to implement RAM to RAM copy and more DMA channels. And I could make DMA channels more complex. For example the current DMA engine is linear, but I could easily implement a 2D engine able to copy rectangular zones. I have not investigated if it is possible to map the screen RAM to external RAM. If possible, adding also some RAM to the cart could allow the DMA to implement some masking capabilities (e.g. skip copying a pixel according to a bitmap mask). Without the on-cart RAM some masking could also be added, but it would be byte per byte instead of pixel per pixel (i.e. on mode 0 each mask bit would be applied to two pixels, while on mode 1 it would be applied to four pixels).
Anyway I do not think I will be developing this, at least in a short time frame. In my experience, the more complex I make a cart, the less people interested in using it! A bit out of topic, but in addition to this CPC cart, I have developed a WiFi enabled Megadrive cart (https://github.com/doragasu/mw) with a well documented API (https://doragasu.github.io/mw-api/doc/html/index.html) to ease its usage, three different NES cartridges (one of them still not public, the others can be found here (https://github.com/doragasu/mojo-nes) and here (https://gitlab.com/doragasu/mojo-nes-mk3)) and a GameBoy/GameBoy Color cartridge with MBC5 support. Only the most simple designs (the NES cartridge without mapper, the NES cartridge with discrete mapper and the Megadrive cartridge without the WiFi parts assembled) have been used so far. The Megadrive cart with WiFi has not picked a lot of interest from devs, and the same goes for the NES cart with MMC3 support (extended up to 128 megabits) and the MBC5 Gameboy cart. So with each cart I complete, I feel less push to spend time making complex carts!
With the CPC the Gate Array / CRTC can only use the first 64 KB for video memory. That's hardwired.
The Z80 can access lots of KB's, but the Video-RAM must be inside the first 64 KB main memory.
Anything else: Awesome!!!
Well, this might be the cause for the price of the CPLD I am using to skyrocket: https://hackaday.com/2021/01/18/pandemic-chip-shortages-are-shutting-down-automotive-production/
I am using an automotive part (LAMXO256C) so this could be perfectly the cause. I suppose things will get back to normal eventually.
Hi there!
Since this is one of the most fascinating hardware projects for the CPC ... let me please ask ...
Are there any news? What's the status quo? Did the project reach final stage? Can it be bought somewhere?
could I ask something related to the topic
Revaldinho's RAM pack for the 464 entirely replaces the onboard ram with external ram
wouldn't this method open up access to all the pins on the ram chips that are not available on the expansion bus?
that open up further performance options maybe
then, couldn't dma to expanded ram outside the base 64k be ultra fast and have interrupts enabled?
the cpc can just do its own thing while the expanded ram is being written to or read from
i'm imagining a Revaldinho style PCB with built in DMA support and storage device connection (say to IDE disc)
then a scroller game can be playing and when it gets near needing new background bitmaps it tells the device to fill up a 16k bank from a file and the cpc carries on. when it needs it, it's loaded in
or for video playback the device could switch the base 64k like with double buffer but your double buffering the screen part of the base 64k with expanded ram banks. like a 2nd gatearray to determine which banks the cpc see's as comprising the base 64k
you could do all that if the cpc didn't care about what was happening to ram it wasn't directly able to address?
maybe the onboard ram could still be available as well somehow for other purposes (the onboard becomes the 'expanded' ram and the 'expanded' the onboard)
i imagine the device having a small stack to remember commands needed to load files into ram, like
Track 3, Sector 1, Read x sectors to ram bank 5 so the cpc doesn't need to hand hold it just setup and its automatic
i await the 'it won't work' :)
Quote from: martin464 on 14:19, 26 February 24Revaldinho's RAM pack for the 464 entirely replaces the onboard ram with external ram
Not entirely. The GateArray still can only access the internal 64K. If I am not mistaken they are in shadow mode in parallel to the first 64K of the external RAM.
hmmm, this is the kind of thing i'd like to confirm heh
his documentation said a CPC with bad internal ram might start working, that's what made me think it was replacing the internal ram
Edit - you were right Eto.
Checking his doc again reveals that in full shadow mode RAMDIS is asserted for all memory reads 'disabling CPU reads from base memory completely'. But it then reveals 'the internal RAM is only used by the CRTC chip to drive the video display'.
So if even with RAMDIS on all the time, it still reads from internal. This is set in stone somehow I guess the CRTC is directly connected to the internal ram and doesn't go through the GA?
Quote from: martin464 on 11:52, 27 February 24This is set in stone somehow I guess the CRTC is directly connected to the internal ram and doesn't go through the GA?
To understand all of this you probably need to look at the schematics of the CPC (https://www.cpcwiki.eu/imgs/6/6d/Schaltplan_cpc_464.jpg) and how the ICs work together. Unlike most other systems the CPC does not have a video chip. The CRTC and the GateArray together are the "video chip" of the Amstrad. And they both are not directly connected to the normal address and data bus for accessing the screen RAM but to an area which is separated from the normal bus via mixers and latches. That's why you will never be able to use external RAM for that, as there is simply no physical connection from CRTC and GateArray to the external RAM.
It works like this (high level):
The CRTC basically is responsible for the video address generation (except for the lowest bit). The internal RAM ICs put the data that is stored at that address on its data pins and the the Gate Array reads those 8 bits of information. Then the GateArray quickly sets the lowest bit of the address (address+1) and reads another 8 bits of data from the RAM.
Got it thanks Eto. The basic thought I had was if Doragasu's DMA is throttling down it's transfer rate to accommodate the internal RAM (either from not having the pins on the expansion bus or due to the GA/CRTC limitations) then the same is not true of external RAM. One could transfer via DMA at max rate with interrupts enabled if none of the RAM undergoing DMA transfer was connected to the bus?
So you'd want to disconnect blocks of RAM during DMA data transfer from the expansion bus and the device would also need to be a RAM expansion itself in order to manage which 16k blocks were flagged as 'DMA transfer' and 'visible' unless it could somehow merrily do this without causing a problem?
To clarify, say the CPC is running a game, you could load game data into external RAM while the game was still playing without any speed limitations. But external RAM only
You can use nBUSRQ. Swith off CPU and write to internal RAM whatever.
You actually have a windows of only 1µs of every 4µs. The problem is, that you can only write to ram when the gatearray allows it. The Addressbus is disconnecteed from the RAM via the Muxxers for most of the time. So you would be able to transfer a byte every 4µs, which is still faster than what the CPU can achieve, because it would need to read from rom and then write to RAM. That is at least two M1 cycles + 1additional for read and 2 additional for write. That does not include logic for looping or anything else in the Z80.
So from CPU point of view your transfer is much faster (at least 4x), because you can copy every single 4µs cycle.
However at the other end, it is only good for larger memory transfers, as the CPU does not do anything.
From a signal perspective you would first need to signal BUSREQ and then wait for the CPU to send BUSACK, then you can start working.
The next thing you need to do is, to send all the CPU signales (RD, RAMRQ etc), so that the GateArray does its job. If you just try to write to the memory, it will not go through. Also timing is crucial here.
Quote from: martin464 on 10:43, 29 February 24Got it thanks Eto. The basic thought I had was if Doragasu's DMA is throttling down it's transfer rate to accommodate the internal RAM (either from not having the pins on the expansion bus or due to the GA/CRTC limitations) then the same is not true of external RAM. One could transfer via DMA at max rate with interrupts enabled if none of the RAM undergoing DMA transfer was connected to the bus?
So you'd want to disconnect blocks of RAM during DMA data transfer from the expansion bus and the device would also need to be a RAM expansion itself in order to manage which 16k blocks were flagged as 'DMA transfer' and 'visible' unless it could somehow merrily do this without causing a problem?
To clarify, say the CPC is running a game, you could load game data into external RAM while the game was still playing without any speed limitations. But external RAM only
That could be done. But IMO the most interesting use cases for DMA in games would be to write to video RAM, that is not possible with an external peripheral, because of what
@eto has explained.
My current DMA engine is able to copy 1 byte per microsecond, it basically does the following:
1. Request the CPU bus.
2. (in parallel to 1) Synchronize with the GA.
3. Once you got the bus and synchronization, use the allocated CPU slot each microsecond to copy the data.
While doing ROM to RAM DMA, it internally disconnects from the CPC bus to read the data from ROM without disturbing the machine. So if you had on-cart RAM, it could do exactly the same and copy from ROM to on-cart RAM (or from on-cart RAM to on-cart RAM) while the CPC bus is disconnected, achieving up to 4 bytes per microsecond, or even much much higher if you use an on-cart oscillator instead of using the 4 MHz clock from the CPC (although this brings other problems that would require a larger CPLD for the same design to fit).
But as I wrote earlier, what would be the point? The cool use cases I can think about (like for example creating an engine to draw hardware sprites) would require writing to video RAM, and we cannot.
Not only games..... Some of my applications really would *LOVE* the possibility of using DMA transfer from Expansion-RAM block to another. For example: Copy data from block &D4 to &D5 (in first 512 KB E-RAM). Or just to be able to do a quick copy in RAM from one address to another. That would really speed up the work with bigger files being kept im RAM (for working quick with them). :) :) :)
Whenever the DMA card is completely finished, I would like to buy (at least) one them please. The first application I would like to make 'DMA ready' would be my text editor FutureTex I guess. :) :) :)
Forced fast Hi pulse when READY is Low can remember screen data in 373.
Next forced READY to Low and not remeber new data in latch but read data from 373.
Grab data and WR to Lo.
DMA from SCREEN RAM to other place RAM.