News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_doragasu

ROM board with a tiny DMA engine

Started by doragasu, 23:33, 14 January 17

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

AugustoRuiz

Doragasu cannot log in right now, so I'm posting news on his behalf...


Quote from: doragasuAfter a long hiatus, I am resuming this project. I got the PCB fabricated and assembled a prototype. Plugged it to a CPC, and after a bit of debugging, I got basic ROM switch function almost working. And I say almost because ROMs apparently get properly enumerated (the ones printing messages on screen, print them as intended), but once the "READY" prompt appears, the CPC freezes: there is no response to keypresses... I would appreciate help to debug this thing, because I am currently out of ideas.
Also while debugging this, I have noticed something that looks strange. If I unplug the CPC and test continuity between #EXP (pin 48) and GND, I find there is a short. In the schematic I have, this pin is only connected to the PIO and to a 2k2 pull-up, so I suppose I should not see this short! I have 2 CPCs, and this is happening in both of them, so if the PIO has been damaged (presubably because of the ROMBA), it has happened in both of them. So is my PIO broken, or am I missing something? Can anyone please check if there is a short between pin 48 of the expansion port and GND.

Bryce

Answer to doragasu: If you look at the schematics of the disk drive section of the 6128, you'll see that /EXP is connected to GND via LK7. I suspect that Amstrad intended using the EXP signal to detect whether a disk drive is connected (the DDI-1 does the same thing when connected to a 464). Because of this, the EXP signal can't be used on the CPC for anything else.
So your CPCs aren't broken, that's the way it's meant to be.

Bryce.

rpalmer

Bryce,

You can still use the /EXP pin on the expansion port provided that the Disk Interface is not connected.

rpalmer

Bryce

On a 464 yes, but on a 6128 the "interface" is always connected.

Bryce.

gerald

And on a 6128, if you open the link, your cpc will try to boot from a CPM floppy. No more basic  :o

rpalmer

Quote from: gerald on 16:33, 13 March 18
And on a 6128, if you open the link, your cpc will try to boot from a CPM floppy. No more basic  :o

The reason being is the DISC ROM initialisation. If the ROM is detected zero then boot cp/m else the initialisation continues and returns to BASIC.
see below (I have disassembled to the DISCROM from KDS and added comments over a decade ago):

LC1BC
;***************************************************************
;  INITIALISATION OF DISC ROM
;  INPUT
;      C FLAG IF ROM NOT IN SLOT 7
;  CALLED BY
;      NONE.
;
;***************************************************************
       JR   C,LC1C4
       CALL &B912       ;CHECK CURRENT ROM
       OR   A           ;ROM ZER0 ?
       JR   Z,LC1DC     ;YES, REBOOT CP/M
LC1C4  PUSH IY          ;NO
       PUSH DE
       LD   DE,&FB00    ;-&4FF
       ADD  HL,DE        ;SIZE OF WORK SPACE NEEDED FOR DISC ROM
       PUSH HL
       INC  HL
       PUSH HL
       POP  IY
       CALL LC5DD       ;SETUP DPB'S AND SEND SPECIFY COMMAND
       CALL LCCA0       ;SETUP AMSDOS CALLS TO DISC ROM
       POP  HL
       POP  DE
       POP  IY
       SCF
       RET
;
LC1DC
; PURPOSE  BOOT CP/M SYSTEM.
; INPUT
;      NONE
; CALLED BY
;      LC1BC - COMMON ENTRY POINT FOR AUTO BOOT and |CPM
;
       LD   SP,&C000
       LD   IY,&AC48    ;Chain Address to next ticker block
       LD   DE,&AD33    ;ADDRESS OF AREA TO CLEAR
       LD   BC,&A5      ;165 BYTES
       CALL LCAAF       ;CLEAR MEMORY FROM &AD33 TO &ADD8
       LD   HL,&AD41
       DEC  (HL)
       LD   A,&81
       LD   (03),A
       XOR  A
       LD   (04),A
       LD   HL,LC033    ;START OF CONTROL-? JUMP ENTRIES
       LD   DE,&BE80    ;LOCATION TO STORE JUMP TABLE
       LD   BC,&3F      ;SIZE = 63 BYTES
       LDIR             ;DOWNLOAD RSX JUMP TABLE
       CALL LC0C0
       CALL LC5DD       ;SETUP DPB'S AND SEND FDC SPECIFY COMMAND
LC20A  LD   C,&41       ;SECTOR &41
       LD   DE,00       ;(REG E) DRIVE A, (REG D) TRACK = 0
       LD   HL,&0100    ;LOAD ADDRESS
       CALL LC666       ;READ IN BOOT SECTOR
       CALL C,LC2AC     ;CHECK SECTOR CONTENTS
       JR   NC,LC224    ;READ FAILED OR EMPTY BOOT SECTOR
       EX   DE,HL       ;PUT LOAD ADDRESS INTO REG DE
       LD   BC,LC17F    ;ADDRESS OF JUMP VECTOR
       LD   SP,&AD33    ;SET STACK POINTER
       JP   LC177       ;SETUP INTERRUPT ADDRESS & JUMP TO &100
LC224  LD   A,&0F       ;CP/M BOOT FAILED AERT NUMBER
       CALL LCAB8       ;DISPLAY ALERT
       JR   LC20A       ;TRY AGAIN

doragasu

At last I recovered access to my account  ;D

Thanks for the replies. Finally I found the problem and got the ROMBA working (at least as a simple Rombox).

Now I have Augusto working on a test program for the DMA modes  ;)

doragasu

Guess what, DMA is working!!! Both DMA modes in fact:
- We can play digital audio with approximately 0% CPU overhead. The limits are the bits per sample ( 8 ) and the maximum sample length (64 KiB). Sampling rate can be configured on the fly, and ranges from 3906 Hz to 50 kHz (and beyond).
- We can transfer data from external ROM to internal CPC RAM at 1 byte per microsecond. While doing DMA the Z80 is stopped, but this is much faster than doing the copy using software routines.

Too bad the CPLD is not bigger, because on a bigger one, I could try making a 2D DMA engine, and THAT could be really useful to speed up rendering sprites!

doragasu

Quick and dirty DMA test ROM by Augusto Ruiz. The ROM does the following:

1. Display the CPCTelera logo while playing "Hadouken" audio sample (using DMA).
2. Copy the background (16 KiB) to the video memory 50 times using a SW routine, then scroll it a bit.
3. Copy the background (16 KiB) to the video memory 50 times using DMA, then scroll it a bit.
4. Print the screen green.


https://www.youtube.com/watch?v=Yejb4n5Q0BQ

Using DMA is about 5 times faster!  :o 8)

GUNHED

#59
Quite nice!  :) :) :)


And everybody else always told me that DMA wouldn't be possible on CPC.  ;)
http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

Munchausen

This is truly amazing work!


I wonder about the potential for it to operate along side the M4, perhaps they can be mapped separately?

Bryce

Quote from: GUNHED on 23:29, 18 April 18
Quite nice!  :) :) :)

And everybody else always told me that DMA wouldn't be possible on CPC.  ;)

And everybody else was correct. The "DMA" that Doragasu is showing isn't true DMA, because the CPU is being halted while the transfers are happening. In true DMA the CPU would continue to execute commands while the RAM is being accessed by others.

But I'm still impressed with this project, very cool indeed.

Bryce.

rpalmer

Quote from: Bryce on 07:44, 19 April 18
And everybody else was correct. The "DMA" that Doragasu is showing isn't true DMA, because the CPU is being halted while the transfers are happening. In true DMA the CPU would continue to execute commands while the RAM is being accessed by others.

But I'm still impressed with this project, very cool indeed.

Bryce.

Sorry Bryce, but a DMA can "suspend" a Z80 to transfer data (see attached Product Specification for how a Z80 DMA can operate). Doragasu DMA engine would be operating something like "Byte-at-a-time" mode (as defined in the attached PDF).

rpalmer

Bryce

Ok, that's slightly closer to what I would consider "real DMA". I thought he was loading the entire 16K in a single halt. However, the CPU is still not getting to do a lot while the transfer is occuring is it?

Bryce.

tjohnson

I don't really understand what I'm looking at, the logo moves stops then the sound plays then the screen changes and wobbles, is the dma taking place to transfer the sound while the logo is moving?

Sent from my E5823 using Tapatalk


Munchausen

#65
Quote from: tjohnson on 07:53, 20 April 18
I don't really understand what I'm looking at, the logo moves stops then the sound plays then the screen changes and wobbles, is the dma taking place to transfer the sound while the logo is moving?


It's just a quick timing test.


After the logo, an unchanging screen is displayed (with red at top and bottom). Actually, the screen is displayed 50 times, copying from ROM, but it just the same screen so it appears that nothing changed. Then the screen wobbles, and afterwards the same screen is displayed again. This time, the screen is again displayed 50 times, but now copying from ROM using DMA. Finally, the display goes green. Notice how much shorter the time to display the screen 50 times with DMA is than without.


I guess that in principle, this can be applied to any data being copied from ROM to RAM. I didn't really follow why it can't be used for sprites? Is it because it would have to be copied to a non-linear set of locations in video RAM? I guess that you could set up a number of small DMA transfers (one for each scan line of the sprite?) I guess it is not useful for small sprites because the setup time overhead would dominate the time taken for the transfer so it would be pointless?

doragasu

@Bryce Your initial guess is correct: I request the bus to the CPU, so the CPU is completely halted while the 16 KB are transferred. I could try not requesting the bus, and then inserting wait states (to pause Z80 while DMA does transfers), and also lowering the DMA transfer rate (e.g. 1 write each 2 us instead of 1 write per us) to let the CPU do things in parallel. But I am not sure that would be more useful than what I have running right now, because it would imply doubling the time to do a DMA transfer, while heavily slowing CPU speed (50%?) during the transfer. BTW I disagree when you say this is not DMA. It is not a requirement for DMA that the CPU runs in parallel (this is just a desirable feature), and in fact many old systems halt the CPU for doing ROM to RAM DMA (e.g.: the Gameboy Advance and the Genesis/Megadrive).

@tjohnson What is happening is what @Munchausen has explained, with the addition of the "hadouken" audio sample play. The DMA can also be used to play digital audio samples, this time without halting the CPU (let's call this "Audio DMA"). When you start an Audio DMA transfer, samples are read from the specified ROM address and written directly to the "DAC" without disturbing the CPU operation (in fact I synchronize reads so the CPU can continue reading from ROM while the DMA engine also reads audio samples from the ROM!), and at the programmed sampling rate. Basically this allows playing audio samples with almost 0% CPU usage (just the writes to the DMA registers to start the operation).

@Munchausen About moving sprites, basically it is what you have already written: for this to be useful, the DMA engine needs to be aware of the screen layout, to be able to copy "square/rectangle" rom regions. You can program transfers for each line, but as starting a DMA transfer requires 6 writes to I/O region... Also for moving sprites, the DMA engine should have some masking capabilities (e.g. if the pixel colour is 0xF, skip write). I could implement these features on a bigger CPLD/FPGA, but I would not like to increase the cost of the cartridge.



doragasu

Quote from: Munchausen on 23:40, 18 April 18I wonder about the potential for it to operate along side the M4, perhaps they can be mapped separately?

I do not know about M4 internals, but unfortunately I don't think they can "cooperate" (e.g. use DMA to transfer data from M4 memory to internal CPC RAM) because current implementation only allows reading from the ROM embedded inside the cartridge.

Munchausen

Quote from: doragasu on 18:04, 22 April 18
@Munchausen About moving sprites, basically it is what you have already written: for this to be useful, the DMA engine needs to be aware of the screen layout, to be able to copy "square/rectangle" rom regions. You can program transfers for each line, but as starting a DMA transfer requires 6 writes to I/O region... Also for moving sprites, the DMA engine should have some masking capabilities (e.g. if the pixel colour is 0xF, skip write). I could implement these features on a bigger CPLD/FPGA, but I would not like to increase the cost of the cartridge.


Thanks for the explanation. I'm quite amazed how well this works. I don't know what the extra cost would be, but it would be very cool if you could DMA whole sprites, with masking!


Quote from: doragasu
I do not know about M4 internals, but unfortunately I don't think they can "cooperate" (e.g. use DMA to transfer data from M4 memory to internal CPC RAM) because current implementation only allows reading from the ROM embedded inside the cartridge.


I was thinking more that if you could have both connected at the same time, it would be possible to download things using the M4 and copy them into your DMA ROM expansion. But the M4 already maps 16 ROMs, so is it possible to perhaps map your expansion to roms 17-32?


I guess the next question would be, if you can copy to the AY with the only cost being the setup overhead, can you use it to do DMA to other peripherals? Though I'm not sure there is a really good use case.


Duke

Quote from: Munchausen on 22:58, 22 April 18
I was thinking more that if you could have both connected at the same time, it would be possible to download things using the M4 and copy them into your DMA ROM expansion. But the M4 already maps 16 ROMs, so is it possible to perhaps map your expansion to roms 17-32?
Actually M4 maps 0 to 31, but only if they are used, unused rom slots, can be used by any other hardware (no romdis or other action is taken by M4). So there should be no problems as long as it doesn't clash with the M4 rom number itself (default 6).

doragasu

I have currently mapped ROMs 0 to 511 on $DF00 and $DF01 IO addresses. Changing the mapping is just a matter of modifying the mux in the VHDL code, very easy. But you have to be careful with modifications because the CPLD is almost full (96% used so far) and I have yet to add a bit of code to allow writing to the Flash from the CPC itself. If the new mapping requires more logic, it might not fit.

Currently I do not suppport DMA to IO range. I'm starting to repeat myself a lot, but OK, I could implement it on a bigger CPLD/FPGA...

For the CPLD, currently I am using a Lattice LCMXO256C. It has 128 macrocells and costs 2 Eur. I could swap it for a LCMXO640C, that has 320 macrocells, and I suppose all the features we are discussing should fit inside it. But price rises from 2 Eur to 7.5 Eur. For that price maybe it would be better using a small FPGA instead of a CPLD. But that would require redesigning the PCB.

Currently the DMA is more of a proof of concept that something really useful. It happened just because a lot of the CPLD was unused and I wanted to do something with all that logic. But if this picks interest, I might consider designing a second iteration of the cart with a simple sprite engine  :D

Munchausen

I for one would love it. DMA to any IO device would be awesome!


I started to wonder the other day about what can be achieved if the RAM is exchanged for dual port SRAM...

GUNHED

This project is getting more and more amazing. Will be fun to code a game for it using 1 us transfers.


First thing I would like to do is a video player ;-)

http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

LambdaMikel

... finally Captain Future Video in HiRes TrueColor and 50 FPS  :D
Anybody signed up for doing the DIY 4 GB RAM extension yet?

GUNHED

http://futureos.de --> Get the revolutionary FutureOS (Update: 2023.11.30)
http://futureos.cpc-live.com/files/LambdaSpeak_RSX_by_TFM.zip --> Get the RSX-ROM for LambdaSpeak :-) (Updated: 2021.12.26)

Powered by SMFPacks Menu Editor Mod