News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_zhulien

interesting walkthrough video coding a pet to play samples at 60khz

Started by zhulien, 18:18, 09 May 25

Previous topic - Next topic

BSC and 1 Guest are viewing this topic.

zhulien



can a CPC do that fast? if not, is the issue the fastest the CPC can feed the sound chip? 

lightforce6128

A scanline takes 64 NOPs, there are 312 scanlines on one screen, and there are 50 screens in one second. Therefore 64 x 312 x 50 = 998'400 NOPs are available per second. If 60 kHz should be reached, this means for each sample 998'400 / 60'000 = 16.64 NOPs can be used. With some compromises a sample can be output with less than 16 NOPs.

If the PSG is initialized, then the following (unrolled) code sends the sample data:

LD C,#80
LD HL,sample_data
REPEAT block_length
    LD B,#F4 : OUTI      ;;  7
    LD B,#F6 : OUT (C),C ;;  6
    ;;                   ;; --
    ;;                   ;; 13
REND

Some drawbacks with this approach:
  • This does not do any, not even the simplest compression on the sample data, so this will need much memory.
  • The unrolled code also will need much memory.
  • There will not be one, but multiple blocks. Switching from one to the next block will create a short, maybe audible delay.
  • The correct protocol requires to close the data transfer from PPI to PSG with a 'OUT (C),0'. It seems this can be left out in this case (because we are only sending data to always the same register), but this might depend on the chip versions.

If instead of the internal PSG e.g. the Digiblaster is used, then there is no intermediate chip (Z80 -> PPI -> PSG), what makes the output code smaller and faster and also provides linear 8-bit instead of logarithmic 4-bit sound.

andycadley

The fastest way is to use a Plus, it can do three register writes per scanline via DMA and the you have the CPU to augment that (although the shortcut above might not work with the Plus PPI, but it'll still be faster).

Orchestrating all of that, plus the memory requirements, would make it mostly impractical in real terms though.

eto

A second of sampled sound won't be very impressive anyway.

McArti0

CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

Longshot

Quote from: andycadley on Yesterday at 07:55The fastest way is to use a Plus, it can do three register writes per scanline via DMA and the you have the CPU to augment that (although the shortcut above might not work with the Plus PPI, but it'll still be faster).
This is of little interest, but to assess the technical limit, we can establish that on the Plus, it takes 8 µsec to modify a register. (The three DMAs do not operate in parallel with each other).

By generating an Hsync every 8 µsec, we can achieve a frequency of 125.8 kHz (or every 24 µsec with three active DMAs).

The ASIC must read one word per register at each Hsync, or 32,768 occurrences without looping for all usable main RAM, which gives 32,768 x 8 = 262,144 µsec (approximately 13 frames = 0.26 sec  :-\ ).

The DMA REPEAT+LOOP instructions, however, allow looping with a loss of 8 µsec per loop.
Rhaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!!

lightforce6128

Quote from: eto on Yesterday at 15:29A second of sampled sound won't be very impressive anyway.


This is one of the biggest problems with samples on an 8-bit system. (The other one is the logarithmic volume that only fits well with square signals and produces strange artifact noises for any other signal.)

In another thread I asked about how additional memory can be used. If e.g. 4 megabytes are used instead of 64 kilobytes, then the duration rises from a second to a minute - still not enough for a normal song. If 30 kHz are used instead of 60 kHz, then we reach 2 minutes. If two nibbles are compressed in one byte, then we reach 4 minutes - finally enough for a (short) song.

lightforce6128

Quote from: McArti0 on Yesterday at 16:39
Quote from: lightforce6128 on Yesterday at 04:17LD B,#F6 : OUT (C),C ;;  6
this is not necessary. INC B is enough

The OUTI instruction is tricky. I always do it wrong - and did so also in this example. Register B is decremented, not incremented. And it is decremented before the data is transferred to the device. This means we have to start with value #F5, what will output #F4. From there two INC instructions would be necessary to reach the required #F6. But at least some time can be saved by using register DE:

LD C,#80
LD DE,#F5F6
LD HL,sample_data
REPEAT block_length
    LD B,D : OUTI      ;;  6
    LD B,E : OUT (C),C ;;  5
    OUT (C),0          ;;  4
    ;;                 ;; --
    ;;                 ;; 15
REND

With this we get a bit of free time to send the missing 0 to complete the transfer what should work on all chip versions and machines.

McArti0

CPC 6128, Whole 6128 and Only 6128, with .....
NewPAL v3 for use all 128kB RAM by CRTC as VRAM
One chip driver for 512kB(to640) extRAM 6128
TYPICAL :) TV Funai 22FL532/10 with VGA-RGB-in.

lightforce6128

Quote from: McArti0 on Today at 00:59No. Writing to register F6 is completely unnecessary.

Thank you very much for this information. From a former experiment I remembered that sound was only there if at least 'LD BC,#F680 : OUT (C),C' was used after sending a sample. But this was a wrong interpretation. As you said this is not necessary. It is enough (and required) to do this once at the begin of the sample loop. With this we get:

LD B,#F5
LD HL,sample_data
REPEAT block_length
    OUTI  ;;  5
    INC B ;;  1
    ;;    ;; --
    ;;    ;;  6
REND

We can send samples at 166 kHz. Should be enough ...

Especially this frees time to replace the long unrolled code with a normal loop. Also a minimum of decompression (4-bit nibbles instead of bytes) should be possible.

Powered by SMFPacks Menu Editor Mod