interesting walkthrough video coding a pet to play samples at 60khz

zhulien · 18:18, 09 May 25

can a CPC do that fast? if not, is the issue the fastest the CPC can feed the sound chip?

lightforce6128 · 04:17, 10 May 25

A scanline takes 64 NOPs, there are 312 scanlines on one screen, and there are 50 screens in one second. Therefore 64 x 312 x 50 = 998'400 NOPs are available per second. If 60 kHz should be reached, this means for each sample 998'400 / 60'000 = 16.64 NOPs can be used. With some compromises a sample can be output with less than 16 NOPs.

If the PSG is initialized, then the following (unrolled) code sends the sample data:

Code Select

LD C,#80
LD HL,sample_data
REPEAT block_length
    LD B,#F4 : OUTI      ;;  7
    LD B,#F6 : OUT (C),C ;;  6
    ;;                   ;; --
    ;;                   ;; 13
REND

Some drawbacks with this approach:

This does not do any, not even the simplest compression on the sample data, so this will need much memory.
The unrolled code also will need much memory.
There will not be one, but multiple blocks. Switching from one to the next block will create a short, maybe audible delay.
The correct protocol requires to close the data transfer from PPI to PSG with a 'OUT (C),0'. It seems this can be left out in this case (because we are only sending data to always the same register), but this might depend on the chip versions.

If instead of the internal PSG e.g. the Digiblaster is used, then there is no intermediate chip (Z80 -> PPI -> PSG), what makes the output code smaller and faster and also provides linear 8-bit instead of logarithmic 4-bit sound.

andycadley · 07:55, 10 May 25

The fastest way is to use a Plus, it can do three register writes per scanline via DMA and the you have the CPU to augment that (although the shortcut above might not work with the Plus PPI, but it'll still be faster).

Orchestrating all of that, plus the memory requirements, would make it mostly impractical in real terms though.

eto · 15:29, 10 May 25

A second of sampled sound won't be very impressive anyway.

McArti0 · 16:39, 10 May 25

Quote from: lightforce6128 on 04:17, 10 May 25LD B,#F6 : OUT (C),C ;; 6

this is not necessary. INC B is enough

Longshot · 18:03, 10 May 25

Quote from: andycadley on 07:55, 10 May 25The fastest way is to use a Plus, it can do three register writes per scanline via DMA and the you have the CPU to augment that (although the shortcut above might not work with the Plus PPI, but it'll still be faster).

This is of little interest, but to assess the technical limit, we can establish that on the Plus, it takes 8 µsec to modify a register. (The three DMAs do not operate in parallel with each other).

By generating an Hsync every 8 µsec, we can achieve a frequency of 125.8 kHz (or every 24 µsec with three active DMAs).

The ASIC must read one word per register at each Hsync, or 32,768 occurrences without looping for all usable main RAM, which gives 32,768 x 8 = 262,144 µsec (approximately 13 frames = 0.26 sec $:-\$ ).

The DMA REPEAT+LOOP instructions, however, allow looping with a loss of 8 µsec per loop.

lightforce6128 · 23:11, 10 May 25

Quote from: eto on 15:29, 10 May 25A second of sampled sound won't be very impressive anyway.

This is one of the biggest problems with samples on an 8-bit system. (The other one is the logarithmic volume that only fits well with square signals and produces strange artifact noises for any other signal.)

In another thread I asked about how additional memory can be used. If e.g. 4 megabytes are used instead of 64 kilobytes, then the duration rises from a second to a minute - still not enough for a normal song. If 30 kHz are used instead of 60 kHz, then we reach 2 minutes. If two nibbles are compressed in one byte, then we reach 4 minutes - finally enough for a (short) song.

lightforce6128 · 23:20, 10 May 25

Quote from: McArti0 on 16:39, 10 May 25
Quote from: lightforce6128 on 04:17, 10 May 25LD B,#F6 : OUT (C),C ;; 6
this is not necessary. INC B is enough

The OUTI instruction is tricky. I always do it wrong - and did so also in this example. Register B is decremented, not incremented. And it is decremented before the data is transferred to the device. This means we have to start with value #F5, what will output #F4. From there two INC instructions would be necessary to reach the required #F6. But at least some time can be saved by using register DE:

Code Select

LD C,#80
LD DE,#F5F6
LD HL,sample_data
REPEAT block_length
    LD B,D : OUTI      ;;  6
    LD B,E : OUT (C),C ;;  5
    OUT (C),0          ;;  4
    ;;                 ;; --
    ;;                 ;; 15
REND

With this we get a bit of free time to send the missing 0 to complete the transfer what should work on all chip versions and machines.

McArti0 · 00:59, 11 May 25

Quote from: lightforce6128 on 23:20, 10 May 25
Quote from: McArti0 on 16:39, 10 May 25
Quote from: lightforce6128 on 04:17, 10 May 25LD B,#F6 : OUT (C),C ;; 6
this is not necessary. INC B is enough

required #F6.

No. Writing to register F6 is completely unnecessary.

lightforce6128 · 01:24, 11 May 25

Quote from: McArti0 on 00:59, 11 May 25No. Writing to register F6 is completely unnecessary.

Thank you very much for this information. From a former experiment I remembered that sound was only there if at least 'LD BC,#F680 : OUT (C),C' was used after sending a sample. But this was a wrong interpretation. As you said this is not necessary. It is enough (and required) to do this once at the begin of the sample loop. With this we get:

Code Select

LD B,#F5
LD HL,sample_data
REPEAT block_length
    OUTI  ;;  5
    INC B ;;  1
    ;;    ;; --
    ;;    ;;  6
REND

We can send samples at 166 kHz. Should be enough ...

Especially this frees time to replace the long unrolled code with a normal loop. Also a minimum of decompression (4-bit nibbles instead of bytes) should be possible.

BSC · 20:08, 11 May 25

Quote from: lightforce6128 on 01:24, 11 May 25We can send samples at 166 kHz. Should be enough ...

What is the use of sending samples at such a high frequency? For audio that is supposed to be consumed by humans (and not dogs or bats, for example) 44 Khz is sufficent. See https://en.wikipedia.org/wiki/Nyquist_frequency. Everything higher than that is a waste of computing power and would also need audio data sampled at this crazy high rate.

Is this supposed to be some kind of competition for the retro-computer sample-replay world-record?

McArti0 · 21:52, 11 May 25

you didn't take dithering into account

GUNHED · 22:35, 11 May 25

Well, using Amdrum, LambdaSpeaks Amdrum Emulation or the CPC-Booster+ (also Digiblaster) can allow more then 60 kHz actually.

High end should be at something like 240 kHz.

Prodatron · 11:10, 12 May 25

Quote from: McArti0 on 00:59, 11 May 25No. Writing to register F6 is completely unnecessary.

Really? I mean everyone in history of CPC always used 3 OUTs for sending one "sample" to the PSG (e.g. see actual Arkos Tracker MOD player).

See here:
https://www.cpcwiki.eu/index.php/How_to_access_the_PSG_via_PPI#Writing_to_a_PSG_register

This part, which is using 3 OUTs...

Code Select

ld b,&f4            ; setup register data on PPI port A
out (c),a           ;

ld bc,&f680         ; Tell PSG to write data on PPI port A into selected register
out (c),c           ;

ld bc,&f600         ; Put PSG into inactive state
out (c),c           ;

...is done in every sample player, see e.g. AT MOD player:

Code Select

                ld b,e
                out (c),a      ;#f400 + value.
                ld b,#f6
                out (c),a      ;#f680
                out (c),0

Same for Lightforces first example.
Would be funny, if we were all wrong in the past.

andycadley · 12:38, 12 May 25

I have a sneaking suspicion you may not get audio on a Plus machine if you don't, but someone would have to try to be sure. Shortcutting what you were supposed to do with the PPI was the tip cause of incompatibilities.

MaV · 16:28, 12 May 25

Quote from: BSC on 20:08, 11 May 25For audio that is supposed to be consumed by humans (and not dogs or bats, for example) 44 Khz is sufficent.

You can lower the value even by a considerable amount if you consider the age of your audience.

Targhan · 20:15, 12 May 25

Quote from: andycadley on 12:38, 12 May 25I have a sneaking suspicion you may not get audio on a Plus machine if you don't,

Yup, you're right!

By the way, we have a cow sound running at 44khz in the "CPC Meuuuuhting" (can't find the link, sorry) party demo with SuperSylvestre. Well, it's not as impressive as you may think. We could have used a higher frequency, but you wouldn't hear the difference, as BSC mentioned. It was just made for fun.

McArti0 · 22:24, 12 May 25

Quote from: Prodatron on 11:10, 12 May 25
Quote from: McArti0 on 00:59, 11 May 25No. Writing to register F6 is completely unnecessary.
Really?
Would be funny, if we were all wrong in the past.

Code Select

org #4000

di
call init

ld d,#F5
ld bc,#F400

loop

  inc c
  out (c),c

vsync1
ld a,d : in a,(0) : rra : jr nc,vsync1
vsync2
ld a,d : in a,(0) : rra : jr c,vsync2

jr loop

ret

init
ld bc,#F782 ;PC set PA as OUT
OUT (c),c

ld bc,#F407 ;PA has R sel (vol A channel)
OUT (c),c

ld bc,#F6C0 ;PC move PA to sel R
OUT (c),c

ld bc,#F600 ;PC dis AY
OUT (c),c

ld bc,#F43e ;PA set
OUT (c),c

ld bc,#F680 ;PC open AY and connect PA with Rsel
OUT (c),c

ld bc,#F600 ;PC dis AY
OUT (c),c

; R8

ld bc,#F408 ;PA has R sel (vol A channel)
OUT (c),c

ld bc,#F6C0 ;PC move PA to sel R
OUT (c),c

ld bc,#F600 ;PC dis AY
OUT (c),c

ld bc,#F409 ;PA set
OUT (c),c

ld bc,#F680 ;PC open AY and connect PA with Rsel
OUT (c),c

ld bc,#F600 ;PC dis AY
OUT (c),c

; R0

ld bc,#F400 ;PA has R sel (vol A channel)
OUT (c),c

ld bc,#F6C0 ;PC move PA to sel R
OUT (c),c

ld bc,#F600 ;PC dis AY
OUT (c),c

ld bc,#F400 ;PA set 0
OUT (c),c

ld bc,#F680 ;PC open AY and connect PA with Rsel
OUT (c),c



ret

MaV · 22:56, 12 May 25

Quote from: Targhan on 20:15, 12 May 25"CPC Meuuuuhting"

That one?

lightforce6128 · 23:41, 12 May 25

Quote from: lightforce6128 on 01:24, 11 May 25We can send samples at 166 kHz. Should be enough ...

Quote from: BSC on 20:08, 11 May 25Is this supposed to be some kind of competition for the retro-computer sample-replay world-record?

Quote from: McArti0 on 21:52, 11 May 25you didn't take dithering into account

Dithering? ... Yes that is exactly what I had in mind: Dithering ...

No

In fact I remembered a glorious article about high-end sound with the C64 at 48 kHz (and a big bunch of memory). So yes, I had some kind of competition in mind. Also you are absolutely right: Everything above 44 kHz will not be audible.

What brings us back to dithering: The idea of dithering is to mix a signal with a low resolution (e.g. 1 bit or 4 bit) with noise that will push the single samples sometimes below and sometimes above the bit levels. With this the resolution can be enhanced greatly, lowering or even removing noise artifacts caused by the low resolution.

But the dithering noise itself is added. Although this is a neutral sound, it nevertheless is clearly audible - except you push it above the 16 kHz level. With 44 kHz there is not much room to do this, but with 166 kHz there is. With this, the audio resolution can be increased, noise artifacts can be reduced, and the dithering noise maybe can be lowered until it gets inaudible.

However, professional equipment uses frequencies of more than 1 MHz for this trick.

lightforce6128 · 00:55, 13 May 25

In our Wiki I found this additional information about the sound chip AY-3-8912:

1) There are four different known versions of the chip used for the CPC. Maybe these versions behave differently, maybe not.

2) Internally the chip works with 125 kHz. This means even if its registers are updated with 166 kHz, it will nevertheless only update the volume with 125 kHz.

3) There are no additional control lines on the chip to acknowledge or synchronize with something. This means: If the chip is programmed to read in register values written by the CPU through the PPI (BDIR=1, B1=0), it will do this non-stop. The PPI will remember the last written value and keep it available (latch). What is not clear is if the PSG stops normal operation as long as registers are written. From my observation: It does not care and simply continues its normal work. This means: One can change register values (e.g. the volume level) during operation without altering the mode. This saves two OUT commands per sample.

4) Only the envelope shape register (#0D) somehow recognizes writing and restarts the envelope.

McArti0 · 06:13, 13 May 25

OUTI ;5
INC B ;1
NOP ;1
NOP ;1

1000/8=125kHz

McArti0 · 07:42, 13 May 25

Quote from: lightforce6128 on 23:41, 12 May 25The idea of dithering is to mix a signal with ...

In other words, it is local PWM.

Samples 33333333444444444
Can be replaced with...
333334343434344444
This means that
33333 /3.5, 3.5, 3.5, 3.5/ 44444

Targhan · 10:52, 13 May 25

Quote from: MaV on 22:56, 12 May 25That one?

Probably the right meeting, but not this demo. Ours was subtitled "the vache from hell". Couldn't find it on CPC Power...

McArti0 · 11:22, 13 May 25

Quote from: Targhan on 20:15, 12 May 25
Quote from: andycadley on 12:38, 12 May 25I have a sneaking suspicion you may not get audio on a Plus machine if you don't,
Yup, you're right!

Does any emulator show this?
It would be a very absurd design. Blocking the F4 port to switch something in F6? After all, AY works the same way.

News:

interesting walkthrough video coding a pet to play samples at 60khz