Just chiming in with what I’ve found practical. My (far from optimised) player sends 4k samples to the 3 channels and stores 2 frames-worth of DMA list per channel in the RAM. It buffers in new lists every frame and that process takes 15-20% of frame time in total for the 3 channels. I can live with that processing time and the hit on the RAM is minimal.
Big hit is on the storage but like Ast says, if you use loops, have good compression and unpack routine etc. and most importantly, are using a 512k cart, it’s feasible to have all your sfx in samples.
Sound quality of the 4k samples really isn’t too bad!