News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_roudoudou

assembly LZ4 decrunch

Started by roudoudou, 08:57, 06 September 16

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

roudoudou

Hi there
I wrote a LZ4 decruncher in Z80 assembly last night


LZ4 page & description of the format:
GitHub - Cyan4973/lz4: Extremely Fast Compression algorithm
RealTime Data Compression: LZ4 explained


This version is compliant with the original LZ4 format but i think it might be interresting to make an adaptation for short length data like tiles in a game.
The purpose of this cruncher is decompression speed and good compression
++




;


; Original algorithm by Yann Collet
;; [url=http://cyan4973.github.io/lz4/]LZ4 - Extremely fast compression[/url]
;   This source is based on LZ4 explained
;;   [url=http://fastcompression.blogspot.fr/2011/05/lz4-explained.html]RealTime Data Compression: LZ4 explained[/url]
;
;   Z80 assembly version by edouard.berge (at) gmail (dot) com
;
;

;
;   As the Z80 cannot adress more than 64K in a row
;   the decrunching routine DO NOT handle length
;   bigger than 65535. You've been warned!
;
;
; hl' compressed data adress
; de' negative lenght of outputed data (set -256 to output 256 bytes)
; de  output adress of data
;


LZ4_decrunch





nextsequence
exx
ld a,(hl)
inc hl
ld (lzunpacklength+1),a
exx
and #F0
jr z,lzunpack ; no litteral bytes
srl a
rrca
rrca
rrca
ld b,0
ld c,a
cp 15 ; more bytes for length?
call z,getbytelength
push bc
exx
pop bc
push hl
ex hl,de
add hl,bc ; increase counter
ex hl,de
add hl,bc ; jump over litterals
exx
pop hl
ldir


lzunpack
exx
ld a,d
or e
ret z ; cause last byte is ALWAYS litterals!


; read 2 bytes offset
ld c,(hl)
inc hl
ld b,(hl)
inc hl
push bc
exx
pop hl
push de
ex hl,de
sbc hl,de
pop de


lzunpacklength ld a,#12
and #F
add 4
ld b,0
ld c,a
cp 19 ; more bytes for length?
call z,getbytelength
push bc
ldir


; update counter
exx
ex hl,de
pop bc
add hl,bc
ex hl,de
exx
jr nextsequence


; get additionnal length subroutine
getbytelength
exx
ld a,(hl)
inc hl
exx
cp 255
jr nz,mediumlength
inc b
dec bc
jr getbytelength


mediumlength
add a,c
ld c,a
ld a,b
adc a,0
ld b,a
; bc=length
ret
My pronouns are RASM and ACE

Docent

#1
Quote from: roudoudou on 08:57, 06 September 16
Hi there
I wrote a LZ4 decruncher in Z80 assembly last night


LZ4 page & description of the format:
GitHub - Cyan4973/lz4: Extremely Fast Compression algorithm
RealTime Data Compression: LZ4 explained


This version is compliant with the original LZ4 format but i think it might be interresting to make an adaptation for short length data like tiles in a game.
The purpose of this cruncher is decompression speed and good compression
++




;


; Original algorithm by Yann Collet
;; [url=http://cyan4973.github.io/lz4/]LZ4 - Extremely fast compression[/url]
;   This source is based on LZ4 explained
;;   [url=http://fastcompression.blogspot.fr/2011/05/lz4-explained.html]RealTime Data Compression: LZ4 explained[/url]
;
;   Z80 assembly version by edouard.berge (at) gmail (dot) com
;
;

;
;   As the Z80 cannot adress more than 64K in a row
;   the decrunching routine DO NOT handle length
;   bigger than 65535. You've been warned!
;
;
; hl' compressed data adress
; de' negative lenght of outputed data (set -256 to output 256 bytes)
; de  output adress of data
;


LZ4_decrunch





nextsequence
exx
ld a,(hl)
inc hl
ld (lzunpacklength+1),a
exx
and #F0
jr z,lzunpack ; no litteral bytes
srl a
rrca
rrca
rrca
ld b,0
ld c,a
cp 15 ; more bytes for length?
call z,getbytelength
push bc
exx
pop bc
push hl
ex hl,de
add hl,bc ; increase counter
ex hl,de
add hl,bc ; jump over litterals
exx
pop hl
ldir


lzunpack
exx
ld a,d
or e
ret z ; cause last byte is ALWAYS litterals!


; read 2 bytes offset
ld c,(hl)
inc hl
ld b,(hl)
inc hl
push bc
exx
pop hl
push de
ex hl,de
sbc hl,de
pop de


lzunpacklength ld a,#12
and #F
add 4
ld b,0
ld c,a
cp 19 ; more bytes for length?
call z,getbytelength
push bc
ldir


; update counter
exx
ex hl,de
pop bc
add hl,bc
ex hl,de
exx
jr nextsequence


; get additionnal length subroutine
getbytelength
exx
ld a,(hl)
inc hl
exx
cp 255
jr nz,mediumlength
inc b
dec bc
jr getbytelength


mediumlength
add a,c
ld c,a
ld a,b
adc a,0
ld b,a
; bc=length
ret


Nice work!

I believe the main goal for lz4 is decompression speed, not the compression ratio - there are a few compressors that compress slightly better than lz4 like pucrunch, exomizer or bitbuster.

I did an lz4 decompressor z80 implementation some time ago.
If someone is interested here're its features:
- decompresses files packed by lz4 command line packer and raw lz4 compressed data.
- supports legacy and the newest lz4 file format (except framing)
- fully relocatable code
- total size of 219 bytes size with raw decompression routine below 110 bytes
- cpc firmware friendly (no alternate register set used)
- ROM friendly - no self modifying code or additional memory required 
- simple interface - only start of file in memory and target memory destination required.

and here's the download link: Index of /download/z80

I did quick speed comparison on a Cybernoid II title screen packed with lz4 and my implementation decompresses 10% faster than yours (969564 cycles against 1057880), so there is some place for optimization in your code :) For example, you could speed it up by replacing call z, getbytelength with actual called routine.

roudoudou

nice work too!


there is a little algorithm difference between our respective code (and a part of slowdown is due to this)


i provide the size of outputed data to mine, you provide the compressed size to yours


cause i was already thinking about many independant compressed tiles ;)


i will share your source URL on Yann Collet website!





My pronouns are RASM and ACE

Docent

Quote from: roudoudou on 22:29, 06 September 16
nice work too!

there is a little algorithm difference between our respective code (and a part of slowdown is due to this)

i provide the size of outputed data to mine, you provide the compressed size to yours

cause i was already thinking about many independant compressed tiles ;)

i will share your source URL on Yann Collet website!

Thanks,  actually I spoke with Yann about it a few years ago and it seems that in the end I forgot to send him the source :)
Anyway, I took my code and ran through the profiler and I managed to optimize it a bit more - now it decompresses the same screen I mentioned earlier in 799484 cycles. The following routine, copying 16k of ram takes 787016 cycles so the decompression is pretty fast I'd say..

   ld hl, datatocopy
   ld de, #c000
   ld bc, #4000
   dec bc
   ld a,b
   ld b,c
   ld c,a
   inc b
   inc c
loop:
   ld a,(hl)
   ld (de),a
   inc hl
   inc de
   djnz loop
   dec c
   jr nz,loop

Of course simple ldir at 393256 cycles cant be beaten, but I'm pretty happy with the result... Moreover, I cant see any place for optimization :) Later today I'll clean up the source and put it under the link I posted earlier.
As for tile compression, please remember that block with the size below 13 bytes cannot be compressed, so lz4 might not fit for tile compression especially for smaller one, like 8x8 etc.

roudoudou

#4
Quote from: Docent on 16:58, 10 September 16
As for tile compression, please remember that block with the size below 13 bytes cannot be compressed, so lz4 might not fit for tile compression especially for smaller one, like 8x8 etc.


i'm currently working on a modified version of LZ4, it's pretty good! A uniform tile fit in 5 bytes  ;D


i removed some limitations (4 bytes always literals is a nonsense on Z80 since we do not have enough memory to perform those kind of hash, 5 last bytes always literals has sense only if you want to decrunch in place, which is absolutely NOT the goal of a tile compressor)


as i want to manage many small datas, i also changed 16bits offset to 8bits offset


So it is not LZ4 anymore, welcome to LZ48!


I'll need another topic!


My pronouns are RASM and ACE

Powered by SMFPacks Menu Editor Mod