Author Topic: Help - algorith for optimal sequence of codes (drawing compressed sprites)  (Read 3128 times)

0 Members and 1 Guest are viewing this topic.

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Does anyone here know an algorithm for finding the optimal sequence of codes for drawing a compressed sprite?
I'm trying to write a fairly optimal compressed sprite converter (in java). I have a basic one but I would like to optimize it. My problem is finding the optimal sequence of grouped codes. In grouped I mean:

for example the compressed sprite (at least potentially) can consist of codes O1-O8 and Ok (Opaque pixels, different value each), o1-o8 and ok (opaque pixels, same value each), m1,mk, Mk (masked) and T1-T4, Tk (transparent), eol (end of line). Each takes it's time and fetching next code takes also some time... How can I find the optimal sequence of form T4-m1-O2-o5-eol etc.?

Maybe you know what similar optimization problems are called?


like
0
No reactions

Offline arnoldemu

  • Supporter
  • 6128 Plus
  • *
  • Posts: 5.336
  • Country: gb
    • Unofficial Amstrad WWW Resource
    • Awards
Does anyone here know an algorithm for finding the optimal sequence of codes for drawing a compressed sprite?
I'm trying to write a fairly optimal compressed sprite converter (in java). I have a basic one but I would like to optimize it. My problem is finding the optimal sequence of grouped codes. In grouped I mean:

for example the compressed sprite (at least potentially) can consist of codes O1-O8 and Ok (Opaque pixels, different value each), o1-o8 and ok (opaque pixels, same value each), m1,mk, Mk (masked) and T1-T4, Tk (transparent), eol (end of line). Each takes it's time and fetching next code takes also some time... How can I find the optimal sequence of form T4-m1-O2-o5-eol etc.?

Maybe you know what similar optimization problems are called?
maybe you need to try each combination?

Give each operation a "weight" of how much time it takes.

Work it out and sum the weights, then go back and repeat the line with different choices. Find the smallest weight.
like
0
No reactions
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
I think that with the "brute force" method you proposed it would take too much time to generate the sequence...

First I translate the sprite in a sequence of the form:
ttttmoOOoooOOmMe
where:
t - transparent byte
m - masked byte (different value than the previous byte)
M - masked byte (the same value as the previous byte)
o - opaque byte (different value than previous byte)
O - opaque byte (same value as the previous byte)

then I'm grouping that sequence into:
t4-m1-O3-o2-O3-M2-e
this grouping is optimal for compiled sprite
but for a compressed sprite, the optimal grouping would be:
t4-m1-o8-M2-e (if Iremember correctly...and that because the overhead of fetching the next code...)
but that when all the possible codes are choosable...
What if I can't use o8 - I don't have that code implemented (because I wanted the compressed sprite drawing codes take less mamory...)

One assumption would be to always have o1 and ok choosable...

With the brute force method, how many sequences there would be to check?
The problem here is with the oOOoooOO part...so

o1
from that I can (for a full alphabet):
1. add another o1 -> o1o1
2. group with previous as different values-> o2
2'. group with previous if same values -> O2 [possible here]
3. group as o(k=2)
3'. group as O(k=2) if possible

from 1.:
o1o1 can branch to
o1o1o1 [1]
o1O2 [2 - 2']
o1O(k=2) [3 - 3']

from 2':
O2 can branch to
O2O1 [1]
O3 [2-2']
O(k=3) [3-3']

and so on...
...
and later for example O3 can branch to O3o1, o4, o(k=4)
O3o1 to O3o1o1, O3o2, O3o(k=2)...
So there would be like 3^(length of the {o,O} sequence) sequences to check for every such subsequence of the same type of pixel bytes... So 3^[sprite's width in bytes] for one line max....So the algorithm is of number_of_lines*3^[width in bytes] complexity...(if I'm not wrong)...
... Maybe I'll try to implement that and see how long it takes...I haven't done that because I thought about the hours of waiting for the code to generate something for me to test and so on...

However, isn't there a more efficient way to do it?...
I thought about dinamical programming, but I never done that and I don't see a recursive dependence here...

From what I remember the "greedy algorithm" finds the optimal sequence but only in the case of a full "alphabet" (every code is choosable - o1...o8,ok,O2...O8,ok,m1-...). Although I'm not sure about even that...


Sorry for my chaotic post...
Really messy...
« Last Edit: 19:47, 18 December 13 by ssr86 »
like
0
No reactions

Offline ralferoo

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.035
  • Country: gb
    • Awards
Does anyone here know an algorithm for finding the optimal sequence of codes for drawing a compressed sprite?
Optimal for what?

Execution time? In that case generate instructions to write the data, e.g. LD BC,#1234: LD (HL),C: INC L: INC L: LD (HL),B... etc

Storage size? In that case, there are many approaches, the quickest is run length encoding, you could have a bitmask saying which bytes should be copied. Have alternating lengths of copy/transparent bytes. There are as many different ways to do this as you can think of. It all depends on your data and whether you're optimising for speed or space.

Generally, as soon as you start doing any kind of conditional code, you're going to be slower than otherwise as it can take longer to decide to skip a byte than to write it to memory!
Quote
Maybe you know what similar optimization problems are called?
Speed optimisation for the first. Compression for any of the later ones.
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
I have already implemented a "compiled sprites" and "standard sprites" generator. I have problems with writing an optimal (with the method I described which maybe isn't the best one but I could only think of something like that) one for "compressed sprites". I use terms "compiled sprites" and "compressed sprites" from arnoldemu examples.

I'm going for speed optimization. However compiled sprites take too much space for large ones, so I'm trying to write an "optimal" generator for compressed sprites to see the gain in speed (compressed sprites as in arnolemu examples are faster than "standard" ones) vs memory.

For similar optimization problems I meant something like "coin change-making", "the knacksack problem" or "finding longest common subsequence" for example... I asked that because I thought that maybe it's a common problem and someone already wrote papers about it and there's a solution algorithm...
« Last Edit: 18:21, 19 December 13 by ssr86 »
like
0
No reactions

Offline Grim

  • CPC6128
  • ****
  • Posts: 202
  • Country: gp
  • La pak ba, bèf ka pasé
    • THERE IS NO GAME
    • Awards
Rather than compiling/compressing entire sprite separately, why not breaking the problem to the line level? Bear with me, theoretical gibberish mode ON!

A byte of a sprite can be:
  • (M)asked: contains some pixels that require masking with the background
  • (O)paque: fully filled with pixels (no masking necessary)
  • (U)nused: contains no pixels or mask (empty)

For sprites of the same width, displayed line by line (which is faster than dealing with columns as long as no clipping is involved): For every line of every sprite you have, you build it's format description (eg. a string such as "UUMOOMOM") and store it into a temporary table if it's not already in there (so that your table contains only distinct line-formats).

With 3 possible states for a byte, M, O or U, we can calculate that the theoretic maximum number of distinct line-formats is:

3^<line-width in byte>

So for a 8 bytes sprite-width (quite large by CPC standards!), we can get 6561 possible line-formats. Doh! But do not worry, in practice, you will hopefully never find that much different line-formats in your sprites (unless your pixel artist pixelate at random :).

Let's say we end up with this table:

array(
   'UUMUUUMUU',
   'UMOMUMOMU',
   'MOOOMOOOM',
   'MOOOOOOOM',
   'UMMOOOOMU',
   'UUMOOOMUU',
   'UUUMOMUUU',
   'UUUUMUUUU'
)


There is 8 distinct line-formats for all our sprites. Now we need to generate one custom line drawing routine for each line-format in the table and also encode the sprite data accordingly.

The generated drawing routine for, say, the second line-format ("UMOMUMOMU") could be something like this:
Code: [Select]
draw_UMOMUMOMU:
  ; U
  inc de ; We simply move to the next byte on the line
 
  ; M
  ld a,(de) ; read screen (background)
  and (hl)  ; apply sprite-mask
  inc hl
  or (hl)   ; merge sprite-pixel
  inc hl
  ld (de),a ; write screen
  inc de
 
  ; O
  ldi ; Raw byte copy from sprite data to screen
 
  ; M
  ld a,(de)
  and (hl)
  inc hl
  or (hl)
  inc hl
  ld (de),a
  inc de
 
  ; U
  inc de
 
  ; M
  ld a,(de)
  and (hl)
  inc hl
  or (hl)
  inc hl
  ld (de),a
  inc de
 
  ; O
  ldi
 
  ; M
  ld a,(de)
  and (hl)
  inc hl
  or (hl)
  inc hl
  ld (de),a
  inc de
 
  ; U
  ;All Us at the end of the line can be ignored
 
  ; Some sort of "BC26" calculation here
 
  ret

So for that particular line-format, there is only 10 bytes worth of encoded sprite-data (pixel+mask). I spare you all the usual possible speed optimizations here (eg. aligned sprite data to use inc l instead of inc hl, etc).

Now that the drawing routine is defined, let's see the sprite data encoding. No big deal, we just store the mask and pixel data according to the line-format:
  • (U): No data, move on.
  • (M): We must store two bytes, one for the mask, one for the pixel data.
  • (O): We only store one byte for the pixel data.

And, nearly there, we must also generate, for each sprite, the list of line-drawing routines to use to display it:

Code: [Select]
structure_sprite1:
dw data_sprite1
dw draw_UUMUUUMUU
dw draw_UMOMUMOMU
dw draw_MOOOMOOOM
dw draw_MOOOOOOOM
dw draw_UMMOOOOMU
dw draw_UUMOOOMUU
dw draw_UUUMOMUUU
dw draw_UUUUMUUUU
dw draw_end

Assuming we can abuse the stack (ie. interrupts disabled), we can display a sprite with something like this:

Code: [Select]
draw_sprite1
; save program stack
ld (draw_end+1),sp
; set stack on sprite structure
ld sp,structure_sprite1
pop hl ; get the pointer on sprite data
ret ; Run line-drawing routines
; restore program stack
draw_end ld sp,0
ret

Aye! All in all, it **should be** obviously a bit slower than fully compiled sprite, but hopefully use less space (because all sprites will share their line drawing routines with each others whenever possible, but it all really depends on the sprite graphics in the end).


The method can be improved a little bit. When building the line-format table, for each (O) or (M) bytes, we could also look for the special case where the pixel and/or mask data stay the same in every sprites, so that they could be in-lined in the drawing routine rather than stored multiple time as sprite-data.

For lo-res sprites (mode 0), the mask only has two possible values (because we only have two pixels per byte at that resolution). So we could extend the line-format with two mask types, ML (Mask Left Pixel) and MR (Mask Right Pixel), so that we can generate a drawing routine with the two possible mask values in-lined, hence reducing the output to sprite-data. But, by doing so, the number of generated drawing routines might increase significantly, which can undermine all our efforts.

Holy. Fracking. Sh*t! MacDeath style post!!  :D
« Last Edit: 22:12, 19 December 13 by Grim »
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Thank you, I like the idea very much.
I'll try to implement it and compare the needed time and memory to other methods that I have for now...

But maybe instead of whole lines, take k bytes, where k is a divider of sprite's line length? (my test sprite is 40 lines 16 bytes each)...

However, I still have to find the optimal code sequence for the previous ("byte-level"?) method......
like
0
No reactions

Offline Grim

  • CPC6128
  • ****
  • Posts: 202
  • Country: gp
  • La pak ba, bèf ka pasé
    • THERE IS NO GAME
    • Awards
Not taking whole line will complicate (read: slow down) things quite a bit because of the "bc26" routine (to calculate the screen address of the next line).

Oh ! Yes, about an optimal encoding for compressed sprite. Well, from what I gather, it looks very much like a job for a shortest path algorithm (Dijkstra, A*, BFS, ... any will do) applied to a weighted graph of all the possible (or at least, reasonable) encoding choices you encounter.

The most important part will be to devise a *good* weight function in order to yield the overall optimal path/list of "good" choices. The weight function takes as many inputs as you can provide to come up with a weight value that will indicate how good or bad a particular choice is, eg. taking into account the decoding speed or encoding output size or both.

There must be existing libs in Java to do all of these things without hassle already (well, except for the magic calculations of the weight function that is).
« Last Edit: 01:37, 20 December 13 by Grim »
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Quote
Not taking whole line will complicate (read: slow down) things quite a bit because of the "bc26" routine (to calculate the screen address of the next line).
Yeah, I think speedwise k=length of line is the optimal, but for wider sprites it would take too much memory to have all the line drawing routines, so speed/memory ratio-wise a smaller k would be more "optimal".
As for the next line, for now I use a lookup table with even line addresses, so I don't have to calculate, but only (more or less) pop the address from a table. But I plan to implement also other methods... So in this case the next line calculation is not a problem I think...
Quote
Oh ! Yes, about an optimal encoding for compressed sprite. Well, from what I gather, it looks very much like a job for a shortest path algorithm (Dijkstra, A*, BFS, ... any will do) applied to a weighted graph of all the possible (or at least, reasonable) encoding choices you encounter.
I checked and the greedy algorithms work only in the case of a full choice "alphabet" (o1,o2,o3,...,ok)...When you can only use for example o1,o3,o5,ok then it doesn't produce the optimal grouping sequence...
Quote
The most important part will be to devise a *good* weight function in order to yield the overall optimal path/list of "good" choices. The weight function takes as many inputs as you can provide to come up with a weight value that will indicate how good or bad a particular choice is, eg. taking into account the decoding speed or encoding output size or both.

There must be existing libs in Java to do all of these things without hassle already (well, except for the magic calculations of the weight function that is).
I'll try to find something.
like
0
No reactions

Offline arnoldemu

  • Supporter
  • 6128 Plus
  • *
  • Posts: 5.336
  • Country: gb
    • Unofficial Amstrad WWW Resource
    • Awards
I was thinking of 1 line at a time.

You could also try:
- making a histogram for each method
- making a histogram of each byte
- making a histogram of each mask

Then using this you can allocate them to registers.

This information should help you to decide which method to take. Then you know the number of cycles for masking, drawing opaque, ignoring transparent and you can make a decision of which opcodes to emit, or which code paths to take.

Extending on Grim's idea which is great, knowing how many possible line drawing methods you need may also help you.
Identify each and count them up, then you can have some which are specific and some which are general.


like
0
No reactions
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

Offline fano

  • Supporter
  • 6128 Plus
  • *
  • Posts: 836
  • Country: fr
  • Easter Egg Programmer
    • Easter Egg
    • Awards
Rather than compiling/compressing entire sprite separately, why not breaking the problem to the line level? Bear with me, theoretical gibberish mode ON!
Hey you nearly broken my toy with this solution ! That could be a nice approach for mixing compiled/classics sprites.Just use compiled lines for mask layout and some (HL?) pointer to get pixels, a nice compromise between compiled and classic sprites.That could be used to solve horizontal clipping with compiled sprites, only vertical clipping/splitting would remain a problem.
like
0
No reactions
"NOP" is the perfect program : short , fast and (known) bug free

Follow Easter Egg products on Facebook !

Offline Grim

  • CPC6128
  • ****
  • Posts: 202
  • Country: gp
  • La pak ba, bèf ka pasé
    • THERE IS NO GAME
    • Awards
Hey you nearly broken my toy with this solution !
I thought you were past the age playing with toy, my bad :P

I checked and the greedy algorithms work only in the case of a full choice "alphabet" (o1,o2,o3,...,ok)...When you can only use for example o1,o3,o5,ok then it doesn't produce the optimal grouping sequence...
If you know that some cases (o2, o4, ...) are not possible, either discard them early while parsing the data (so you only get possible case: o1, o3, o5, ...) or put them in the graph anyway and make sure your weight function will assign them a very very bad result (so they've got no chance at all to be part of the optimal result). Why wouldn't that work?
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Quote
If you know that some cases (o2, o4, ...) are not possible, either discard them early while parsing the data (so you only get possible case: o1, o3, o5, ...) or put them in the graph anyway and make sure your weight function will assign them a very very bad result (so they've got no chance at all to be part of the optimal result). Why wouldn't that work?
...
I'm too stupid to have thought of that... :'(
Yeah that should work ;D , thank you very much sir.
like
0
No reactions

Offline TotO

  • 6128 Plus
  • ******
  • Posts: 4.085
  • Country: fr
    • ?area=showdonations;u=4
    • Awards
I thought you were past the age playing with toy, my bad  :P
My fault, as I continue to provide him new sexy ones.  ;D
like
0
No reactions
"You make one mistake in your life and the internet will never let you live it down" (Keith Goodyer)

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
The greedy algorithm doesn't produce the optimal output in this problem... :'(

Here's a counter-example:

"ooOOOOOO"

we choose from o1,o2,o3,o4,o5,o6,o7,o8,ok, O2,O3,O4,O5,O6,O7,O8,Ok

with wages:

oi = (2+2+18)*i+2,  i=1,...,8
ok = 6+(4+18+2+3)*k-1+2, k=1,2,...,104
Oi = 2+2+18*i+2, i=2,...,8
Ok = 10+(18+2+3)*k-1+2, k=1,2,...,104
fetch_next_code = 12

The greedy algorithm will produce "o8" as output with wage 178+12=190.
A better one is for example "o1O7" with wage (24+12)+(118+12)=36+130=166.

Or am I doing something wrong?
like
0
No reactions

Offline ralferoo

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.035
  • Country: gb
    • Awards
Firstly, just because it might help you if you're searching for further stuff online, what you call "wage" is usually called "cost" in English papers on this subject.

But yeah, this is the fundamental problem you need to solve whatever the compression problem, but usually the required amount of backtracking is small. Because it's usually "is a run of literals 'better' than a run of literals, some compression, another run of literals".  Your problem is still to vague to make that decision, because you're still thinking too abstractly, from what I understand (although I don't really understand your terminology).

For example, consider 8-bit run-length encoding. You have a byte that contains a "length from 0-127" and a bit that dete rmines between having n literals or the "next byte" n times. In that case, the encoding "123444567" could be (3,"123"),(3x"4"),(3,"567") or 4+2+4 bytes. The same could also be encoded as (9,"123444567") which is also 10 bytes longer. For this example, you could chose either.

However, "1234444567" looks like you'd always want to encode it as (3,"123"),(4x"4"),(3,"567") which is 10 bytes again rather than the literal which would be 10 bytes. However, depending on the data that follows, that might sway your decision. If the next block would be a literal block, it'd make sense to continue the literal block as the extra byte in the preceding literal would be smaller than the overhead to change coding, if it's a repeated block there's no advantage to using the longer literal version and so we can use the shorter form.

But, if we had a 16-bit RLE, the decision point shifts because there's an increased overhead to change encoding method. Likewise, if you're using a dynamic huffman-style there's probably less cost to changing mode.

The only thing you can really do is to work out what you're actually going to generate and again, as I said, what you're actually optimising for. If it's execution time, you'd want to consider that e.g. INC L takes 1 cycle, LD A,(DE): INC E:LD (HL),A:INC L takes 6 cycles. However, a conditional jump would take 2 or 3 cycles, a DJNZ would take 3 or 4 cycles and so a loop of INC E:DJNZ x is actually almost as slow as copying a literal. The copy loop is even slower as you're now taking 10 cycles per byte. Obviously, you've saved storage space, so if this is your only aim, do it like this. Whatever the shorter representation of data is wins.

On something like the CPC though, the speed is important. Even more so with a large sprite like you're using. So, as we said, the quickest approach is to create a function that plots all the pixels.

A better approach is a hybrid. You could create a list of functions that do a few things, e.g.
Code: [Select]
skip2: INC L: INC L: RET (5 cycles)
skip3: INC L: INC L: INC L: RET (6 cycles)
 copy2: LD A,(DE): INC E:LD (HL),A:INC L: LD A,(DE): INC E:LD (HL),A:INC L: RET (15 cycles)
copyskipcopy: LD A,(DE): INC E:LD (HL),A:INC L: LD A,(DE): INC E:LD (HL),A:INC L: INC L:LD A,(DE): INC E:LD (HL),A:INC L: LD A,(DE): INC E:LD (HL),A:INC L RET (28 cycles)
nextline: LD BC,#800-width: ADD HL,BC: RET (9 cycles)
nextline_skip1: LD BC,#801-width: ADD HL,BC: RET (9 cycles)
etc... These then could become your building blocks and you can then make a list of them. The simplest might be:
Code: [Select]
CALL copy2:CALL skip3:CALL copy2:CALL nextline (20 cycles + 15+6+16+9)
You could even be more clever, and put these on the stack, e.g.:
Code: [Select]
LD (sp_restore+1),SP ;6
DI ;1
LD SP,sprite1 ;4
RET ;3
sp_restore: LD SP,0 ;4
EI ;1
RET ;3

DEFW sp_restore
DEFW nextline, copy2, skip3, copy2 (15+6+16+9)
So, the overhead is bigger per sprite, but the per-operation cost is 5 cycles lower. As the overhead is 25 cycles, this cost is recouped after 5 blocks. The advantage with this approach is you can have special functions. e.g. storing a fixing byte value a couple of times rather than loading it from DE.

As I said before, this is something you need to work out for yourself, but that determination can only really be made once you know the sprite data you're copying - e.g. does it have a lot of gaps or blocks of static colour? Are there repeated elements? But most importantly, which do you value most, speed or space?
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Quote
The only thing you can really do is to work out what you're actually going to generate and again, as I said, what you're actually optimising for. If it's execution time, you'd want to consider that e.g. INC L takes 1 cycle, LD A,(DE): INC E:LD (HL),A:INC L takes 6 cycles. However, a conditional jump would take 2 or 3 cycles, a DJNZ would take 3 or 4 cycles and so a loop of INC E:DJNZ x is actually almost as slow as copying a literal. The copy loop is even slower as you're now taking 10 cycles per byte. Obviously, you've saved storage space, so if this is your only aim, do it like this. Whatever the shorter representation of data is wins.

On something like the CPC though, the speed is important. Even more so with a large sprite like you're using. So, as we said, the quickest approach is to create a function that plots all the pixels.
I'm trying to write a tool for generating various possible sprite outputs. The user will decide if he wants to optimize for speed, memory or the speed/memory ratio.
Quote
A better approach is a hybrid. You could create a list of functions that do a few things, e.g.
That is what I think that I am doing.
For now the user can choose from skip1, skip2,skip3,skip4,skipn, masked_1-8, masked_n for different-valued bytes and same-valued bytes, the same for opaque. I haven't implemented combinations (skip-masked-opaque-skip etc) yet...
Quote
etc... These then could become your building blocks and you can then make a list of them. The simplest might be:
[...]
You could even be more clever, and put these on the stack, e.g.:
Currently (for compressed sprites) the only option is to use a jumptable of codes....
An example line genreated by the program:
Code: [Select]
db &06,&15,&AA,&55,&4E,&C2,&C0,&C1,&63,&FF,&57,&EB,&CF,&EB,&BB,&33,&FF,&15,&55,&AA,&09,&03
The jumptable looks like this:
Code: [Select]
.CprDR_JumpTable:
pop hl:ret:nop   ;; &00
jp CDR_NextLine       ;; &03
inc hl:jp (iy)       ;; &06
jp CDR_skip2       ;; &09
jp CDR_skip3       ;; &0C
jp CDR_skip4       ;; &0F
jp CDR_skipn       ;; &12
jp CDR_masked1       ;; &15
jp CDR_masked2dv       ;; &18
jp CDR_masked3dv       ;; &1B
jp CDR_masked4dv       ;; &1E
jp CDR_masked5dv       ;; &21
jp CDR_masked6dv       ;; &24
jp CDR_masked7dv       ;; &27
jp CDR_masked8dv       ;; &2A
jp CDR_maskedndv       ;; &2D
jp CDR_masked2sv       ;; &30
jp CDR_masked3sv       ;; &33
jp CDR_masked4sv       ;; &36
jp CDR_masked5sv       ;; &39
jp CDR_masked6sv       ;; &3C
jp CDR_masked7sv       ;; &3F
jp CDR_masked8sv       ;; &42
jp CDR_maskednsv       ;; &45
jp CDR_opaque1       ;; &48
jp CDR_opaque2dv       ;; &4B
jp CDR_opaque3dv       ;; &4E
jp CDR_opaque4dv       ;; &51
jp CDR_opaque5dv       ;; &54
jp CDR_opaque6dv       ;; &57
jp CDR_opaque7dv       ;; &5A
jp CDR_opaque8dv       ;; &5D
jp CDR_opaquendv       ;; &60
jp CDR_opaque2sv       ;; &63
jp CDR_opaque3sv       ;; &66
jp CDR_opaque4sv       ;; &69
jp CDR_opaque5sv       ;; &6C
jp CDR_opaque6sv       ;; &6F
jp CDR_opaque7sv       ;; &72
jp CDR_opaque8sv       ;; &75
jp CDR_opaquensv       ;; &78

I will repeat what's the method I'm currently using:

First I translate the sprite bytes to symbols t,o,O,m,M,e (when masking and transparency are on).
The letters meanings are:
t - skip (transparent byte)
m,M - masked bytes
o,O - opaque bytes
e is the end of line

For example the line
Code: [Select]
db &00,&00,&00,&FF,&FF,&C7,&C3,&FF,&C7,&FF,&FF,&DF,&82,&00,&00,&00
would be translated into: "tttoOoooooOomttte"

the upper/lower cases meaning is:
o - opaque with value different than the previous byte
O - opaque with the same value as the previous byte

The optimal grouping for compiled sprites is trivial, here: "t3-O2-o4-O2-o1-m1-e1".
I have problems with determining the optimal grouping for the compressed type.
The simple greedy algorithm don't work as I mentioned earlier.


Quote
Code: CALL copy2:CALL skip3:CALL copy2:CALL nextline (20 cycles + 15+6+16+9)
You could even be more clever, and put these on the stack, e.g.:
Code: LD (sp_restore+1),SP ;6
DI ;1
LD SP,sprite1 ;4
RET ;3
sp_restore: LD SP,0 ;4
EI ;1
RET ;3

DEFW sp_restore
DEFW nextline, copy2, skip3, copy2 (15+6+16+9)
So, the overhead is bigger per sprite, but the per-operation cost is 5 cycles lower. As the overhead is 25 cycles, this cost is recouped after 5 blocks. The advantage with this approach is you can have special functions. e.g. storing a fixing byte value a couple of times rather than loading it from DE.
Thanks for this:)

 
« Last Edit: 21:22, 22 December 13 by ssr86 »
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Ok, I managed to implement the brute force algorithm (same-byte-type [ooOoOoo f.e.] strings not entire lines [tttmoOome f.e.]).
It doesn't take that much time as I feared it would take (actually it's suprisingly fast:D - at least for my test sprite) as many of the branches and possible output sequences can be trashed during the process because the resulting sequences have no chances of being optimal (their current cost is "out of bounds" of the optimal choice (optimal at the moment)).

Could try that in the first place...:/ ...but my experience with brute force algorithms was that of leaving the computer on for the entire night to let the program finish it's tasks...
Sorry for bothering you. However thank you for the few ideas (for methods) given here:)
« Last Edit: 02:26, 23 December 13 by ssr86 »
like
0
No reactions

Offline ssr86

  • CPC664
  • ***
  • Posts: 120
  • Country: pl
    • Awards
Wow, ralferoo I can't thank you enough for the hybrid method idea!
As for the speed criteria the drawing speed up was about 3000 nops (drawing sprite + restoring bg) for my test sprite (but I haven't implemented yet the stack method you proposed which is faster).
I haven't checked the change of memory usage but I'm sure the memory/speed trade off is better than for compiled sprites.
Thanks again. :D

Edit: It seems that there is a mistake in this part (the order of the addresses should be inverted).
Quote
DEFW sp_restore
DEFW nextline, copy2, skip3, copy2 (15+6+16+9)

Implemented the "stack+list_of_addresses" method and it saves about 800nops more for the test sprite (32x40 dual line mode0 pixels) .

For comparison (speed and memory (I'm not sure if the size is right) but there are still some possible optimizations):

JumpTable method - 13361 (draw sprite + save bg) + 8228 (restore bg) = 21589 nops and 664 + 167 + 3*(22+12) = 933 bytes (table, jump codes and sprite data)
List_of_calls - 11965 + 7497 = 19462 nops and 1387 + 695 = 2082 bytes (list of calls + sprite data)
List_of_calls+macros (for smaller blocks) - 11517 + 7049 nops = 18566 nops and 1436 + 642 = 2078 bytes (list of calls with macros in between and sprite data)
Stack+List_of_addresses - 10942 + 6889 = 17831 nops and 941 + 332 = 1273 bytes


 
« Last Edit: 19:12, 25 December 13 by ssr86 »
like
0
No reactions

Offline ralferoo

  • Supporter
  • 6128 Plus
  • *
  • Posts: 1.035
  • Country: gb
    • Awards
Yes, you're right. SP will increase as things are popped, so the addresses would be listed in the order you want them to run. Not sure why I swapped them round, must have been thinking about where the next mince pie was coming from... :)
like
0
No reactions