News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Fessor

YAZD - Yet Another Z80 Disassembler

Started by Fessor, 15:40, 17 February 16

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Fessor

Searching for a reassembler which can lable a dissassembly i stumbled upon this

GitHub - toptensoftware/yazd: Yet Another Z80 Disassembler

QuoteYAZD is a simple command line disassembler for Z80 binary code files.  It's based on the disassembler in z80ex, ported to C#.
YAZD supports the following:

       
  • Disassembly of all Z80 instructions, as supported by z80ex.
  • Code path analysis can usually tell the difference between code and data.
  • Generates labelled assembly language listings.
  • Can also generate more detailed listing files with byte code and assembly source.
  • Can detect procedure boundaries and generate call graphs
  • Can generate reference listings to all external addresses and I/O ports.
  • Can highlight all word literals (use to help find other memory address references).
  • Can generate plain text, or hyperlinked HTML output files.
  • Handles references to addresses not aligned with instruction (eg: self modifying code)
  • Data segments are listed 1 DB byte per line with ASCII character in comments.

....

Example Output Typical source mode output.  Note that YAZD is normally smart enough to tell the difference between local labels and procedure boundaries.  This helps divide up the listing making it easier to understand.
ORG     0900h

        ; Entry Point
        ; --- START PROC L0900 ---
L0900:  LD      A,0Ah
        OUT     (0Ch),A
        LD      A,2Fh           ; '/'
        OUT     (0Dh),A
        LD      HL,0F800h
        LD      C,80h
L090D:  LD      E,C
        LD      D,03h
L0910:  XOR     A
        BIT     0,E
        JR      Z,L0917
        OR      0F0h
L0917:  BIT     1,E
        JR      Z,L091D
        OR      0Fh
L091D:  LD      B,05h
        LD      (HL),A
        INC     HL
        DJNZ    L091F   
        RRC     E
        RRC     E
        DEC     D
        JR      NZ,L0910
        LD      (HL),A
        INC     HL
        INC     C
        JR      NZ,L090D
        JP      L1042

        ; --- START PROC L0932 ---
L0932:  PUSH    BC
        LD      C,A
        LD      B,A
        LD      A,12h
        OUT     (0Ch),A
        LD      A,B
        RRCA
        RRCA
        RRCA
        RRCA

I think with minimum effort it could be adapted so that the output is more maxam-friendly.


Alcoholics Anonymous

#1
Quote from: Fessor on 15:40, 17 February 16
Searching for a reassembler which can lable a dissassembly i stumbled upon this

GitHub - toptensoftware/yazd: Yet Another Z80 Disassembler

I think with minimum effort it could be adapted so that the output is more maxam-friendly.


It seems to be a pretty good disassembler.

I gave it a quick go on a sudoku solver for cpm:  sudoku-cpm.zip - Google Drive

The solver is written in C and the assembler generated by the compiler is in "sudoku.opt".  Scroll down to the first translated function "_mark" in "sudoku.opt" on line 647.  This corresponds to the disassembly in "out.html" beginning at label "l0e9f" (do a text search to find it).  You can compare side-by-side.  A little further down in the disassembly and you will see it correctly identified the "count" array and listed it as defb.

The disassembly was generated with this:

yazd sudoku.com --addr:0x0100 --entry:0x0100 --html > out.html

The complete code is disassembled by yazd which includes all the library code and drivers that become part of the compiled C program.  I had a look through that too.

The first thing the program does is make sure it's running on a z80:


    ORG    0100h

    ; Entry Point
    ; --- START PROC L0100 ---
L0100:    sub    a
    jp    po,L0118
    ld    c,09h
    ld    de,010Dh
    call    L0299
    rst    0x00

L010D:    DB    7Ah    ; 'z'
    DB    38h    ; '8'
    DB    30h    ; '0'
    DB    20h    ; ' '
    DB    6Fh    ; 'o'
    DB    6Eh    ; 'n'
    DB    6Ch    ; 'l'
    DB    79h    ; 'y'
    DB    0Dh
    DB    0Ah
    DB    24h    ; '$'

L0118:    ld    sp,(0006h)


It correctly disassembles the embedded text string.

Then I spotted a few places that were not disassembled, example:


    dec    hl
    pop    de
    ret

L0258:    DB    0E5h
    DB    5Eh    ; '^'
    DB    23h    ; '#'
    DB    56h    ; 'V'
    DB    0EBh
    DB    09h
    DB    0EBh
    DB    23h    ; '#'
    DB    7Eh    ; '~'
    DB    23h    ; '#'


These places were driver code or code only called by drivers.  Driver code is only called via function pointers so a static code analysis is not going to be able to identify driver code and that's why it's missed here.  yazd allows you to add more "--entry" points to tell it to that some block of code is executed and should be disassembled.  I think a few of these added and it would manage to disassemble the rest of the driver-related code.

A couple of other interesting things:


L0651:    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'
    DB    77h    ; 'w'
    DB    23h    ; '#'

    ; --- START PROC L0669 ---
L0669:    ld    (hl),a
    inc    hl
    ld    (hl),a
    inc    hl
    ld    (hl),a
    inc    hl
    ld    (hl),a
    inc    hl
    ret


This is actually a long subroutine that initializes variable-size data structures:


PUBLIC l_setmem_hl

   ; write byte to buffer pointed at by hl
   ; invoke with "call l_setmem_hl - (n*2)" to write n bytes to memory
   ;
   ; enter : hl = char *buffer
   ;          a = fill byte
   ;
   ; exit  : hl = char *buffer (one byte past last byte written)
   ;
   ; uses  : hl

   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl

   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl

   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl

   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl
   ld (hl),a
   inc hl

l_setmem_hl:

   ret


It's intended to be called like this "call l_setmem_hl - n*2" where "n" is the number of bytes you want initialized.  Having this one subroutine for all data structure initialization saves on code size.

The interesting thing about the disassembly is it's showing what part of this function is actually used.  The "ld (hl),a; inc hl" parts not used remain as defb.

(sorry for the change of font here.. editor is confused)

Another interesting thing:


    jp    L0817

L0817:    ld    iy,(L18CA)
    jp    L081E

L081E:    call    L09E8


The linker has placed a routine next to another that jumps into it.  This is a consequence of the separation of C interface from asm implementation where the C interface code will jump into the asm implementation after registers are set up.  It's intended in the future that the linker will automatically get rid of these superfluous jumps but in the meantime, the thought occurred to me that this example and the last one with unreferenced code show that a disassembly followed by a tool that gets rid of superfluous code, followed by re-assembly might result in smaller code. Just a thought :)

The only downside is it's written in C# which means it isn't easily portable.


Edit:  adding "--lst" generated a few more tidbits.

A procedure reference count:


Procedures (64): Proc  Length  References Dependants
  L0137  0006            7          0
  L04C3  0005            7          0
  L080F  0144            7         15


I just listed the three functions tied for most references.  A little cross-reference with the map file generated by the compiler and we find out they correspond to:  p_list_next (return next item in linked list), error_mc (return -1 and set carry flag) and printf respectively.

The call graph shows the most deeply nested subroutines by tabbing them the furthest to the right:


Call Graph:
Call Graph: L0100 - Entry Point L0299 0005h - External L05B7 L0634 L02A5 L059E L15E2 L01C2 L04CC L080F L0D65 L0A8F L09E8 L0A14 L04C3 L059E...


In the above, "L15E2" corresponds to main().

I think these two bits of information would be a poor man's profiler in that it would suggest which functions are most likely to benefit the program if optimized.  Functions that are only referenced once you could even consider inlining.


I'm not into disassembling so I'm not sure how it compares to other popular disassemblers like skoolkit and dz80.

Powered by SMFPacks Menu Editor Mod