News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_Jean-Marie

Turrican (128K)

Started by Jean-Marie, 17:58, 12 April 25

Previous topic - Next topic

Koren Lesthe and 8 Guests are viewing this topic.

Jean-Marie

This is version 7 featuring a more forgiving Timer decremented every second. Hooray!

Koren Lesthe


dlfrsilver

Quote from: Jean-Marie on 15:31, 21 June 25I just had a look at that Timer, and there's something amiss indeed 🤔
Looking at the Atari video, it should be decremented every second, but it's a bit faster on the CPC.
For some unknown reason, Daren White didn't rely on Interrupts to clock the Timer. This would have given the best precision, since there are 300 interrupts per second. You just need to set a value to 300, decrement it on every interrupt, and decrement the Timer when that value reaches 0.
I might have a look into that, thanks for reporting it.
org &04c8
ld a,(&dea0)        ;;counter
and &07
jp nz,l04de
ld a,&03            ;;number of digits to display
ld bc,&c50f         ;;Timer string offset in VRAM
ld de,&01           ;;decrement step
ld hl,&ded8         ;;Timer value (24 bit)
call &c622          ;;Decrement function

Hi Jean-Marie,

I have a little request : could it be possible for you to speed up the scrolling of Turrican ? In order to match a little better the Amiga/C64 versions ? 

Cheers, and hats off for this great remaster ! 


Jean-Marie

Quote from: dlfrsilver on 21:41, 22 June 25could it be possible for you to speed up the scrolling of Turrican ?
Thank you @dlfrsilver. I did all that I could to have the game faster. For the scrolling, it used LDIR/LDDR instructions that I have unrolled, so it should be a tad faster, even if it's hardly noticeable. However, I'm not a scrolling expert, and maybe I missed obvious things.
The disassembled and (very poorly) commented code can be found in the Excel file enclosed. If someone can come up with improvements, feel free to participate.
The scrolling functions can be found in the Optimization tab at:
SCROLL UP: 124C
SCROLL DOWN:130E
SCROLL LEFT: 1372
SCROLL RIGHT: 12B7

Although, to be honest, I'd really like to close the Turrican chapter, as I've been playing it ad nauseam since February!

lightforce6128

Quote from: Jean-Marie on Yesterday at 00:46
Quote from: dlfrsilver on 21:41, 22 June 25could it be possible for you to speed up the scrolling of Turrican ?
For the scrolling, it used LDIR/LDDR instructions that I have unrolled, so it should be a tad faster, even if it's hardly noticeable. However, I'm not a scrolling expert, and maybe I missed obvious things.

@Jean-Marie : I am very impressed. Developing a program from scratch is one thing. But successfully redesigning an existing program (in machine code) with all its constraints and limitations, and even improving it, is a whole other level.

@dlfrsilver : If software scrolling is used for a game (like it is here), it will consume a big amount of computation time, probably more than anything else. Adapting this to hardware scrolling would need a redesign of big parts of the program. One example: In Turrican sprites are never erased. They are just overwritten by the next background. If the background is scrolled without being redrawn, each sprite will create a tail of unerased parts.

As far as I've seen, scrolling is done in a two-step process: First from the level map a screen-fitting tile map is copied. Then all tiles from this tile map are drawn to the screen. This second step looks like this:

ORG #0DA9

;; ... setting registers, initializing loops

;; This block copies one tile of 2x8 bytes to the screen buffer.
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
LDI : LDI : SUB C : LD D,A                 ;;  12
;;                                         ;; ---
;;                                         ;; 110


So one tile needs 110 NOPs, the surrounding loop structure needs additional 24 NOPs, and 32x16=512 tiles need to be drawn. This takes 68'608 NOPs or about 3.5 frames. With this approach, the frame rate will never be higher than 12.5 hz. And still, sprites, movement, music, game logic are missing.

Maybe somebody else sees something how this can be improved.

lightforce6128

#130
While experimenting with the Turrican tile drawing / scrolling, I had one idea to speed it up: Only draw half of the image, and later draw the remaining half. This somehow works, the drawing gets faster by one frame. But now every object is followed by some shadow object. Looks interesting, but does not really improve the game play. You can never be sure how many enemies are on screen.

ORG #0DA9

LD A,(#DE74) : CP A,#30            ;; Check which screen buffer should be updated.
                                   ;; This is not an address, but the value for CRTC register 12.
LD IX,#E72D : JP Z,$+6 : LD LX,#2F ;; Load address of list with screen row addresses to IX.
                                   ;; There are two entries per list row,
                                   ;; one for screen #8000 and one for #C000.

;; Update local frame counter.
LD A,0 :store_a : INC A : LD (store_a-1),A ;; 6 bytes

LD HL,(#C7F1) ;; Load address of tile map.
LD D,#60      ;; High byte of tile buffer address.

LD C,#10      ;; Number of rows.
row_loop:

    EXX
        LD E,(IX+#00)               ;; Load start address of screen row from list to DE'.
        LD D,(IX+#01)               ;;
        LD A,#04 : ADD LX : LD LX,A ;; Go to next address in list.
        LD B,#10                    ;; High byte of 2 scanline offsets in B'.
    EXX

    LD B,#20 ;; Number of chars.
    char_loop:

        XOR A : RLD           ;;  6 ;; Load upper 4 bits of tile number to A. Also shift tile number in (HL) by 4 bits.
        ADD D                 ;;  1 ;; Add tile buffer base address to get high byte of tile address in A.
        EX AF,AF' : LD A,(HL) ;;  3 ;; Load shifted tile number as low byte of tile address in A'
        INC HL                ;;  2 ;; Go to next tile in tile map.
        ;;                    ;; -- ;;
        ;;                    ;; 12 ;;

        EXX ;; 1

            LD L,A : EX AF,AF' : LD H,A ;; 3 ;; Transfer tile address from A/A' to HL.

            LD A,(store_a-1) : BIT 1,A : JR Z,even_frame ;;     4    ;;
            odd_frame:                                   ;;  ....... ;;
                LD C, #30 + 8                            ;;  2     . ;; High byte of 6 scanline offsets plus 8. This will be reduced by 8 LDIs.
                LD A,D                                   ;;  1     . ;; Store high byte of screen buffer address.
                JR registers_are_set                     ;;  3     . ;;
            even_frame:                                  ;;  .     . ;;
                LD C, #38 + 8                            ;;  .     2 ;; High byte of 7 scanline offsets plus 8. This will be reduced by 8 LDIs.
                INC L : INC L                            ;;  .     2 ;; Go to next tile line.
                LD A,D : ADD #08 : LD D,A                ;;  .     4 ;; Store high byte of next screen buffer address.
                JR registers_are_set                     ;;  .     3 ;;
            DEFS 1                                       ;; --    -- ;; Fill unused bytes.
            registers_are_set:                           ;; 10    15 ;; Average time: 12.5 NOPs

            ;; This block copies one tile of 2x8 bytes to the screen buffer.
            ;; It needs 60 NOPs. Loop overhead is additional 12+1+3+12.5+1+4 = 33.5 NOPs.
            ;; The loop is executed 32x16 = 512 times to show a single frame.
            ;; This takes  47'872 NOPs or ~2.4 frames.
            LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14 ;; Copy two bytes. Then go to next scanline.
            INC L : INC L                              ;;   2
            LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
            INC L : INC L                              ;;   2
            LDI : LDI : ADD B : LD D,A : DEC E : DEC E ;;  14
            INC L : INC L                              ;;   2
            LDI : LDI : SUB C : LD D,A                 ;;  12 ;; Copy the last two bytes. Go back to the first scanline.
            ;;                                         ;; ---
            ;;                                         ;;  60

        EXX ;; 1

    DJNZ char_loop ;; 4|3

    ;; The tile map is a bit bigger than the screen.
    ;; Skip 4 invisible tiles at the border.
    INC HL : INC HL : INC HL : INC HL

DEC C : JP NZ,row_loop

Jean-Marie

Wow, thanks for  your contribution, that's an interesting idea. I'll give it a try soon. You seem to have understood the code better than I did, bravo!
Also, I think there might be a mistake here:
LD A,(store_a-1) : BIT 1,A : JR Z,even_frameShouldn't it be Bit 0,A (or better: RRA), or have I misunderstood something?

dlfrsilver

Quote from: Jean-Marie on Yesterday at 00:46
Quote from: dlfrsilver on 21:41, 22 June 25could it be possible for you to speed up the scrolling of Turrican ?
Thank you @dlfrsilver. I did all that I could to have the game faster. For the scrolling, it used LDIR/LDDR instructions that I have unrolled, so it should be a tad faster, even if it's hardly noticeable. However, I'm not a scrolling expert, and maybe I missed obvious things.
The disassembled and (very poorly) commented code can be found in the Excel file enclosed. If someone can come up with improvements, feel free to participate.
The scrolling functions can be found in the Optimization tab at:
SCROLL UP: 124C
SCROLL DOWN:130E
SCROLL LEFT: 1372
SCROLL RIGHT: 12B7

Although, to be honest, I'd really like to close the Turrican chapter, as I've been playing it ad nauseam since February!

Thanks a lot ! :D 

I asked because some people said that it had the same scrolling speed as the original with no changes.

Egg Master

It's very interesting to see what can be done to improve scrolling. :)
Have you taken a look at the Turrican 2 display? It seems faster to me.

Jean-Marie

Quote from: Egg Master on Yesterday at 10:58Have you taken a look at the Turrican 2 display? It seems faster to me.
It could be due to the screen size, which is smaller. A lot of games use software scrolling, so it could be interesting to have a look, although Daren White did quite a good job. I'm not sure it can be noticeably improved without changing a lot of things.

lmimmfn

I'm just curious, are the input routines at the start or end of a frame? If not at the end could they be moved to the end to reduce any perceived lag? As in take input at end of frame so next frame if generated based on those latest inputs?
6128 for the win!!!

lightforce6128

Quote from: Jean-Marie on Yesterday at 09:59I think there might be a mistake here:
LD A,(store_a-1) : BIT 1,A : JR Z,even_frameShouldn't it be Bit 0,A (or better: RRA), or have I misunderstood something?

First I wanted to use bit 0. Then I recognized that the game uses double buffering. With checking bit 0, the first buffer will always show the first half, and the second buffer will always show the second half. But it is necessary to overwrite both halfs in each buffer. This is done by checking bit 1. Now the first buffer will show the first half, then the second buffer will also show the first half. After this, both buffers will show the second half.

Jean-Marie

Quote from: lmimmfn on Yesterday at 12:28are the input routines at the start or end of a frame?
The keyboard is checked by the Interrupt Service Routine once per frame, during the the Vertical Retrace. So 50 times per second. During that Time, the code will set CRTC registers for the Split screen effect, play the Music or SFX, check the keyboard inputs and modify the keyboard buffer accordingly, and change ink number 15 (for the water flow effect I think).
org &ce00
push af
push bc
push de
push hl
push ix
push iy
call &cf25               ;;set CRTC reg 12 & 13
ld a,i                         ;;interrupt counter
inc a
cp 6
jp c,lce16
xor a
lce16:
ld i,a
cp 5
jp nz,lcee1
jp @Player
DB &75,&DE
ld d,3               ;;3 channels
ld iy,&de76
lce29:              ;;SFX Player
ld a,(iy+1)
or a                   ;;cp a: turn off SFX
jp z,lceb0
ld c,(iy)
ld ixh,a
ld ixl,c
ld a,(ix)
or a
jp z,lce91
dec (ix)        
ld a,3
sub d
ld e,a
add a
ld c,(ix+1)
call &cf92
ld a,e
add a
inc a
ld c,(ix+2)
call &cf92
ld c,(ix+3)
ld a,6
call &cf92
ld a,e
add 8
ld c,(ix+4)
call &cf92
ld c,(iy+2)
ld a,(ix+3)
or a
jp z,lce74
ld c,(iy+3)
lce74:
ld a,(&de75)
and c
ld (&de75),a
ld l,(ix+1)
ld h,(ix+2)
ld c,(ix+5)
ld b,(ix+6)
add hl,bc
ld (ix+1),l
ld (ix+2),h
jr lceb0
lce91:
ld a,(&de75)
or (iy+4)
ld (&de75),a
ld a,(ix+7)
ld (ix+1),a
ld a,(ix+8)
ld (ix+2),a
ld a,(ix+9)
ld (ix),a
ld (iy+1),0
lceb0:
ld a,5
add iyl
ld iyl,a
dec d
jp nz,lce29
ld a,(&de75)
ld c,a
ld a,7                  ;;PSG Mixer register
call &cf92         ;;turn off AY channels
@SkipSFX:
call &ceec        ;;scan keyboard
ld a,(&1b2e)     ;;change ink #15
cp 2
ld bc,&4b1a
jp z,lced4
ld bc,&5502
lced4:
ld a,c
ld (&1b2e),a
ld a,b
ld bc,&7f0f
out (c),c
out (c),a
ld hl,&04C9        ;;decrement Timer count
dec (hl)
lcee1:
pop iy
pop ix
pop hl
pop de
pop bc
pop af
ei
ret

Jean-Marie

Yeah, the sprites are rendered with a "ghost" effect, but the scrolling is flawless indeed !
That's interesting nonetheless, and could pave the way to new ideas.
Thanks a lot for taking the time to explain us the inner working of the scrolling @lightforce6128 

lightforce6128

To avoid the ghost effect, I also tried another approach. Each tile is stored with 16 bytes in a table starting at #6000. If the space of four tiles is combined, the drawing code for this tile can be unrolled. Then each tile gets a small and fast drawing routine. This reduces the calculation time by almost 40%!

The drawbacks are:
  • The tiles need to be redesigned. The number shrinks from 256 to 64. The drawing routine is fast, but less flexible. This will require some fiddling.
  • The level maps need to be redesigned to make use of the reduced number of tiles.
  • Some tiles are animated (e.g. the parts of the energy beam). These animations need to be adapted to the new tile format (or deactivated).

Just for checking the speed I created a patch that will only use 32 tiles (those without animation) and fill them with random pixels. This reduces the calculation time from 3.5 frames to 2.2 frames. Now there are no longer any ghost effects, but the landscape looks a bit chaotic.

NOLIST

MACRO DRAW_NEXT_TILE
    ;;                        ;; NOPs ;; bytes ;;
    XOR A,A : LD L,A          ;;  1   ;;  1    ;; Clear flags. Set L to zero.
    LD A,(DE) : INC DE        ;;  4   ;;  2    ;; Load next tile number.
    ;;RRA : RR L : RRA : RR L ;;  6   ;;  6    ;; Shift tile number into tile address in HL.
        AND A,7 : DEFS 4      ;;      ;;       ;;     As current workaround: Mask tile number.
    ADD A,C : LD H,A          ;;  2   ;;  2    ;;
    JP (HL)                   ;;  1   ;;  1    ;; Execute tile.
    ;;                        ;;  --  ;;  --   ;;
    ;;                        ;;  15  ;;  13   ;;
ENDM



ORG #0DA9

LD C,#60                                          ;;  2 ;; High byte of tile buffer in C.
LD DE,(#C7F1)                                     ;;  6 ;; Start address of tile map in DE.
EXX                                               ;;  1 ;;
    LD DE, 4 * #800 - 2 * -1                      ;;  3 ;; Offset to next char in DE'.
    XOR A,A : LD L,A                              ;;  2 ;; Set low byte of screen buffer to zero in L'.
    LD A,(#DE74) : RLA : RLA : XOR A,#40 : LD H,A ;;  9 ;; Convert CRTC value of visible screen to high byte of hidden screen buffer in H'.
EXX                                               ;;  1 ;;
;;                                                ;; -- ;;
;;                                                ;; 24 ;;

row_loop:
    LD B,#20                          ;;  2   ;; Number of chars in B.
    DRAW_NEXT_TILE                    ;;  -   ;; Count NOPs in (last) tile, not here.
        DEFS 127-46                   ;;  -   ;; Fill unused bytes.
    return_to_row_loop:               ;;  -   ;;
    INC DE : INC DE : INC DE : INC DE ;;  8   ;; Skip 4 invisible tiles at the border.
EXX : BIT 2,H : EXX : JR Z,row_loop   ;;  7-1 ;; If not 16 rows have been drawn, continue loop.
;;                                    ;; ---- ;;
;;                                    ;; 17-1 ;; This is executed 16 times: (16*17)-1 = 271



ORG #6000

;; Define random pixel bytes.
LET a_value = #00
LET bc_value = #0000

REPEAT 32

    ;; INFO: There are not only 32 possible tiles, but 64. But the second half
    ;; is updated with animation. Currently this is not compatible with the
    ;; below unrolled code. So only the first 32 tiles can be used.

    ;; The following is the code to draw one tile. Besides unrolled drawing commands
    ;; it also contains some loop code. Each tile automatically fetches the next one.
    ;;                                          ;; NOPs  ;; bytes ;;
    EXX                                         ;;  1    ;;  1    ;;
        LD A,a_value : LD BC,bc_value           ;;  5    ;;  5    ;; Set up often used bytes.
        LD (HL),A : INC L : LD (HL),C : SET 3,H ;;  7    ;;  5    ;; Draw two bytes.
        LD (HL),B : DEC L : LD (HL),A : SET 4,H ;;  7    ;;  5    ;; ...
        LD (HL),C : INC L : LD (HL),B : RES 3,H ;;  7    ;;  5    ;;
        LD (HL),A : DEC L : LD (HL),C : SET 5,H ;;  7    ;;  5    ;;
        LD (HL),B : INC L : LD (HL),A : SET 3,H ;;  7    ;;  5    ;;
        LD (HL),C : DEC L : LD (HL),B : RES 4,H ;;  7    ;;  5    ;;
        LD (HL),A : INC L : LD (HL),C : RES 3,H ;;  7    ;;  5    ;;
        LD (HL),B : DEC L : LD (HL),A           ;;  5    ;;  3    ;; Draw the last two bytes.
        ADD HL,DE                               ;;  3    ;;  1    ;; Go to the next char.
    EXX                                         ;;  1    ;;  1    ;;
    DJNZ $+5 : JP return_to_row_loop            ;;  4+2  ;;  5    ;; Continue or leave the loop.
    DRAW_NEXT_TILE                              ;;  15   ;;  13   ;; Draw the next tile.
    ;;                                          ;;  --   ;;  --   ;;
    ;;                                          ;;  83+2 ;;  64   ;;

    ;; Now the tiles need four times the space they needed before. But drawing is faster by 37%.
    ;; Drawing one tile with additional loop code takes 83 NOPs. All 32x16 = 512 tiles
    ;; will need 42'496 NOPs or ~2.2 frames.

    ;; Update random pixel bytes.
    LET a_value = a_value + 83 AND 255
    LET bc_value = bc_value + 7901
   
REND

Jean-Marie

This looks great, thanks for sharing your knowledge! Although, yeah, redesigning the level maps would be quite heavy. I need to think about it.

kawickboy

An Amsdos release. Thanks. There is a turn disk message so ?

OneVision

@jmb11 @lightforce6128 I could give a try with level 1 and 64 tiles max. What kind of file would be needed ? A Tiled file with a .tmx and a .tsx ?

Jean-Marie

Quote from: kawickboy on Today at 07:22An Amsdos release. Thanks. There is a turn disk message so ?
Not exactly : you need to turn the disk when the border becomes white.

Axelay

Not sure if this will be any use, but a few years ago I looked at trying to improve the background character printing routine in Turrican, and produced a bit of test code with reformatted characters (which I've attached) This only improved the speed by about half a screen refresh though, and it revealed that I'd need to spend a lot more time on it than I was prepared to spend to actually integrate it, so I didn't take it any further.  It would need to be used in conjuction with other improvements to be useful.

The approach was to alter the format of the character data in a way that meant there was only the need of a single set or reset to traverse the character on screen, and also to 'interleave' the character data to reduce the calculation done to find the character data.  This is 20nops faster per character between both the print and locating the char address in memory.

Here's the replacement code:

org #0dbf ; 128k v7
exx
ld e,(ix+#00)
ld d,(ix+#01)
;ld bc,#0004
;add ix,bc
exx
.l0dc0
ld b,#20
.l0dcf
;xor a
;rld
;add d
;ex af,af'
ld a,(hl)
inc hl
exx
ld h,&60
add a,a
ld l,a
jr nc,skip
set 3,h
.skip
;ex af,af'
;ld h,a
;ld a,d
ldi
ld a,(hl)
ld (de),a
set 3,d
inc h
ldd
ld a,(hl)
ld (de),a
set 4,d
inc h
ldi
ld a,(hl)
ld (de),a
res 3,d
inc h
ldd
ld a,(hl)
ld (de),a
set 5,d
inc h
ldi
ld a,(hl)
ld (de),a
set 3,d
inc h
ldd
ld a,(hl)
ld (de),a
res 4,d
inc h
ldi
ld a,(hl)
ld (de),a
res 3,d
inc h
ldd
ld a,(hl)
ld (de),a
res 5,d
inc e
inc e
;jr nz,skip2
;inc d
;skip2
exx
djnz l0dcf
exx
ld a,e
or a
jr nz,skip2
inc d
.skip2
exx
inc hl
inc hl
inc hl
inc hl
dec c
jp nz,l0dc0
defs 9


I've attached a zip that contains that source as well the reformatted characters so you can try it in the game (the v7 posted earlier).  You will need to set a breakpoint somewhere outside the screen update routine (say at &451 is where it is called) then compile CharPrint_Test_v7.asm and Level1Chars_Test.asm to memory.  Being half a frame faster doesn't really produce much of tangible benefit.  In theory, it just raises the bar for it slowing down from 10fps to 8fps.  One of the problems revealed if you try it is that there is code in the game rewriting the flashing characters regularly, so that corrupts some of the background chars in the new format, so would need to be modified for this to be any use.

The zip also contains the routine I used to re-order the character data.  It is not optimized at all, but might help explain the reformatting.

Powered by SMFPacks Menu Editor Mod