News:

Printed Amstrad Addict magazine announced, check it out here!

Main Menu
avatar_ervin

32-character-width screen mode

Started by ervin, 15:11, 01 August 13

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ervin

Hi everyone.

I'm almost certain I've seen code before that puts the CPC's screen into "spectrum mode".
I'm not talking about 32x32, but about 32x24 (but 32x25 would be better).

Can anyone help me out with some code to enable this mode?

Also (possible dumb question alert!!!) does changing to such a mode free up some memory?
Or are we still dealing with 16K worth of screen to update? (i.e. is that narrower screen still scattered through RAM from &c000 onwards?)

Thanks for any pointers.

ralferoo

Quote from: ervin on 15:11, 01 August 13
I'm almost certain I've seen code before that puts the CPC's screen into "spectrum mode".
I'm not talking about 32x32, but about 32x24 (but 32x25 would be better).

out &bc01,32    : rem 32 characters wide
out &bc02,&2a  : rem horizontal position

out &bc06,24    : rem 24 characters tall

Sykobee (Briggsy)

#2
You'll want to read up on the CRTC: CRTC - CPCWiki and how in the CPC it generates memory addresses (handy table on The 6845 Cathode Ray Tube Controller (CRTC) that should be on the wiki page)


In particular, register 1 (displayed width) and register 6 (displayed height) to set the size, and then register 2 and register 7 to adjust where it appears on the screen.
(see Ralfaroo's comment above to see values to use).

It does free up memory (as a 32x24 display uses 12KB of screen memory exactly, rather than the 16000 bytes the normal CPC screen uses), but not in a single contiguous block due to how the CPC arranges its screen memory.

A standard CPC screen has 384 bytes of the 16KB memory free, but they're free in 8 different 48 byte locations in that memory due to the screen layout (see the spare start/end here: http://www.cpcmania.com/Docs/Programming/Painting_pixels_introduction_to_video_memory.htm). You can see from this that reducing the screen height to 24 characters frees up a bit more memory (an extra 80 bytes) in each of those 8 locations.

This new mode similarly scatters the display memory throughout the 16KB block, but you only need to update 12KB of it to update what is shown on the screen. Reducing the screen width to 32 characters (64 bytes) means that each MemoryRow* in the screen display is now only 64*24 bytes (1536 bytes) rather than 80*24 bytes (1920 bytes). This gives you 8 areas of 512 bytes that you can use for whatever you want - for example graphics storage, lookup tables, etc.


Unless you start scrolling the screen using the CRTC.  I'll leave that for someone who knows more about that aspect than me.


* erm, a MemoryRow is actually 24 characters rows of 32 characters, one pixel height of each, due to screen memory layout / CRTC/ULA cleverness - I need a better name! Sorry!

TFM

Quote from: ralferoo on 16:18, 01 August 13

out &bc01,32    : rem 32 characters wide
out &bc02,&2a  : rem horizontal position

out &bc06,24    : rem 24 characters tall

That's wrong!
Try this:


10 MODE 1
20 OUT &BC01,&01
30 OUT &BD20,&20:REM 32 chars
40 OUT &BC02,&02
50 OUT &BD2A,&2A:REM H-Pos.
60 OUT &BC06,&06
70 OUT &BD19,&19:REM 25 lines


That works ;-) Have fun!
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

ralferoo

Quote from: TFM/FS on 18:52, 01 August 13
That's wrong!
Yeah, my bad. That was comes of writing stuff from memory without thinking first!  :-[

TFM

Ha-ha! Happens when mixing Basic and Assembler in mind. You definitely work too much.  :)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

ervin

#6
Sykobee - thanks for all that information! Absolutely brilliant.

Ralferoo and TFM/FS - thanks for the code guys. Much appreciated.

[EDIT #1] Actually I reckon I might use the 32x32 mode, and have the top and bottom borders as info panels.  8)

[EDIT #2] Hmmm, that's weird.
I can write data to the first 31 character rows.
But when I write to the last character row, the CPC crashes.



In this screenshot, the last character row has not had anything written to it at all.
All the other rows have had something put into them.

Is there some important stuff in that area of RAM which must not be overwritten?

ralferoo

Quote from: ervin on 00:08, 02 August 13
[EDIT #2] Hmmm, that's weird.
I can write data to the first 31 character rows.
But when I write to the last character row, the CPC crashes.

Is there some important stuff in that area of RAM which must not be overwritten?
No, all of the default screen memory space (&C000-&FFFF) is available. The normal sized screen uses 16000 bytes of the 16384, so there's not much unused anyway. Those spare bytes aren't allocated though, because when the screen scrolls normally the start offset of the screen memory changes and it wraps around.

It's most likely your calculations are wrong and you're writing beyond &FFFF and overwriting the important stuff from &0000 onwards. Perhaps you're counting lines from 0 and trying to draw to line 32 (which is actually the 33rd line)?

In a 32 character mode, the starting screen addresses for each character line are very easy to calculate: c000 (line 0),c040 (line 1),c080,c0c0,c100,...c7c0 (line 31) and then you add a multiple of 0800 to that for each pixel line. If you consider the last pixel line of the last character row with this scheme, the first character's address is ffc0, so clearly still within the screen memory range.

arnoldemu

1024 total characters, 2 bytes per character in width, gives &800 for each line.

32x32 = 1024.

So at 32x32 you're using maximum screen capacity - all 16384 bytes.

Perhaps you are overwriting the last line by a few bytes?
My games. My Games
My website with coding examples: Unofficial Amstrad WWW Resource

ervin

Thanks for the info guys. Indeed I'm no doubt just overwriting memory location &0000. Just gotta figure out where!

redbox

#10
Also, don't forget when using a 32x32 screen on the CPC that it becomes page-aligned.  If you also page-align your sprites (or whatever you want to draw) you only need to do 8-bit increments (e.g. INC L instead of INC HL).

So when you're plotting to the screen, you can simply use code like LD A,(HL) : LD (DE),A : INC L : INC E .  You can also use the SET/RES commands to calculate the screen lines.

All of this adds up to a very fast sprite routine, for example with a 8x8 sprite you would do something like this:


; HL = sprite data (page-aligned)
; DE = screen address (page-aligned)

ld a,(hl) : ld (de),a : inc l : inc e    ; line 1
ld a,(hl) : ld (de),a : inc l

set 3,d

ld a,(hl) : ld (de),a : inc l : dec e    ; line 2
ld a,(hl) : ld (de),a : inc l

set 4,d

ld a,(hl) : ld (de),a : inc l : inc e    ; line 4
ld a,(hl) : ld (de),a : inc l

res 3,d

ld a,(hl) : ld (de),a : inc l : dec e    ; line 3
ld a,(hl) : ld (de),a : inc l

set 5,d

ld a,(hl) : ld (de),a : inc l : inc e    ; line 7
ld a,(hl) : ld (de),a : inc l

set 3,d

ld a,(hl) : ld (de),a : inc l : dec e    ; line 8
ld a,(hl) : ld (de),a : inc l

res 4,d

ld a,(hl) : ld (de),a : inc l : inc e    ; line 6
ld a,(hl) : ld (de),a : inc l

res 3,d

ld a,(hl) : ld (de),a : inc l : dec e    ; line 5
ld a,(hl) : ld (de),a


Note that the screen lines are in a non-linear format so you will need to also store your sprite data in the same way.  But you can rip a sprite easily by just swapping the registers over on the routine above.

It becomes even quicker to blank the sprite - just LD A,0 once at the beginning and then each line can simply be LD (DE),A : INC E , which is very quick.  Also, using a mask is quick if you use a look-up table - I can post an example of this if you need it.

ralferoo

Quote from: redbox on 20:18, 02 August 13

; HL = sprite data (page-aligned)
; DE = screen address (page-aligned)

ld a,(hl) : ld (de),a : inc l : inc e    ; line 1
ld a,(hl) : ld (de),a : inc l : inc e
...

If you don't mind corrupting BC, it's actually quicker to do LDI in this case (5 cycles instead of 6 per byte)...  ;D

redbox

Quote from: ralferoo on 20:38, 02 August 13
If you don't mind corrupting BC, it's actually quicker to do LDI in this case (5 cycles instead of 6 per byte)...  ;D

Interesting...!

Wouldn't work on the DEC E rows though (and you'd have to DEC E again before starting the line).

Also, this would be specifically if you are using the one sprite routine.  If you're compiling your sprites, you could pre-load B and C with the most common bytes and then LD A,B : LD (DE),A which would be quicker...?

TFM


In your code you do SET and RES the register E. Shouldn't it be D?



Quote from: redbox on 20:18, 02 August 13
Also, don't forget when using a 32x32 screen on the CPC that it becomes page-aligned.  If you also page-align your sprites (or whatever you want to draw) you only need to do 8-bit increments (e.g. INC L instead of INC HL).

So when you're plotting to the screen, you can simply use code like LD A,(HL) : LD (DE),A : INC L : INC E .  You can also use the SET/RES commands to calculate the screen lines.

All of this adds up to a very fast sprite routine, for example with a 8x8 sprite you would do something like this:


; HL = sprite data (page-aligned)
; DE = screen address (page-aligned)

ld a,(hl) : ld (de),a : inc l : inc e    ; line 1
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l

set 3,e

ld a,(hl) : ld (de),a : inc l : dec e    ; line 2
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l

set 4,e

ld a,(hl) : ld (de),a : inc l : inc e    ; line 4
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l

res 3,e

ld a,(hl) : ld (de),a : inc l : dec e    ; line 3
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l

set 5,e

ld a,(hl) : ld (de),a : inc l : inc e    ; line 7
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l

set 3,e

ld a,(hl) : ld (de),a : inc l : dec e    ; line 8
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l

res 4,e

ld a,(hl) : ld (de),a : inc l : inc e    ; line 6
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l : inc e
ld a,(hl) : ld (de),a : inc l

res 3,e

ld a,(hl) : ld (de),a : inc l : dec e    ; line 5
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a : inc l : dec e
ld a,(hl) : ld (de),a


Note that the screen lines are in a non-linear format so you will need to also store your sprite data in the same way.  But you can rip a sprite easily by just swapping the registers over on the routine above.

It becomes even quicker to blank the sprite - just LD A,0 once at the beginning and then each line can simply be LD (DE),A : INC E , which is very quick.  Also, using a mask is quick if you use a look-up table - I can post an example of this if you need it.
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

ralferoo

Quote from: redbox on 21:08, 02 August 13
Also, this would be specifically if you are using the one sprite routine.  If you're compiling your sprites, you could pre-load B and C with the most common bytes and then LD A,B : LD (DE),A which would be quicker...?
If you're going to be doing that kind of thing, you should probably not use LDI, and swap HL and DE. That way, you're always loading A from (DE), but can write to (HL) with any register.

BTW, if you want the absolute quickest method, disable interrupts and use PUSH instead. A 16-bit load and a PUSH is 7 cycles for 2 bytes, so about the quickest you'll manage. This is getting somewhat off topic for the original question now though... ;)

redbox

#15
@TFM - yes you're right, good spot, will correct it.

@ralferoo - yes I remember swapping the regs but am away from my source code at the moment so am relying on my memory...!  I've yet to try out the stack method but am looking forward to it ;)

ervin

Thanks again for all the info everyone! This has become a very useful thread, with some great tips all in one spot.  :)

Axelay

Quote from: redbox on 21:08, 02 August 13
Interesting...!

Wouldn't work on the DEC E rows though (and you'd have to DEC E again before starting the line).



When using LDI lists for sprites on a page aligned screen, if you are jumping to pixel lines within a character as with your example using SET and RESet, then an alternative to DEC E is to LD A,E initially, and then just LD E,A every time you use SET or RESet.


On the topic of a 12k screen though, you could always use characters set to 6 pixel lines.
org &8000

; Set CRTC.R9 to 6-1 scan lines high
ld bc,&BC09
out (c),c
ld bc,&BD05
out (c),c

; set hsync position
ld bc,&bc02
out (c),c
ld bc,&bd00+42
out (c),c

;; set display width of screen
ld bc,&bc01
out (c),c
ld bc,&bd00+32
out (c),c

; set vsync position
ld bc,&bc07
out (c),c
ld bc,&bd00+40
out (c),c

;; set display height of screen
ld bc,&bc06
out (c),c
ld bc,&bd00+32
out (c),c

;; set height of screen
ld bc,&bc04
out (c),c
ld bc,&bd00+51
out (c),c

ret



That gives a 12k screen consuming &c000-&efff, with the single 4k block from &f000-&ffff free to use.  At 32 six pixel characters in height, it's the equivalent of a 24 character high screen, and 32 characters wide.  The only downside with this is if you've got sprites or background tiles that work on the basis of 8 pixel high characters, then the screen addressing becomes very untidy.

TFM

Quote from: redbox on 00:16, 03 August 13
@TFM - ....
Your code snippet is actually a perfect example. And it shows what can be done with some smart ideas on the CPC.  :)
TFM of FutureSoft
Also visit the CPC and Plus users favorite OS: FutureOS - The Revolution on CPC6128 and 6128Plus

Gryzor

QuoteI'm not talking about 32x32, but about 32x24 (but 32x25 would be better).


Better? How? Why? WHY?


:D

ervin

#20
Quote from: Gryzor on 17:23, 04 August 13

Better? How? Why? WHY?

:D

Because it are MORE BIGGER!!!  :P

Gryzor


cpcitor

Quote from: redbox on 20:18, 02 August 13

All of this adds up to a very fast sprite routine, for example with a 8x8 sprite you would do something like this:


; HL = sprite data (page-aligned)
; DE = screen address (page-aligned)

ld a,(hl) : ld (de),a : inc l : inc e    ; line 1
ld a,(hl) : ld (de),a : inc l

set 3,d



Warning: the innocent-looking set instructions are actually among the slowest instructions that the Z80 has. It's as slow as no less than 7 NOPs !

For the duration of 4 NOPs you can do this:


ld a,d
add a,8
ld d,a


Refer to the cheat sheet I compiled from Kevin Thacker's data on Craving for speed ? A visual cheat sheet to help optimizing your code to death.

Cheers,
Had a CPC since 1985, currently software dev professional, including embedded systems.

I made in 2013 the first CPC cross-dev environment that auto-installs C compiler and tools: cpc-dev-tool-chain: a portable toolchain for C/ASM development targetting CPC, later forked into CPCTelera.

redbox

Quote from: cpcitor on 23:16, 29 October 13
Warning: the innocent-looking set instructions are actually among the slowest instructions that the Z80 has. It's as slow as no less than 7 NOPs !
For the duration of 4 NOPs you can do this:

ld a,d
add a,8
ld d,a


SET takes 5us.

LD A,D : ADD A,8: LD D,A takes 7us.

Bruce Abbott

Quote from: redbox on 23:26, 30 October 13
SET takes 5us.

LD A,D : ADD A,8: LD D,A takes 7us.
According to cpcitor's timing chart SET 3,D takes 2us, while LD A,D : ADD A,8: LD D,A takes 4us.

WinAPE agrees, but I wasn't confident that it was accurate. So I wrote a little test program and ran it on a real 6128. This program executes each instruction sequence 10 million times, which takes enough time to be accurately measured with a wall clock.

The results (accurate to 0.1us):-

NOP 1.0us
SET 3,D 2.0us
LD A,D : ADD A,8: LD D,A 4.0us 

I was pleasantly surprised to find that the timing was identical when running under WinAPE.

Below is the raw data, and my source code:-

Empty loop:- 2 seconds
NOP:- 12 seconds
SET 3,D:- 22 seconds
LD A,D : ADD A,8: LD D,A:- 42 seconds;==============================================
;    CPC machine code execution time test
;==============================================
; 20 * 250 * 200 * 10 = 10,000,000 (10 million) iterations
; Empty loop takes 2 seconds
; Therefore:- us per iteration = (seconds-2)/10
;
org #4000
write "test3.bin"
Test1:
di
ld b,10
loopa:
push bc
ld c,200
loopc:
ld b,250
loopb:
; put test code here 20 times!
djnz loopb
dec c
jr nz,loopc
pop bc
djnz loopa
finish:
ei
ret

Test2:
di
ld b,10
loopa1:
push bc
ld c,200
loopc1:
ld b,250
loopb1:

set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d

set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d
set 3,d

djnz loopb1
dec c
jr nz,loopc1
pop bc
djnz loopa1
ei
ret


Test3:
di
ld b,10
loopa2:
push bc
ld c,200
loopc2:
ld b,250
loopb2:
ld a,d
add a,8  ; 1
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8   ; 10
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8
ld d,a

ld a,d
add a,8    ; 20
ld d,a

djnz loopb2
dec c
jr nz,loopc2
pop bc
djnz loopa2
ei
ret



Powered by SMFPacks Menu Editor Mod