Changes

Jump to: navigation, search

Programming:Integer Multiplication

3,243 bytes added, 07:29, 24 October 2015
a new fast multiplication routine added
</pre>
== FasterFastest, accurate 8bit * 8bit Unsigned with 16 KB tables ==
'''Input:''' L = Multiplier, C = Multiplicand
</pre>
16K is a lot of memory to use for tables== Fast, but I'm working on a way to reduce this to 8K while maintaining similar performance (around 26&mu;s). The idea is rather than using accurate 8bit * 8bit Unsigned with 8 KB tables for the low and high nibbles, use tables for alternate bits. So there would be 16 tables for the values #00, #01, #04, #05, #10, #11, #14, #15, #40, #41, #44, #45, #50, #51, #54, #55. The values can be shifted by 1 (rather than 4) to give values for the alternate bits using a simple ADD HL,HL.==
This routine should be easy to adapt to a 16bit * 8bit unsigned multiplication or even a 16bit * 16bit'''Input:''' L = Multiplier, using simple additions.C = Multiplicand
[[User'''Output:Executioner|Executioner]] 03:58, 8 May 2007 (CEST)''' DE = Product
'''CPC Cycles:''' 100 = 25 usec
'''Size:''' 19 bytes of code + 8 KB tables
 
16K is a lot of memory to use for tables, but I'm working on a way to reduce this to 8K while maintaining similar performance (around 26&mu;s). The idea is rather than using tables for the low and high nibbles, use tables for alternate bits. So there would be 16 tables for the values #00, #01, #04, #05, #10, #11, #14, #15, #40, #41, #44, #45, #50, #51, #54, #55. The values can be shifted by 1 (rather than 4) to give values for the alternate bits using a simple ADD HL,HL.
 
This routine should be easy to adapt to a 16bit * 8bit unsigned multiplication or even a 16bit * 16bit, using simple additions.
 
[[User:Executioner|Executioner]] 03:58, 8 May 2007 (CEST)
Ok, here is the new 8K routine, first the routine to build the tables:
'''Output''': DE = Product
'''CPC Cycles''': 104-112 (108 on average) = 26-28 (27)usec
'''Size''': 24 bytes of code and 1024 bytes for the tables
[[User:Executioner|Executioner]] 06:25, 4 April 2008 (CEST)
 
 
'''Output''': A:E = Product
 
'''CPC Cycles''': 136-172 (154 on average) = 26-28 (27) usec
 
'''Size''': 44 bytes of code and 512 bytes for the tables
 
Fast 8 bit unsigned multiplication with 16 bit result. It uses formula
<pre>
x*y = ((x+y)/2)^2 - ((x-y)/2)^2, if x+y is even
= ((x+y-1)/2)^2 - ((x-y-1)/2)^2 + y, if x+y is odd and x>=y
</pre>
 
<pre> cp b
jr nc,l1
 
ld e,a
ld a,b
ld b,e
 
l1 ld c,a
sub b
rra
ld d,a
ld a,c
add a,b
rra
ld l,a
ld h,high(sqrlo)
ld a,(hl)
ld e,l
ld l,d
jr nc,l2
 
sub (hl) ;odd
ld l,e
ld e,a
inc h ;loads high(sqrhi)
ld a,(hl)
ld l,d
sbc a,(hl)
ld d,a
 
ld a,e
add a,b
ld e,a
ld a,d
adc a,0
ret
 
l2 sub (hl) ;even
ld l,e
ld e,a
inc h
ld a,(hl)
ld l,d
sbc a,(hl)
ret
 
sqrlo ;low(x*x) should be at the page border
db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1
db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1
db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1
db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81
db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61
db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41
db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21
db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1
db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1
db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1
db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1
db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81
db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61
db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41
db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21
db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1
sqrhi ;high(x*x)
db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
db 1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3
db 4,4,4,4,5,5,5,5,6,6,6,7,7,7,8,8
db 9,9,9,$a,$a,$a,$b,$b,$c,$c,$d,$d,$e,$e,$f,$f
db $10,$10,$11,$11,$12,$12,$13,$13,$14,$14,$15,$15,$16,$17,$17,$18
db $19,$19,$1a,$1a,$1b,$1c,$1c,$1d,$1e,$1e,$1f,$20,$21,$21,$22,$23
db $24,$24,$25,$26,$27,$27,$28,$29,$2a,$2b,$2b,$2c,$2d,$2e,$2f,$30
db $31,$31,$32,$33,$34,$35,$36,$37,$38,$39,$3a,$3b,$3c,$3d,$3e,$3f
db $40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$4a,$4b,$4c,$4d,$4e,$4f
db $51,$52,$53,$54,$55,$56,$57,$59,$5a,$5b,$5c,$5d,$5f,$60,$61,$62
db $64,$65,$66,$67,$69,$6a,$6b,$6c,$6e,$6f,$70,$72,$73,$74,$76,$77
db $79,$7a,$7b,$7d,$7e,$7f,$81,$82,$84,$85,$87,$88,$8a,$8b,$8d,$8e
db $90,$91,$93,$94,$96,$97,$99,$9a,$9c,$9d,$9f,$a0,$a2,$a4,$a5,$a7
db $a9,$aa,$ac,$ad,$af,$b1,$b2,$b4,$b6,$b7,$b9,$bb,$bd,$be,$c0,$c2
db $c4,$c5,$c7,$c9,$cb,$cc,$ce,$d0,$d2,$d4,$d5,$d7,$d9,$db,$dd,$df
db $e1,$e2,$e4,$e6,$e8,$ea,$ec,$ee,$f0,$f2,$f4,$f6,$f8,$fa,$fc,$fe
</pre>
 
[[User:Litwr|Litwr]] 10:25, 24 October 2015 (CEST)
== 16bit * 16bit Unsigned ==
22
edits