Changes

Jump to: navigation, search

Programming:Integer Multiplication

4,158 bytes added, 17:31, 1 March 2018
/* 16bit * 16bit Unsigned -> 32 bits result */ forgotten label
'''Output:''' HL = ''Product''
'''ClocksCPC Cycles:''' 188-244 (216 on average)= 47-61 (54) usec
'''Size:''' 33 bytes
'''Output:''' A:HL = ''Product''
 
'''CPC Cycles:''' 212-300 (256 on average) = 53-75 (64) usec
 
'''Size:''' 47 bytes
<pre>
adc a,c ; ...
</pre>
 
== Fast 8bit * 8bit Unsigned (using log / antilog tables) ==
'''Output:''' DE = ''Product''
'''ClocksCPC Cycles:''' 124= 31 usec
'''Size:''' 1024 bytes (768 bytes for the tables)
</pre>
== FasterFastest, accurate 8bit * 8bit Unsigned with 16 KB tables == '''Input:''' L = Multiplier, C = Multiplicand '''Output:''' DE = Product '''CPC Cycles:''' 80 = 20 usec '''Size:''' 14 bytes of code + 16 KB tables
I'm currently working on a new routine, based on a routine by Kirk Meyer [http://www.ticalc.org/pub/86/asm/source/routines/vfmult.asm] which uses nibble multiplication tables.
ld d,a ; 20
</pre>
 
== Fast, accurate 8bit * 8bit Unsigned with 8 KB tables ==
 
'''Input:''' L = Multiplier, C = Multiplicand
 
'''Output:''' DE = Product
 
'''CPC Cycles:''' 100 = 25 usec
 
'''Size:''' 19 bytes of code + 8 KB tables
16K is a lot of memory to use for tables, but I'm working on a way to reduce this to 8K while maintaining similar performance (around 26&mu;s). The idea is rather than using tables for the low and high nibbles, use tables for alternate bits. So there would be 16 tables for the values #00, #01, #04, #05, #10, #11, #14, #15, #40, #41, #44, #45, #50, #51, #54, #55. The values can be shifted by 1 (rather than 4) to give values for the alternate bits using a simple ADD HL,HL.
== Very fast 8bit * 8bit Unsigned with only 1K of tables ==
 
'''Input:''' A = ''Multiplier'', L = ''Multiplicand''
 
'''Output:''' DE = ''Product''
 
'''CPC Cycles''': 104-112 (108 on average) = 26-28 (27) usec
 
'''Size''': 24 bytes of code and 1024 bytes for the tables
Here's a new routine I've developed which uses the formula ab = ((a + b)<sup>2</sup> - (a - b)<sup>2</sup>) / 4. It's based on a routine for the 6502 by Stephen Judd in a C= Hacking article. Because of differences between the way the 6502 does register indexing it was quite difficult to actually get this working, but it's a great compromise between speed and space since it only uses 1K of tables (as opposed to the 16K or 8K table routines above), and can still manage to do the job in a maximum of 28 microseconds.
Now for the actual multiply routine:
'''Input:''' A = ''Multiplier'', L = ''Multiplicand''
 
'''Output:''' DE = ''Product''
<pre>
ld h,umul_tab_lo / #100 ; 2
</pre>
This code could easily be converted to a macro as it's only 24 bytes. I've tried to optimise optimize it further but with no luck!
[[User:Executioner|Executioner]] 06:25, 4 April 2008 (CEST)
== Fast 8bit * 8bit Unsigned with only 512 bytes of tables == '''Input''': A = Multiplier, B = Multiplicant '''Output''': A:E = Product '''CPC Cycles''': 136-172 (154 on average) = 34-43 (38.5) usec '''Size''': 44 bytes of code and 512 bytes for the tables Fast 8 bit unsigned multiplication with 16 bit result. It uses formula<br>x*y = ((x+y)/2)<sup>2</sup> - ((x-y)/2)<sup>2</sup>, if x+y is even<br> = ((x+y-1)/2)<sup>2</sup> - ((x-y-1)/2)<sup>2</sup> + y, if x+y is odd and x>=y <pre> cp b jr nc,l1  ld e,a ld a,b ld b,e l1 ld c,a sub b rra ld d,a ld a,c add a,b rra ld l,a ld h,high(sqrlo) ld a,(hl) ld e,l ld l,d jr nc,l2  sub (hl) ;odd ld l,e ld e,a inc h ;loads high(sqrhi) ld a,(hl) ld l,d sbc a,(hl) ld d,a  ld a,e add a,b ld e,a ld a,d adc a,0 ret l2 sub (hl) ;even ld l,e ld e,a inc h ld a,(hl) ld l,d sbc a,(hl) ret sqrlo ;low(x*x) should be at the page border db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1 db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1 db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1 db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81 db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61 db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41 db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21 db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1 db 0,1,4,9,$10,$19,$24,$31,$40,$51,$64,$79,$90,$a9,$c4,$e1 db 0,$21,$44,$69,$90,$b9,$e4,$11,$40,$71,$a4,$d9,$10,$49,$84,$c1 db 0,$41,$84,$c9,$10,$59,$a4,$f1,$40,$91,$e4,$39,$90,$e9,$44,$a1 db 0,$61,$c4,$29,$90,$f9,$64,$d1,$40,$b1,$24,$99,$10,$89,4,$81 db 0,$81,4,$89,$10,$99,$24,$b1,$40,$d1,$64,$f9,$90,$29,$c4,$61 db 0,$a1,$44,$e9,$90,$39,$e4,$91,$40,$f1,$a4,$59,$10,$c9,$84,$41 db 0,$c1,$84,$49,$10,$d9,$a4,$71,$40,$11,$e4,$b9,$90,$69,$44,$21 db 0,$e1,$c4,$a9,$90,$79,$64,$51,$40,$31,$24,$19,$10,9,4,$1sqrhi ;high(x*x) db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 db 1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,3 db 4,4,4,4,5,5,5,5,6,6,6,7,7,7,8,8 db 9,9,9,$a,$a,$a,$b,$b,$c,$c,$d,$d,$e,$e,$f,$f db $10,$10,$11,$11,$12,$12,$13,$13,$14,$14,$15,$15,$16,$17,$17,$18 db $19,$19,$1a,$1a,$1b,$1c,$1c,$1d,$1e,$1e,$1f,$20,$21,$21,$22,$23 db $24,$24,$25,$26,$27,$27,$28,$29,$2a,$2b,$2b,$2c,$2d,$2e,$2f,$30 db $31,$31,$32,$33,$34,$35,$36,$37,$38,$39,$3a,$3b,$3c,$3d,$3e,$3f db $40,$41,$42,$43,$44,$45,$46,$47,$48,$49,$4a,$4b,$4c,$4d,$4e,$4f db $51,$52,$53,$54,$55,$56,$57,$59,$5a,$5b,$5c,$5d,$5f,$60,$61,$62 db $64,$65,$66,$67,$69,$6a,$6b,$6c,$6e,$6f,$70,$72,$73,$74,$76,$77 db $79,$7a,$7b,$7d,$7e,$7f,$81,$82,$84,$85,$87,$88,$8a,$8b,$8d,$8e db $90,$91,$93,$94,$96,$97,$99,$9a,$9c,$9d,$9f,$a0,$a2,$a4,$a5,$a7 db $a9,$aa,$ac,$ad,$af,$b1,$b2,$b4,$b6,$b7,$b9,$bb,$bd,$be,$c0,$c2 db $c4,$c5,$c7,$c9,$cb,$cc,$ce,$d0,$d2,$d4,$d5,$d7,$d9,$db,$dd,$df db $e1,$e2,$e4,$e6,$e8,$ea,$ec,$ee,$f0,$f2,$f4,$f6,$f8,$fa,$fc,$fe</pre> [[User:Litwr|Litwr]] 10:25, 24 October 2015 (CEST) == 16bit * 16bit Unsigned -> 32 bits result == '''Input:''' BC = ''Multiplier'', DE = ''Multiplicand'' '''Output:''' DE,HL = ''Product''<pre>ld hl,0ld a,16muluw add hl,hl rl e rl d jr nc,muluw_cont add hl,bc jr nc,muluw_cont inc demuluw_cont dec a jr nz,muluw</pre> == 16bit * 16bit Unsigned -> 24 bits result ==
'''Input:''' BC = ''Multiplier'', DE = ''Multiplicand''
36
edits