2024-11-12 09:29:15 https://en.wikipedia.org/wiki/Division_algorithm#Large-integer_methods says: Newton's method is particularly efficient in scenarios where one must divide by the same divisor many times, since after the initial Newton inversion only one (truncated) multiplication is needed for each division. 2024-11-12 09:57:52 What's interesting is, at least on the CPU I'm targetting at moment, a mobile Core 2 Duo, division is faster when the numerator is smaller 2024-11-12 09:58:23 You can be dividing a 128-bit numerator, but if it contains a smaller number it makes a big difference 2024-11-12 09:59:34 So clearly it shortcuts or computes which unit to start finding the quotient on (?) 2024-11-12 10:00:05 And it seems to take at least as many clocks as there are bits in the numerator 2024-11-12 10:00:06 Ben Eater writes a division algorithm for a 6502 https://www.youtube.com/watch?v=v3-a-zqKfgA 2024-11-12 10:00:44 :P 2024-11-12 10:01:22 I link, therefore, I am! :P 2024-11-12 10:04:05 I did do all this on Z80 when I implemented division 2024-11-12 10:05:15 The most serious version being that I actually implemented M*/ so I32*I16->I48 and then I48/I16->I32 2024-11-12 10:06:51 https://github.com/Veltas/zenv/blob/master/src/zenv.asm#L1030 2024-11-12 10:07:28 But you know, the easiest way to display decimal on Z80 is to use decimal to start, with the BCD instructions 2024-11-12 10:07:45 So if you've got a Z80 8-bit game with a decimal score count, they probably didn't implement division 2024-11-12 10:34:21 GeDaMo: The algorithm Ben's using is the one I use, keeping the whole thing in one stream 2024-11-12 10:35:00 I like the visualisation he's got for it 2024-11-12 10:35:17 That's what was in my head when I was working with it anyway 2024-11-12 10:35:20 i did a little bit of digging, and found that divide is about 3-5x slower than multiply 2024-11-12 10:35:25 Yeah, it's a good video 2024-11-12 10:35:53 it's not 10x slower 2024-11-12 10:36:15 On the mobile Core 2 Duo I'm targetting there's not really a hardware divide, it seems to have to do it bit-by-bit 2024-11-12 10:36:47 Probably using the same algorithm as Ben Eater but with a short-circuit for all zero bits at the start 2024-11-12 10:36:53 And doing one step in one clock 2024-11-12 10:37:47 So it's actually more like 100 times slower for the full sized numerator, but that's an edge case 2024-11-12 10:39:30 The full 64-bit multiplication happens in two clocks, so I think it's got only a 32-bit hardware multiply and chains it together for the 64-bit multiply 2024-11-12 10:40:24 And the mixed precision multiply is decidedly slower 2024-11-12 14:10:53 on 6502 I kept the numbers in decimal and just used an array of powers of 10 to subtract from the number until it underflowed then shifted to the next one 2024-11-12 14:27:30 hmm, not that's an interesting idea 2024-11-12 15:01:31 You could also use the decimal mode for 6502 right? 2024-11-12 15:12:01 sure, if it's just a simple score but if those numbers are related to coordinates on the screen or a memory address or something they should stay in binary 2024-11-12 15:14:32 Yeah I don't disagree