2023-01-03 08:02:56 Ugh. Vacation over, back to the mill today. 2023-01-03 08:03:48 Yup 2023-01-03 08:09:05 It was a good break, though - very relaxing. 2023-01-03 08:09:20 And the Verilog work - that's been a LOT of fun. 2023-01-03 08:09:34 It's nice to knock the dust off of some old skills. 2023-01-03 08:09:48 I definitely felt rusty the first few days, but the wheels feel pretty limbered up now. 2023-01-03 08:45:40 Time to get back in the cage I suppose 2023-01-03 09:04:46 Good luck. 2023-01-03 09:04:54 I'm headed in in a little while too. 2023-01-03 09:06:17 So, now that I'm doing this one-op look ahead, returns will become "free" so to speak. There's no reason at all I can't see the return coming up on the next cycle and head on back to the fetch state without actually spending a cycle on the return op. That should include conditional returns, since the "now" op one cycle before will have its own results by the end of its cycle. 2023-01-03 09:06:56 That reasoning will also apply to unconditinoal and conditional "looping back" as well. 2023-01-03 09:07:47 Same reasoning for rep microloops too. 2023-01-03 09:08:08 so this peeking ahead turns out to be a very good thing to do. 2023-01-03 09:09:02 Well, rep coming up won't send me back to the fetch cycle - it will just refresh the shifted copy of the instruction word and continue in the op state. 2023-01-03 09:10:02 So cell-level calls and jumps (tail calls) will consume a cycle, but all other control transfers won't. 2023-01-03 09:11:53 Hmmm. Maybe it's possible to do even better. If my last operation in a cell uses the RAM, then there's no avoiding a fetch cycle for the next cell. But if my last op doesn't use RAM, maybe I *can* go ahead and fetch the next cell in parallel. That's worth thinking about. 2023-01-03 09:12:27 Maybe only @ and ! words as last cell op will carry that cost, while other stuff doesn't. 2023-01-03 09:15:06 No, no - that won't work. My earlier comment about the clocked RAMS preventing that do hold; I was thinking for a minute I might have seen a work-around for that, but I dont think so. I can present the next address to the RAM, but I still need the fetch cycle to actually DO the access. 2023-01-03 09:16:14 In fact, if the last operation of a cell is a ! word - a write - then I clock that address into the RAM at the end of the cycle and I may then require a wait state to get the next IP address applied. 2023-01-03 09:17:23 Oh, scratch that. Writes use the other port. 2023-01-03 09:17:35 God, it's hard to keep all these little nuances in mind at the same time. 2023-01-03 15:52:24 Here's something I'd not heard of before today: 2023-01-03 15:52:26 https://yosyshq.net/yosys/ 2023-01-03 15:54:32 Supports both Xilinx 7 family and Lattice iCE 40 family. 2023-01-03 16:06:41 crc: I'm taking a look at ilo.lua, I think I can speed it up significantly 2023-01-03 16:07:15 I know you've said you don't know Lua that well, I do actually know it a bit so will do what I can 2023-01-03 16:21:54 I'd be quite happy to incorporate any improvements 2023-01-03 16:50:08 So I decided to group all of the instructions that mark or possibly mark "the end of a cell" at low numbers. I'm going to shift six 0 bits in each time I shift the opcodes down, so a zero field coming up next means I'm working now on the last instruction. That means I go back to the fetch state. 2023-01-03 16:50:24 So, ret should also send me back to the fetch state. So should "me" and so on. 2023-01-03 16:50:51 By making those the first few opcodes in numerical sequence, I can get it so anytime the top four bits are zero, I know I'm going back to the fetch state. Make it easy to recognize. 2023-01-03 18:03:50 crc: Why does load_image set ip to -1? 2023-01-03 18:08:30 Does it? I thought that was done in the i/o handler, not load_image 2023-01-03 18:12:23 I only see this done in the i/o handler 2023-01-03 18:13:53 Specifically in this case, reloading the image (and resetting the stacks) is done during execution, to reset to a clean state. Since IP is incremented at the end of each instruction bundle running, setting to -1 in the i/o handler causes execute() to increment back to 0, restarting execution at the beginning 2023-01-03 18:16:27 lua might have a better way to do this. In C, I didn't think of an option other than this or using setjmp/longjmp, which I didn't want to do. 2023-01-03 18:25:23 you could return the PC value in every opcode. 2023-01-03 18:25:44 or return a PC offset. 2023-01-03 18:30:53 I'm not sure how that would be benficial 2023-01-03 19:56:03 crc: it'd eliminate the odd -1 constant. 2023-01-03 19:56:27 move the PC control to the individual opcodes instead of execute. 2023-01-03 20:05:35 I'm not sure that'd make things better 2023-01-03 20:06:06 opcodes are part of bundles; they aren't aware of their position in a bundle. IP is incremented after the bundle finishes executing 2023-01-03 20:08:14 IP is modified by lit, call, jump, conditional call, conditional jump, return, and two I/O operations. Having every opcode need to track and update IP depending on their location in a bundle seems like more work than just accounting for the IP increment in the handful of instructions that actually alter it 2023-01-03 20:18:43 And on mine the opcode level control transfers either operate within the current cell (microloop) or else jump to an address that's on top of a stack. 2023-01-03 20:18:58 The calls and jumps that have information in the code stream are at the cell level. 2023-01-03 20:19:16 never cross the code streams? 2023-01-03 20:19:36 crc: Do the I/O instructions take a port # from the code stream? 2023-01-03 20:20:07 no, i/o port numbers and parameters are passed on the stack 2023-01-03 20:20:24 Ok. Why do the modify IP? 2023-01-03 20:21:23 The i/o port for terminating the VM sets IP to end of memory; the i/o port for resetting the environment sets it to -1 so it'll increment back to 0 at the end of execute() 2023-01-03 20:21:41 Oh, I see. There's more going on than just I/O. 2023-01-03 20:23:52 yes, for these two operations 2023-01-03 20:24:28 I'm going to have to retract some of the nice features I mentioned earlier. I think seeing ret and unconditional loop jump "ahead of time" should work fine. But I also mentioned skipping over conditionals not taken. That would be great, and would be best performance. But it opens a can of worms. First of all it is more complex. But... what if I had two in a row? 2023-01-03 20:24:44 Or three? I can't skip an "arbitrary" number. 2023-01-03 20:25:23 ret and "me" are ok because they're guaranteed to send me back to the fetch cycle. But the conditional operations will have to be given their cycle, whether they "fire" or not. 2023-01-03 20:25:50 So, not quite as fast, but it will be easier to design. 2023-01-03 20:27:11 The "real" purpose of the one op lookahead is to clock in the address I'm going to need for any potential RAM reads on the next cycle. Anticipating ret and me will be a "bonus." 2023-01-03 20:28:00 And I may let that be something I add later. 2023-01-03 21:08:59 Ok, so it looks like the opcode sequencer will need two 32-bit registers, one 32-bit 4-1 mux, and one 32-bit 2-1 mux. Counting the register load enables, that means five control signals, so there will be some combinational logic to produce thos. 2023-01-03 21:11:50 And probably 90% of that is needed to work around those clocked RAMs.