2023-01-12 07:11:39 I enjoy thinking about how to make RAM stuff more compact 2023-01-12 07:11:41 Within reason 2023-01-12 07:23:05 Yeah, such problems can be fun to tinker with for sure. In a notebook PC there's really not a whole lot of payoff for it in something like Forth - we've basically got more RAM than we can really use effectively. But in an FPGA RAM resources are often much more limited. 2023-01-12 07:23:41 So being able to get as much code in there as possible is definitely a goal worth giving some thought. 2023-01-12 07:24:05 In this case it's really those 32-bit call addresses that are rather wasteful. 2023-01-12 07:25:45 In token forths I tend to allow 16-bit and 32-bit offsets for calls 2023-01-12 07:25:56 PowerPC allows 24-bit immediate offsets I think 2023-01-12 07:26:04 And that will really pretty much completely define the size of my code, because normally we don't write a lot of new machine code in Forth. 2023-01-12 07:26:18 Maybe I should allow that too.... can reach most programs entirely with 24-bits, even larger than 64K 2023-01-12 07:26:55 That was my first idea. I'm using those two leftover bits of a 32-bit cell to define it's meaning, and there's one 2-bit value left over; I could use it to say "here are two 15-bit addresses>" 2023-01-12 07:27:08 That would immediately shrink my code quite a lot. 2023-01-12 07:28:06 But unfortunately I'd have no good way to do tail optimization on those, so I'd have to use a 32-bit cell for tail-optimized items. 2023-01-12 07:28:32 I also allow 8-bit, 16-bit, and 32-bit jump offsets 2023-01-12 07:28:59 Likewise PPC has 24-bit (or 28-bit?) jump offsets 2023-01-12 07:29:09 Yeah, I've been trying to think of a nice clean way to allow several different sizes, but still have it be fairly easy for the hardware to fetch and decode. 2023-01-12 07:29:38 They would require different opcodes, or subcodes 2023-01-12 07:30:17 Right. 2023-01-12 07:30:28 I think the simplest way is allow small relative immediates, and indirects (i.e. take value from stack, or a register) 2023-01-12 07:31:08 Then for larger jumps/calls you have to load the full value as a literal and then do indirect jump/call 2023-01-12 07:31:12 Yeah, I've already really eliminated jump targets completely from the code stream - I'm doing only backward jumps, and as I pass the target address I have an opcode to capture that value. 2023-01-12 07:31:21 Into a register basically. 2023-01-12 07:31:35 backward jumps would work for tail calls 2023-01-12 07:31:42 There's a stack for that, so each called word has one address it can catch. 2023-01-12 07:31:47 Because we're generally just calling something that's already defined 2023-01-12 07:31:50 Yes. 2023-01-12 07:32:07 But I need an opcode to actually jump to one of those. 2023-01-12 07:32:52 The initial plan here was to use two bits to id the function of each cell. One pattern says "five 6-bit opcodes," another says "30-bit call address," a third says "30-bit jump address." 2023-01-12 07:33:02 And the fourth could say "two 15-bit calls." 2023-01-12 07:33:06 But then I'm out of patterns. 2023-01-12 07:33:21 So I'm left with no way to do a 15-bit jump. 2023-01-12 07:33:48 To use the other jump back mechanism requires an opcode (a "me" opcode), which would consume a whole cell. 2023-01-12 07:33:53 So a 2 bit opcode? 2023-01-12 07:33:55 I'd be better off using a 30-bit jump. 2023-01-12 07:34:07 Basically that's what it is. 2023-01-12 07:34:16 2 bit opcode sounds inoptimal 2023-01-12 07:34:18 At the cell level each cell has a 2-bit opcode and a 30-bit data field. 2023-01-12 07:34:39 4 or 5 bits would make more sense IMO 2023-01-12 07:34:43 Well, that's why I'm thinking more about it. To see if I can arrive at something better. 2023-01-12 07:34:51 Well, my "reaL" opcodes are 6 bits. 2023-01-12 07:34:52 Unless you're writing a brainfuck VM :P 2023-01-12 07:35:15 The two bits just specifies how I'm using the other 30 bits of each 32-bit cell. 2023-01-12 07:35:23 And one way to use it is to hold five 6-bit opcodes. 2023-01-12 07:35:48 But if I need less than five, then the rest of that cell is wasted in this design. 2023-01-12 07:36:28 So those are the two ways I want to do better. Not waste any of the space becuase of switching from cell meaning to cell meaning, and to recognize the fact that 30 bit addresses are often too large. 2023-01-12 07:36:57 I think you'll get more room using a 4 or 5 bit opcode 2023-01-12 07:37:09 I'd like to be able to have eight and 16 bit addresses too, and find a way to avoid wasting leftover parts of cells. 2023-01-12 07:37:28 I considered 5-bit ones, in which case a cell holds six. 2023-01-12 07:37:30 Although I can't speak for how it will affect the gate usage 2023-01-12 07:37:48 But I do have some desires about what I want my instruction set to be able to do. 2023-01-12 07:38:02 But bear in mind that 8-bit CPUs had 8-bit opcodes, so if they can do it you can. 2023-01-12 07:38:03 I'm sure I could invent a smaller instruction set. 2023-01-12 07:38:08 But it would be 'different'. 2023-01-12 07:38:51 I'm pretty happy with the current 6-bit set; it does everything I like. 2023-01-12 07:39:00 So the "machine code" part of this is pleasing to me. 2023-01-12 07:39:27 It's just the integration of the call portion of things that I'm tinkering with. 2023-01-12 07:39:42 I think a good focus is RISC-V, has a lot of potential. Have you considered writing a Forth for RISC-V natively? 2023-01-12 07:39:59 No, that's never come under my scrutiny. 2023-01-12 07:40:11 I've mostly DONE x86, and contemplated ARM. 2023-01-12 07:41:16 Oh, you know, here's an idea. 2023-01-12 07:42:56 Well, no - that winds up not saving anything really. Never mind. 2023-01-12 07:43:11 Anyway, just being able to have two 15-bit calls helps a lot. 2023-01-12 07:43:25 So I already see how to do some improvements. 2023-01-12 07:43:34 I'm just being kind of greedy. :-) 2023-01-12 07:43:43 And I enjoy thinking about this kind of thing. 2023-01-12 07:45:51 Most of the ideas I've thought of for getting really aggressive for this would require more complex fetching; the fetch unit might already be looking at the next RAM cell while the current one was being processed. And that pre-fetch might prove to be wrong - "this" cell might jump somewhere else. So that would require logic to recover from that, and it would probably cause a delay, and so on. 2023-01-12 07:45:56 It just gets messier. 2023-01-12 07:46:18 Have you seen how things like ARM, PPC, or RISC-V are encoded? 2023-01-12 07:46:32 Because they are also 32-bit words of instructions 2023-01-12 07:47:44 Also, some of the thiings I've thought of would require me to be able to jump to / return to mid-cell code locations, which is something I'm not currently needing to do. 2023-01-12 07:48:05 No, haven't really gone off and looked. 2023-01-12 07:48:11 Just puzzling over it on my own. 2023-01-12 07:48:20 But like I said, it's fun. 2023-01-12 07:48:33 I'm not in angst over it or anything. :-) 2023-01-12 07:50:41 Some aspects of it look fairly easy to deal with. My RAM will still use 32-bit cells, regardless. So I pictured a barrel shifter type decoder; it would fetch a 32-bit cell, and then start at the beginning of it. As long as it could recognize, or know, the size of the next "entity" in there, it can dispatch it and then shift over past it. Lather, rinse, repeat. 2023-01-12 07:51:23 But that raises the prospect of doing a call from somewhere in the middle of the cell, and that would require the return information to include not only the cell's address but also where to resume in that cell. 2023-01-12 07:51:54 Not impossible to do, since I'm designing the hardware, but it's something that has to be gotten right. 2023-01-12 07:52:27 One of the problems I wanted to solve was "leftover unusable parts of cells." So my very first thinking on that was to say "Ok, let's try to waste NOTHING - what does that require?" 2023-01-12 07:52:44 PDP-9 had a 4-bit opcode and 14-bit instruction data 2023-01-12 07:52:46 18-bit words 2023-01-12 07:53:05 But it would be simple to push both a cell address and a partially processed cell on calls. 2023-01-12 07:53:46 I read that article laslt week on huffman encoded code streams. 2023-01-12 07:54:08 That barrel shifter approach would handle that too - it would recognize the size of the next opcode, so it would know how to shift it by. 2023-01-12 07:54:21 That was interesting, but I don't think I'm going to pursue it. 2023-01-12 07:54:35 The opcode stream isn't where the waste is in what I've done so far. 2023-01-12 07:54:42 6-bit opcodes - already fairly compact. 2023-01-12 07:55:22 Another possibility is to scrap the 2-bit control field and just have call and jump opcodes. 2023-01-12 07:55:33 And I could have more than one, to handle multiple sizes. 2023-01-12 07:55:49 That might be the easiest way forward. 2023-01-12 08:06:08 It makes it feel a little more like a "standard design" and a little less like a "Forth machine," though - that 2-bit "layer" really kind of embraced Forth nicely. At that level, it uses the notion of a Forth definition being a "list of addresses" very overtly. 2023-01-12 08:06:48 Then the opcode layer "how it eventually did actual operations." 2023-01-12 08:09:25 Of course there's also the possibility of 7-bit opcodes; then four would fit in a cell and there would be four control bits in each cell. That would give me quite a number of ways to interpret a cell's meaning. 2023-01-12 08:10:04 The main thing I've been internally debating the last few days is whether to have that structure - cell oriented, with fields within a cell all having the same meaning, vs. a true "stream" of code, where anything can happen at any time. 2023-01-12 18:50:06 Looks like there are somewhere around 1250 words n the out-of-the-box GForth dictionary. 2023-01-12 18:50:44 I don't know if they make them all immediately accessible on startup - you might have to "activate" groups or something. 2023-01-12 18:50:57 But i copied this: 2023-01-12 18:50:59 https://gforth.org/manual/Word-Index.html 2023-01-12 18:51:23 to vim and it had 1255 lines. I just rounded down to guestimate the section separator lines. 2023-01-12 18:53:10 It occurred to me that in an indirect threaded Forth you don't have to have call and jump that will span your whole address space. 2023-01-12 18:53:35 You just need to span a table with the CFA/PFA info for all of your words. 2023-01-12 18:53:46 The table entries are what have to span the whole address space. 2023-01-12 18:54:00 And that's actually fairly limited and it scales in a nice way. 2023-01-12 18:54:15 If you don't have much RAM in your system, you aren't going to have many words and the table can be small. 2023-01-12 18:54:36 If you have a big table, it's because you have a lot of RAM - you have to to support that many words. 2023-01-12 18:54:54 So in any case, the table remains modest compared to your total RAM. 2023-01-12 18:55:19 Also, in a small system it requires fewer bits to span your small table, so that makes code smaller. 2023-01-12 19:00:44 Also, we talked some recently above having absolute and relative calling ability - the relative call ability would let you use smaller code fields to access definitions "recent to you," because their code is closer. 2023-01-12 19:01:40 Well, that would work in this kind of system too - your recent definitions would be "close to you" in the table. So as long as the hardware kept up with the table entry currently being executed, you could offset back from there with small bitfields to call recently defined words. 2023-01-12 19:04:49 Say you had one-byte items in your code stream. One bit would be your absolute/relative bit, and one bit would tell you whether or not the following byte should be included in the item (so items would be one byte or two bytes). 2023-01-12 19:05:56 For one-byte items, the remaining six bits would let you target either the first 64 words in the table (your most commonly used "core" words), or the 64 definitions immediately preceding you (likely to be your "helper" words). 2023-01-12 19:06:20 You might also need a jump/call bit, so the 64 becomes 32. 2023-01-12 19:06:48 If you included the second byte, you'd have 13-14 bits to span your table with, so you could support either 8k or 16k words. 2023-01-12 19:07:57 That's just the threading engine - you'd still need opcodes to eventually manipulate your hardware. The table would tell you when it was pointing at opcodes, so different hardware would handle that business. 2023-01-12 19:08:09 I actually kind of like that. 2023-01-12 19:46:03 Anyway, that would give you support for a dictionary with 4000 or 8000 words, and a large majority of your definition cells would be one byte. 2023-01-12 21:36:23 Got a copy of "Astronomical Algorithms" by a guy named Jean Meeus delivered today. Just thumbed through - looks like a nice mixture of general calculation methods and specific astronomical stuff. 2023-01-12 21:37:01 so... you could say you were star struck? 2023-01-12 21:37:13 I suppose one could say that... :-) 2023-01-12 21:37:49 I don't know that I've ever actually had that experience; I don't usually get to worked up over celebrities. 2023-01-12 21:37:55 too 2023-01-12 21:38:21 BTW, I'll take a moment to curse the postal service some more. STILL no tracking update on the DM41. 2023-01-12 21:38:36 I really DO think it's just going to show up at some point, without ever having shown me any motion. 2023-01-12 21:38:53 Either that or I've got a problem on my hands. 2023-01-12 22:27:11 KipIngram: oh cool, my first bit of FORTH code was one of his algorithms, on calculating the date of Easter 2023-01-12 22:50:30 :-) 2023-01-12 22:51:16 I've got a book upstairs somewhere by a guy named Ball, just called "Algorithms for RPN Calculators." That was some of my first learning (after I consumed the manual for my HP-41CV the weekend after I bought it). 2023-01-12 22:52:34 This new book goes through a lot of the basics. Interpolation, from tables, for example, including when it's safe to interpolate using just three points, when you need five, etc. 2023-01-12 22:52:50 Curve fitting, etc. All the fun basic stuff. 2023-01-12 22:54:20 On most of these calculators, the statistics functionality generally gives you an easy way to accumulate x/y points, and as you do so it will accumulate sum of x, sum of y, sum of x^2, sum of y^2, and sum of x*y. 2023-01-12 22:55:16 Generally they stop there. That lets you go out to a certain point in those interpolation algorithms. But if you summed higher order terms - x^3, x^2*y, x*y^2, and y^3, for example, you could go to the next order in the curve fitting, interpolating, and so on. 2023-01-12 22:55:53 Instead of being limited to fitting a best straight line, you could fit a best parabola. 2023-01-12 22:57:03 There's an alternative derivation of calculus you can find online by that fringe math guy we've talked about - Norman Wildberger. 2023-01-12 22:57:32 He shows how higher derivatives get you at "best fitting conic approximations" to functions. 2023-01-12 22:58:03 I'm not sure whether he's "truly weird" or just misunderstood, but nonetheless some of the things he teaches are damn interestng and probably useful. 2023-01-12 22:58:28 And there is a pretty high "common sense factor" in his way of looking at things. 2023-01-12 22:59:19 I think I wrote an HP-41 program back in college for finding best fitting "arbitrary polynomials." 2023-01-12 22:59:58 Turned it into a Gaussian elimination problem in the polynomial coefficients. 2023-01-12 23:13:10 So this relative call stuff. It's a little involved. At "level one" it works great. By that I mean you start making definitions, and at first you only have your machine code instructions to use so you're writing primitives. 2023-01-12 23:13:21 We'll call those level 0. 2023-01-12 23:13:37 Then level 1 is definitions that only call level 0 words. 2023-01-12 23:13:56 And so on. Of course higher up you wind up mixing use of words from various levels. 2023-01-12 23:14:37 But when you're defining level 2 words, the level 1 words you write to support it are probably near at hand. But then you write your second level 2 word, and its level 1 words are close at hand as well. 2023-01-12 23:14:53 But now that other level 2 word is far back, on the other side of all of the second word's helpers. 2023-01-12 23:15:07 so when you get to writing level 3 words, things have started to get scattered out. 2023-01-12 23:15:27 And having a way to compactly access the "32 most recent definitions" has less payoff. 2023-01-12 23:16:52 But I've got this vague thought that somehow the slots in this table also have "levels." Slots 16, 32, 48, etc. - slots on multiples of a power of 2, can be recognized. So if you had a way to relatively access the most recent, say, 32 words, *at some power of 2 level*, then you might be able to cleverly use your slots to accomplish more compactification. 2023-01-12 23:17:03 That's all I've got right now - it's still really murky in my head. 2023-01-12 23:17:16 And probably isn't something most programmers would want to think about anyway. 2023-01-12 23:17:34 But a smart compiler could probably suss it out for you. 2023-01-12 23:18:20 So here you are defining your words, and when you come to an "important one," maybe you jump ahead in the table to the next available "some power of two" slot, and then drop back and fill in the ones in between. 2023-01-12 23:18:52 You can assign an "importance" to your definitions based on where you put them in the table, and the more important they are, the further away you can reach them from with a compact call. 2023-01-12 23:19:19 The cost is that the call has to specify which level it's going to work at, as well as the offset it's calling across. 2023-01-12 23:19:39 So whether it actually winds up paying off I have no idea. 2023-01-12 23:20:26 I guess what's on my mind in GENERAL tonight is having hardware directly support an indirect threaded structure. 2023-01-12 23:21:11 That's different from what I've been thinking about - you might remember I mentioned that some of the threrading aspects had "disappeared" from the mental picture I had going a week or two ago. 2023-01-12 23:21:46 What I was designing before looked more like a code-threaded system. 2023-01-12 23:27:23 Speaking of "neat basic knowledge" (like the stuff in that algorithm book), somewhere online I found a really neat write-up on old-school dimensional analysis. 2023-01-12 23:27:58 What they called dimensional analysis when I was in school was just a pale imitation of what was in this paper - it had some really, really neat stuff on identifying dimensionless parameters in systems and that kind of thing. 2023-01-12 23:28:28 I remember thinking as I read it that it was the sort of stuff that should be in every engineer's and scientist's "basic toolkit." And it probably *was* a century ago. 2023-01-12 23:28:40 I think we've let some really worthwhile stuff "slip away" over time. 2023-01-12 23:29:11 good things fall out of copyright pretty quick 2023-01-12 23:29:12 I've got that paper around here somewhere on my computer; if I can come up with the link again I'll post it. 2023-01-12 23:30:43 Hah! 2023-01-12 23:30:45 I found it. 2023-01-12 23:30:59 Found it in my Downloads folder first and then just searched the exact title. 2023-01-12 23:31:01 https://web.mit.edu/2.25/www/pdf/DA_unified.pdf 2023-01-12 23:31:36 I need to make time to really give that a thorough going through. 2023-01-12 23:32:31 I think there's a strong tendency these days to find software to do these basic things, and then it's easy to completely lose touch with how they're done. 2023-01-12 23:32:49 We're growing awfully dependent on our technology. 2023-01-12 23:33:11 I swear, the extent to which my kids took forever to even learn to drive around town just blew my mind. 2023-01-12 23:33:24 Totally crutched around with their phones. 2023-01-12 23:36:19 or that whole fire thing