2022-03-25 00:03:36 Heh. Just wrote a word that scans my whole dictionary and makes sure all cfas are on 32-bit alignment points. 2022-03-25 00:12:58 now "ls -t" is "^ls%space%-t" xD 2022-03-25 03:24:12 KipIngram: ITC? 2022-03-25 03:24:17 or ARM? 2022-03-25 06:51:26 dave0: BCPL also isn't C 2022-03-25 06:51:50 They don't teach you that in schoo 2022-03-25 06:55:38 veltas: yeah but is bcpl as interesting as forth? :-) 2022-03-25 08:53:58 dave0: No but it is actually kind of interesting 2022-03-25 09:33:23 theres a guy working on a BCPL interpreter for the 65816 2022-03-25 09:33:37 if youre interested in BCPL in this century 2022-03-25 11:04:54 Wow... 2022-03-25 11:04:56 https://www.tomshardware.com/news/nvidia-unveils-144-core-grace-cpu-superchip-claims-arm-chip-15x-faster-than-amds-epyc-rome 2022-03-25 11:05:58 finally catching up to chuck moore 2022-03-25 11:06:20 /not serious 2022-03-25 11:31:14 remexre: Not sure what you were asking - my platform is x86_64 Linux. But aspects of it use 32-bit quantities. Specifically, the stacks, the variables, etc. are 64-bit and store full addresses where relevant, but dictionary lists are comprised of 32-bit offsets from the base address the system is loaded at. 2022-03-25 11:32:05 Part of the reason for that is that I first wrote this one on a Mac, and the MacOS "MACHO" executable format requires total relocatability. But another reason was to make the system smaller - it's about half the size it would be if it'd been a pure 64-bit system. 2022-03-25 11:32:44 If ITC meant "indirect threaded," yes - it is indirect threaded. 2022-03-25 11:33:32 So when the inner interpreter uses the IP to scoop the next item up from a definition, it then adds the contents of a register to that to get a 64-bit address. 2022-03-25 11:34:24 Or when the 32-bit CFA is pulled from a word header, the same register is added to that and I jump to the result. 2022-03-25 11:35:18 Well, actually that's two different registers - the list entries get offset by "base of headers" and the CFA gets offset by "base of bodies." r14 and r15 in my system. 2022-03-25 11:36:28 So it does a little more work than an "efficient as possible" indirect threaded system would do, but the difference isn't very much. 2022-03-25 11:37:07 The reason those base addresses have registers devoted to them is to make it possible to fix up an offset with a single register add instruction. 2022-03-25 11:37:34 dovar adds the body base register so that the variable address that winds up on the stack is a real address. 2022-03-25 11:38:01 yeah, I meant that; was wondering why alignment would be necessary with x86 machine code 2022-03-25 11:38:25 Ah - I see. Well, I guess it's not; it's just a little more efficient. 2022-03-25 11:38:50 And I really only needed to align the body section once - everything I add to it there is a 32-bit quantity, so it stays aligned. 2022-03-25 11:39:09 I just put an align 4 right after all the machine code and before the definition lists. 2022-03-25 11:39:50 I just wanted to avoid the processor having to read the halves of a 32-bit quantity separately. 2022-03-25 11:43:33 I also add 1-3 pad bytes after the last character of name strings, so that the CFA and PFA pointers are aligned. 2022-03-25 11:44:00 That one actually gets used a lot, so I sacrifice some bytes there. 2022-03-25 11:44:21 Names of length 1, 5, 9, etc. fit exactly; other lengths get a pad. 2022-03-25 11:45:04 The link field is 2 bytes, then there's a count byte for the name, so that leaves one byte in that 32-bit item. 2022-03-25 11:48:37 I keep three status bits in that 16-bit link offset field too, and the way it's arranged right now two of them are the low order two bits, which I mask off before treating it as a link distance. So that makes it so the link distance has to be a multiple of 4, unless I made some changes. 2022-03-25 11:49:09 I could shift and mask of course, but that was all done on purpose. 2022-03-25 11:49:43 Third status is the MSB, so that limits the link distance to 32k bytes, but I'm never going to need anywhere near that, so it's fine. 2022-03-25 13:19:33 So here's an interesting use case for my stack frames. That word I wrote that checks alignment. The top level word I first defined like this: 2022-03-25 13:19:44 : check path @ @ (check) drop drop ; 2022-03-25 13:19:54 Pretty simple - it just sets up, runs a loop, and cleans up. 2022-03-25 13:20:00 But an alternative definition is this one: 2022-03-25 13:20:10 : check { path @ @ (check) 0 } ; 2022-03-25 13:20:38 I sort of like that, because it makes it more explicit that the word has no overall stack effect. 0 } pretty much means "put the stack back just like how { found it." 2022-03-25 13:21:02 It will throw away any residue left by the contents of { ... } 2022-03-25 13:21:24 And if I wanted to also throw away N incoming parameters, it would be N } instead of 0 } 2022-03-25 13:22:40 It's a limited use of the frame - only using it for cleanup, really. 2022-03-25 13:23:06 But the main functional thing is that inside there I have a nice fixed access to the stack as an array. 2022-03-25 13:23:11 If I need it. 2022-03-25 13:24:47 If I somehow changed (check) so that it left one more or one less item on the stack, that second definition wouldn't need to change - the first one would. 2022-03-25 15:19:59 KipIngram: What are the defs of { and } ? 2022-03-25 15:35:50 A processor register is used as the "frame pointer." { saves that register to the return stack and loads the current stack pointer into it. } move the frame pointer back into the stack pointer, recovers the previous frame pointer from the return stack, and then drops from the stack the number of items specified by the top of stack value cached in a register. 2022-03-25 15:36:03 They're fairly short, fast primitives. 2022-03-25 15:36:17 In between the frame pointer remains static, and can be used as a fixed reference point into the stack. 2022-03-25 15:38:36 That step "moves the frame pointer back into the stack pointer" is the key to it not mattering what condition the interior has left the stack in. Doesn't matter - that step throws away everything that's been added inside the frame. 2022-03-25 15:39:03 So if you want the frame to return a value to the outside, you make a space for it before opening the frame, and the interior stores the result there. 2022-03-25 15:39:55 It creates a sort of "unnamed local variables," or something not terribly far from it. 2022-03-25 15:41:01 I have several words - s0 s1 s2 ... - (that I think I should change to f0 f1 f2 ...) that return the address of the pertinent spot in the stack. Then I just treat it like any address. 2022-03-25 15:43:26 Chuck would say I should write code that needs values from down in the stack. I try, but sometimes I can't figure out how to not need them without it getting rather ugly - this solves that problem. 2022-03-25 15:45:10 In the case I cited above, I didn't really "need" to use it - there was an easy way to do that word without the mechanism. 2022-03-25 15:45:25 But I still think it's a slightly preferable definition with the frame, for the reasons I stated. 2022-03-25 15:45:59 When I first wrote them, I didn't have the drop parameter to } - I added that later. 2022-03-25 15:46:04 Just for convenience. 2022-03-25 15:47:23 I could eliminate the need for it by opening the frame *before* stacking up parameters, but then I'd have to use negative indices to access some of those, and that just seemed a little tacky. 2022-03-25 15:48:42 Or I guess I could let positive indices go in the other direction; then I'd wind up with something like 2022-03-25 15:48:59 { parm1 parm2 parm3 ... 2022-03-25 15:49:12 and s1 would point to parm1, s2 to parm2, etc. 2022-03-25 15:49:17 hmm, thinking about those f0, f1, f2 words to get pointer to stack 2022-03-25 15:49:23 Yes. 2022-03-25 15:49:29 I am going to rename those with f. 2022-03-25 15:49:34 That was the idea. 2022-03-25 15:50:04 If you want to you can write words that actually do things to the addresses, like f1@ etc. Little more efficient, but starts to add words. 2022-03-25 15:50:18 Because you'd need one for every index. 2022-03-25 15:50:29 A generic word that you give the index to as a parameter is possible too. 2022-03-25 15:50:33 Just less efficient. 2022-03-25 15:50:40 i wonder about saving any primitives in compile mode to a buffer to do a small amount of peepholing. seems like very low hanging fruit. s0 @ 1+ s0 ! could go from a few dozen cycles to 2 2022-03-25 15:50:51 1 f instead of f1. 2022-03-25 15:51:16 That would be a "minimalist" approach in terms of word count. 2022-03-25 15:51:35 Possibly, yes. 2022-03-25 15:51:46 Forth sort of ofers a lot of potential there. 2022-03-25 15:52:01 ya would be more flexible with passing index number though even more overhead. i guess it doesnt matter on x86 though :P 2022-03-25 15:52:36 Of course, I'd have to put that code somewhere permanent, since I'm not code threaded. Which incurs a good part of the cost of just defining the word myself. 2022-03-25 15:53:14 The peephole created code, I mean - I'd need to "build a primitive" that I could call from my definition. 2022-03-25 15:54:11 I guess I'd need to do that either before or after I compiled the definition, because those things both go into the same ram region. 2022-03-25 15:54:49 At any rate, it's a fair jump in compiler sophistication to start trying that. 2022-03-25 15:56:05 hmm ya if you're not inserting machine code into thread 2022-03-25 15:56:57 Well, I don't have any assembler words yet, but I will, which will let me write new primitives, and that machine code will go right in line with the other definitions, in the body section. 2022-03-25 15:57:09 The machine code is the "definition" part of the primitive. 2022-03-25 15:57:47 And then I'd re-align to the four-byte boundary, since there's no guarantee the machine code will have mod 4 = 0. 2022-03-25 16:30:36 So, I did a little poking around earlier and discovered that, in fact, my system seems to need for the headers to all be 32-bit aligned. I bumped the header pointer up one byte, and then defined a word - that immediately crashed the system. I don't know exactly why yet; I'll dig into it in a bit. 2022-03-25 16:31:30 It's got to be to do with the link or name string, because it automatically re-aligns before laying down the CFA and PFA pointers. 2022-03-25 16:48:34 and some peeps have asked me why I like to have addressing at machine-word granularity and not byte granularity 2022-03-25 17:20:58 Oh, you can't get inside the cells? 2022-03-25 17:22:53 Well, I was overlooking hte obvious. When I bump the header next available byte pointer, that makes it so the link back to the prior word not a multiple of 4. So I can't properly store it in my link field. I set it correctly - to the right distance back. But then when I try to search the dictionary the next time I get lost, because I mask off the low order two bits of that link distance. 2022-03-25 17:23:00 So, of course that doesn't work. 2022-03-25 17:24:56 If I wanted to fix that, I could move those two status bits up to the top of the word, like the other one, and accept limiting the link distance to 8k. Which would be fine, but the real point of doing that alignment is to keep the CFA/PFA pointers aligned. 2022-03-25 17:25:07 So I won't bother - I just wanted to understand it. 2022-03-25 17:28:33 sure you can get inside cells, just not with @ and ! 2022-03-25 17:30:33 I see. Well, my @ and ! will "work" on misaligned addresses. And I have the usual other words to deal with smaller sizes. I guess with the way I've got my system aligned @ and ! won't generally receive misaligned addresses. 2022-03-25 17:30:45 : C@ ( address bytenr -- byte ) swap @ swap 4 swap - 8 * >> 0xFF and ; 2022-03-25 17:31:04 similiar for C! 2022-03-25 17:31:14 But they do whatever the underlying machine code does, and I presume it will fetch and store misaligned addresses. 2022-03-25 17:31:31 Yeah, I've goto @ and ! and then h, w, and c variants of each. 2022-03-25 17:32:53 most archs nowdays fetch or store whole machine-words iirc aligned then use barrel shifter 2022-03-25 17:34:01 I just checked it out - @ and ! work on the eight bytes starting at whatever address I say. I presume the processor has to use two operations, each getting or putting 32 bits, to do that. 2022-03-25 17:35:09 So, I don't guarantee that my variables are 64-bit aligned. 2022-03-25 17:35:43 That would involve giving up more bytes for padding, and honestly fetching variables isn't a large enough part of overall operation for me to think it's necessary to optimize it. 2022-03-25 17:35:59 Whereas fetching 32-bit "next thing to execute" cells is a large part of that operation. 2022-03-25 17:36:19 So I could easily have variables that span two 64-bit cells. 2022-03-25 17:38:13 So, the earlier investigation revealed that I don't have to have my headers all 32-bit aligned, but I do have to have the header-to-header separations all be multiples of 4 bytes. 2022-03-25 17:39:04 The link+name is the only part that's not naturally aligned, so it's the only place I have to pad. 2022-03-25 17:39:46 In the body area, variables, constants, and : defs will all be naturally aligned to 32 bits as well - it would just be embedded strings and primitives where I'd need to post-align. 2022-03-25 17:40:22 And I don't bother on the built-in primitives; their code is all just jammed up against each other. 2022-03-25 17:40:32 I align after the last one. 2022-03-25 17:40:32 you have the link in front of the name, yes? 2022-03-25 17:40:46 Yes; link, name, cfa, pfa. 2022-03-25 17:41:32 So getting from the name to the cfa just uses the count and bumps to the next align point. Going the other direction involves scanning for the set MSB of the count byte. 2022-03-25 17:42:13 one trick I learned years ago from an MCU programmer was to have static sized datums in front of variably sized datums 2022-03-25 17:42:45 so link, cfa, pfa, and then name 2022-03-25 17:43:15 Yes, it did seem easier to have that link up front. I guess I could have put cfa and pfa up there too. That might be worth thinking about, but now I have a noticeable amount of code built on this structure. 2022-03-25 17:43:24 I agree though - on hindsight it seems better. 2022-03-25 17:43:35 Would have simplified some things. Like I wouldn't need that scan. 2022-03-25 17:44:02 FIG has thelink after the name, and I deliberately moved it for exactly the reasons you're alluding to. Don't know why I didn't think to move everything. 2022-03-25 17:44:34 Actually, cfa, pfa, link, name would be even better, because link is just two bytes. 2022-03-25 17:44:37 something I noticed as a side effect of split dictionary in eForth was precisely this 2022-03-25 17:45:14 Well, if I ever re-do this, I'll keep that in mind. It's a "right idea." 2022-03-25 17:46:00 other benefit was that words that were basically tail sub sequence of other words did not need extra copy of that copy 2022-03-25 17:46:34 s/that copy/that code/ 2022-03-25 17:46:54 Oh, right. I can do that. That's why I split the headers out to start with. 2022-03-25 17:47:10 but man does that make SEE much more complicated 2022-03-25 17:47:13 : foo ...code1... : bar ...code2... ; 2022-03-25 17:47:27 foo runs code1 and code2; bar just code2. 2022-03-25 17:47:56 In my case SEE foo woudn't even know that bar was involved. 2022-03-25 17:48:05 It would just find code1 code2 2022-03-25 17:48:28 and vice versa; SEE bar would just find code2. 2022-03-25 17:49:03 But I guess if you were trying to use all the pointers to define who owned what code, I see the issue. 2022-03-25 17:49:26 SEE would lookup each xt in a colon definition by searching the dictionary until an cfa matches 2022-03-25 17:49:31 I presumed SEE would scan for a "termination word," which would ordinarily be (;) or JMP . 2022-03-25 17:49:39 Right. 2022-03-25 17:50:22 But pointers to foo and bar are different values, still. 2022-03-25 17:50:30 I think I'm missing part of your point. 2022-03-25 17:50:57 might spend part of the length part of name to indicate termination word, or just spend a cell and have a SEE handler pointed to 2022-03-25 17:51:55 Damn it. Now I'm going to sit around wishing I had "cfa, pfa, link, name." :-( 2022-03-25 17:52:03 lets say I have word baz that has xt to bar in it 2022-03-25 17:52:09 Ok. 2022-03-25 17:52:50 just like : baz .... bar .... ; where .... are other xts 2022-03-25 17:52:56 Yes. 2022-03-25 17:53:25 So it will search the word list and find that bar's CFA is pointed to by that entry. 2022-03-25 17:54:05 Or, actually, wait - it will just follow that pointer, know it's a cfa, and run nfa to get to the name. 2022-03-25 17:54:09 No search needed. 2022-03-25 17:54:30 when SEE baz is enumerating through the colon definition of baz it comes across the xt of bar, looks through the dictionary until it finds cfa=xt and prints that entry 2022-03-25 17:54:54 or that 2022-03-25 17:54:55 I don't think a search is needed in my system. 2022-03-25 17:55:13 The definition cell points at the header - the CFA field of the header. 2022-03-25 17:55:30 The name is right there immediately preceding it. 2022-03-25 17:56:04 Oh, by the way, for exactly this reason I decided not to get rid of the assembly level name strings of my "helper" words. 2022-03-25 17:56:06 note that I had to do the search in my port of eForth to a variant of dual stackmachine specified as a softcore 2022-03-25 17:56:18 They will be useful for SEE. 2022-03-25 17:56:33 And I'll still find them, even if the word is no longer in any vocabulary. 2022-03-25 17:57:47 So, my headers point to code and definition bodies, but the cells in the definition body point back to headers. 2022-03-25 17:58:29 so, even you moved the cfa and pfa around per above you could still get the assembly level name? 2022-03-25 17:58:51 Yes. Say I went to [cfa, pfa, link, name]. 2022-03-25 17:59:10 A reference to that word would point right at the first byte of that, where cfa is. 2022-03-25 17:59:24 So 10 + would get me to name. 2022-03-25 17:59:48 And I wouldn't need the msb of the count byte set high, either. 2022-03-25 18:00:03 Because I'd never find myself at the far end of name needing to scan back. 2022-03-25 18:00:33 So basically : see-cell (xt --) 10 + count type ; 2022-03-25 18:00:41 in the eForth port above I couldnt do that as the code was directly threaded 2022-03-25 18:01:06 Right - I was wondering if maybe direct threading would make that difference. Hadn't quite gotten it thought through yet. 2022-03-25 18:01:18 the xt on that arch basically a call to that address 2022-03-25 18:01:21 I'll add that to my list of advantages of indirect threading. :-) 2022-03-25 18:01:32 Yes - makes total sense. 2022-03-25 18:01:58 But, like I said, now I feel bad. I want [cfa,pfa,link,name] 2022-03-25 18:02:05 Soooo much better. 2022-03-25 18:02:33 But making that change strikes me as a fair bit of work at this point. I'm sure I'll think about it, though - maybe it's worth it. 2022-03-25 18:02:48 16 bit cell arch where you could not call into the first 16 addresses of memory 2022-03-25 18:05:06 Maybe it's not that hard. I could just copy my dev directory and spend a little time on it. I'd have to change my nasm macros that build my headers. And then maybe I'd just need to change the definitions of nfa, pfa, cfa, lfa, and create. 2022-03-25 18:05:34 If I've done a good job it would be just that easy - that's where that logic *should* be contained. 2022-03-25 18:07:55 I don't feel terribly confident that's what I'd find, though - very likely I've "cheated" here and there based on knowing my structure. 2022-03-25 18:08:08 And I'd have to chase all of those down. 2022-03-25 18:08:17 THAT is the part that could be potentially hard. 2022-03-25 18:08:30 I guess it would be nice to eliminate those things, though. 2022-03-25 19:17:37 KipIngram: Okay that's an interesting approach (re { and } explanation) 2022-03-25 19:18:30 I wonder if I can implement that in standard forth... hmmmm 2022-03-25 19:25:08 KipIngram: What gets printed here? : TEST 1 2 { 3 4 s1 ? } ; TEST 2022-03-25 19:25:52 Sorry, I mean : TEST 1 2 { 3 4 s1 ? 2 } ; TEST 2022-03-25 19:56:00 You probably can - what it does is pretty simple. You might have to use a variable to hold the frame pointer, though. 2022-03-25 19:56:07 So it would "work" but be less efficient. 2022-03-25 19:56:16 I felt it was worth a dedicated register. 2022-03-25 19:56:44 It really, really cuts through most "stack juggling" situations. 2022-03-25 19:56:51 You just don't need to do all those flips anymore. 2022-03-25 19:57:09 In purely standard FORTH I can't get the address of stuff on the stack 2022-03-25 19:57:15 But every normal forth allows that 2022-03-25 19:57:32 I *can* at least do something like s0@ etc 2022-03-25 19:57:55 s0! is not very efficient, s0@ is okay because I can implement with PICK 2022-03-25 19:58:30 But.... : TEST 1 2 { 3 4 s1 ? 2 } ; TEST 2022-03-25 19:58:35 What's printed? 2022-03-25 20:06:18 Right - the standard deliberately suppressed that kind of thing. 2022-03-25 20:06:59 What's the ? do? 2022-03-25 20:07:46 If it is the thing that prints (my word for that is . ) then the *address* of s1 (which is where the 1 is) would be printed. 2022-03-25 20:08:09 : test 1 2 { 3 4 s1 @ . 2 } ; test 2022-03-25 20:08:18 That would print 1, and leave the stack empty. 2022-03-25 20:08:52 From my system: 2022-03-25 20:08:55 : test 1 2 { 3 4 s1 @ . 2 } ; test 1 ok 2022-03-25 20:09:20 s0 @ . would print 2. 2022-03-25 20:09:45 I forget offhand how far down I provided words for. Up to s6 or so, I think. 2022-03-25 20:10:41 It was in writing QUERY that I most used it - I had implemented a command history, bash style, that I could access with the cursor keys. 2022-03-25 20:11:34 So QUERY calls EXPECT with that command history active, and so I have EXPECT's parameters - a buffer address and a count, plus I have where I am in the string, the cursor position on the screen, and also a pointer into the command history array, to deal with. 2022-03-25 20:11:51 I shudder to even thing of trying to make all that work with normal stack words. 2022-03-25 20:11:57 s/thing/think/ 2022-03-25 20:14:26 Oh, and also you have the last result you got from KEY. 2022-03-25 20:15:10 I haven't implemented command history yet on this new implementation. 2022-03-25 20:27:39 I think when I do it, like more or less everything string related, will be supported by that ropes machinery we discussed a week and a half or so ago. 2022-03-25 20:27:50 Because the command history is essentially an array of strings. 2022-03-25 20:28:24 There's one characteristic of the bash command history I "fixed" last time and will fix again, though - in bash if you do the right manipulations you can CHANGE the history. 2022-03-25 20:28:50 If you cursor up to an old command, change it, and ISSUE it, the old line remains unchanged, but you get a new command in the history. 2022-03-25 20:29:11 But if you cursor back, make a change, and then CURSOR AWAY from that changed line, the change persists in the history. 2022-03-25 20:29:21 I find that very bothersome - the "history" should be a HISTORY. 2022-03-25 20:29:53 Once a line is issued it should never change. Maybe you use it as the basis of a new line, but the old version should still be back there, untouched. 2022-03-25 20:30:09 So in my view, once you hit "enter" on a line, it becomes part of a read-only history. 2022-03-25 20:31:29 You should be free to use those lines in the construction of new lines, but you are building a *new line*. Not editing an old one.