2023-07-17 04:40:41 Good point about over == swap if you only have two items of interest (with circular stacks). And Chuck has always seemed very very into not having much on the stack. 2023-07-17 04:42:26 But I haven't noticed my code evolving in that direction. I've gotten so my definition length is pretty similar to what Jeff Fox said Chuck did (45-ish characters). But my stack usage hasn't moved that far down. 2023-07-17 04:47:21 It occurs to me that the circular stack is a bit like just using registers in a circular manner. The stack mechanism kind of does an "automatic renaming" for you, but you could just rotate your way through a register pool on a register machine. 2023-07-17 04:52:23 A limited stack size immediately improves performance on a hardware flexible design; it opens the door to hardware stacks that operate at register speed. 2023-07-17 04:55:04 Whereas the need to keep the stack pointer in a range would hurt performance a little on a software based system. 2023-07-17 05:49:52 It raises kind of an interesting issue. With a standard Forth system with normal stacks, we *must* keep our stacks balanced - if we don't there will be problems. So any code written that way will work fine on a system with circular stacks, so long as the stacks are "deep enough." So the question that comes up is "are we going to RELY on the fact that the stacks wrap or not?" Are we actually going to 2023-07-17 05:49:54 change how we right our code based on the fact that it's now ok to be sloppy? 2023-07-17 05:51:13 I imagine that nine times out of ten such "being sloppy" would involve abandoning deep stack items. I've worked that way - that's sort of how things worked out when I programmed my HP calculator in college. 2023-07-17 05:52:34 I'm fairly torn over the whole thing, given that implementing in on a software design will impose a performance cost. 2023-07-17 05:53:36 I guess the way to assess it would be to try to compare the benefit of being ABLE to abaondon deep items with impunity vs. the cost of the extra pointer maintenance required to circularlize. 2023-07-17 06:13:24 My design already has a register that points to the running thread's RAM block, so basically I'd just need to AND the stack pointer with a mask after incrementing or decrementing it to keep it in range, and then do a [reg+reg] style access instead of just a [reg] style access. Totally eliminates the possibility of using the built in push and pop instructions. 2023-07-17 06:13:54 The slight problem is that there are two stacks, and the other one would require [reg+reg+index], so those instructions are getting bulkier. 2023-07-17 06:19:03 Or I guess I could point the base register into the middle and let the two stacks grow away from it - use [regBase+regSP] and [regBase-regRP] 2023-07-17 06:25:07 Oh, no - [reg-reg] doesn't seem to be allowed. 2023-07-17 08:02:19 Maintain SP as regBase+index, don't do the addition on every access, do a subtraction on retrieving SP 2023-07-17 08:02:25 KipIngram: ^ 2023-07-17 08:03:43 Well, that's what I do now - my SP and RP are registers with actual addresses. But they're not circular stacks. The only way I see to circularize efficiently would be to make SP and RP offsets, which I then just mask into a small range so that the stack works circularly. 2023-07-17 08:09:39 I see what you're saying. Align them to a large value like 4096 and then clear the bits inbetween 4096 and whatever power of two your stack is 2023-07-17 08:10:03 it seems that forth briefly spent a day on hn's frontpage 2023-07-17 08:10:05 the sunday, even. 2023-07-17 08:10:08 thanks ratfactor. 2023-07-17 08:11:50 KipIngram: So for example if your stack is size 1024, align to 4096, and then clear the 2048 bit after each operation. Then you can store full address and have a cheap circular operation 2023-07-17 08:13:33 Actually that doesn't work but a modification does, although not sure it's worth it 2023-07-17 08:14:34 Align to 8192, and offset the stack by 4096 in that region, and set 4096 and clear 2048 after each operation 2023-07-17 08:15:25 This guards against the overflow and underflow situations, both a 1 and 0 is needed in the address to guarantee it doesn't hit more significant bits 2023-07-17 08:16:57 A more OS-oriented solution is write a signal handler for the bordering memory and adjust the pointer on each signal... but that would be incredibly slow sometimes. 2023-07-17 08:22:59 I think the best option here, assuming it's x86, is to keep the stack pointer and return pointer in low byte registers, and leave the rest of the register zero'd 2023-07-17 08:23:18 Then you have size 256 stacks 2023-07-17 08:24:28 Sure, that works going up in address. What about coming down? 2023-07-17 08:24:44 All of the higher order bits would need to be modified. 2023-07-17 08:24:50 I mean, t his can of course be done. 2023-07-17 08:25:06 I could just REPLACE all of the higher order bits each time. 2023-07-17 08:25:18 But it begins to involve several instructions to do that. 2023-07-17 08:25:27 I'm not struggling with whether it's *possible* or not. 2023-07-17 08:26:00 With the [reg+reg] approach I can maintain the pointers with one extra instruction. 2023-07-17 08:26:12 Just an AND after each increment/decrement. 2023-07-17 08:26:14 KipIngram: "What about coming down" did you see the errata I gave? 2023-07-17 08:26:37 "Align to 8192..." 2023-07-17 08:28:01 Yes, but you also noted that it's multiple instructions. You have to set and clear bits. 2023-07-17 08:28:07 Also, as I said, if you use a byte register as your index then no extra instruction is needed (although it does have the index addition to do) 2023-07-17 08:28:11 I think we're really seeing the same thing. 2023-07-17 08:28:21 I'm just chafing over too much "extra activity." 2023-07-17 08:28:34 Oh. 2023-07-17 08:28:37 Unfortunately I think this needs to be profiled 2023-07-17 08:28:37 Can you do that? 2023-07-17 08:28:41 Hang on a minute. 2023-07-17 08:28:48 Yes if you leave the higher parts zero'd 2023-07-17 08:28:54 I'm assuming x86 2023-07-17 08:31:47 KipIngram: If your stack is within 256 bytes, then you could just increment/decrement the low byte of the address 2023-07-17 08:32:09 But then I'm guessing you're down to 32 stack items, hopefully that's enough 2023-07-17 08:33:31 The [reg+reg] form seems to require that both regs be full 64-bit regs. 2023-07-17 08:34:54 So [regBase+regPointer] works fine, and it's one instruction to maintain regPointer. But that only gets me one of the two stacks. 2023-07-17 08:35:10 Assuming I only want to use one regBase. 2023-07-17 08:36:26 And yes, a byte offset is adequate; I figure if I do this at all it will be with 32-element stacks, so that's 256 bytes. 2023-07-17 08:37:07 I think it would be fairly difficult to really measure these things, because the BENEFIT of the circular stack is in a whole coding style change. 2023-07-17 08:37:26 I have no idea how much code it would save me over time to be able to just forget about items on the stack. 2023-07-17 08:37:39 I'd have to write a bunch of code that way before I'd have a feel for it. 2023-07-17 08:38:10 On the other hand, this is pretty obviously a win if you're designing a processor. 2023-07-17 08:38:19 Practically all upside. 2023-07-17 08:42:15 KipIngram: When I say use byte reg, I mean set e.g. RAX to the 256-byte aligned address, and then modify AL. So you're using RAX/EAX in address ops, but AL in stack push/pop 2023-07-17 08:42:29 For address in reg, if you want index in reg then set RAX to 0 2023-07-17 08:43:43 256-byte aligned address avoids the index addition but you're limited to size 32 stacks 2023-07-17 08:46:35 Oh, gotcha. Yes, that's a good idea. 2023-07-17 08:46:53 That's a very good idea - it eliminates the masking. Thanks. 2023-07-17 08:46:57 I like that a lot. 2023-07-17 08:49:26 That actually lets one register do the whole job. 2023-07-17 08:49:40 Don't need [reg+reg] anymore. 2023-07-17 08:50:02 That pretty much puts to bed all of my concerns. 2023-07-17 08:50:03 Wouldn't accept anything less :) 2023-07-17 08:50:27 Wouldn't ACCEPT but instead we EXPECT 2023-07-17 08:50:27 Heh heh. 2023-07-17 08:50:39 Thanks again - I'm happy now. 2023-07-17 08:51:07 I think that tells me what I'll be doing with rbx and rdx. 2023-07-17 08:51:38 Most of the registers don't have byte access, right? 2023-07-17 08:51:46 No I think they all do 2023-07-17 08:52:12 Except maybe rbp et al 2023-07-17 08:52:54 What's the syntax for the low byte of, say, r15? 2023-07-17 08:53:41 Looks like it's just r15l 2023-07-17 08:53:57 Oh, no - that's not right. 2023-07-17 08:54:04 It interpreted r15l as a variable name. 2023-07-17 08:54:09 defuse.ca 2023-07-17 08:55:09 Looks like r15d is the low 32 bits. 2023-07-17 08:55:16 But when you set those it clears the upper 32 bits. 2023-07-17 09:02:07 So, this would still let me write threads with smaller stacks, but I'd have to make sure that such code kept the stacks balanced. 2023-07-17 09:34:05 KipIngram: r15b I think 2023-07-17 09:34:13 Sorry probably not helpful now :) 2023-07-17 09:34:33 Yes, that looks right. 2023-07-17 09:35:31 I'm just extremely pleased with that. Do we know for sure that manipulating r15b leaves the high 56 bits alone? 2023-07-17 09:35:40 I know it does on rax..rdx 2023-07-17 09:57:09 Yes we know for sure 2023-07-17 09:57:25 All ops affect only the relevant part... except 32-bit ops in x86-64, which clear the high part 2023-07-17 13:10:25 I may be neglecting to consider something, but I think that with this F18A based design I won't need to allocate a register for designating the "system base address." 2023-07-17 13:11:13 I think I maybe be able to do everything pretty efficiently using IP-relative addressing. 2023-07-17 13:11:49 you know that you got only 64 cells of ram per nofe, yes? 2023-07-17 13:12:03 unless you are using an external sram 2023-07-17 13:12:27 Oh, I did think of something missing in that instruction set that will quickly take up a good many of my extra 32 instruction codes. The F18A has no sub-cell RAM access. Bytes, words, half cells, etc. 2023-07-17 13:13:05 Zarutian_iPad: I'm talking about a software system that is *based on* the F18A's instruction encoding, not an actual design using the product. 2023-07-17 13:40:18 https://duskos.org 2023-07-17 13:42:41 Ah, a CollapseOS sibling. 2023-07-17 14:20:32 The "Who is Dusk for" section - that totally resonates with me. Just to be able to sit down at your computer and poke around at the hardware without having to be some kind of a flipping genius - that's wonderful. 2023-07-17 14:20:58 The words they have in that section just really hit the nail on the head. 2023-07-17 14:21:15 The whole 'prepare for civilisational collapse' thing is silly 2023-07-17 14:40:49 it does prey upon the collective desire of programmers to go back to a simpler time 2023-07-17 14:41:58 The ability to "hack around" with systems in that way, though, is something I think it's important to preserve. 2023-07-17 14:43:37 But I do agree stressing over the end of civilization isn't very useful. 2023-07-17 14:44:03 it looks stupid, tbh 2023-07-17 14:44:14 why the fuck are you compiling it from scratch on every boot? 2023-07-17 14:44:34 just compile it once, and then write out the binary, and load that in 2023-07-17 14:44:40 you don't need to do it on every boot 2023-07-17 14:44:42 that'll take ages 2023-07-17 14:44:58 that's not even the most utterly fucktarded part of the whole sorry mess 2023-07-17 15:10:27 i mean, the post explains the logic of why 2023-07-17 15:31:42 KipIngram: What you said is basically the whole point of minimal systems, in my opinion. It makes the barrier for entry lower, more maintainable, etc 2023-07-17 15:32:18 But the collapse stuff is a bit of fun, it's not useful 2023-07-17 15:41:05 it is a interesting constraint 2023-07-17 15:41:19 given the whole "stages of collapse" thing 2023-07-17 15:57:00 It's artistic, I mean a lot of Forth stuff is very much an art 2023-07-17 15:57:23 constraints breed creativity. 2023-07-17 15:57:26 It's highly creative and aesthetic work, and constraints often enhance that 2023-07-17 15:57:27 Yeah 2023-07-17 15:58:31 Nobody can take that away from it. When all is said and done, art is more of what makes us what we are than the ever marching progress of engineering. When engineering meets art it's dear to me, certainly 2023-07-17 16:59:57 gordonjcp: That's part of Forth's philosophy. You don't think in term of stored binaries - you think it terms of stored *source*. 2023-07-17 17:00:10 And it compiles super fast, so it's hardly even noticeable. 2023-07-17 17:00:23 The idea there is it's easy to CHANGE source; not so easy to change binaries. 2023-07-17 17:00:30 It's intended to be a "hands on" system. 2023-07-17 17:04:19 veltas: Totally agree re: art. 2023-07-17 17:04:25 Forth is like art. 2023-07-17 17:12:49 gordanjcp: I'm not sure you realize how fast Forth can compile. 2023-07-17 17:13:34 A long time ago - like 15 years or more - Jeff Fox reported one of Chuck's systems compiling source code at rates over 100 MB/s. 2023-07-17 17:14:01 Of course, he had pulled out all of the stops on that one, to push his compile speed as far as he could. 2023-07-17 17:14:32 I don't know for sure, but I'm guessing it was when he was doing his chip design system and I suspect he re-compiled the entire system every time he changed the source. 2023-07-17 17:14:50 It's the only reason I can think of for him to be so wound up over compile speed. 2023-07-17 17:31:00 I want to turn your own words around - IF you could recompile the whole system without even noticing the time that it takes to do that, wouldn't it be more "simple" to just do that and dispense with the need for an intermediate storage format? 2023-07-17 17:31:36 Dusk OS is intended to be a system that's extended and changed, so you have to have the source. 2023-07-17 17:31:59 But the only reason to have an intermediate binary format is if it buys you something. 2023-07-17 17:33:00 The only other argument I can think of in favor of a binary format is "Well, it's how everyone does it..." 2023-07-17 17:34:24 And of course the other reason it's done that way is because people don't want you to HAVE the source. But that reason doesn't apply in this case. 2023-07-17 17:34:35 depends if that binary format is actually machine executable or not 2023-07-17 17:34:51 Good point - often they're not. 2023-07-17 17:35:10 Even then, though, if you truly can't notice the compile time then I question the need to avoid it. 2023-07-17 17:35:34 "Edit, build, run" vs. "Edit, run." Second one looks simpler to me. 2023-07-17 17:36:05 never got why some peeps insist on a 'builld' step 2023-07-17 17:36:33 spefically in js land 2023-07-17 23:24:18 does mecrisp run on the blues swan STM32L4R5 feather board? 2023-07-17 23:48:14 Zarutian_iPad: I think by and large it's just because it's what people are used to. It's "always been there" so to speak, in most people's experience. 2023-07-17 23:51:20 And let's face it - the way the bulk of the software infrastructure works, where compiles are NOT fast, it's necessary for a good user experience. 2023-07-17 23:52:14 To seriously consider dropping it, you need two things. First, you need to not be worried about sharing your source code, and second, you need for it to be fast enough to not muck the experience.