2022-11-23 08:13:58 Does anyone know if the standard x86 FPU has any notion of "knowing how full it is"? If i recall correctly it's eight deep - can it be set up to detect when it's "full" and throw some signal on overflow that could be caught so the otherwise lost data can be spilled to RAM? 2022-11-23 08:14:27 I guess it wouldn't be terribly expensive to keep up with how full it is myself. 2022-11-23 08:14:59 Even better would be if it knows how to spill and recover content itself, using some internal hardware. 2022-11-23 08:15:43 But you'd only want that behavior when it actually overflowed. 2022-11-23 08:16:46 I've always wrestled with whether to let floating point values just live in the FPU standard, or whethr to keep them on the regular data stack. 2022-11-23 08:43:04 KipIngram: I think there are several floating point schemes in x86 processors these days. As I understand it, it doesn't work like the old x87 these days 2022-11-23 08:49:11 Yeah, that's how it looks. 2022-11-23 08:49:48 Guess I'll try to feel charitable toward them for not feelng trapped by backwards compatibility. Always a tough decision. 2022-11-23 08:50:40 It does look like you can mark individual registers empty - including ones in the "middle" of the stack. 2022-11-23 08:51:28 I haven't found all the details yet, but I guess if I marked, say, the fourth register down empty and then loaded a float, I'd want the top three to push up and take up that vacancy, but the ones beyond to stay put. 2022-11-23 08:51:44 No point keeping an empty value around - it's not equivalent to ero. 2022-11-23 08:52:35 But at any rate, it knows the empty/full status of each register, so it certainly is *positioned* to know if it's about to lose one off the bottom. 2022-11-23 08:52:48 It would be frienly to throw a signal on that event. 2022-11-23 08:53:35 If it also threw one whenever the deepest reg was full before you pop the stack, then you could do the rest with signal handlers. 2022-11-23 08:54:49 Ideally you want that second signal only if there was spilled data to return to that empty slot, but that would require it keep up with more - the other would actually be "close enough" and would use only resources I already see it has. 2022-11-23 08:55:52 Of course, signal handling is a fair bit of overhead - it might be higher performance just to always run code to check whether you needed to handle spill. 2022-11-23 08:55:59 My guess is that would be faster. 2022-11-23 08:57:17 For that matter, the fastest might be to always treat it as though it's full and always spill/recover spill, provided it doesn't throw an exception if you try to save an empty register. 2022-11-23 08:58:54 Guess it would really depend on how often your calculations filled it up. 2022-11-23 08:59:06 If it's rare... then the signals might be the way to go. 2022-11-23 08:59:22 I did an awful lot on my four-deep stack calculator in college. 2022-11-23 08:59:39 So eight might be a wealth of resources. 2022-11-23 09:00:38 This is a fairly nice coverage of the instructions: 2022-11-23 09:00:40 https://redirect.cs.umbc.edu/courses/undergraduate/CMSC313/spring05/burt_katz/lectures/Lect12/floatingpoint.html 2022-11-23 09:02:42 One lesson I had to learn the hard way a few years ago is that a couple of those more sophisticated instructions require that the input value e between 0 and 1 (or between 1 and 2 - can't remember for sure right now). As far as I could tell if it's not, the instruction just does nothing. And that was an instruction that I decided I needed for input conversion of floats. 2022-11-23 12:29:04 Oh, hmmm. Looks like the fpu as I've referred to it above isn't actually used much anymore. Compilers don't emit those instructions unless you force them to, for example. 2022-11-23 12:29:20 Looks like these days it's SSE/SSE2 and AVX. 2022-11-23 12:29:54 So I should be looking into that instead of the above-mentioned topics. 2022-11-23 12:46:13 It looks like none of those extensions are "stack oriented." 2022-11-23 12:47:13 Nor does it look like there's any good way to treat the vector storage facilities in a stack-like way. The register values are hard coded into the instructions that use them. 2022-11-23 12:47:36 No way to move a "top of stack" around in any sort of acceptably elegant way. 2022-11-23 13:18:14 So, aside from just designating one of them (or a couple) TOS and 2OS for a floating point stack, I don't see any way to treat them as anything other than just "processor registers." 2022-11-23 13:32:33 Looks like my processor supports up to AVX2. No AVX512. 2022-11-23 13:32:52 Though I saw hints online that they may be scrubbing AVX512 anyway. 2022-11-23 13:35:37 Hmmm. Maybe they're only scrubbing it on Alder Lake processors. 2022-11-23 13:35:54 Apparently those have two different core types, and only one of them supported AVX512 anyway. 2022-11-23 13:36:17 Looks like they had trouble with the instruction scheduling in that situation. 2022-11-23 13:45:32 Yes - looks like it's still supported in higher end processors. 2022-11-23 13:45:49 Sounds a bit like Porsche not offering a turbo option on the Cayman line. 2022-11-23 13:46:03 from what little I know the AVX situation is quite a mess 2022-11-23 13:46:17 Rumor has it they're worried that the Cayman would nip at the heels of the 911 line (their flagship), and Cayman is mid-engine to boot. 2022-11-23 13:46:29 it's not AVX512 or no AVX512 but instead some features of AVX512 on some processors and not on others 2022-11-23 13:46:39 I had an aftermarket turbo put on my Cayman S, and I *absolutely* *love* *it*. 2022-11-23 13:46:42 and the ones that have more features are not necessarily supersets of the ones that have less 2022-11-23 13:47:09 Yeah, the small bit of reading I just did made it sound like something of a clusterfuck. 2022-11-23 13:47:26 but as far as being register based now, you still have access to all the old strategies of trying to make forth fast on a register machine 2022-11-23 13:47:37 Sure. 2022-11-23 13:47:49 or at least faster if not fast 2022-11-23 13:48:16 Right. I figure you'd cache at least TOS, and given the vector possibilities it might make sense in this case to cache more than that. 2022-11-23 13:49:22 ya and you might be able to get even more performance if you considered *gasp* static analysis 2022-11-23 13:49:35 What made me dig into AVX512, though, is that it supports gather loads and scatter stores; the lower extensions support only gather loads. 2022-11-23 13:49:44 :-) 2022-11-23 13:50:11 I tendd to do a little of that anyway. I tend to think about problems for a while before solving them - I think I wind up with better solutions that way. 2022-11-23 13:50:41 So if I were planning some deep loop number crunching, I'd look at it hard to work out the best way of doing it. 2022-11-23 13:51:21 This thing I'm doing for work right now, to control the drive testing, got a lot simpler on thinking about it for a few days. 2022-11-23 13:51:28 what I mean is having the forth compiler work out the best way 2022-11-23 13:51:35 thats starting to sound too much like C though :P 2022-11-23 13:51:41 The very first mental picture I formed was quite a bit more involved than what I'm actually writing. 2022-11-23 13:51:57 Oh, I see. 2022-11-23 13:51:59 :-) 2022-11-23 13:52:20 That comes up a lot with Forth. Smart programmer or smart compiler? 2022-11-23 14:10:32 That sort of "enhancement" can rapidly become a really deep rabbit hole, and you don't have a "simple" compiler anymore. 2022-11-23 14:10:57 Which shouldn't stop someone from doing it if they find it fun. 2022-11-23 14:12:43 I think it's more so because making a working forth is much, much easier than oprimization 2022-11-23 14:12:55 even simple optimizations can bring a huge improvement in performance 2022-11-23 14:13:02 mecrisp for example is awesome 2022-11-23 14:13:23 but not everyone is trying to program tiny embedded chips so optimizing is a waste of effort for a lot of projects 2022-11-23 14:37:49 Right. 2022-11-23 14:38:14 That's really what I meant by "rabbit hole." There are some hard tasks involved with doing that kind of otpimization WELL. 2022-11-23 14:38:40 And you ideally need to be able to see things about the structure of your algorithm, and Forth really doesn't tell you a whole lot by default. 2022-11-23 14:38:50 You don't get any parse trees, etc. etc. 2022-11-23 14:40:13 The optimizer would need to know what the words MEAN to even follow the data around if you moved it amongst variables. 2022-11-23 15:01:04 not really. look at mecrisp. it does a great job and doesnt need that much to go off of 2022-11-23 15:01:54 I think some parts should be fairly straightforward. Especially peephole stuff. 2022-11-23 15:01:55 it's still 2-3x slower than C so there is room to improve if you could get at some of the info forth doesnt convey but just doing simple optimizing is a huge win 2022-11-23 15:02:13 DUP 5 + ROT 2 AND type stuff could easily be 10x faster 2022-11-23 15:02:42 I mean, just considering two primitives adjacent to one another probably offers opportunity for improvement. 2022-11-23 15:03:48 a little probably but apparently peepholing isnt that big of a win 2022-11-23 15:04:53 I just didn't mean to be sticking a stake in the ground claiming "you can't do anything without great difficulty." 2022-11-23 15:05:08 You can keep on making it more involved, though - rabbit hole. 2022-11-23 15:07:34 I only see two ways to approach it. You'd either have to have some formal system knowledge of word functionality, and look for ways to cut out waste at that level, or else you'd need to look at the generated code (and then you'd probably need to know what the instructions did). 2022-11-23 15:08:00 Seems to me like you'll need some kind of data structure, on either words or instruction sor both, defining their functional effects. 2022-11-23 15:08:16 Otherwise it's just a group of arbitrary symbols, and you have no idea what they even do. 2022-11-23 15:08:38 Knowledge has to be baked in. 2022-11-23 15:08:41 effect notation gets you halfway there, but you have no way of communicating the intended data flow or even the identities of your arguments. 2022-11-23 15:09:00 And that's just something that Forth itself doesn't require. 2022-11-23 15:09:41 Yes, those traditional effects comments would help, if you've been careful to get them right on every word. 2022-11-23 15:10:08 either that or you derive them from primitives. 2022-11-23 15:10:22 Right, in which case you have to know what the instructions do. 2022-11-23 15:10:34 that's an easier thing to do, though. 2022-11-23 15:10:45 But at least that's something the processor manufacturer could in theory provide. 2022-11-23 15:11:07 Well, I guess the do - I don't know if it's intended to be rigorous enough to be machine readable or not. 2022-11-23 15:11:12 if you have a word that uses primitives, and you know the effects of those primitives, you (should) know the effects of the word. 2022-11-23 15:11:21 Indeed. 2022-11-23 15:11:28 the problem is when you get into unbounded loops. 2022-11-23 15:11:37 and conditional results. 2022-11-23 15:11:42 Right. 2022-11-23 15:12:03 you need a programming language to verify/optimize the programming language. 2022-11-23 15:12:09 It helps that we're talking here about an option functionality - it could just leave alone cases it couldn't cope with. 2022-11-23 15:12:20 Take the low-hanging fruit. 2022-11-23 15:12:22 KipIngram: Yeah you're right, direct threading wouldn't work, would have to be indirect 2022-11-23 15:12:25 yeah. 2022-11-23 15:13:00 Maybe if you were obsessed with having something you could call direct threaded, you could hack something up. But it looks like it'd be ugly. 2022-11-23 15:13:09 No it's just not a thing 2022-11-23 15:13:10 And you wouldn't really have bought anything. 2022-11-23 15:13:21 how do you even codify ?dup into an effect. 2022-11-23 15:14:11 dont use ?dup 2022-11-23 15:14:29 ( 0 -- 0 | x -- x x ) 2022-11-23 15:14:44 Go left to right until you match. 2022-11-23 15:14:55 But yeah - you won't know what the value will be at compile time. 2022-11-23 15:15:20 eris: Yeah - I've gravitated away from ?dup. 2022-11-23 15:15:39 eris[m]: the same goes for any conditional word. 2022-11-23 15:16:03 KipIngram: I guess you could just simulate both cases. 2022-11-23 15:16:08 KipIngram: sorry, didnt mean to sound argumentative :) 2022-11-23 15:16:09 decision points. 2022-11-23 15:16:27 ive been thinking about optimization a lot. ill come back when I have finished my toy optimizer to show what I mean 2022-11-23 15:17:24 I was worried I sounded that way. 2022-11-23 15:17:31 So, it's cool. 2022-11-23 15:18:15 Or at least that I hadn't made my meaning very clear. 2022-11-23 15:20:15 I actually really enjoy little "formal things" like optimization. My problem is that I usually get myself in trouble with them - I start with sme small little thing, and then get tempted down the primrose path, and months later I suddenly realize it's gotten all out of hand. 2022-11-23 15:20:30 My usual response to that is to just start over. 2022-11-23 15:20:43 And make some different decisions. 2022-11-23 15:20:59 MrMobius: nice 2022-11-23 15:39:01 I got tco for free in node xD 2022-11-23 15:39:29 it seems my weird fake compiling method let's node perform tco on those compiled recursive words 2022-11-23 15:40:10 aa word ends being a closure tied to a list of functions and just executes them all 2022-11-23 15:40:43 a recursive word will get a function that uses the later compiled value of that word, and seems node is able to perform tco on that 2022-11-23 15:41:08 it was unexpected, but cool 2022-11-23 15:42:00 with a real forth there's no need for tco I assume 2022-11-23 15:49:04 depends on what you mean by tail calls. 2022-11-23 15:49:36 I don't see many cases of recursion in forth. 2022-11-23 15:49:44 it's a lot of iteration. 2022-11-23 15:50:44 tail call isnt a recursion operation 2022-11-23 15:51:15 its an optimisation for any function that returns the result of a function call 2022-11-23 15:51:27 yes, I am intimately familiar with the idea of a tail call. 2022-11-23 15:51:51 in forth, you can optimise word ; into a jump, rather than a call 2022-11-23 15:52:08 machineforth does this, but it also includes more focus on recursion 2022-11-23 15:52:17 recursive calls are one of the more common cases. optimizations routinely focus on common cases. 2022-11-23 15:52:40 yes 2022-11-23 15:52:44 don't "um acksually" people. 2022-11-23 15:52:55 in forth, however, recursion is not the common case 2022-11-23 15:53:04 for a tail call optimisation 2022-11-23 15:53:09 yes. which is what I said. 2022-11-23 15:55:06 kip has a whole set of words for it 2022-11-23 15:55:07 i had two 2022-11-23 15:55:07 i find that implicit recursion is a nice way to phrase things in forth 2022-11-23 17:14:04 Those words just make it easy and clean to do conditional recursion. Nothing you couldn't do in a standard system - it would just be "wordier." 2022-11-23 17:15:12 And a little slower. 2022-11-23 17:15:43 I guess they're entirely analagous to 2022-11-23 17:16:03 IF THEN 2022-11-23 17:16:35 Except these are primitives. 2022-11-23 17:17:45 It probably would have been more readable for others if I'd use the word SELF in the names of those words, but I couldn't resist saving the two characters and going with ME 2022-11-23 17:27:21 decay: I tend to think of it as looping back to the beginning of the word, rather than as "recursion." I guess they're only the same, though, if nothing but return follows the recursive call. If any code at all comes after, then it has to be a call and not a jump and it's definitely recursion. 2022-11-23 17:27:38 Almost all of my use cases are not in that category, and it can be thought of as looping. 2022-11-23 17:27:47 Or, as you say, iteration. 2022-11-23 17:29:11 because the return and data stacks are decoupled, there's little benefit to recursion because you can just replace it with conditional iteration at the top level. 2022-11-23 17:30:09 for a lot of cases. 2022-11-23 17:32:13 That makes sense. 2022-11-23 19:33:39 So, earlier I talked about "slippery sloping" into excessive complexity in my Forth implementations. These days I mostly try to keep it completely basic and standard, with the idea that I can always add those things later as extensions if I want to. Where I can think of ways to do it I try to write things in a way that will make such extensions easy. 2022-11-23 20:17:07 You know, something just occurred to me. The "canonical" Forth interpreter/compiler grabs the next word from the input stream, searches for it, and then has a conditional test there in that code that will attempt to convert it to a number if the search failed. Only if that fails does it go through an error handler. 2022-11-23 20:17:52 Why do we do it that way? Why don't we just have a "not found" option that we jump to on failed searches, and have it do the error conversion attempt? 2022-11-23 20:18:09 I mean, I know it's the same process, but it would remove that conditional function from the interpreter loop. 2022-11-23 20:18:23 conditional statement, I mean. 2022-11-23 20:20:32 I guess it's not quite that easy. My FIND doesn't have the variable return format that the traditional one has. It returns CFA for found words and 0 for not found words, and I can't just "execute" zero. 2022-11-23 20:21:05 So I guess the two cases aren't really "the same" operationally. 2022-11-23 20:21:17 Which implies the need for a test and decision.