2023-05-06 03:22:31 i found an interesting forth: https://dacvs.neocities.org/SF/ 2023-05-06 03:23:10 it mixes raw amd64 machine code in with a subroutine threaded forth 2023-05-06 03:23:29 it proudly states that it doesn't even use assembler :-) 2023-05-06 04:00:16 dave0: Hehe. Definitely a gem. By sheer coincidence, I ran across that same site a few days ago after going through jonesforth. Have you watched any of the videos in the youtube channel pointed at by the comments? Supposedly they're really good. 2023-05-06 04:01:07 xelxebar: not seen the videos but have read jonesforth .. it is very good 2023-05-06 04:01:19 xelxebar: someone updated jonesforth to amd64 2023-05-06 04:05:42 i just googled and there was a couple of different jonesforths 2023-05-06 04:05:45 dave0: That was posted on here the other day lol, yeah it's interesting, his channel has some good stuff indeed 2023-05-06 04:06:15 I'm fed up with these forth2020 videos in my feed, they're absolutely rubbish 2023-05-06 04:06:29 veltas: oh cool 2023-05-06 04:06:44 veltas: except raw machine code is a bit excessive ;-) 2023-05-06 04:06:51 Yes I agree 2023-05-06 04:07:08 There's no point, with assembly if I didn't have an assembler I could technically convert to machine code manually if ever required 2023-05-06 04:07:21 So in my opinion there's no dependency simplification writing it raw 2023-05-06 04:07:50 But obviously they enjoy stuff like that, most of what I do in Forth is probably pointless anyway 2023-05-06 04:07:53 To each their own 2023-05-06 04:25:33 veltas: It's indeed a little funny to avoid an assembler but still target the linux syscall ABI. However, it's somewhat in line with software bootstrapping projects like stage0, which also begins with a small hand-assembled binary. 2023-05-06 04:26:23 That said, I also find smithforth extremely delightful. Reading ref.x86asm.net from time to time is one of my guilty pleasures :P 2023-05-06 04:26:57 Just write and record assembly code, and keep a listing printout handy for the apocalypse 2023-05-06 04:27:07 And if you have to write it again write the assembly and hand-assemble it after 2023-05-06 04:28:43 I've finished writing the initial code for ilo amd64, have started debugging it 2023-05-06 04:33:27 Cutting out the linker is probably the bigger dependency-reduction win. 2023-05-06 04:34:27 That said, there are lots of architectural details apparent in the intstruction encodings that are almost impossible to notice if you just use the mnemonics. 2023-05-06 04:34:58 Linking static code manually is trivial 2023-05-06 04:35:59 The mapping from assebly to machine code is not really 1-1. And mapping object file to elf executable is also full of lots of decisions. 2023-05-06 04:36:22 You can make it easy for yourself though 2023-05-06 04:37:07 We won't retain x86 in this kind of scenario anyway 2023-05-06 04:37:31 Would end up with something like 6502, that's got to be one of the easiest processors to recreate 2023-05-06 04:37:40 I'd argue that hand-coding elf files *is* making it easier for oneself when your goal is to grok x86. 2023-05-06 04:38:20 If you have other goals, like preparing for societal collapse, then sure, maybe there's a motivation misalignment :P 2023-05-06 04:38:26 x86 isn't *that* hard to understand, Intel just make a mountain out of a molehill explaining it 2023-05-06 04:42:15 IMHO, the interesting parts are in the microarchitectural features. Predicting execution times of code blocks takes quite a lot of familiarity with the chip and how x86 translates to the its semantics. 2023-05-06 04:58:02 Yeah when I say not hard to understand I just mean translating to machine code, of course it's a very complicated beast under the hood 2023-05-06 07:33:25 The thing is that you only need a small part of the x86 instruction set to do most things. It's the whole thing that's complicated. The stuff you actually need you can see a sensible encoding for. 2023-05-06 07:33:55 I was able to write 2-3 blocks of code that did a decent job encoding all the various moves, arithmetic, and logic instructions, and that's most of what you need. 2023-05-06 07:34:43 They just kept adding new features over hte years and it got more and more involved. 2023-05-06 07:35:12 I.e., 8086 instruction set is enough to create a Forth. 2023-05-06 07:35:28 And even it has excess. 2023-05-06 07:35:55 The extension of that subset to 64-bit is fairly straightforward to. 2023-05-06 07:36:10 Yeah agreed, a subset of x86 is appropriate and easy to write an assembler for 2023-05-06 07:36:27 This site here is enough to figure it all out: 2023-05-06 07:36:33 https://defuse.ca/online-x86-assembler.htm#disassembly 2023-05-06 07:37:21 You wind up with a set of constants for the registers, a set for operations, and then you do a reasonable amount of diddling with those to make the instructions. 2023-05-06 07:38:20 Oh, that's what was missing fom my "list of current interests" last night. 2023-05-06 07:38:26 Meta compilation. 2023-05-06 07:39:32 On that front I was very interested in Liz Rather's remarks that separating interpret and compile into separate words makes that cleaner and easier. 2023-05-06 07:40:11 The traditional approach crams them into one bit of code with some conditionals. 2023-05-06 07:41:18 A Rather remarkable statement 2023-05-06 07:41:32 Chuck wrote a system at some point that did it the separated way - I forget it's name at the moment, but at one point in time he identified it as the one he regarded as his "best one," at least up to that time. 2023-05-06 07:42:05 :-) 2023-05-06 07:43:26 The comments of hers I've seen so far all seem fairl sensible - she seemed to know what she was talking about. 2023-05-06 07:44:14 In the same remark where she complimented that approach (separating interpret and compile), she acknowledged that Forth Inc. didn't do it that way, because it was non-ANS. 2023-05-06 07:44:57 That seemed to me as an odd level for a standard to address. 2023-05-06 07:45:35 I'm not 100% on what the difference is other than it probably affects how you can portably declare STATE-aware words 2023-05-06 07:45:48 Otherwise you have no way of doing this 2023-05-06 07:45:59 Although I don't really like STATE-aware words at all, I don't really see the point 2023-05-06 07:46:19 And they have some serious caveats 2023-05-06 07:46:20 I think you don't have STATE if you do it that way. 2023-05-06 07:46:30 Yeah I was thinking that 2023-05-06 07:47:29 : and ] just cause you to call the compiler. 2023-05-06 07:47:50 ; and [ exit from it. 2023-05-06 07:48:12 Yeah 2023-05-06 07:48:27 It's the 'obvious' way of implementing it and I honestly had assumed before that's how it was done traditionally 2023-05-06 07:48:39 But what you're saying implies this wasn't the classic approach for some reason 2023-05-06 07:48:48 And meta-compilation just adds a bunch more STATE checking, which is avoided in this separated approach. 2023-05-06 07:48:49 I mean maybe it resulted in less code footprint using STATE 2023-05-06 07:49:07 Yes, I think that is likely true. 2023-05-06 07:49:11 Lots of older code prefers globals because they were lightweight 2023-05-06 07:49:13 I was thinking that too a few minutes ago. 2023-05-06 07:49:23 Weird to hear now because these days it's heavier to use globals everywhere 2023-05-06 07:50:02 Yeah, a lot has changed. I suspect I still have failed to absorb all of the ramifications of those changes. 2023-05-06 07:50:28 Consider my remark a couple of days ago where I declared that that "ret based next" direct threading model was probably the fastest way to do it. 2023-05-06 07:50:42 Someone pointed out that it might confuse branch prediction. 2023-05-06 07:50:48 Yeah that was me 2023-05-06 07:50:51 I hadn't even thought of that - I was still thinking "old school." 2023-05-06 07:50:55 Yup 2023-05-06 07:51:20 so I'm aware of a lot of the impact of new things, but have missed some of it. 2023-05-06 07:51:23 x86 arch has been optimised by Intel for years on conventional C generated machine code 2023-05-06 07:51:30 Right. 2023-05-06 07:51:48 So if your code looks like a C compiler's output it's probably more optimised than if it's a bit shorter but really funky 2023-05-06 07:52:24 Not that I care, I enjoy writing funky code, but will bear this in mind in hot sections of assembly code 2023-05-06 07:53:04 I have a natural tendency to equate "fewer instructions" / "fewer bytes" with speed, and that's a risky assumption. 2023-05-06 07:53:43 Yeah push and pop are single bytes, but I would guess they're often slower than MOV'ing to a register 2023-05-06 07:53:53 Not that I've actually benchmarked this 2023-05-06 07:54:15 But if that shorter code fits something into less cache lines... then maybe it will actually be faster 2023-05-06 07:54:24 I didn't really "catch on" to the performance impact of cache line sharing across threads until may be five years ago. 2023-05-06 07:54:39 The golden rule of optimisation is to guide it with actual profiling 2023-05-06 07:54:50 Yeah, there's really no other way. 2023-05-06 07:55:03 Unless you're just super super familiar with that kind of stuff. 2023-05-06 07:55:16 Well some processors you can exactly calculate timing 2023-05-06 07:55:22 And even if you are it's because you've done exactly that a lot in the past. 2023-05-06 07:55:25 But not modern desktop/server CPU's 2023-05-06 07:55:41 Yeah. I did a lot of such calculating back in the day. 2023-05-06 07:55:51 Even when you can calculate CPU timing on embedded systems it's harder to factor in I/O speed 2023-05-06 07:56:04 So if it involves more than just CPU work you can't 2023-05-06 07:56:42 It was just an extension of calculating propagation delay timing in digital circuits. 2023-05-06 08:00:52 So, I usually terminate primitives with a next that is a jmp instruction - I keep a register pointing at the actual code for next. Then next does a little calculating and winds up doing an indirect jump through some calculated location. So that jump - that calculated jump at the end of next. How is that jump going to interact with branch prediction? It's "the same jump," residing at the same 2023-05-06 08:00:54 location, over and over, but it goes to altogether different places every time. 2023-05-06 08:01:41 It's either as fast as a relative jump or slower 2023-05-06 08:01:54 I guess that's an unconditional jump, so maybe the prediction circuitry isn't interested in it at all. 2023-05-06 08:02:15 I think it should probably predict if the register's not being changed 2023-05-06 08:03:06 Yeah, that register that points to next never changes. I've thought about changing it at times - for instance to turn on profiling (different next for profiling). 2023-05-06 08:04:17 Profiling looks super easy in this "table of CFAs" approach. I just create a parallel table of counters, and next will bump the appropriate counter in addition to doing what it normally does. 2023-05-06 08:04:24 Then later I stop that and study the table. 2023-05-06 08:05:54 The alternative to changing the register to trigger that alternate process is to have a primitive that just overwrites the next code. 2023-05-06 08:06:18 Probably just writes another jump into its starting point. 2023-05-06 08:07:24 My main reason for taking an overwrite approach instead of a register change approach is that I generally have that register doing double duty - it also servers as my "base of system" pointer. 2023-05-06 08:07:31 And in that role it CAN'T be changed. 2023-05-06 08:08:01 It identifies the start of the 32-bit address space that the system lives in in my overall 64-bit address space. 2023-05-06 08:09:01 But things may be different with these tables. It may be that I only need to know where the tables are, and no longer need to know where the whole thing is. 2023-05-06 08:09:24 If the tables have 64-bit entries. 2023-05-06 08:11:22 I'm a little torn over how to proceed from where I am. I'm contemplating a rather different architecture than what I've got. But what I've got is so so close to being ready to roll - I really just need to write a block editor. 2023-05-06 08:11:45 So I could "start over" with a clean nasm slate, or I could just push this that I've got over the finish line and then use it to write the new thing. 2023-05-06 08:14:12 The payoff of the latter approach is that I wouldn't have to muss with nasm trying to get it to lay things out in memory exactly the way I wanted them. 2023-05-06 08:14:53 My block buffers are completely contiguous - I could just build a new image in block space. 2023-05-06 08:34:25 I've been thinking about using different regions of blockspace as different things 2023-05-06 08:34:41 One for a "block file" or "main disk", and one just for RAM, etc 2023-05-06 08:34:45 what do you guys do with the parameter/data stack in your error/exception handling implementations? clear the stack on fault or leave it up to the caller to figure out the right course of action? 2023-05-06 08:35:10 ABORT clears both stacks if it falls all the way through 2023-05-06 08:35:43 If you have exceptions then it restores stack to the height it was when calling CATCH if it catches an exception 2023-05-06 08:35:50 Both stacks 2023-05-06 08:36:59 And I think it's assumed the items of the stack that are restored after being removed before an exception is thrown could be nonsense 2023-05-06 08:37:55 This is the standard's practice and it's how I would implement it, it's quite reasonable. 2023-05-06 08:38:14 sounds logical, thanks 2023-05-06 08:38:20 Some Forths leave the data/parameter stack untouched on ABORT so you don't lose intermediate values in a calculation if you make one mistake 2023-05-06 08:38:51 i.e. if you mistype a word or something you don't want to lose all the data on the stack, it's non-ANS but desirable for many 2023-05-06 08:41:28 i haven't added an ABORT yet, but i'm implementing TRY and CATCH (in this javascript toy project - so i'm not aiming for ANS or anything traditional) and i'm thinking about the potential of having a fault stack dedicated to exceptions 2023-05-06 08:42:21 I just have a single variable to point at the current CATCH context, and I put the context itself on return stack 2023-05-06 08:42:42 Not dissimilar to how setjmp/longjmp work 2023-05-06 08:43:27 Well CATCH/EXCEPT are exactly that, they might as well have called them SETJMP/LONGJMP but that would be a bit heretical 2023-05-06 08:43:47 Except you can specify which context to jump to with longjmp 2023-05-06 08:46:32 https://www.electronicdesign.com/technologies/embedded/digital-ics/memory/video/21263394/memorycentric-compute-can-speed-searches-for-machinelearning-applications?o_eid=2379F0905723A3W&rdx.ident[pull]=omeda|2379F0905723A3W&oly_enc_id=2379F0905723A3W 2023-05-06 08:46:52 the rough concept i have so far: TRY possibly-dodgy-code CATCH IF !> handle-fault THEN 2023-05-06 08:47:16 unjust: I restore the entire system, including the data stack, to the state it had before I began to type the line that threw the error. 2023-05-06 08:47:34 where CATCH returns TRUE or FALSE, and !> pops the last fault and pushes it onto the parameter stack (there'd be a !@ and >! as well, i guess) 2023-05-06 08:47:49 The idea was that I'll have a command history, and that leaves me ready to up-cursor to recover that line, fix the error, and hit Enter again. 2023-05-06 08:48:09 unjust: That looks good to me, but I'd allow returning any non-zero code as error 2023-05-06 08:48:16 So you can throw different 'exceptions' if needed 2023-05-06 08:48:44 It seemed like the natural course of faction in the presence of a command history. 2023-05-06 08:48:49 It's better than the ANS exceptions which take an xt which is a tiny bit abominable 2023-05-06 08:50:55 Unfortunately standards always compromise design for consensus 2023-05-06 08:58:40 KipIngram: gforth people also said they felt it was alright to wipe data stack because you can just go back through command history to repeat and correct 2023-05-06 08:59:08 Because originally gforth had the volksforth behaviour of keeping stack preserved, but originally there wasn't command history 2023-05-06 09:16:10 KipIngram: is that a single slot for a system state snapsnot? or does it build upon the stack framing and the return-to-caller-caller concept you've mentioned? 2023-05-06 09:20:19 veltas: i can see how that extra degree of flexibility could be useful. i'm currently using only TRUE and FALSE because the fault stack contains (references to) javascript error objects 2023-05-06 09:20:39 Fair enough 2023-05-06 09:29:04 veltas: Yes, but my goal was to avoid repeating steps that had worked - there could be any arbitrary number of them. 2023-05-06 09:29:35 unjust: It's only done on an input line by input line basis. There is no attempt to recover "partial progress" of a single line. 2023-05-06 09:29:49 If it did that, then I'd have to figure out what part of the input line to "not repeat." 2023-05-06 09:30:43 So there's just one slot. It snapshots the system just before calling EXPECT to acquire the command line. If control goes through the error handler, it restores it - if it doesn't, then the snapshot is just abandoned and overwritten by the next one. 2023-05-06 09:32:03 Note that this will even cover the case where the input line loads a whole bunch of nested blocks of source. All of that is "within a typed line." 2023-05-06 09:32:28 So if a compile process fails, then everything it accomplished so far is punted and I'm set to repeat the entire load again. 2023-05-06 09:33:04 What it doesn't snapshot is the disk buffers - those are in a separate memory region. And that could cause some oddities, if I had modified disk blocks. 2023-05-06 09:33:24 I don't have any applications of that at this point, but I could see that becoming an issue for sure. Database integrity and so on. 2023-05-06 09:34:31 But even snapshotting the buffers isn't a total solution - the input line might have FLUSHed. 2023-05-06 09:34:40 It's just a thorny issue taken as a whole. 2023-05-06 09:35:10 I suppose I could snapshot my block file, in the Linux case. But holy cow... 2023-05-06 09:35:49 In a bare metal case it's a problematic method to start with, because there's just no certainty that I have the storage resources for such a total imaging process. 2023-05-06 09:36:20 I regard it as a luxury in the case of a "notebook PC Forth," where I have tons of resources I'm not using. 2023-05-06 09:37:11 I've got another such luxury in mind, if I ever get to the point where I'm compiling enough code for compile time to matter (it never has so far). 2023-05-06 09:37:47 I might add a bit initially empty hash table, and store the results of word lookups in it. That way only the first access of a word would require a linked list search - after that I'd find it in the hash table. 2023-05-06 09:38:14 The minute I timed Gforth compilation speed I knew they had to be using some kind of a hash approach. It was too fast to possibly be a linked list search. 2023-05-06 09:38:40 Although the linked list is still there 2023-05-06 09:38:48 It's just in addition to the hash table 2023-05-06 09:38:49 Sure, you do need word order. 2023-05-06 09:39:07 You can get word order with a linked list in just the buckets 2023-05-06 09:39:08 And it would still be there in my case too - this hash table would be built gradually as words were actually used. 2023-05-06 09:39:19 Yes, you could. 2023-05-06 09:39:33 The linked list is the best way to STORE the dictionary. 2023-05-06 09:39:48 You'd like for this hash table to be big and sparse, to minimize collisions. 2023-05-06 09:41:26 gotta use those jiggabytes of ram for something 2023-05-06 09:42:21 No kidding. But if you have any plans at all for moving to a small resource platform, then your "big resource" features better be optional optimizations. 2023-05-06 09:43:03 What you just said is exactly how I look at it - Forth itself just really doesn't need all those resources. 2023-05-06 09:45:54 Or just don't have big resource features if they're not necessary 2023-05-06 09:47:06 Which IMO hash tables aren't, if things slow down too much then you've either got too much code or you can use wordlists to break up the work 2023-05-06 09:48:15 Right. I've felt the error recovery snapshotting is "necessary" (which really just means "convenient") but to date I've never implemented the hash table, because I don't compile enough code for the time it takes to matter. 2023-05-06 09:48:28 All of my compiling is effectively "instant" in terms of my reaction times anyway. 2023-05-06 09:48:46 It's just nice knowing I've got the plan if I decide I want it. 2023-05-06 09:49:07 And I might make comments in my source code as to where it would "plug in." 2023-05-06 09:49:38 It's easy enough - WORD could compute the has as it parses out each word. 2023-05-06 09:49:42 hash 2023-05-06 09:50:15 Leave it in some variable. 2023-05-06 09:50:35 I mean the compile time is 'fast enough' for each line I input on my Z80 Forth with a single wordlist 2023-05-06 09:50:50 So logically it should be fine for a modern computer to build a larger program 2023-05-06 09:51:01 Yes, I think this would only show up at all if you were LOADing some huge application. 2023-05-06 09:51:06 These days things are just so fast. 2023-05-06 09:51:18 That's what I mean, LOADing a 'huge' program 2023-05-06 09:51:46 That's why I've always wondered what Chuck was doing back in the period when he was so concerned with compile speed. I suspect he was re-compiling his entire chip design system from scratch every time, and machines were slower back then. 2023-05-06 09:51:56 firefox still takes a long while to get up off disk, even on a !potato system 2023-05-06 09:54:43 It was clearly a big deal to him, and Jeff Fox wrote a bunch of stuff around that too. 2023-05-06 09:54:57 It was clearly on their minds as a primary design goal. 2023-05-06 09:59:51 Reasonable compile times are very important for iteration and development 2023-05-06 10:00:28 For instance I won't touch zig with a bargepole because it took like 30 seconds to build hello world the last time I tried it 2023-05-06 10:00:43 And apparently rust is much slower than C++ to compile, which already is a bit slow 2023-05-06 10:01:51 they've been trying to make rust less bad, but that's according to C++ folks who are used to unreal engine taking 45 minutes to compile on a beefy compile rig 2023-05-06 10:14:45 Yeah, there are certainly slow things out there, and you're right - it does matter. 2023-05-06 10:14:53 Those delays stall the creative flow. 2023-05-06 10:15:37 somehow gamedevs make do. maybe that's why lua scripting is popular? 2023-05-06 10:16:04 I think that's exactly why they employ embedded scripting languages a lot 2023-05-06 10:16:12 And can change stuff dynamically 2023-05-06 10:16:57 or they prototype in a not-C++ language (bonus: if management wants them to ship it, they say no can do, it's written in ...) 2023-05-06 10:33:47 Ok, I think I'm wanting the five regions I listed yesterday, and I do think I could work with using registers only to point to the two tables, though a register pointing to the disk buffer region would speed BLOCK's "already resident" path up a bit. But to avoid a register to point to the bodies, which is the natural home of code, I'd need to put NEXT in one of the table regions, at the very beginning - then 2023-05-06 10:33:49 the table pointer reg would do double duty as the jump target for next. 2023-05-06 10:34:20 On init I'd add the base location of the body region to all of the table items. 2023-05-06 10:34:51 And I'd have to manually set a variable pointing to the header region (and the disk buffer region if I don't use a reg for it). 2023-05-06 10:35:23 But this table approach simplifies image loading substantially, because all the cells that need adjusting are gathered together in well-defined locations and not "mixed in" with stuff that doesn't need adjusting. 2023-05-06 10:36:09 The header region will contain no addresses - it will work with table indices and relative distance to next list entries exclusively. 2023-05-06 10:45:38 I'd probably put next in the PFA table region, because there's already a reason to set its base below the first actual table entry (since primitives don't have PFA pointers). 2023-05-06 10:47:05 Well, maybe not - maybe that's a reason to put it in the CFA table. Otherwise I'd be forced to actually carry that empty unused space between next and the first colon definition PFA.