2024-10-22 05:50:55 Hey, I understand what the x64 does when you do a jmp qword ptr [reg] or something along those lines. It uses reg as an address, pulls a 64-bit address from that location, and jumps there. 2024-10-22 05:51:00 What does it do then? 2024-10-22 05:51:15 How does it use a dword, or whatever, smaller than 64 bits, as a jump target? 2024-10-22 05:54:35 Is it really feasible to have a jump table with less than 64-bit entries? 2024-10-22 05:55:01 Direct, I mean - I know I could take anything and calculate it into a register and then jump to that. 2024-10-22 06:00:58 I think there's like 20 different jump modes depending on what exactly is involved 2024-10-22 13:13:49 That seems typical. :-) I'm just trying to figure out if I can shrink the size of my jump table. At this point my guess is probably not, but I guess I ought to study up on it. 2024-10-22 14:32:59 I wonder if you could jump to 0x80483020 + eax*8. (Or 16?) Then short words would be entirely inside the jump table, while longer words would have to jump to their "lower halves" 2024-10-22 14:34:38 if a word didn't need its entire 8-byte jump table entry, maybe you could pack the lower half of some other word into it 2024-10-22 15:24:44 As far as I can tell all of this IS what will reduce the size of the application code. The app level won't really see any of these differences I'm discussing here - it's all just part of how the underlying engine works. 2024-10-22 15:25:06 The whole goal is to let as much of my app code as possible be encoded with one-byte operation codes. 2024-10-22 15:25:20 Including as many of my calls as possible. 2024-10-22 15:26:20 As I noted in our private chat, that will include a) some amount of the most recent code I've compiled, and b) some number of "common definitions" that get used frequently. 2024-10-22 15:26:54 My personal coding style leads to a lot of my calls being to recently defined "factored out words." 2024-10-22 15:27:23 So I'm trying to arrange to be able to, in at least most cases, call those with one byte. 2024-10-22 15:28:52 Possibly a byte in the middle of a cell full of instructions. On the F18A, any call or jump terminates processing of the current instruction cell. It "uses up" the entire remainder of the cell, whether it needs it all or not. I'm trying to do better than that. I want to be able to make a one-byte call with, say, the third byte in a cell, and when I return carry on with the rest of that cell. 2024-10-22 15:29:48 Otherwise I'll pretty quickly lose the full benefit of having this packed opcode approach to start with. 2024-10-22 15:31:22 I totally agree with you that shrinking the engine at the expense of app size wouldn't be a win. I just don't think it's an issue that comes into play here. 2024-10-22 15:31:46 This whole approach is designed to minimize app code size; I'm just dickering about over the implementation details here. 2024-10-22 15:32:50 I wonder if you could make some of the bytecodes be less than a whole byte 2024-10-22 15:33:02 by shifting less than 8 bits in their embedded next 2024-10-22 15:34:10 The problem there is that you don't know what instruction you're jumping to until you get there. So how many bits do you use to produce your table entry address? 2024-10-22 15:34:28 oh, that would involve some instructions using up 2 or 4 entries in the table 2024-10-22 15:34:47 like entries 0 and 128, or 0, 64, 128, and 192 2024-10-22 15:34:52 Ah, I see. Well, yes - that does seem possible. Nice idea. 2024-10-22 15:35:09 So you'd always wind up in the right place, and then the per-instruction code would decide how many bits to shift. 2024-10-22 15:35:12 right 2024-10-22 15:35:24 I'll have to give that some thought - it might be worthwhile. 2024-10-22 15:35:25 the BASIC Stamp used a variable-bit-width tokenized "bytecode" for its BASIC 2024-10-22 15:35:41 but I always thought that would be ridiculously inefficient 2024-10-22 15:35:49 Yeah. I like it - thanks. 2024-10-22 15:35:58 but I think that with your shift-and-index-with-low-byte approach it wouldn't be 2024-10-22 15:36:09 I think you'd beat the possible inefficiencies by using those multiple table entries. 2024-10-22 15:36:11 you could just have more complex behaviors on your bytes 2024-10-22 15:36:28 True, but I don't want to overcomplicate the engine TOO much. 2024-10-22 15:36:33 This is really really simple right now. 2024-10-22 15:36:53 What xentrac just suggested doesn't add any performance overhead. 2024-10-22 15:37:24 well, except maybe more dcache misses 2024-10-22 15:37:28 NEXT would continue to work exactly like I have it now, except that the bit shift would be regarded not as part of NEXT but would be part of the instruction code bits. 2024-10-22 15:37:35 Still just a single instruction, though. 2024-10-22 15:38:18 And xentrac, this would be applied to the words based on how commonly they're used. 2024-10-22 15:38:26 that indirect jump is probably two or three μops 2024-10-22 15:38:49 yeah, I was thinking that you probably have some very common words 2024-10-22 15:38:54 Yes, but it's not too different from things I've written before. 2024-10-22 15:39:03 yeah 2024-10-22 15:40:04 That was what got me enthused over this idea to start with. I've historically just automatically categorized things with a token-threaded flavor as "inefficient," but after thinking it over I started to feel like this could be just as efficient as the indirect threaded stuff I've written in the past. 2024-10-22 15:40:30 Maybe even a little better. And that will be plenty good to suit me. I'm trying to get "as good performance as I've had in the past" while being a lot more compact. 2024-10-22 15:42:29 Probably somebody has tried making next just be a RET instruction, right? 2024-10-22 15:42:45 Yes, dave0 here in channel has played with that. 2024-10-22 15:42:55 For DTC 2024-10-22 15:43:14 Right. I thought it looked really interesting. I don't think he ever finished it out, but I was always nagging him to. 2024-10-22 15:43:48 Of course, that means your instruction cells have to be a fully qualified return address, so eight bytes. 2024-10-22 15:43:50 I imagine it doesn't work very well with the branch predictors in current CPUs, but it might not be as bad as the conventional DTC approach 2024-10-22 15:44:00 yes, it wouldn't be helpful for sizecoding compos 2024-10-22 16:38:49 when I was mixing token threaded forth and 6502 assembly in the same file, I assigned the token to start a word to the same byte as the software interrupt op code so you could jump to the same forth word from forth or assembly and it would work 2024-10-22 16:39:18 maybe you could play with the return instruction and save some space 2024-10-22 16:47:36 As far as engine size, you really may have to write a lot of code to make up for a larger engine. I think it's a case by case basis on how much saves vs adds. I was tracking this for each feature since it was designed to work for a specific project not be general purpose and some features were a net loss 2024-10-22 17:39:34 Technically, in standard Forth, you are supposed to use .( instead of ." outside of a definition (for tedious reasons) 2024-10-22 17:39:41 what are those reasons? 2024-10-22 17:39:53 comes from https://wiki.laptop.org/go/Forth_Lesson_8 2024-10-22 17:41:24 it makes no sense for me, and less when you add a cr into .( when .( is immediate, but cr is not 2024-10-22 17:58:10 vms14: just speculation, but maybe because in some forths ." is a compile-only word 2024-10-22 17:58:34 makes sense 2024-10-22 17:58:36 ty 2024-10-22 19:58:50 afk 2024-10-22 20:43:20 vms14: I guess if cr is in .( then someone thought that made sense; it implies .( is used only to pring line messages. But .( being immediate is enough - everything inside it will execute when it runs. I guess the intent there is for .( to display stuff at compile time, whereas ." stuff won't show until runtime. 2024-10-22 20:43:58 ." is immediate too, by the way. But not because it prints anything at compile time - it just has to do some custom stuff to compile the right stuff to run at runtime. Has to copy the string from the input stream to the compiled definition, etc. 2024-10-22 20:45:03 It compiles a runtime word, often called (."), and then puts the string right after that. At runtime (.") runs, prints the string it finds in the code right after itself, and jumps execution over the string. 2024-10-22 20:47:32 which is why i could imagine in some forths ." might be a compile-time only word if they were too lazy to make it state-aware and just always compile (.") followed by the string 2024-10-22 22:17:35 it's not just laziness; a lot of people think state-smart words are a plague and should be eliminated, which is why .( got introduced in the first place, I think in Forth-83 2024-10-22 22:18:52 vms14: ↑ 2024-10-22 22:42:22 but quit is state-smart 2024-10-22 22:43:50 well, i suppose those people might avoid it by writing another parse loop in ] 2024-10-22 22:44:06 which i was doing for a while for my cross compiler 2024-10-22 22:52:54 it's a bit of a special case though