2024-09-06 00:59:29 xentrac: I wrote mine so that BLOCK starts off like a primitive, and if the block requested turns out to be resident, it runs entirely as a primitive. Only if disk activity is going to be required does that part then transfer control over to a definition. 2024-09-06 01:00:06 That's not exactly a standard thing to do in Forth, but I rigged it anyway. I wanted the "already resident" path to be as fast as humanly possible. I went to considerable trouble to get that efficient. 2024-09-06 01:11:11 forth is whatever you want it to be 2024-09-06 01:25:44 KipIngram: that seems like a very good idea. How are you managing the block-to-buffer mapping? Is it a linear search, a hash table, a trie, or something else? 2024-09-06 01:36:36 if you're writing a native-code compiler, you could inline at least the fast path of block into the caller. with a cuckoo hash table you could probably get it down to 5-8 instructions 2024-09-06 01:36:49 or less in the case where the block number is a constant 2024-09-06 01:41:21 what is a cuckoo hash table? 2024-09-06 01:42:01 I usually go with 255 buffers, but any 2^n-1 would do . So far I've settled for a very simple has - just block number mod 255, which I speed up using that Hacker's Delight trick of multiplying instead of dividing. The reason that feels "settling" is because there's really only one buffer a block can map to, and a really good overall system would offer a couple. I don't try to allow collisions - 2024-09-06 01:42:03 if the buffer's occupied I just boot it's block. 2024-09-06 01:43:00 The reason 255 and 2^n-1 work well is that I can allocate a 1 MB block for this and that gives me 255 4kB block buffers with 4kB left over, and that will hold 255 16-byte descriptors for the buffers with just 16 bytes unused. 2024-09-06 01:44:19 that's basically how very simple memory caches work 2024-09-06 01:44:23 The descriptors tell me whether the buffer is occupied, if so what block is in it, and whether ti's been updated or not. 2024-09-06 01:44:27 except they use 2^n rather than 2^n - 1 2024-09-06 01:45:10 Right. 2^n would have made the modulo op trivial, but then I'd have had no space left for the descriptors unless my overall RAM block wasn't a power of 2. 2024-09-06 01:45:31 i mean you can always put the descriptors somewhere else 2024-09-06 01:49:11 Yes, that's true. 2024-09-06 01:49:19 unjust: in a cuckoo hash table, you use two hash functions, H1 and H2. Data for a key K is stored either at A[H1(K)] or A[H2(K)]. On insertion you insert into either one if the other is occupied. If they're both occupied, you must evict some item with some other key K' from its slot, which you canonically do by moving it to its other possible slot, recursively repeating the procedure 2024-09-06 01:49:51 KipIngram: it's necessary to have at least two possible buffers for a block for correctness; your direct-mapped cache scheme is not sufficient 2024-09-06 01:50:21 thanks xentrac 2024-09-06 01:50:34 because, as I understand it, it's required that block swap block 1024 cmove update should work to copy one block to another 2024-09-06 01:51:56 but with a direct-mapped cache scheme like yours, it's possible that both blocks might hash to the same slot, in which case that code will fail to copy data 2024-09-06 01:52:41 Anyway, I never really was shooting for the "slickest ever" hashing setup. I was going for something I could get to work quickly; figured I could improve it later if I wanted to. 2024-09-06 01:52:57 (that is, both calls to block will return the same buffer address, so you'll be copying the block to itself) 2024-09-06 01:53:03 I think it's a very good hashing setup! 2024-09-06 01:53:15 speaking of cmove 2024-09-06 01:53:16 Honstly the level I've coded and operated at so far almost anything would work; I haven't needed something incredible. 2024-09-06 01:53:20 except for the correctness problem, but you can fix that 2024-09-06 01:53:36 should cmove work for overlapping regions 2024-09-06 01:53:42 a la memmove vs memcpy 2024-09-06 01:53:56 cmove is defined to work bytewise from lower to higher addresses 2024-09-06 01:54:05 memmove() is move 2024-09-06 01:54:18 xentrac: Yes, strictly speaking the results of a call to BLOCK in my system are valid only up until the next BLOCK call. 2024-09-06 01:54:21 i see 2024-09-06 01:54:27 Unless I know otherwise, of course. 2024-09-06 01:54:28 there's also a cmove> which works from higher to lower 2024-09-06 01:54:46 That's one of the things that I've not ever gotten "big enough" to have be a problem. 2024-09-06 01:54:51 the reason for this is so that you don't need a separate word for memset() 2024-09-06 01:55:24 That's one of the reasons, though, I wanted the "already resident" part of BLOCK to be fast. So I could call it more or less with impunity. 2024-09-06 01:55:52 maybe I'm wrong! https://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Blocks.html says Gforth also uses a direct-mapped scheme 2024-09-06 01:56:49 As I said, though, I was just getting the system up on its legs and wanted to be able to start writing Forth source for further parts of the system. I wanted something that would work and not take too much of my time. 2024-09-06 01:57:00 There were other things I wanted to get on with. 2024-09-06 01:58:16 I think I'm wrong; https://forth-standard.org/standard/block just says, "A call to BLOCK or BUFFER may render a previously-obtained block-buffer address invalid," without any restriction to exclude the immediately previous call to block 2024-09-06 01:58:21 now I'm wondering where I got that idea 2024-09-06 01:58:30 probably cuz it's fairly smart 2024-09-06 01:58:40 If I wanted to fix the problem you called out, I'd probably just allocate 2MB instead of 1MB and have two buffers for each hash bucket. 2024-09-06 01:58:55 dumb and fast 2024-09-06 01:58:58 Then when the first was found occupied the next block would go to the second, and that would work fast. 2024-09-06 01:59:35 And then my problem would be if I wanted to xor a pair of blocks into a third block and they happened to be just the wrong ones. 2024-09-06 01:59:43 so... 3MB. :-) 2024-09-06 01:59:52 or you could require people to use a temporary buffer for that 2024-09-06 01:59:58 Yeah. 2024-09-06 02:00:10 I think the more common case for things like that is something like the dot product of two long vectors 2024-09-06 02:00:20 well, no. matrix multiply, maybe 2024-09-06 02:00:37 But if I were implmenting a RAID system I'd just design it so that I knew the blocks within a stripe wouldn't hash to the same bucket. 2024-09-06 02:00:59 haha 2024-09-06 02:01:59 Later on I thought a lot more about hashing, this time in connection with dictionary searches. It's pretty clear that GForth uses hashing for the dictionary. I did some timing measurements on its compile speed one night, and it was just IMPOSSIBLY fast for it to be a linked list search. 2024-09-06 02:02:17 Completely out of all reason, especially given how many words there are in the GForth dictionary. 2024-09-06 02:02:37 https://forth-standard.org/standard/block/BLOCK says Forth-79 guaranteed strict LRU ordering for block buffer assignment, which is also what F83 uses, with four buffers 2024-09-06 02:03:13 I've got a scheme in mind for next time that will let me start with a linked list style dictionary without actually having to pay the cost of a link field (the count byte takes care of it). And then I'll be able to add hashing to it later if I choose to. 2024-09-06 02:03:34 The trick for getting the length byte to do that job is to start putting words in at the TOP of the RAM range and build downward instead of the other way around. 2024-09-06 02:03:57 The only thing that's not a fixed size is the name string, so the length byte gives you what you need to jump upward to the next (previous) entry. 2024-09-06 02:06:49 I was kind of tickled at noticing that - that you could drop an explicit link field entirely if you did it that way. 2024-09-06 02:07:38 alternatively if you've got enough ram you can just burn 32 bytes per word 2024-09-06 02:07:53 27 character word names are probably fine right? 2024-09-06 02:07:58 In my current system my CFA/PFA pointer pairs are in my headers along with the name, but next time I'm going to have them separate, and the "vocabulary" will contain ONLY the name strings and pointers into that CFA/PFA table, which will be shared system wide by all vocabularies. 2024-09-06 02:08:33 The names for each vocabulary will go in their own RAM block, and if I wanted to I could completely deallocate those blocks after I'd used those names for all I needed them for, and things would still run. 2024-09-06 02:10:01 Nothing anywhere else will point back into those. The down side of that design is that there won't be any record of what vocabulary a compiled word is in - I'd have to search more or less everything in order to decompile, so... not 100% sure of it yet. 2024-09-06 02:10:48 Even thinking about "decompiling" and "discarding names" in the same system is a little in the weeds though, isn't it? 2024-09-06 02:11:28 KipIngram: you have a count byte to tell you how big the word is? 2024-09-06 02:11:40 i rarely use the decompiler 2024-09-06 02:11:43 I don't think you're going to be able to implement compile, very easily with that scheme 2024-09-06 02:11:43 Yes, the name string is just a standard Forth counted string. 2024-09-06 02:12:05 oh, you're not putting the code and data fields there, just the names? 2024-09-06 02:12:06 when i do it's for debugging, and when i need to debug that kinda thing it's because i'm writing immediate words 2024-09-06 02:12:10 Well, certaintly not if I've thrown away the names. 2024-09-06 02:12:31 so i wrote my own decompiler that a) gives me more information and b) doesn't segfault if it doesn't understand the word 2024-09-06 02:12:34 Right. If the code and parameter fields were there I'd never be able to discard it. 2024-09-06 02:12:45 They get accessed by the runtime stuff. 2024-09-06 02:12:53 Right. 2024-09-06 02:14:07 What if you put the length field after the characters of the name? like b l o c k 5 m o v e 4 2024-09-06 02:14:18 Another advantage of putting the CFA/PFA pairs in a table (which is what this boils down to) is that if you wanted to you then could have a definition be a list of offsets into that table instead of a list of addresses. 2024-09-06 02:14:19 then you could use a pointer to the length field as a pointer to the name 2024-09-06 02:14:35 And then you might make them 16 bits and you could still have 2^16 words. 2024-09-06 02:14:53 yes, small offsets are good 2024-09-06 02:15:24 Anyway, that's really no longer my main plan; since then I've thought of this other scheme with the vm. 2024-09-06 02:15:33 KipIngram: you've just invented token threading 2024-09-06 02:15:36 What's the vm scheme? 2024-09-06 02:15:38 slower but denser 2024-09-06 02:15:56 indexing a table isn't actually slower than dereferencing a pointer on modern architectures 2024-09-06 02:16:13 A byte-code vm, for compactness and portability, with a code-threaded Forth running on top of it. 2024-09-06 02:16:14 or, very little 2024-09-06 02:16:26 "code-threaded"? 2024-09-06 02:16:39 And I used to think that would be slower than the indirect threaded systems I've previously written, but after I thought about it some I'm not so sure. 2024-09-06 02:16:52 Code threraded is where you basically compile machine code rather than lists of addresses. 2024-09-06 02:16:58 Definitions are a sequence of calls. 2024-09-06 02:17:02 oh, STC 2024-09-06 02:17:08 "subroutine threaded" 2024-09-06 02:17:11 In this case, though, vm calls, not native machine calls. 2024-09-06 02:17:15 right 2024-09-06 02:17:20 Yes - that. I've never written that style of system before. 2024-09-06 02:17:39 that's kind of how StoneKnifeForth works 2024-09-06 02:17:47 xentrac: true 2024-09-06 02:17:47 (without the VM) 2024-09-06 02:18:07 i guess i've just got avr brain 2024-09-06 02:18:19 yeah, on the AVR it makes a real difference 2024-09-06 02:18:21 where it kinda has to be tokens because there's two address spaces 2024-09-06 02:18:33 but on Cortex-M0 it's the same 2024-09-06 02:18:42 no, ITC works fine with two address spaces 2024-09-06 02:18:56 the CFA points into code space, the DFA is in data space 2024-09-06 02:19:09 that's sort of why ITC was invented actually 2024-09-06 02:20:11 i'll be honest im not too familiar with what the DFA does because the forth i'm familiar with (jonesforth) doesn't have one 2024-09-06 02:20:23 you just go DOCOL and then immediately into the forth words 2024-09-06 02:22:47 ie. the data field is directly after the cfa in ram 2024-09-06 02:22:51 so there's no need for a pointer 2024-09-06 02:23:00 this is the original ITC paper, Dewar 01975: https://dl.acm.org/doi/pdf/10.1145/360825.360849 2024-09-06 02:23:14 in such a system you can kinda hack in DOES> but it can't easily work the same way 2024-09-06 02:23:31 mm actually i think you can 2024-09-06 02:23:32 that is the perfectly normal way to do ITC 2024-09-06 02:23:42 and yes, implementing does> is common on such a system 2024-09-06 02:23:56 because DOES> gets run before any code gets compiled to the word 2024-09-06 02:24:06 the implementation in F83 is: 2024-09-06 02:24:06 : DOES> (S -- ) COMPILE (;CODE) 232 ( CALL ) C, \ ;CODE Used for defining the run time portion of a defining 2024-09-06 02:24:10 [ [FORTH] ASSEMBLER DODOES META ] LITERAL \ word in low level code. 2024-09-06 02:24:13 HERE 2+ - , ; IMMEDIATE \ DOES> Specifies the run time of a defining word in high 2024-09-06 02:24:16 \ level Forth. 2024-09-06 02:24:59 oh wait that's the implementation for the assembler 2024-09-06 02:25:14 amby: The CFA points to machine code always, which for a definition is DOCOL. The other one points to the list of addresses that comprise the definition, or to the actual data for a variable or a constant. 2024-09-06 02:25:18 the normal implementation is: 2024-09-06 02:25:18 T: DOES> (S -- ) \ DOES> 2024-09-06 02:25:19 [FORWARD] <(;CODE)> HERE-T ( DOES-OP ) 232 C,-T \ Compile the code field for (;CODE) and a CALL instruction 2024-09-06 02:25:22 [[ ASSEMBLER DODOES ]] LITERAL HERE 2+ - ,-T T; \ to the run time for DOES, called DODOES. 2024-09-06 02:25:33 yeah i got cfa 2024-09-06 02:25:46 the forth i'm working on already has DOCON and DOVAR 2024-09-06 02:25:57 unfortunately this implementation is not very clear to me 2024-09-06 02:26:02 It gets extra interesting when you do CREATE DOES>, because to do it really cleanly you need three. 2024-09-06 02:26:19 although CREATE doesn't automatically append DOCOL in jonesforth 2024-09-06 02:26:22 CREATE DOES> is a lot like combining a definition and a variable. 2024-09-06 02:26:31 And you need to know where both are. 2024-09-06 02:26:31 so DOES> doesn't need to go back and overwrite anything 2024-09-06 02:26:50 The definition is the DOES> code; the variable part is the data associated with the end word. 2024-09-06 02:29:08 this is the F83 definition for dodoes: 2024-09-06 02:29:09 ASSEMBLER LABEL DODOES \ DODOES 2024-09-06 02:29:12 SP RP XCHG IP PUSH SP RP XCHG IP POP \ The runtime portion of defining words. First it pushes the 2024-09-06 02:29:15 W INC W INC W PUSH NEXT \ IP onto the return stack and then it pushes the BODY address 2024-09-06 02:29:18 \ of the word being executed onto the parameter stack. 2024-09-06 02:30:45 w in F83 is a register that holds the DFA on entry to the code 2024-09-06 02:30:56 uh, so it holds the CFA, not the DFA 2024-09-06 02:31:58 the xchg/push/xchg dance is to push the interpretive instruction pointer (SI) onto the return stack 2024-09-06 02:32:06 yknow it took me a minute to realise that was postfix asm 2024-09-06 02:32:17 yeah, sorry, it can be confusing 2024-09-06 02:32:18 i've been using my own assembler enough that it just looks natural now 2024-09-06 02:32:30 no it wasn't confusing lol 2024-09-06 02:32:33 oh good 2024-09-06 02:32:35 exactly the opposite 2024-09-06 02:32:40 i just didn't realise it was postfix 2024-09-06 02:32:47 yeah, I have that happen with Spanish and English sometimes 2024-09-06 02:32:59 like someone will say something in Spanish and I'll assume that English-speakers nearby understood it 2024-09-06 02:33:17 because I didn't realize they were speaking Spanish 2024-09-06 02:33:23 yeah 2024-09-06 02:33:43 but I'm still confused about exactly how F83 implements does> in ITC 2024-09-06 02:34:01 I guess the CFA has to point to dodoes 2024-09-06 02:34:58 but I'm not clear on where the pointer to the code following does> is. At the beginning of the data field? 2024-09-06 02:35:28 i would assume so 2024-09-06 02:35:39 How could it get there if I've just said create , , and already put something else at the beginning of the data field? 2024-09-06 02:36:05 the way i was thinking of doing it is DODOES calls (i think) DF[0] with the address of DF[1] on the stack 2024-09-06 02:36:40 hm yeah 2024-09-06 02:36:58 like, the F83 implementation of 2constant is literally : 2constant create , , does> 2@ ; 2024-09-06 02:39:55 if i was doing it i'd go something like 2024-09-06 02:40:41 : 2CONSTANT WORD CREATE DOES , , DOES> 2@ ; 2024-09-06 02:40:56 where DOES compiles DODOES and leaves a space 2024-09-06 02:44:43 also does 2@ leave low or high word on top of stack 2024-09-06 02:45:28 mine leaves high on tos, but thinking about it you want low 2024-09-06 02:45:31 cuz little endian 2024-09-06 02:52:48 okay, now I understand how it's working 2024-09-06 02:53:35 each "class" defined by a create does> word in F83 gets a little stub of machine code generated for it 2024-09-06 02:54:49 so does> goes back and changes the CFA to point to that little stub, which it sticks into the definition of the "class" such as 2constant 2024-09-06 02:54:59 all the stub does is "call dodoes" 2024-09-06 02:55:00 ah 2024-09-06 02:55:07 clever 2024-09-06 02:56:11 so all 2constant-defined words get a CFA that points into the middle of 2constant 2024-09-06 02:57:18 instead of using the usual CFA for create-defined words 2024-09-06 02:57:50 it's sort of a departure from the pure ITC model because you have to emit machine code for each new create does> class 2024-09-06 02:58:07 which goes into your data memory 2024-09-06 03:00:47 now I'm wondering what F-PC does, because IIRC it actually puts machine code in a whole separate segment 2024-09-06 03:03:14 if dodoes had some way to find the *end* of the word's data field, you could stick the pointer to the does> code there. Or you could have does> go back at compile time and patch the create call to call something else that allocates extra space at the beginning of the data field for a pointer to the does> code 2024-09-06 03:03:36 similar to your does 2024-09-06 03:06:11 basically I was just playing with F83 and looking at a bunch of hex dumps 2024-09-06 03:08:42 with respect to block, even the Forth-79 standard says, "Only data within the latest block referenced by BLOCK is valid by byte address, due to sharing of the block buffers." 2024-09-06 03:09:00 it could have said "latest two blocks" but it didn't, so I think I was just comprehensively wrong 2024-09-06 03:12:19 p. 33 of https://www.complang.tuwien.ac.at/forth/fth83std/FORTH83.TXT says, "Only data within the last buffer referenced by BLOCK or BUFFER is valid." 2024-09-06 03:12:43 and unlike Forth-79, Forth-83 actually got implemented 2024-09-06 03:12:58 KipIngram: possibly of interest ↑↑ 2024-09-06 03:50:29 I think KipIngram's remark, "It gets extra interesting when you do CREATE DOES>, because to do it really cleanly you need three," is amply justified by the F83 implementation of create does> 2024-09-06 04:52:59 welcome back, dnm. irccloud glitch? 2024-09-06 11:32:09 xentrac: Someone else has suggested starting PAD near end of dictionary and having words like -PAD +PAD that subtract or add 16 cells or something 2024-09-06 11:32:40 Frankly if you've got that level of control, and don't have weird optimisations like my Z80 forth, you can have the return stack share this space with dictionary 2024-09-06 11:32:56 Then you can just put it on the return stack 2024-09-06 11:33:51 In my Z80 forth I've got a few buffers defined at fixed offsets under return stack, and the return stack code assumes that it's within a page so physically can't be bigger than like 120 cells 2024-09-06 11:34:32 Basically because it's faster/smaller to increment/decrement the low address register byte rather than the whole thing 2024-09-06 11:38:11 But yes I'm very aware that the first suggestion has that limitation, but that's the thing about Forth, I am always willing to accept a simpler limited solution over a general complicated one 2024-09-06 11:39:09 For your hanoi program I would probably combine the data into one cell and have access words for playing with those bitfields 2024-09-06 11:39:15 And keep it all on the parameter stack 2024-09-06 11:39:36 If I was targeting my Z80 forth, anyway 2024-09-06 13:32:18 veltas: yeah, that makes sense. I feel like a problem Forth suffers on 8-bit machines is that it makes it hard to avoid 16-bit arithmetic 2024-09-06 13:32:33 which is usually pretty expensive on an 8-bitter 2024-09-06 14:10:39 Personally I think it's fine and 16-bit is so convenient you accept the penalty 2024-09-06 14:10:56 But there's a lot of small optimisations hidden about like the one I just mentioned to make it a bit better 2024-09-06 14:11:23 I could have added 8-bit ops etc but with Forth's overhead the benefit would be hard to notice 2024-09-06 14:11:43 To save memory a lot of my system vars are 8-bit though, accessed with C@ C! 2024-09-06 14:13:06 i mean 2024-09-06 14:13:37 i wouldn't expect you to save much memory there compared to the size of the header for the var 2024-09-06 14:38:08 Correct ......... so? 2024-09-06 14:38:19 Today I'm trying to see if I can convince GCC to build a better ilo 2024-09-06 14:38:25 On my lunchbreak 2024-09-06 14:38:44 It seems to inline everything quite nicely, but even if I make everything static it refuses to try and be smart with the variables 2024-09-06 14:38:54 So I'm going to move everything into one function and see what happens 2024-09-06 14:39:02 And then see if I can tidy that up if that works 2024-09-06 14:39:14 And then crc can tell me it's too ugly to see the light of day ;) 2024-09-06 14:39:53 Otherwise GCC's code is quite similar to my AMD64 manual assembly implementation 2024-09-06 14:40:18 Except it uses static vars for everything 2024-09-06 14:44:26 are you using gcc instead of just gas because you take advantage of cpp? 2024-09-06 14:46:11 or at least i'm assuming ilo is implemented in assembly and not c 2024-09-06 14:51:25 ilo's implemented in a few languages 2024-09-06 14:51:30 I'm talking about optimising the C implementation 2024-09-06 14:51:41 Or trying to figure out how to coax GCC into doing a better job with it 2024-09-06 14:52:02 But the result will probably be unacceptable to CRC because the big advantage of the .c file is it's easy to read/understand and really short 2024-09-06 14:53:12 veltas: I'm ok with having more complex / optimal implementations of ilo 2024-09-06 14:53:20 crc: This isn't necessarily representative of final approach but I'm interested to know how this performs compared to .c and manual AMD64 implementations in your benchmark https://termbin.com/xpv1 2024-09-06 14:53:42 The main change here is moving almost everything into one function 2024-09-06 14:54:16 GCC seems to do a reasonable job putting stuff in registers, can fine-tune if this has good gains, if not then I'll probably stop here 2024-09-06 14:58:18 The code's not as clean as the manual AMD64 stuff, but if it runs as well then this is useful because in theory you can build performant code for other arch's 2024-09-06 15:00:16 Probably worth playing with moving some of the non-core operations out like I/O, and moving the arrays back into static/extern vars 2024-09-06 15:00:53 When I say "out" I mean into their own functions 2024-09-06 15:01:24 What's interesting is that GCC managed to do a good job inlining even with everything extern, it just failed to reconcile the vars into regs even with all vars static 2024-09-06 15:04:31 I'll run some tests and report back in a little while 2024-09-06 15:19:42 GCC specific but this is super relevant: https://gcc.gnu.org/onlinedocs/gcc/Global-Register-Variables.html 2024-09-06 15:19:51 I had no idea this was a thing until I started googline 2024-09-06 15:20:27 Conventional wisdom is 'register' is irrelevant now, I wonder if that's really true, would be nice to help GCC figure out which vars are best to reserve regs for 2024-09-06 15:20:49 So many things are stated as fact that really deserve more scrutiny 2024-09-06 15:21:00 yeah, the PFE README reported that that was a really important optimization to get PFE within a stone's throw of assembly-programmed Forth performance 2024-09-06 15:21:15 in like 01996 2024-09-06 15:21:16 Source? 2024-09-06 15:21:25 What is 01996? 2024-09-06 15:21:36 28 years ago 2024-09-06 15:22:15 Oh the year 1996 2024-09-06 15:23:10 looking at my notes, I was looking at PFE 0.9.14, which it turns out is from 01994: https://sourceforge.net/projects/pfe/files/pfe/0.9.14/ 2024-09-06 15:23:44 Writing the year like that is quite confusing 2024-09-06 15:23:54 I was trying to get it to build on current Linux, because it uses a GCC extension to C that were removed in GCC 4.0 in 02006 (generalized lvalues) 2024-09-06 15:23:58 *was 2024-09-06 15:24:21 because that version of PFE supposedly supported MS-DOG 2024-09-06 15:25:45 I'm not sure if current PFE can currently take advantage of GCC global register variables 2024-09-06 15:26:29 What is MS-DOG? 2024-09-06 15:27:03 :-) 2024-09-06 15:27:54 Their code is all prefixed with p4_, reminds me of perforce! 2024-09-06 15:29:19 veltas: "A clone of CP/M for the 8088 crufted together in 6 weeks by hacker Tim Paterson at Seattle Computer Products, who called the original QDOS (Quick and Dirty Operating System) and is said to have regretted it ever since. Microsoft licensed QDOS in order to have something to demo for IBM on time, and the rest is history." 2024-09-06 15:31:52 https://foldoc.org/MS-DOG 2024-09-06 15:36:28 veltas: http://forth.works/share/3YCgkQ8NIi.txt for results of a few quick tests 2024-09-06 15:36:53 niice! 2024-09-06 15:37:19 it closed most of the gap, and all of it on some tests 2024-09-06 15:37:25 it's noticeably closer to the x86-64 assembly, and faster in some cases 2024-09-06 15:37:54 do you have a standard deviation for the tests? I'm especially skeptical of the "faster" 2024-09-06 15:38:36 Seems like best thing to try would be using the global register vars feature in GCC 2024-09-06 15:38:57 xentrac: no, I should put together something for more reliable benchmarking 2024-09-06 15:39:22 crc: it's a neverending problem 2024-09-06 15:58:10 I largely haven't worried about benchmarks 2024-09-06 16:00:04 I think there's value in having a faster VM, but I wouldn't be surprised if that's never been a limiter for crc 2024-09-06 16:00:38 they can be useful for improving performance sometimes. one of the nicest ones is cachegrind, because it gives you answers that are reproducible to twelve decimal places 2024-09-06 16:01:04 what they measure is only a proxy for actual performance, though! 2024-09-06 16:01:59 but it enables you to try optimizations and see if they increase or reduce your number of instructions executed or cache misses by 0.1% 2024-09-06 16:02:18 the SQLite team used that approach to double the speed of SQLite in a recent release 2024-09-06 16:02:55 speed is always a limiter when you're using a computer 2024-09-06 16:03:24 you're always deciding to not compute things because it would take too long 2024-09-06 16:05:08 I just leave long running calculations to run in the background and go do other things :) 2024-09-06 16:06:01 I have something that's really time critical, I'd just use assembly or C, and then move on 2024-09-06 16:06:08 right, which means you can't use those calculations in an interactive, exploratory fashion 2024-09-06 16:06:34 you have to plan them out, decide which ones to do, initiate them, and then come back later 2024-09-06 16:06:38 like punched-card days 2024-09-06 16:06:54 ACTION doesn't really mind that... 2024-09-06 16:09:28 I think the ultimate optimisation for ilo would be a JIT VM, which is missing the point of ilo a bit 2024-09-06 16:09:36 Although maybe a simple JIT would be alright 2024-09-06 16:10:21 even a simple JIT can sometimes make a big difference 2024-09-06 16:10:21 Weirdly the main motivation for writing it in AMD64 was to make the binary smaller and disassembly more readable 2024-09-06 16:10:34 And just because i was bored 2024-09-06 16:11:11 I think it's a testament to its design that it was easy to rewrite in assembly 2024-09-06 16:11:36 I can't say the same of most other high-level Forth VM's, in C or otherwise 2024-09-06 16:13:34 For most of what I do, performance isn't a big deal. It's not really a problem for me if it takes an extra few minutes to do some tasks. Gives my hands time to take a sort break, which can be helpful 2024-09-06 16:15:25 (re: jit, my son has said he wants to try this, though he's not shared any projected timeframe on actually writing it yet) 2024-09-06 16:16:47 I'm interested in reading a JIT solution 2024-09-06 16:17:06 I'd probably learn something from it 2024-09-06 16:17:08 Even a simple one 2024-09-06 16:18:35 I'll let you know one he has something ready to share 2024-09-06 16:31:14 xentrac: was that speed improvement specifically for x86-64? i could do with better performance from sqlite on arm (especially the low end targets like armv5tejl) 2024-09-06 16:54:19 unjust: that was the main platform they were measuring, but I think the results did generally improve performance on other platforms as well 2024-09-06 16:59:11 because I'm pretty sure the performance improvement was a lot more than 12% and it was a lot more recent than 02009 2024-09-06 17:00:14 aha, it was https://sqlite.org/draft/releaselog/3_8_7.html 2024-09-06 17:00:28 > Many micro-optimizations result in 20.3% more work for the same number of CPU cycles relative to the previous release. The cumulative performance increase since version 3.8.0 is 61%. (Measured using cachegrind on the speedtest1.c workload on Ubuntu 13.10 x64 with gcc 4.8.1 and -Os. Your performance may vary.) 2024-09-06 17:01:00 and here's a more general article on the topic covering 02009 to 02019: https://www.sqlite.org/cpu.html 2024-09-06 17:05:37 "Recent versions of SQLite use about one third as many the CPU cycles compared to older versions." 2024-09-06 17:06:46 The 3.8.0 to 3.8.7 part of the history mentioned above shows especially fast improvement, but it's been an ongoing process 2024-09-06 17:42:59 thanks xentrac 2024-09-06 17:45:37 have any of you played the trivia game in #cjeopardy? there's a bot that dishes out jeopardy style questions about C, mostly in relation to content from the ISO standard. i think it might be cool to have the same sort of game for forth 2024-09-06 17:45:51 the game play looks something like: 2024-09-06 17:46:04 unjust: Next question: 975) An end-of-file and a read error can be distinguished by use of these functions (name both, separated by "and"). 2024-09-06 17:46:14 !h 2024-09-06 17:46:23 unjust: Hint: .... .nd f..... 2024-09-06 17:46:34 !w feof and ferror 2024-09-06 17:46:44 unjust: 'feof and ferror' is correct! (12s) 2024-09-06 17:49:21 Based on Forth standard? 2024-09-06 17:49:46 initially, yes 2024-09-06 17:50:02 Not a good idea given most people here don't care about standard, and we don't need language lawyering 2024-09-06 17:50:21 but of course you could augment that database of questions with anything of relevence 2024-09-06 17:50:21 The standard also is quite defective 2024-09-06 17:51:19 You could make it more interesting with trivia like "The original name of the DUP word" or whatever 2024-09-06 17:51:29 Well interesting is subjective lol 2024-09-06 17:52:30 Bit of a waste of time really 2024-09-06 17:53:07 hah, like your discussions aren't equally full of shit on a regular basis ;) 2024-09-06 17:54:10 it's just an idea to garner a bit more exposure to random aspects of the language that many of us might not otherwise see 2024-09-06 17:55:27 you could ask about specific forths i guess 2024-09-06 17:56:07 "how is jonesforth described by its author?" 2024-09-06 17:56:10 "sometimes minimal" 2024-09-06 18:15:43 unjust: np 2024-09-06 18:16:45 I think the standard is useful for figuring out how to get things to work in existing Forths, or write code that's portable across (some) Forths 2024-09-06 19:15:58 unjust: Yeah I agree re full of shit 2024-09-06 19:16:54 xentrac: I think starting with the standard, but with a pinch of salt and not becoming a language lawyer is a good place to start, it's where I started 2024-09-06 19:17:08 It won't do you harm, I don't believe in programming sin unlike some people 2024-09-06 19:19:01 veltas: that comment was made in jest btw, hopefully the wonky/winky smiley conveyed that - i usually enjoy reading the discussions here 2024-09-06 19:28:29 You know I think the way emoticons are interpreted is quite regional 2024-09-06 19:28:56 I couldn't tell whether it was meant seriously or not but I did agree with the comment 2024-09-06 19:29:41 Emoticons are quite sarcastic to most anglos I think, and more sincere among other europeans 2024-09-06 19:30:40 I can't remember where you're from but I treat most emoticons like white noise regardless 2024-09-06 19:31:45 unfortunately it's hard to convey the tone of a message in text alone, so they are occassionally useful to infer that the message shouldn't be taken entirely seriously 2024-09-06 19:32:00 or at least that's my grasp on them 2024-09-06 19:33:16 Just my opinion but I think :P is more universally understood as a "I'm pulling your leg". More so than the phrase "I'm pulling your leg", even. 2024-09-06 19:33:34 I am highly influenced by my environment though 2024-09-06 19:33:59 we all are 2024-09-06 19:34:29 ACTION suggests lojban for a culturally neutral starting point. 2024-09-06 19:37:04 .xo'onai 2024-09-06 19:43:23 ... as corrupted by internet nerd culture 2024-09-06 19:44:26 veltas: yeah, agreed 2024-09-06 19:44:50 b 2024-09-06 19:44:52 oops 2024-09-06 19:46:04 thrig: yes, but it's a *shared* corruption