2023-06-25 12:29:16 Sounds like everyone's having a good Sunday! 2023-06-25 12:39:12 So far so good. Just mulling over memory layout details for this next go I make at this. 2023-06-25 12:39:32 I've changed some of my thinking recently, and I'm just ferreting out the details. 2023-06-25 12:39:35 mulled memory with allspice and cognac 2023-06-25 12:40:12 Mostly aimed at supporting this idea of "discardable" vocabularies. 2023-06-25 12:41:47 I think a "process" will basically be an empty vocabulary attached to the Forth vocabulary with an associated thread. 2023-06-25 12:42:12 When a process dies, that thread goes away and that vocabulary gets discarded. 2023-06-25 12:44:42 The Forth vocabulary will sit at the bottom of my overall memory space, and it will be the only vocabulary that's "imageable" (launchable from a binary form). All other vocabularies will have to load their code from source. 2023-06-25 14:18:33 I have never understood if bottom refers to toward address 0x0..0 or toward 0xF..F (same for top) 2023-06-25 14:19:08 I mean bottom == low address, top == high. But you make a good point - I've seen it drawn both ways. 2023-06-25 14:19:37 the end was resolved in the "Lilliput and Blefuscu" incident related by Jonathan Swift 2023-06-25 14:20:14 always prefere lo- and hi-mem instead 2023-06-25 14:21:19 lo as as low or lower, nearer 0x0..0 (because lower address number) and hi as in high or higer (ditto vice versa) 2023-06-25 14:43:19 That works fine too. 2023-06-25 14:48:12 You know, on this hash table stuff and the process of "table doubling." When you get tight on room. All of the keys in the table have to be re-hashed to move them to the other table. But it seems like a gradual process woould work there. When your table gets full, go ahead and allocate the new, bigger table, which is initially empty. Then every time you search for a word, search both tables (just 2023-06-25 14:48:14 calculate two hashes while parsing the word instead of one), and move that word. Wheneever you create a new entry, create it in the new larger table. 2023-06-25 14:48:42 And along side that, have a slow background process that gradually picks through the old table, making sure everything's moved. 2023-06-25 14:48:48 Then when all that's done you deallocate the old table. 2023-06-25 14:49:10 It would avoid having any large "pause" in normal operations while you took care of that move in one big operation. 2023-06-25 14:49:43 Even hashing into two tables is still going to be a lot faster than searching a long linked list. 2023-06-25 14:54:00 So, question for everyone. How long is the longest Forth word name you typically use? 2023-06-25 14:59:54 Best I can tell I have nothing longer than eight characters. 2023-06-25 15:01:04 I'm thinking about not using a separate string table associated with the hash table - just put the name strings directly in the fixed size hash table entries. The question of how much room to allow for that arises. 2023-06-25 15:02:48 I'm leaning toward 12 for the counted name string and 4 for the CFA offset. 2023-06-25 15:03:30 That would allow 11-byte long names, and I've never used more than 8. 2023-06-25 15:04:05 Of course that wastes a lot of space. 2023-06-25 15:04:51 KipIngram: the longest I have used is about 55 chars 2023-06-25 15:05:10 but I think 32 chars or so 2023-06-25 15:05:20 is the max generally 2023-06-25 15:05:22 My word names average 8 characters (or 5, if you do not include the namespace part). Longest in use is 30, but IIRC, almost all are 12 or less. 2023-06-25 15:05:42 Wow, what's 30 bytes long? 2023-06-25 15:06:09 note that I am often naming the words in Icelandic 2023-06-25 15:06:23 long compund words 2023-06-25 15:06:26 Oh, yeah - that would make a difference, wouldn't it? 2023-06-25 15:06:49 And I do have to remember that if I'm intending to support utf8 characters then the storage will be more. 2023-06-25 15:07:04 Most of those, though, will likely be only 1-2 visible characters. 2023-06-25 15:07:57 I don't know. The idea of dropping that many bytes on every item, even when the name is just one character, kind of offends me. The separate string table avoids that. 2023-06-25 15:08:19 My longest names are derived from databases I work on (naming includes table + field information) 2023-06-25 15:08:20 In that case the hash table would just contain a two-byte offset into the string table. 2023-06-25 15:08:45 And the string table would just be packed counted strings. 2023-06-25 15:09:07 Just used to do a final confirmation that I had indeed found the right thing (that it wasn't a collision). 2023-06-25 15:09:45 Then the hash table entries would just be six bytes. 2023-06-25 15:09:47 I prefer tries for interned strings 2023-06-25 15:11:14 The name length doesn't really matter in my two systems (other than being annoying to enter); Retro allows for unlimited (to memory limits) length names; and Konilo doesn't keep the name string. 2023-06-25 15:12:09 You just trust you won't get any collisions? 2023-06-25 15:13:04 yes. So far it hasn't been a problem. If it does become an issue, I'll either rename something or change the hash algorithm. 2023-06-25 15:13:15 Sure. 2023-06-25 15:13:28 Probably won't ever be a problem. How big is the hash? 2023-06-25 15:14:23 It's just using a djb2 hash, though restricted by the fact that I only have signed values, so overflows occur. 2023-06-25 15:16:46 Before starting to use it, I ran it against word name data from programs I've written over the last two decades, and against a few wordlist files from natural languages. There weren't any collision issues that looked troubling. 2023-06-25 15:22:30 My average name length is just under 5 chars, so 6 bytes with a count byte. It's all those formula-named conditional words. 2023-06-25 15:39:45 I'm not totally happy with the usual terminology around variables. VARIABLE and USER. I also see a difference between a VARIABLE in the Forth vocabulary, which will be shared by all processes, and one in a process's private vocabulary, which will just be shared by threads belonging to that process. Don't really NEED a different word for those - it would just be a matter of what the CURRENT vocabulary was 2023-06-25 15:39:47 when the thing was defined. 2023-06-25 15:40:11 But I lean toward wanting to use the word VARIABLE for a thread-private global variable, and something else for thread-shared variables. 2023-06-25 15:43:28 Also, I think I'm going to wind up with VOCABULARY (which I'll almost certainly call VOC) taking parameters. If each voc is going to get its own memory region, so that I can later discard it if I want to, then the system's going to need to know how much room to allocate for various things. 2023-06-25 15:44:03 VOC ALPHA 2023-06-25 15:45:28 So has anyone ever seen a Forth that recognizes literal suffixes, like 4kB, or is it always done with separate words like 4 kB? 2023-06-25 15:46:23 It wouldn't be very hard for me to make my NUMBER word handle those internally. 2023-06-25 15:55:37 I've only seen the suffixes done separately 2023-06-25 15:56:27 Definitely easier. 2023-06-25 16:14:39 4 kC ? 2023-06-25 16:14:47 c for cell 2023-06-25 16:15:30 :-)