2022-03-17 01:47:20 An implementation of Interaction Nets: https://github.com/cicada-lang/inet 2022-03-17 10:20:01 Oh, that looks potentially interesting. 2022-03-17 10:20:15 I'm reading the "founding paper" now. 2022-03-17 10:24:26 Looks like a possible way to express parallelism, though it might run into the issues we were discussing a couple of days ago about modern processors having difficulty with really fine-grain parallelism, due to cache related performance impairments. 2022-03-17 19:38:14 Well, I just spent a quite fun afternoon seeking (and finding) and efficient implementation of modulo 255. My first code had idiv in it, which is just awful. I wound up with about a dozen lines with only moves, shifts, and add/subtract instructions. 2022-03-17 19:38:57 The one wart is that for exact multiples of 255 it returns 255, instead of 0. But one under returns 254, and one over returns 1, so I'll just have to catch 255 results and replace them with 0. 2022-03-17 19:39:34 It uses the idea that 1/255 = 2^-8 + 2^-16 + 2^-24 + 2^-32 ... 2022-03-17 19:40:51 That 255 residue problem arises from the fact that you have to truncate that series somewhere. 2022-03-17 19:43:43 I wanted it to work for all 32-bit unsigned values; I haven't tested it thoroughly yet, but I think it does. 2022-03-17 19:43:51 Except for the 255 cases. 2022-03-17 20:54:23 KipIngram, dividing by a constant is an interesting thing ive played around with too 2022-03-17 20:55:04 i was looking for a way to divide by 10 quickly and did a combination of shifts and adds but eventually figured out i could leave 3-4 shifts off, add a constant and round 2022-03-17 20:55:46 not sure how to generalize that kind of thing but i was wondering how to make it generalizable so you could generate the minimum number of operations for a given divide constant 2022-03-17 20:59:23 Do you know about Hacker's Delight? 2022-03-17 20:59:41 https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685/ref=sr_1_1?crid=37LV18Q8E0YCS&keywords=hacker%27s+delight&qid=1647565171&sprefix=hacker%27s+delight%2Caps%2C89&sr=8-1 2022-03-17 20:59:57 It's full of solid gold. There's a section in there on "dividing by multiplying." 2022-03-17 21:00:13 Has something to do with the algebra of finite integer fields. 2022-03-17 21:01:02 What I did today wasn't quite that, I don't think. I think I just did some fixed point arithmetic and then scaled it. 2022-03-17 21:01:19 Worked out because the fractional pieces was exactly what I needed to get rid of. 2022-03-17 21:01:29 s/pieces/piece/ 2022-03-17 21:25:42 neat 2022-03-17 21:25:54 why mod 255 out of curiosity? 2022-03-17 21:26:36 ya i know about divide by multiplying. C compilers use that trick on x86 for example since it's so much faster 2022-03-17 21:26:57 i was looking for something that would work on a 6502 so was sticking to shifts and adds 2022-03-17 21:31:41 That was just an arbitrary choice - I'm allowing 255 disk buffers in this new design. 4 kB blocks. That leaves one 4 kB block from a 1 MB allocation. There I'm putting 255 16-byte "buffer descriptors," where I store resident block #, dirty bit, etc. 2022-03-17 21:31:51 That last 16 bytes of the whole 1 MB will be unused. 2022-03-17 21:32:24 What I'm looking at right now is allocating RAM in 1 MB blocks using a simple fixed page size allocator, and then allocating smaller pages in those using that ropes stuff. 2022-03-17 21:33:28 A rope-based structure will be able to grow and shrink dynamically, in whatever size page units I've chosen for that type of data. Pages will be able to be any size, but just one size within a 1 MB block. 2022-03-17 21:36:26 hey guys 2022-03-17 21:36:35 I've got that book 2022-03-17 21:36:37 it's awesome 2022-03-17 21:36:52 thats cool. i hope it works out doing it like that 2022-03-17 21:36:56 It really is. 2022-03-17 21:37:14 I hope so too - it seems awfully promising to me, but then again I haven't actually made it and tried to drive it yet. 2022-03-17 21:37:20 No telling what I may have overlooked. 2022-03-17 21:37:48 ive been thinking about memory allocation for one project for a few years. trying to get a PC program written in C using lots of mallocs to fit onto a microcontroller with only 64k ram. i think garbage collection is the way to go at this point 2022-03-17 21:38:14 May be - down toward lower memories you really just can't afford any waste. 2022-03-17 21:38:44 ya gotta try to be sure. it's a bit overwhelming to think about since it's a subsystem that has to be absolutely perfect all the time. you cant have the slightest bug in the memory allocator 2022-03-17 21:38:52 Right. 2022-03-17 21:38:59 to get garbage collection to work on a small system you have to do stuff like what MicroPython does... but it still has the disadvantage of that you can't really do anything realtime with it 2022-03-17 21:39:24 Yeah, GC tends to just "seize things up" for little bursts of time. 2022-03-17 21:39:29 I've got a memory allocator I've written for zeptoforth which I wrote a whole program to aggressively fuz it with 2022-03-17 21:39:53 but by default I don't use the memory allocator except for my line editor 2022-03-17 21:39:54 Also, for a non-fixed page size system, you really really would like to be able to move things to a new spot and have them continue to run. 2022-03-17 21:40:03 That level of relocatability is a pretty tall order. 2022-03-17 21:40:15 Well, along with good performance, at least. 2022-03-17 21:40:26 luckily i wouldnt need real time for the garbage collected part of the memory. the challenging part is retrofitting a garbage collection system onto a large C codebase using malloc'd pointers. youd have to turn indirections into double indirections and take the speed hit but at least would never fragment your tiny microcontroller ram 2022-03-17 21:40:28 and then it uses a small heap - there is no global heap with this allocator, and you have to pass in the heap to ALLOCATE, FREE, and RESIZE manually 2022-03-17 21:41:19 I think I will be able to take a pointer to some random byte in a data structure and "find the ropes" in the system I'm picturing right now. 2022-03-17 21:41:31 (the allocator is not task-safe, so you have to protect the heap yourself if you want to use it across tasks) 2022-03-17 21:41:41 And you need to be able to do that, because stepping to the next bit of the structure will *sometimes* involve traversing the rope structure. 2022-03-17 21:41:50 Just a small fraction of the time, but sometimes. 2022-03-17 21:42:22 I'm not planning to refernce ropes via the root. 2022-03-17 21:42:43 I'm going to refernce them via the first leaf page - I'm adding parent pointers to the description I read in Wikipedia. 2022-03-17 21:42:54 That way the rope structure will just be totally hidden in the background. 2022-03-17 21:43:01 Knitting pieces of a structure together. 2022-03-17 21:43:17 So strings will be just like regular strings, unless they need to be bigger than one page. 2022-03-17 21:43:27 Won't even have any rope gear. 2022-03-17 21:44:09 And I want the rope gear for every different type of structure to be the same. 2022-03-17 21:44:57 The idea is that an element on the Forth stack, or any other 64-bit entity in the system, can be an integer, or a float, or an address, just like always. OR it can be a "pointer to a typed structure." 2022-03-17 21:45:10 If such structures become large enough, they'll acquire roping. 2022-03-17 21:46:07 If I *know* I'm dealing with a "small structure," I can navigate around just like normal, because the thing will be in one page. But when in doubt, I'll need to fall back to generic utilities for moving along the structure. 2022-03-17 21:46:25 Since for all I know I might be at the end of a page. 2022-03-17 21:46:51 So for all the strings we normally deal with in traditional Forth (tib, etc.), I can treat it the same as always. 2022-03-17 21:47:04 A long string, though, like a whole file I read in as a string, will get roped. 2022-03-17 21:47:25 I'm planning for those "pointers to structured entities" to have some type bits. 2022-03-17 21:47:34 And a pretty massive address. 2022-03-17 21:48:41 Anything outside of the RAM region the process is actually running in, I'd expect to reference using those pointers, and the memory would have come from this allocation system. 2022-03-17 21:49:03 But if I'm not using any of that, then there's no burden on the Forth system at all - no inefficiency for "regular" operation. 2022-03-17 21:49:48 I might decide to have the tib and other system strings reside in that stuff, but it won't be mandatory. And even if I do, it will be a short string residing in a single page. 2022-03-17 22:48:22 here's a trick - if you assume all addresses are 4-byte aligned, you automatically get two type bits 2022-03-17 22:50:33 and just have all addresses have the type bits be %00 2022-03-17 22:51:19 one can get away with 31-bit or 63-bit integers (like OCaml) if you have three types 2022-03-17 22:51:32 and have the lowest bit set to 1 always mean an integer 2022-03-17 22:51:59 which gets you one other type to work with 2022-03-17 22:52:15 if you really want four types, then you're stuck with 30-bit or 62-bit integers 2022-03-17 23:01:31 Yes - just so. :-) 2022-03-17 23:02:00 I use that trick to store two bits of status in the 16-bit link field of my headers. Because everything is 32-bit aligned; I pad the names with 0's if necessary to get that. 2022-03-17 23:03:02 One of those is my "immediate" bit, and one is a bit to indicate that the definition is transient - those will get "unlinked" the next time I run a word called .wipe. 2022-03-17 23:03:26 .: sets the bit, but otherwise functions just like : 2022-03-17 23:03:39 So I can have 2022-03-17 23:03:44 .: foo ... ; 2022-03-17 23:03:50 : bar ... foo ... ; 2022-03-17 23:03:53 .wipe 2022-03-17 23:04:01 and now I have bar in the dictionary, but no foo. 2022-03-17 23:04:38 In the last system I actually put those names and links in dynamic RAM, so I saved those bytes of RAM too. 2022-03-17 23:04:53 This time I didn't do that - I'll get them out of the word list, but won't save the memory. 2022-03-17 23:07:42 The 16-bit link, the count byte, and the first byte of the name string go in one cell. Lengths 2-5 take an additional cell, etc. 2022-03-17 23:07:59 The first and last bytes of the actual name characters have the high bit set; I used that to scan over names. 2022-03-17 23:08:06 Pad bytes following name chars are zero. 2022-03-17 23:08:19 Then there's a 32-bit CFA and a 32-bit PFA. 2022-03-17 23:08:33 Except primitives, which have no PFA field. 2022-03-17 23:11:17 Both the CFA and PFA fields store offsets from the base origin the system is loaded at. 2022-03-17 23:11:34 Wait - that's not true. 2022-03-17 23:11:44 The CFA pointer might point at any alignement. 2022-03-17 23:12:04 Still an offset - but not an aligned offset. 2022-03-17 23:12:21 That means I could grab two bits of status from every PFA pointer too.