2023-06-03 03:13:29 gordonjcp: Yeah I've got an intern to replace an excel monstrosity at work, which has surprisingly good code density compared to the other thing they've rewritten 2023-06-03 03:14:48 The other was an MFC horrorshow which genuinely was in the 100k region, and he just rewrote it based on spec and took it down to a few thousand lines of not-particularly-factored C++ 2023-06-03 03:15:43 If you manage to write something an intern can rewrite in a couple weeks at about 5% of the LOC then you should be hung drawn and quartered 2023-06-03 04:57:36 veltas: it's sheer madness, isn't it? 2023-06-03 10:02:04 Wiggins. 2023-06-03 10:02:58 Some years back at the office we had a guy whose name I can't even remember now, but he was super-highly thought of because he was a "fast coder." But I always felt like he just sat down and started churning out code before he'd given any real thought to the problem. 2023-06-03 10:03:09 lovecraft and some of them older authors got paid by the word 2023-06-03 10:03:21 He produced a lot of code and he produced it very quickly, and that was enough to impress management. 2023-06-03 10:04:01 That was really common in those days, when a lot of writers sold their stories to pulp fiction magazines. 2023-06-03 10:12:09 When I first hired onto the little company that eventually got bought by IBM I joined as the software engineering manager. It's odd how we find outselves in slots that don't necessarily correspond perfectly to our "best skills," but anyway, that's why I was there. I used to start my days with a cup of coffee and a perusal of git log; I'd at least *look at* all of the code that had been comitted the day 2023-06-03 10:12:11 before. 2023-06-03 10:12:34 Man, that guy's name almost came to me just now. I'll probably remember it in a few minutes. 2023-06-03 10:12:49 Michael something, I think. 2023-06-03 10:14:07 Anyway, you kind of start to see which ones actually think in terms of the effect their code's having on physical things, and which ones look at it as a more abstract thing. 2023-06-03 10:18:10 Anyway, IBM bought the place and in the assessment of everyone that ensued I think that (correctly) recognized that my managerial style wasn't a good fit with the IBM culture. So I had to find something new to do. 2023-06-03 10:18:41 I was bummed by that initially, but they offered me this big retention incentive that made leaving a poor decision. 2023-06-03 10:18:55 And by the time that played out (three years), I just wasn't that bummed any more. 2023-06-03 10:19:56 I found myself making the same kind of money with a lot less stress, and decided that that was probably not something to take lightly. 2023-06-03 10:21:21 I usually get reminded about that whole change process when I hear that Bruce Springsteen song "Glory Days." :-) 2023-06-03 10:21:58 Aha - Renken. Michael Renken. Turns out I'm LinkedIn with him. 2023-06-03 10:22:39 Looks like he's a software engineer at HackerOne these days. That feels like it would suit him. 2023-06-03 10:23:44 One kid that worked for me briefly during that period is over at SpaceX now. I liked him - he hired on to the small company, right at the end, and it turned out that his parents were both heavyweights in IBM, and his mom was the main financial person that coordinated the money aspects of the acquisition. 2023-06-03 10:24:07 He actually left shortly after the acquisition, because the whole point of him coming in the first place had been to NOT work at IBM. 2023-06-03 10:24:46 He could have had an IBM job dropped in his lap without even trying, but he was trying to "do it on his own," and the IBM turned around and bought the company he'd chosen to join. So he left. I couldn't help feeling a good bit of respect for that. 2023-06-03 10:28:20 It was pretty funny, though. Here the kid is, and his MOM shows up in his workplace. :-) 2023-06-03 10:29:34 The guy who owned that place left about five months after the acquisition. He had somehow presumed that he was going to pocket all that money and then go right on running the place exactly the way he always had. 2023-06-03 10:29:54 But immediately once the ink was on paper IBM started changing things, and he was most displeased. 2023-06-03 10:30:19 I had little sympathy. With the stroke of a pen he'd become the most wealthy guy I've ever known. 2023-06-03 10:31:42 He started another company and hired off a few of the best guys (which he'd almost certainly agreed not to do). But it didn't long and I don't know what he's done since then. Except involve himself in fairly "right leaning" politics. 2023-06-03 10:32:18 His wife (who I'd known from earlier days) made a couple of unsuccessful runs at the US House of Representatives. 2023-06-03 10:33:46 Anyway, I wouldn't have lasted there had it not been for the acquisition, because that guy was far and away the most "micro-managing" boss I'd ever had. By the time that started to become clear, though, I knew about the upcoming deal and smelled money, so you couldn't have driven me off. 2023-06-03 10:35:14 I used to joke to my wife that if he had a quality thoroughbred racehorse, he'd train it to walk along beside him and put its feet exactly where he pointed. 2023-06-03 10:37:37 Oh, say, I've been watching these lectures on data structures. Last night I was really keying in to how thoroughly serious computer scientists disapprove of data structures with "linear performance." 2023-06-03 10:37:52 I couldn't help thinking that that's exactly what Forth's dictionary yields. :-| 2023-06-03 10:38:08 It's just not how things are done in modern software. 2023-06-03 10:38:31 They want log performance at least, and constant where they can get it. 2023-06-03 10:40:59 It looks like state of the art is hashing based dictionaries, with table doubling used to manage the memory. 2023-06-03 10:41:37 When you have to do a table change that's slow, but the amortized performance is constant. 2023-06-03 10:42:26 The idea is that you start with a small hash table, and when it becomes full you make a new one twice as large (which involves re-hashing everything) and carry on. 2023-06-03 10:42:51 If you delete items such that the occupancy falls below some threshold, you go back to a smaller one. 2023-06-03 10:43:20 In a Forth system these expensive operations would only come at compile time. 2023-06-03 10:45:02 What you want is a hash function that you can iterate on, and it yields a permutation of all the slots. So if you go to add a new key and find that slot occupied, you just iterate the hash to get a "next slot." Eventually you find one that's empty, since eventually it gives you all of them. 2023-06-03 10:45:26 When you search, you have to go through the same process, and it "fails" when you come to an empty slot without finding your key. 2023-06-03 10:45:49 If you need to delete a key, you don't actually delete it yet - you just mark it with a "delete me" flag. 2023-06-03 10:46:06 Otherwise it would leave an empty slot that would terminate future searches prematurely. 2023-06-03 10:46:17 You do the actual deletion the next time you table double. 2023-06-03 10:54:35 Anyway, it has me thinking; I may choose to do something along these lines in this next system instead of the traditional approach. 2023-06-03 10:54:42 ACTION nods 2023-06-03 10:55:41 much thrashing as the table fills up to find those few empty slots 2023-06-03 10:56:12 Yeah, if you try to fill it all the way up. 2023-06-03 10:56:29 Usually they table double somewhere around 50%/60% full. 2023-06-03 10:56:49 And then table halve if it falls to 25%-30% full. 2023-06-03 10:57:30 You can write down math for the thrashing, and as I said it can be shown that it's amortized constant regardless of how many items are in the thing. 2023-06-03 10:57:55 Poor choice of hash functions can lead to what they call "clustering," which can kill your constant amortized performance. 2023-06-03 10:58:04 So, you just choose a wise hash function. 2023-06-03 10:58:28 It's actually not a perfectly solved problem, but they have methods that get "close" to optimum performance. 2023-06-03 10:58:44 And like I said, it's all compile time cost - no run-time impact. 2023-06-03 10:59:38 Anyway, I think the thing to do would be to have a global string table, the purpose of which is to associate a fixed size integer with every string encountered. Then your actual hash table(s) could have fixed size entries and be stored in an array. 2023-06-03 11:00:11 I haven't decided whether forming keys from vocabulary+name vs. having a hash table for each vocabulary would be the way to go. 2023-06-03 11:00:25 that's probably a pros and cons thing. 2023-06-03 11:00:52 Separate tables per vocabulary would make you linear in vocabulary count. 2023-06-03 11:01:05 But it's not like vocabulary count gets "big." 2023-06-03 11:01:53 If you used voc+name as key and had a single table, then you would need multiple appearances of that string in the global string table. 2023-06-03 11:02:03 One for each vocabulary it appeared in. 2023-06-03 11:11:02 I don't really know for sure about that global table - that may break the constant performance and I just don't see how yet. 2023-06-03 11:12:57 Note, though, that some step involved that's log(name length) doesn't count against the performance of the system vs. word count. 2023-06-03 11:14:05 I figure as WORD parses out the next word it will calculate that string's hash and also walk a binary search tree on the strings. So once you have the word you have its hash and you know whether or not it's in the table. And you know where to put it if it's not there yet. 2023-06-03 11:15:12 Then you could either iterate through separate vocabulary hash tables or else iterate through vocabularies and each time append that voc's id to the hash and use it on one big table. 2023-06-03 11:15:16 It's an iteration either way. 2023-06-03 11:15:41 So it looks like no matter how you do it there's a step that's linear in search path length. 2023-06-03 11:16:27 You want access to the string just to do a final confirmation to protect against collisions. 2023-06-03 11:17:29 thrig: the expect number of "probe iterations" is 1/(1-alpha) where alpha is the table fullness load. 2023-06-03 11:17:50 So you'd expect two on average when you're half full, ten if you're 90% full, etc. 2023-06-03 11:18:44 In a standard Forth system all this would be only compile time. In languages like Python it's pervasive - it's how almost everything is done. 2023-06-03 11:19:13 I guess that's why we can tolerate a linked list dictionary in Forth - we don't actually see any impact of that at run time. 2023-06-03 11:19:14 compiled systems might go for a perfect hash 2023-06-03 11:19:28 That only works if you have a pre-defined name list. 2023-06-03 11:19:33 Then you can search for such a thing. 2023-06-03 11:19:44 But if you're going to be adding arbitrary names there's no way to do it. 2023-06-03 11:20:21 The point, I think, is that two or three probes isn't really "thrash." 2023-06-03 11:20:35 It's just a small constant factor. 2023-06-03 11:21:03 At least that's how the computer scientists look at it, but yeah, I was just complaining the other day about how blithely they toss such factors aside. :-) 2023-06-03 11:21:22 It's still wildly better than a linear linked list. 2023-06-03 11:22:11 But we have had those "so what" conversations about this here. Forth is still faster than we are and we still don't notice those dictionary searches. 2023-06-03 11:22:21 It's "good enough" in a pragmatic sense. 2023-06-03 11:22:39 The real reason to look for better would be precisely to open the door to putting the mechanism to heavier use. 2023-06-03 11:22:53 Python "needs to." 2023-06-03 11:22:58 Forth really doesn't. 2023-06-03 11:24:04 But we also talk from time to time about more advanced things, like local definitions and variables and dynamic typing and so on - a high performance dictionary might offer "opportunities." 2023-06-03 11:25:57 I like t think of the run-time stuff as entirely separate from the compile time support, so that you could "decapitate" the runtime and have it work. But any runtime feature that capitalized on this stuff would haul all of that back into the required runtime environment. 2023-06-03 11:37:49 Apparently in older Python versions the key used for items was their memory address. So there were games you could play with that. 2023-06-03 11:40:18 Seems it's changed these days, though. 2023-06-03 12:14:50 https://en.wikipedia.org/wiki/Oh-My-God_particle 2023-06-03 12:17:05 bears, huh 2023-06-03 12:46:26 Oh, hey - this approach I'm thinking through shares the straightforward FORGET behavior of a normal linked list system. 2023-06-03 12:47:10 It's got somewhat different "structures," but they all grow in the same kind of "expanding only" sort of way. FORGETing would just prune each one of those regions back to an earlier state. 2023-06-03 13:28:22 Well, I was thinking about how to make that string table hold only one copy of any given string, but that is a waste, I think. The normal linked list doesn't accomplish that - when you re-use a string for a new word, you get a new copy of it in the dictionary. And any method that allows me to easily check whether I've seen a string before winds up adding some kind of data structure that would take up far 2023-06-03 13:28:24 more memory than a copy of the string is apt to, unless it was a very long string. 2023-06-03 13:28:43 And all you need from this is just a way to check at the very end that your search result really is matching the word. 2023-06-03 13:29:11 So I think every time you define a word you stick that string into the table, and use that new copy's address for the final word verification. 2023-06-03 13:29:49 Then it's literally nothing but a list of strings that you never search - when you put a new item in its address goes into the hash table, and you later follow that pointer back. 2023-06-03 13:30:29 That could be either a two byte or a four byte offset; two would likely cover most needs, but four would guarantee you never ran out of space. 2023-06-03 13:34:43 I had actually thought of some of this before, in particular the idea that a specific string should lead to a permutation cycle through your table space. What I'd never heard of before the last couple of days was the idea of "table doubling." It's one of those fairly obvious things, but I'd always just figured one would start with an empty table big enough to service your ultimate needs. 2023-06-03 13:34:56 Clearly that doesn't scale well. 2023-06-03 13:36:53 It does make the selection of a hash function quite an interesting puzzle - it's got to be something that yields that permutation and still supports operating with "powers" table sizes. 2023-06-03 13:39:22 It also raises an interesting issue of what to do when saving off a binary image. You'd like to minimize the size of that image. So maybe what you do then is look at how many items are in your hash table, and pay the cost of hashing all of them into a table that's as small as possible. That might involve a somewhat high probe count - just depends on how close your word count is to "full" for that smallest 2023-06-03 13:39:24 possible table. 2023-06-03 13:39:58 Then when you load the image you'd create a table the next size up, or maybe two sizes up, and hash everything over into that. 2023-06-03 13:41:50 It might be more sensible, though, to just save the table you have, and use it as is on reload. That makes your image somewhat bigger than it would need to be, but maybe that's just a price worth paying. 2023-06-03 15:09:29 Wow, I think I may have just identified my next "read." 2023-06-03 15:09:49 Once you've read The Dresden Files, it's hard to settle for anything less. 2023-06-03 15:10:06 But this just came highly recommended on the Dresden Reddit sub: 2023-06-03 15:10:08 https://images.squarespace-cdn.com/content/v1/563ba4c9e4b03a8b2f1ad563/3e672fdc-4033-4d97-89bf-40f0e89f61aa/HS+Reading+Order+1.jpg?format=1500w 2023-06-03 15:11:07 That's 29 novels and four short stories. :-| 2023-06-03 15:11:41 Dresden's only up to seventeen novels, and a 15-20 short stories. 2023-06-03 15:12:13 Hoping for #18, *Twelve Months*, next year maybe. 2023-06-03 15:13:00 The writer's said he'll be alternating back and forth between Dresden and his other, newer series, The Cinder Spires. Spires #2 is slated to drop in November, so he's now officially working on Twelve Months. 2023-06-03 15:13:41 The guy delivered the first 15 at a pace of about one a year, and then made us wait six years before we got 16 and 17 within three months of one another. 2023-06-03 15:13:50 It was originally one novel, but the publisher got him to split it. 2023-06-03 15:14:32 Dresden is seriously good stuff, though - if you haven't tried it out you oughta. 2023-06-03 15:15:44 I've tried several other series, trying to get another similar "fix," but so far nothing else has come close. 2023-06-03 15:16:21 Probably the closest so far has been a series about a dude named Yancy Lazarus. 2023-06-03 15:16:49 I've read two of them. They're... "alright." Which means they're good, but not on Dresden level. 2023-06-03 15:17:40 Butcher (Dresden author) has just slaughtered my ability to pick up practically anything and tell myself I'm reading something of quality. 2023-06-03 15:30:36 Ok, acquired "The Long Way Down," the first item in that long list - chapter 1 passes the smoke test. I'm already "interested." And it has the same nice "first person narrative style" that Butcher uses for Dresden - that mode works really well for me somehow. 2023-06-03 15:30:57 I think it promotes you connecting with the narrating character in a closer way. 2023-06-03 15:31:05 You ride around inside his head, more or less. 2023-06-03 15:31:46 Multi-pov, like in Game of Thrones, just makes me feel yanked around. Just as I'm getting comfortable in one person's head, I get yanked into another. 2023-06-03 15:32:13 Some of the Dresden short stories are pov other characters, always using that same first person style. But the novels are all pov Dresden himself. 2023-06-03 16:15:20 Ok, so what I'm picturing here is a string zone, which is just one counted string after another, catted after one another. Then there's the hash table - entries there have two fields - an offset into that string table where I can find a copy of the word's name, and my "index" that I've already been discussing (an xt basically). The rest is the same - CFA and PFA tables I use an xt to index into, and those 2023-06-03 16:15:21 point to the right stuff in the body region. 2023-06-03 16:16:52 you are now entering the string zone doo dee doo doo doo dee doo doo 2023-06-03 16:17:41 :-) 2023-06-03 16:18:20 I don't expect FORGET to be a constant time operation, and that may become a problem, but at worst it would reqire an extra field in the hash table entries. 2023-06-03 16:18:51 Because the hash table won't just "pare back" - stuff is scattered around all over in it. 2023-06-03 16:19:04 So I may have to record a path through those. 2023-06-03 16:19:46 Everything else does just pare back. 2023-06-03 16:23:00 I'll need to change my startup a little, though - what I'd written previously was code that allocated the main regigons using a syscall. But I don't want to do it that way after startup - I want to use my own memory manager. And doubling the tables will mean allocating new space for those things from time to time. So I want to get those using my own stuff out of a bigger initial region. 2023-06-03 16:23:10 Probably will just bite the bullet and do it that way for everything. 2023-06-03 20:49:24 KipIngram: i'll be giving dresden files a read then 2023-06-03 21:43:01 Oh, do let me know what you think. First one is called Storm Front. 2023-06-03 21:43:20 by Jim Butcher. 2023-06-03 22:45:22 aight