2022-12-22 00:03:53 What I really wish is that I had a way to easily change NEXT, system wide. The NEXT address is in a register, and for a while I thought that would make it easy. 2022-12-22 00:04:06 Just point that register at a different NEXT. 2022-12-22 00:04:25 But then I realized I was ALSO using that register as "base of system" (NEXT is the very first thing). 2022-12-22 00:04:40 So I can't change it. I'd have to dedicate another register to be purely NEXT. 2022-12-22 00:04:50 And I'm almost out - I just wasn't quite willing to do that. 2022-12-22 00:05:25 But being able to change it would mean I could have the lean optimum NEXT generally, and then replace it when I wanted to use additional features, like background words, profiling, etc. 2022-12-22 00:06:52 The other way to get that optional optimality is to re-write the NEXT routine on the fly. 2022-12-22 00:09:54 Oh, the other little rub I've run into is the double returns (conditional or unconditional). Those require popping the return stack twice, IF the return is actually done. Not exactly sure what the resolution of that is going to be. It means I need second item on return stack to go into IP, and third item on return stack to go into TOR. 2022-12-22 00:10:18 And I can't read them both from RAM at once. Or, maybe I can - if the RAM is duel ported, then that may get me there. 2022-12-22 00:10:47 Yeah, I guess it will. 2022-12-22 00:12:05 So single return will do TOR to IP, top RAM item to TOR, and double return will to top RAM item to IP, next RAM item to TOR. And bump the counter by two instead of one. 2022-12-22 00:12:54 Counters, plural, that is - I guess I'll have to have two separate up-down counters for that. 2022-12-22 00:13:57 That does mean, though, that the return stack RAM won't have a general access port - won't be in data ram address space. 2022-12-22 00:15:20 So if I want to multi-thread that RAM will have to be big enough for however many return stacks I support. 2022-12-22 00:15:27 Same for the data stack. 2022-12-22 00:15:58 And RP! will have to load both counters simultaneously. 2022-12-22 00:16:37 It can use the 1+ or 1- ALU output to get the other required value. 2022-12-22 02:01:58 Yeah, I think this dual-port RAM trick will come in handy for the data stack too. One port will address the current NOS location and will always be a read port; that's where NOS will come from to go into the ALU. And the other port will always reference the location TOS *will be* written into, if we're pushing something new onto the stack. That makes both locations handy within a single cycle. 2022-12-22 02:03:39 That's a bit different from the return stack handling; it needs a different treatment to make those double returns work out. 2022-12-22 02:06:50 I think that one will need to be able to shift from "next TOR push address" + "current NOR read address" to one down from there, just on those special instructions. 2022-12-22 02:07:15 Gonna be a little tricky, but it feels like it'll work out. 2022-12-22 02:07:51 If I only used the double returns occasionally I might give up having them be instructions and code them manually, but I actually use them fairly often. 2022-12-22 02:09:27 Oh, hey - Digikey says my FPGA dev board should arrive tomorrow. That's fast service! 2022-12-22 02:16:57 just wanna say I enjoy your company, KipIngram. 2022-12-22 02:17:16 thank you for being here. 2022-12-22 02:26:13 Oh man - thank you; I sometimes feel like a frigging firehose. I guess on the other hand, though, it's usually not like I'm blocking someone else from talking; the bandwidth is here. :-) 2022-12-22 02:26:27 I'm glad someone's finding some positive value in it. 2022-12-22 02:26:48 Hopefully I can actually get the thing DONE this time. 2022-12-22 02:26:50 the alternative is silence, and the first-mover effect means that nobody's going to talk. 2022-12-22 02:27:10 I scroll back through your thoughts pretty often. 2022-12-22 02:27:28 you're part of a functioning and thriving community. 2022-12-22 02:27:49 I was happy to hear dave0's update too - I've found some of his tinkering over time pretty interesting too. 2022-12-22 02:28:51 When he first showed up here a few years ago, he was working on a Forth (x86) that used the stack pointer as IP, and "next" was just a return instruction. :-) 2022-12-22 02:28:59 I found that fairly fascinating. 2022-12-22 02:29:00 _that_ is fascinating. 2022-12-22 02:29:06 :-) 2022-12-22 02:29:21 Which means it was direct threaded, of course. 2022-12-22 02:29:25 Would have to be. 2022-12-22 02:29:58 that's a really interesting minimization.. 2022-12-22 02:30:21 I was worried about interrupts, but it turns out the x86 uses a different stack pointer / region for interrupts. 2022-12-22 02:30:28 Probably for security reasons. 2022-12-22 02:30:54 So no danger of interrupts munging his code. 2022-12-22 02:31:55 Yeah, exactly. I was quite taken with it, so he went right onto my "interesting person" list. 2022-12-22 02:32:32 Heck, though, I've even enjoyed the playing around vms14 has been doing. It's not really what I'd call Forth, but I think he's getting some good self-education value out of it. 2022-12-22 02:32:53 just the fact that people are _doing things_ makes me happy. 2022-12-22 02:33:06 For sure. 2022-12-22 02:33:09 everywhere else, I see people doing the same damn things. 2022-12-22 02:33:20 y'all are like a breath of fresh air. 2022-12-22 05:49:29 KipIngram: x86 can put interrupt return address on the stack, in some legacy modes, but yeah when you're working in userspace that's not an issue 2022-12-22 05:50:29 And in fact the systemV ABI for x86-64 allows you to use the area below the stack pointer as scratch https://en.wikipedia.org/wiki/Red_zone_(computing) 2022-12-22 05:51:04 So leaf functions with a small amount of stack usage often don't need to allocate any stack at all 2022-12-22 05:58:37 But when you're not in userspace I think the return-from-interrupt address goes right on the stack, which I think is why kernel code has redzone disabled 2022-12-22 06:36:34 i read in the intel manuals that each protection ring has it's own stack.. todays os's only use ring 0 and ring 3 and interrupts and exceptions use ring 0 stack 2022-12-22 06:36:43 user programs are ring 3 2022-12-22 06:37:39 its kind of nice that you can do funky things with the stack register is user mode :-) 2022-12-22 06:37:43 in* 2022-12-22 06:51:04 PowerPC has dedicated registers that receive the return address, so stack isn't touched by any kind of interrupt 2022-12-22 06:51:14 At least newer PowerPC, not sure about old 2022-12-22 07:00:38 i would like to see forth on a RISC machine .. RISC-V forth would be interesting 2022-12-22 07:00:51 I think that's already a thing 2022-12-22 07:00:58 I'm guessing mecrisp is on ARM 2022-12-22 07:01:24 first link on google for "risc-v forth": https://github.com/theandrew168/derzforth 2022-12-22 07:01:29 I only mention PowerPC because it's the assembly/arch I've worked with most 2022-12-22 07:01:40 veltas: ah macintosh? 2022-12-22 07:01:47 No never used a mac 2022-12-22 07:02:03 My company makes PowerPC and x86 machines for use in military/aero etc 2022-12-22 07:02:10 I write firmware for said computers 2022-12-22 07:02:20 oh wow cool 2022-12-22 07:02:40 :O 2022-12-22 07:02:51 veltas: in forth? :D 2022-12-22 07:02:53 So I've worked extensively on PowerPC boot stuff for instance, I'm very familiar with the low level 2022-12-22 07:03:05 olle: No but conceptually there's no reason I couldn't 2022-12-22 07:03:11 Business reasons mean not 2022-12-22 07:03:27 Technically there are boot loaders that use forth 2022-12-22 07:04:23 Although at the moment I'm working on more embedded stuff on the boards, like a K2 SoC (ARM) 2022-12-22 07:09:16 kk 2022-12-22 09:07:28 dave0: there's also a variant of mecrisp for risc-v: https://github.com/hansfbaier/mecrisp-quintus 2022-12-22 09:32:14 I'm not sure what would make it particularly "different." 2022-12-22 09:32:50 Isn't the primary (generic) distinction between RISC and CISC that RISC only works on memory with load and storee operations, and everything else is register-centric? 2022-12-22 09:37:59 So, here's another reason to have a "next on stack" register. With that, then each clock edge clocks new values (potentially at least) into the top to stack elements. Only after that clock-to-output delay can the values start to percolate through the ALU circuitry. 2022-12-22 09:38:10 Without NOS registered, that becomes a RAM access time. 2022-12-22 09:38:37 So if those times are different, then having it in a register is faster. 2022-12-22 09:48:19 Of course, it's all awfully fast, but there is a noticeable difference. For one speed grade the RAM access is around 3 ns, while the register clock to output is down around 1. 2022-12-22 09:48:57 Wow, that's fast. I haven't worked with this stuff in quite a few years; it was several times slower back then. 2022-12-22 09:49:42 3 ns sometimes 2022-12-22 09:53:25 KipIngram: RISC/CISC is a bit arbitrary in my mind. But yeah usually they limit storing/loading to special instructions and not as part of other operations 2022-12-22 09:53:58 Max clock frequency in the data sheet I'm looking at now is 311 MHz. 2022-12-22 09:54:01 And they also are meant to be more orthogonal 2022-12-22 09:54:19 Oh, right, yes. 2022-12-22 09:54:40 x86_64 is "more orthogonal" than the older 86 chips. 2022-12-22 09:55:16 I was tinkering one day with 16-bit mode and found that certain instructions I was using just weren't allowed in that mode. Certain things you just couldn't do with certain registers. 2022-12-22 09:55:55 Yeah I suppose another RISCism is having all general purpose registers and less special purpose registers 2022-12-22 09:56:07 it's like the bolted it together and somehow the ball of mud flies 2022-12-22 09:56:18 Wow - basic LUT combinational delay is just 0.15 ns. 2022-12-22 09:56:38 Even in modern x86 you get longer/slower instructions using some registers over others, or sometimes are forced to use certain registers 2022-12-22 09:56:50 Right. 2022-12-22 09:56:56 I think edx:eax pair is still required for unsigned division 2022-12-22 09:57:08 And faster/shorter for multiplication and signed division 2022-12-22 09:57:13 Or something like that 2022-12-22 09:57:44 Probably has minimal if any effect on performance now 2022-12-22 10:00:44 It is. 2022-12-22 10:00:50 Something like that. 2022-12-22 10:01:56 So, looks to me like I can take the minimum clock period, subtract the register clock-to-output value, subtract the setup time required on the register inputs, and then divide what's left by the LUT delay. That gives a feel for how many levels of logic you can have before that logic delay starts to gate the speed. 2022-12-22 10:02:11 I'd have to assess routing delay too, though, but the general idea seems sound. 2022-12-22 10:02:33 And INT_MIN/-1 throws an exception in x86 (which will crash an SysV C program) 2022-12-22 10:06:09 Which of course is valid C, because it's a signed overflow, so it is allowed to raise a signal 2022-12-22 10:06:21 But confusingly I think it gets labelled a "floating point exception" or something 2022-12-22 10:07:11 KipIngram: Also I think RISCs tend to have a "word size" and do everything in that size 2022-12-22 10:07:29 That seems likely. 2022-12-22 10:07:39 https://thrig.me/tmp/not-feeling-it.png 2022-12-22 10:07:42 Whereas in x86 you can operate with 8-bit, 16-bit, 32-bit, 64-bit as you like 2022-12-22 10:07:43 The alternative is fairly expensive in logic. 2022-12-22 10:08:10 Expensive in size of instructions actually 2022-12-22 10:08:22 Yeah, that too. 2022-12-22 10:08:23 RISC can be more compact, and RISC-V is supposedly very compact 2022-12-22 10:08:51 Although PowerPC fails at this, instructions are all 4 bytes and it takes like 6 instructions to load a 64-bit immediate 2022-12-22 10:10:28 Wow. I did the calculation I just outlined, and the result was 15 layers of LUT logic between register outputs and inputs. It's not really meaningful without routing delay inclued, but 15 is a rather large number there. 2022-12-22 10:10:45 I may be able to have as my goal "support the maximum device clock rate." 2022-12-22 10:12:21 A dumb simulation of an instruction set in C will run around 300MHz easily on a modern computer 2022-12-22 10:12:43 Has anyone ever written anything on typical values for the fraction of time or cycles that Forth spends executing actual instructions vs. navigating definition trees? 2022-12-22 10:12:49 Obviously it would depend on coding style. 2022-12-22 10:12:57 But it seems like something interesting to study. 2022-12-22 10:13:26 What do you mean navigating def trees? 2022-12-22 10:13:29 Yeah, modern computers are pretty impressive. 2022-12-22 10:13:36 Like going up and down the high level stack? 2022-12-22 10:13:39 docol / ret. 2022-12-22 10:13:55 vs. "getting things done" primitive instructions. 2022-12-22 10:14:38 MrMobius will say this is a silly question anyway because the instructions that "get stuff done" are not doing it very efficiently anyway 2022-12-22 10:14:46 i.e. doing it on a stack rather than in registers 2022-12-22 10:15:47 That's fair, although in a hardware Forth some of that stuff is in registers. Registers regarded as part of a stack, granted, but still regs. 2022-12-22 10:15:56 In my opinion Forth is "relatively fast" compared with non-JIT scripting languages in general, and you've got the tools to write in assembly code when it's really needed 2022-12-22 10:16:13 Yes, that's how I feel about it too. 2022-12-22 10:16:58 isn't it that a particular implementation is either fast or slow (and that it is either interpreted or compiled) -- i.e not an intrinsic trait of the language itself? 2022-12-22 10:17:17 So, if I decide I want external RAM on this thing, slower things start happening. Timing on the chip that involves I/O pins is quite a bit s lower. 2022-12-22 10:17:33 So it wouldn't make sense to have external RAM "work like" internal RAM. 2022-12-22 10:18:04 I think you'd want to treat it more like a peripheral. Load address outputs with one instruction, read in the data with another later. 2022-12-22 10:18:04 jackdaniel: Find me a slow forth implementation 2022-12-22 10:18:31 There aren't any I'm aware of, if we're talking relative to i.e. Python. Forth is always not "terrible" performance 2022-12-22 10:18:41 jackdaniel: Well, "interpreted" is slower than compiled, because interpreted stuff has to search the dictionary. 2022-12-22 10:18:57 Threaded code in general is a 'fast' approach, and it's possible because forth is untyped and does not have 'dynamic' types/tables etc 2022-12-22 10:19:02 Those would be things you'd assess separately. 2022-12-22 10:19:48 And the interpreted speed can vary wildly, depending on how that search is implemented. 2022-12-22 10:20:05 GForth uses a hash table, and is a LOT faster than mine, which searched the dictionary as a linked list. 2022-12-22 10:20:15 I'm not worried about compilation performance 2022-12-22 10:20:23 sure, my point is more that speed is not a property of the language but the implementation; and you've seemed to talk generally about the language in terms of performance 2022-12-22 10:20:28 gforth therefore probably eats more memory 2022-12-22 10:20:30 Again, find me a forth that compiles slowely verses i.e. Rust 2022-12-22 10:20:41 I'm not either, really. Most of the time the amount of code you're compiling is small enough that it's going to be more or less instant anyway. 2022-12-22 10:20:44 I keep saying i.e. when I mean e.g. 2022-12-22 10:21:19 That's why I'm rather curious about the phase Chuck apparently went through back around 2000 or so where he was rabidly focused on source code size and compile speed. 2022-12-22 10:21:29 It seems like he put a LOT of work into optimizing those. 2022-12-22 10:21:33 He must have had a reason. 2022-12-22 10:21:51 My ZX Spectrum forth compiles faster than Rust for my small programs, maybe an unfair comparison but IMO it's a massive handicap to run on an 8-bit 3.5MHz computer so it's fair 2022-12-22 10:22:02 I really do think he was wanting to recompile his whole chip design system on every edit keystroke or something like that. 2022-12-22 10:22:11 lower compile speed means faster iteration on new code, rather than waiting for an outlier like rust to get done 2022-12-22 10:22:15 And that uses list dictionary lookup 2022-12-22 10:22:28 Yes, but once it's too fast for you to notice it's enough. 2022-12-22 10:22:41 List dictionary lookup is fine when your dictionary is either not massive, or is partitioned into wordlists sensibly 2022-12-22 10:23:10 I could fairly easily bolt an on-the-fly hash table onto mine if I wanted to; sometimes I've felt it would be a reasonable way to use the wealth of RAM on most notebooks. 2022-12-22 10:23:20 But... it's just in my pocket for "if I ever need it." 2022-12-22 10:23:20 That's how gforth works internally I think 2022-12-22 10:23:41 The hash table is written in Forth, it runs without it initially I think 2022-12-22 10:23:59 gforth is structured like a classic forth internally I think, at least the copy I've got is 2022-12-22 10:23:59 Makes sense. Look the word up for real one time, and then subsequent searches are hash speed. 2022-12-22 10:24:25 Go straight to the hash result and do the name compare to confirm. 2022-12-22 10:24:30 When I say "initially" I mean during early startup, not on first lookup 2022-12-22 10:24:53 You could even skip the name compare if you "trusted" your collision rate was low enough you could ignore it. 2022-12-22 10:24:56 The hash table is in their forth source code, it's used for most of the lookups (i.e. it's INCLUDED early) 2022-12-22 10:25:24 Oh, so it's not built on the fly then. They have a pre-existing table? 2022-12-22 10:25:38 I figured I'd just start with an empty table and populate it as I did searches. 2022-12-22 10:25:40 It's built on the fly 2022-12-22 10:26:13 Don't you have a fixed size hash table? Maybe I'm thinking of mark4... 2022-12-22 10:26:13 As you said, though, I've never been to focused on that speed. 2022-12-22 10:26:35 What I was really interested in was how much time NEXT, (:), and (;) took. 2022-12-22 10:26:59 It matters when it matters, if it starts to matter use a hash table 2022-12-22 10:27:05 NEXT measured out to around 1 ns; a bit over. 2022-12-22 10:27:22 Yes - my plan exactly. If it ever matters I'll use a hash table then. 2022-12-22 10:28:01 I did some very rough scratching the other day, though, and decided that was at least 100k compiled source words, and maybe considerably more, though. 2022-12-22 10:28:19 That was to get a compile that excheeds that Dogherty bound. 2022-12-22 10:28:39 That's a lot of compile. 2022-12-22 10:29:27 I was reading about Chuck's compile speed work, and it's always tempting to want to "follow Chuck" in this stuff. But I wound up deciding that even IF it had been worth it back in those days, it probably isn't any more. 2022-12-22 10:29:49 or maybe he got obsessed about the problem 2022-12-22 10:30:05 Possible - he's human too. 2022-12-22 10:30:14 Chuck said you should write a forth web browser 2022-12-22 10:30:15 No matter how good a person is, they're not perfect. 2022-12-22 10:30:25 Yes, he did. 2022-12-22 10:30:30 No, you specifically KipIngram 2022-12-22 10:30:30 I saw that a few days ago too. 2022-12-22 10:30:43 He told me :P 2022-12-22 10:30:49 :-) 2022-12-22 10:31:07 I wish I'd done enough things to get noticed by Chuck. 2022-12-22 10:31:16 I'm a face in the crowd, man. 2022-12-22 10:31:18 A forth web browser wouldn't be w3 standard, it would have to simplify the job 2022-12-22 10:31:29 Forth is your portable language, so needs to be built around forth 2022-12-22 10:31:34 My wife is a fairly well known person in her little "niche specialty," but I'm definitely not. 2022-12-22 10:31:48 What is her niche speciality? 2022-12-22 10:32:24 Well, very small niche. The design of offshore oil pipeline components of the type they call "flexible pipe." 2022-12-22 10:32:50 They're these "layered" structures, with plastic layers, little metal pices that interlock with one another to provide various types of mechanical support, etc. 2022-12-22 10:33:00 Interesting 2022-12-22 10:33:04 A pipe might have seven or eight different kinds of layers. 2022-12-22 10:33:41 "Pipe" - doesn't sound too exotic. But apparently this type of pipe is a pretty specialized thing. 2022-12-22 10:33:53 What company does she work for? 2022-12-22 10:34:14 McDermott International. Sort of a "middle tier" player in the industry. 2022-12-22 10:34:31 She used to work for one called Technip; they're a big fish. 2022-12-22 10:35:00 I worked for Baker Hughes for a few months 2022-12-22 10:35:08 Anyway, my impression is that there may be only a couple dozen or so folks in the world who are top level experts in that stuff. 2022-12-22 10:35:43 As opposed to me being in "digital logic design," which is everywhere. 2022-12-22 10:36:01 Forth is probably one of the most understood languages, and one of the languages with most implementations 2022-12-22 10:36:08 Somewhere close to lisp maybe 2022-12-22 10:36:08 and EE folks bemoan the loss of talent to softwares 2022-12-22 10:36:49 Well, couple days ago I was bemoning software methods "taking over" my specialty. 2022-12-22 10:38:15 But, given that little calculation I did earlier it looks like being too focused on "minimizing logic layers" may not have any real payoff. 2022-12-22 10:38:32 All the software has to do is get the layer count below that value imposed by the max clock frequency. 2022-12-22 10:38:42 Once it's there, further reduction brings no gains. 2022-12-22 10:39:05 So "optimal" as I defined it a day or to ago looks not strictly necessary. 2022-12-22 10:39:34 Still has value in making the design easy to understand, but no performance payoff. 2022-12-22 10:40:08 In fact, one could argue that adding complexity to the instruction set, so that more could be accomplished with single instructions, might be a win. 2022-12-22 10:40:14 Clock speed not affected, but does it affect power draw? 2022-12-22 10:40:18 If your instruction rate is going to be the same either way. 2022-12-22 10:40:28 Oh, it might. 2022-12-22 10:40:29 The greenarrays chip seemed to be quite low power 2022-12-22 10:40:31 Good point. 2022-12-22 10:40:39 It is, and that's pretty impressive to me. 2022-12-22 10:40:47 No way this stuff I'm doing will come anywhere close to that. 2022-12-22 10:41:01 Chuck claims part of the way it does that is by being fully asynchronous. 2022-12-22 10:41:21 Only necessary changes occur - most of the chip remains quiescent most of the time. 2022-12-22 10:41:42 There's no clock 2022-12-22 10:41:43 And logic of this type really only consumes significant power when you change register outputs. 2022-12-22 10:41:50 Right. 2022-12-22 10:42:05 Which means no clock distribution hassles. 2022-12-22 10:43:00 Just having a snappy clock distribution network ups your power requirement. 2022-12-22 10:43:11 You're driving that chip-wide clock net whether it's needed everywhere or not. 2022-12-22 10:43:31 More capacitance to charge / discharge. 2022-12-22 10:44:27 I suppose forth ecosystem should bemoan effort lost to reimplementing forth lol 2022-12-22 10:44:46 But we'd be bemoaning ourselves 2022-12-22 10:48:55 Forth seems conducive to the principles of free software 2022-12-22 10:49:18 Yeah, I guess so, but Chuck strongly advocated the kind of tinkering we do. 2022-12-22 10:49:48 Exploring different and potentially better ways of doing things. 2022-12-22 10:50:05 It's one of the reasons he worried about seeing it standardized. 2022-12-22 10:50:45 And I saw a Liz Rather quote a couple of days ago; they were discussing cmForth's use of multiple word lists instead of IMMEDIATE. 2022-12-22 10:51:09 After one disucssion she basically siad "Yes, that's a fine scheme. But it's not standard so we no longer use it." 2022-12-22 10:51:37 In the video where he said to make a browser I think he basically said "stop working on the language, it's good enough, *make* something with it" 2022-12-22 10:52:01 Oh, he's said the opposite in some places, though. 2022-12-22 10:52:19 Though, I'm sure he meant "try to have a problem you're solving." 2022-12-22 10:52:24 I agree with that sentiment to actually make stuff 2022-12-22 10:52:28 Even if I'm a hypocrite 2022-12-22 10:52:29 Instead of just tinkering with the tool the way I do. 2022-12-22 10:52:41 He'd probably say that it should be your target problem that motivates you to change the tool. 2022-12-22 10:53:01 Me too, but I'm just as bad. 2022-12-22 10:53:33 He reduces software to stuff that manipulates the 'input' into the 'output' 2022-12-22 10:53:37 Tinkering with the tool is most of what I've done, for 40 years. 2022-12-22 10:53:53 With that view, worrying about the language has diminishing returns quickly 2022-12-22 10:54:10 Yeah - you have to know what your input and output ARE. 2022-12-22 10:56:12 If you want a web browser, you're immediately faced with doing a TCP/IP stack, and my understanding is that that's pretty hard. 2022-12-22 10:56:24 Not hard to get "something to work," but hard to handle all the special cases. 2022-12-22 10:57:06 Years ago I saw a little FPGA project that was a "simplest possible make something move across the network" gadget. 2022-12-22 10:57:19 But apparently it neglected a LOT of things that the stack calls for. 2022-12-22 10:57:55 such as? 2022-12-22 10:58:02 It's been too long. 2022-12-22 10:58:09 This was like 8-10 years ago. 2022-12-22 10:58:26 I know next to nothing about TCP/IP. 2022-12-22 11:02:56 I do think TLS is a basic requirement 2022-12-22 11:03:14 So having conventional crypto algos is necessary 2022-12-22 11:03:59 TLS is a rather complicated protocol, especially with the NSA stuffing it full of extras 2022-12-22 11:04:21 You don't need all the extras though 2022-12-22 11:04:44 Define a subset, also TLS/TCP/IP might be more complicated than necessary 2022-12-22 11:04:51 Forth's alternative should simplify it 2022-12-22 11:06:21 veltas: haha not a silly question :P 2022-12-22 11:06:41 ACTION hides 2022-12-22 11:08:07 veltas: and agreed forth is not slow compared to python or on old systems compared to basic for example. just slow compared to C and assembly in embedded 2022-12-22 11:08:55 C is so fricking slow in embedded developement though 2022-12-22 11:15:16 C is slow in performance? Compared to what? 2022-12-22 11:15:45 slow in time to develop, presumably 2022-12-22 11:16:05 But as someone noted earlier, you can always figure out where you need better and code it in assembly. 2022-12-22 11:19:41 MrMobious: make one change, start the compiler, go and get coffee/tee, it is still compiling, go and do nr 2, come back and its finally done, get the eeprom burner or in circuit programmer, burn the rom in, test the change for two to three minutes, realize that you need to change another thing and repeat 2022-12-22 11:20:11 Yeah, sometimes it is like that. 2022-12-22 11:20:29 It was eliminating that that made mecrisp across seem so appealing to me. 2022-12-22 11:20:52 And yet it still didn't try to put a whole Forth system on your target. I thought it was quite well thought through. 2022-12-22 11:21:14 Best approach I've ever seen for dealing with tiny micros. 2022-12-22 11:21:24 hmm, we were discussing that recently in another channel. some of the forth old timers criticize C for long compiler times talking about minutes to compile KB for firmware and then switching to forth 20+ years ago. I wonder if they know a modern C compiler can compile enough to fill up the whole firmware in 1 second or so. I don't think the speed argument applies anymore 2022-12-22 11:22:03 It likely doesn't, just as an concern Chuck had 20 years ago about Forth compile speed probably isn't applicable anymore. 2022-12-22 11:22:19 "any" concern 2022-12-22 11:23:08 The speedup we've gotten the last couple of decades involves numbers that are just so big it's hard to get a human grasp on them. 2022-12-22 11:23:32 Zarutian_iPad: maybe an enormous PC C program. I don't believe any small to medium size project takes that long in 2022. Also no one is burning eeproms other than retro hobbyists. For that matter, keeping your forth source on your PC and feeding it to the MCU over serial is even slower than C compiling 2022-12-22 11:23:49 And yet we don't have computers that are ready to use by the time your finger is off the power button. 2022-12-22 11:23:56 "small to medium size project" meaning embedded project 2022-12-22 11:24:04 MrMobius: oh it still applies as the amount of crap put in and therefore the optimization passes require takes bloody long time 2022-12-22 11:24:32 pro digital cameras startup rather quickly 2022-12-22 11:24:42 Zarutian_iPad: sorry but absolutely not buying it. Especially if you have a make file and aren't recompiling every time 2022-12-22 11:24:49 Yes, some "appliance type" things do, that's true. 2022-12-22 11:25:01 I was talking mostly about Windows / Linux / MacOS. 2022-12-22 11:25:23 My phone takes 20-30 seconds to turn on. And for that matter takes several seconds to turn *off*. 2022-12-22 11:25:50 MrMobious: that is the issue, the whole thing needs to be recompiled every time if the firmware wide optimizations are to work 2022-12-22 11:27:03 What would make more sense to me, for a phone or other such gadget, is that you could do a full reset if you wanted to, but normal "turning off" should be instant, and turning it back on should be "instant" and put you EXACTLY back where you were when you turned off. 2022-12-22 11:27:06 Zarutian_iPad: that's a bad way of doing it. I put 0 crap in my embedded C but even hardware abstraction stuff for ARM programming is very fast to compile. what is slow for you? 2022-12-22 11:27:59 I actually don't see a need to settle that argument - my main point is that things aren't as fast as they could be / should be, given the largesse of power we've been graced with over time. 2022-12-22 11:28:45 Jevons paradox is also a thing 2022-12-22 11:30:56 Oh, that's interesting (Jevon's). 2022-12-22 11:31:08 Makes total sense, but I'd never "seen it" before. 2022-12-22 11:31:25 another rendition runs along the lines of Andy giveth, Bill taketh 2022-12-22 11:31:33 :-) 2022-12-22 11:31:57 Well, I've heard that Andy lobbies Bill to take more, so there will be demand for the newer, faster hardware. 2022-12-22 11:32:02 or these days, the zeroday prone bloatware someone is chrome plating to hide the turd 2022-12-22 11:32:31 I lament all the resources consumed by "bells and whistles" type stuff. 2022-12-22 11:32:44 Do we REALLY need animating window open/close, etc? 2022-12-22 11:33:00 Or wouldn't it be just as good for the window to just "appear" / "disappear"? 2022-12-22 11:34:25 Maybe if you have twenty windows open or something it helps you register which one is the new one. But most of the time it doesn't add much. 2022-12-22 11:35:14 Of course, I get it that the window effects probably isn't WHY the capability exists - the why would have to do with application needs, like gaming, etc. Then it becomes a "why not?" question. 2022-12-22 11:36:03 I also acknowledge that it has a small "oh, nifty" factor. 2022-12-22 11:36:22 And maybe nifty helps you sell more products than your competitor. 2022-12-22 11:52:36 So, one thing that's going to be easy here is producing the selection signals for that 16-1 selector in the TOS circuit. Opcodes have six bits; luts have six inputs. So one lut can produce each of the four select signals. So that'll just take four luts. 2022-12-22 11:53:59 The only way to do better would be to be able to use the opcode bits themselves as the select signals, and that's just not going to happen - the distribution of sources needed for the various instructions isn't balanced enough to make that work. 2022-12-22 11:54:38 Need a fifth lut for the register clock enable, and maybe one for that reset signal as well (for 0). 2022-12-22 11:56:43 MrMobius: Speed of compilation argument doesn't apply anymore, C does build fast for me in embedded and flashing is fast too now. 2022-12-22 12:43:08 So, Artix 7 has 11,550 of the fully capable "SLICEM" slices, and around twice that many of the lesser "SLICEL" slices. Main difference is that the SLICEL's don't have fast carry chains. 2022-12-22 12:44:29 But those are huge numbers - for 32-bit TOS I'll need eight for the register itself, eight for the second selection level, and 32 for the first selection level. So 48 for that register, and a lot less for other registers. 2022-12-22 12:44:56 Just a rough gut guess I imagine I'll come in under 1,000 slices for the whole thing. Or at least not too far over. 2022-12-22 12:45:15 what is in each SLICE* ? 2022-12-22 12:45:46 Four six-input LUTs, four registers, and assorted other muxes and gates to implement the other features. 2022-12-22 12:46:06 Basically, though, a slice can do four six-input bits, either registered or combinational. 2022-12-22 12:46:39 each register is one bit? 2022-12-22 12:46:45 Yeah. 2022-12-22 12:47:10 Or you can do two 5-input functions, but they can't both be registered. 2022-12-22 12:47:17 SR flipflop based or? 2022-12-22 12:47:36 Two per bit - so eight 5-input outputs, so long as the inputs are shared pair-wise. 2022-12-22 12:47:46 No, looks more like a D flop to me. 2022-12-22 12:48:04 With a synchronous reset and also with a shared clock enable. 2022-12-22 12:48:40 hmm… and one bit output of the LUT? 2022-12-22 12:48:42 Setting aside the frills, the LUT makes this six-input function, and you can either send that straight out of the slice, or you can clock it into a register bit. 2022-12-22 12:48:54 One six-input function or two five-input functions. 2022-12-22 12:49:10 Two 5-input functions of the same inputs. 2022-12-22 12:49:14 And you get that 4x in the slice. 2022-12-22 12:49:49 There's a logic diagram here: 2022-12-22 12:49:51 https://cse.unl.edu/~jfalkinburg/cse_courses/2020/436/lecture/lecture32.html 2022-12-22 12:50:25 Anyway, at the point I'd guess (very roughly) that I'd be able to get at least four processors into the chip, and perhaps eight. 2022-12-22 12:50:44 Though some other capacity limit I'm not paying attention to right now might come up. 2022-12-22 12:50:55 now, I am curious what is the Switch matrix 2022-12-22 12:51:13 Yeah - they don't talk much about that. 2022-12-22 12:51:20 I guess that regard that as secret sauce. 2022-12-22 12:52:17 I've alway wished I could know exactly what each bit of the config stream actually DOES. 2022-12-22 12:52:21 But they're not talkin'. 2022-12-22 12:52:32 probably a combination of crossbars, buses of various lengths, and connection crosses 2022-12-22 12:52:37 Yep. 2022-12-22 12:52:43 Probably "fairly obvious." 2022-12-22 12:52:47 If you could see it. 2022-12-22 12:53:40 The basic AND/OR grid in the old PALs was absolutely obvious, and documented. 2022-12-22 12:54:10 There was a picture of it right there in the data sheet, and you could clearly see how the programming file mapped onto it. 2022-12-22 12:54:54 there was an mini fpga by AVR that had clear description of the bitstream and I am tempted to ‘borrow’ their switch matrix design for the BitGrid block stuff I have been musing on 2022-12-22 12:56:54 Sure, why not? 2022-12-22 12:57:57 I also always wished you could reconfigure all or part of the thing on the fly, too. But that would require that the stream be double-buffered, so you could shift a new one in without changing the current control bits. They may be using the shift register outputs directly. 2022-12-22 12:58:23 I think being able to do that would have brought all kinds of clever applications over the years. 2022-12-22 12:59:09 depends, Xilinx Spartan 6 has an Internal Configuration Port that you can access from your logic and allows for partial reconfiguration 2022-12-22 12:59:25 Oh, that's cool. I wasn't aware of that. 2022-12-22 12:59:34 Well, good for them. 2022-12-22 13:00:02 It's not as useful if you don't know what the bits do, but at least it would let you have a library of pre-compiled streams you could choose from. 2022-12-22 13:00:21 ACTION tries to recall if it was AT40K10AL ( https://www.digikey.com/en/products/detail/rochester-electronics-llc/AT40K10AL-1BQU/13513199?s=N4IgjCBcoLQdIDGUBmBDANgZwKYBoQB7KAbRAGYxyQBdAXwJgBYpRlJ1t8jTwwB2JhHp06QA ) or not 2022-12-22 13:02:09 iirc Spartan 6 also has AES accelerator that can use keyslots that are only write accessible from the configuration and some of them ar flash based 2022-12-22 13:02:21 Oh, I like that documentation already. 2022-12-22 13:02:31 Looks straightforwardly organized. 2022-12-22 13:02:59 Atmal stuff usually is, Microchip stuff is just slightly less 2022-12-22 13:03:06 Atmel* 2022-12-22 13:03:07 And it looks a lot more "revealing" than the Xilinx stuff is. 2022-12-22 13:03:13 They're nice about sharing. 2022-12-22 13:04:35 if you are willing to buy 100K pcs Microchip can also work with you to mix and match ipcores of theirs into a chip 2022-12-22 13:05:11 So, I've got the 16 things that TOS can be on clocking it (18 if we count 0 and "no change). By contrast, NOS next value can only come from TOS, self, or data stack RAM port. 2022-12-22 13:05:24 I just wished they had a crossbar between the io peripherals and the pins 2022-12-22 13:05:27 Very possible I might be able to encode that choice directly into the opcodes. 2022-12-22 13:06:07 The data stack "write port" will just be hard-wired to NOS. 2022-12-22 13:06:16 No other possibility. 2022-12-22 13:06:27 I had to cross out a mcu of their that had all the features I wanted but there was a pin assignment conflict 2022-12-22 13:07:18 Bummer. 2022-12-22 13:07:38 I guess crossbars are fairly expensive, though. 2022-12-22 13:08:00 die space wise sure 2022-12-22 13:08:37 but this was a 14 pin chip 2022-12-22 13:08:59 and two of those went to power and ground 2022-12-22 13:11:08 Oh, so just 12 pins. That wouldn't have been AWFUL. 2022-12-22 13:12:16 But they probably looked at how many die they'd get per wafer and didn't want to take the decrease. 2022-12-22 13:12:22 and about 16 peripherals 2022-12-22 13:12:25 Corporations and their sharp pencils. 2022-12-22 13:12:44 fetch the #2 iron, Jeeves 2022-12-22 13:13:00 naah, oversight by the designer I think 2022-12-22 13:16:28 So, am I thinking about this right? To "execute" the xt in TOS. There's a valid return stack on the return stack now - does it make the right thing happen to execute >r followed by "swap IP with TOR"? 2022-12-22 13:17:06 The >r makes a "spot" on the return stack for your next instruction, then the swap saves our current next instruction in tor and puts the new xt into IP. 2022-12-22 13:17:57 in my Forth for FCPU16: : EXECUTE >R ; 2022-12-22 13:18:13 it is that simple 2022-12-22 13:18:38 Oh, interesting. I was thinking of making it work as a primitive, but maybe doing it that way is better. 2022-12-22 13:19:05 The call to EXECUTE makes the extra spot on the return stack. 2022-12-22 13:19:25 Ok, I like that. Otherwise I have five sources for IP instead of just 4, and 4 is one lut layer. 2022-12-22 13:19:35 alright : EXECUTE R> DROP >R ; 2022-12-22 13:19:42 Right. 2022-12-22 13:19:58 Ok, I'm not going to worry about having exec as an opcode, then. 2022-12-22 13:20:01 Delete... 2022-12-22 13:20:37 getting the opcode number down to 16 for FCPU16 took some doing. 2022-12-22 13:22:01 Yeah, it was a bit of a "choice" for me too. 2022-12-22 13:22:18 NOP UM+ AND XOR 1LBR 1+ @ ! DUP DROP SWAP SKZ >R R> EXT EXIT if you are curious 2022-12-22 13:22:55 LBR stands for LeftBitRotate and SKZ stands for SKip if Zero 2022-12-22 13:26:32 though FCPU16 treats instructions above 0xF as an call to that address 2022-12-22 13:27:30 For me the saving "way out" was to let the rarely used sp@ and rp@ instructions put those value on the return stack instead of the data stack. 2022-12-22 13:27:48 Yes, that necessitates following them with r>, but so what? It solves the "16 problem" 2022-12-22 13:28:19 And it also lets me no longer give up fetching the address registers - they can be handled the same way. 2022-12-22 13:28:38 Because there's tons of spare room on the "top of return stack source list." 2022-12-22 13:28:55 I just map sp and rp to one memory address. 2022-12-22 13:29:15 That works too. 2022-12-22 13:30:18 and the stacks are not deeper than 256 each 2022-12-22 13:30:29 I don't know yet how deep I'll make mine. 2022-12-22 13:30:46 I may go way down - in the F21 Chuck had them 17-18 deep. 2022-12-22 13:31:03 I really should tweak my notebook Forth to let me see how much of that space I'm actually using. 2022-12-22 13:31:37 Probably not nearly as much as I've made room for. 2022-12-22 13:32:32 And I'm considering Chuck's circular stack layout, too. I don't yet quite "see the joy" of it, but just the fact that he favors it carries weight with me. 2022-12-22 13:33:05 But it's probably one of those things where it pays off only after you adjust how you right programs, so I might have to work with it for a while to really have an "aha" moment. 2022-12-22 13:39:01 The main advantage I see right now is that there's it would make it impossible for a run-away stack to clobber other parts of your system. 2022-12-22 13:39:31 But these stacks won't even be in the general address space, so that's already impossible. 2022-12-22 13:40:04 I could just set it up so that returning on an empty return stack resets the thing completely. 2022-12-22 14:30:32 That : EXECUTE >R ; definition won't work in my system, because my threading operates differently. 2022-12-22 14:30:48 I mean the x86_64 system I wrote with nasm. 2022-12-22 14:30:54 But, I think it will work on this processor. 2022-12-22 14:31:19 Because in this system CFAs actually hold executable code. 2022-12-22 14:42:00 was looking at 16 bit PIC stuff just out of curiosity and there is a register to set the max stack address so you get an exception on stack overflow 2022-12-22 14:42:05 very cool feature 2022-12-22 14:42:51 also if you try to access memory with an address held in a register that hasn't been written to it throws an exception 2022-12-22 14:43:38 maybe it was added by someone wrestling with buggy code on the device 2022-12-22 14:47:47 hehe 2022-12-22 14:58:36 circular stack layout? 2022-12-22 15:56:41 decay: In some of his Forth chips, Chuck has usd "rings" of registers to implement his stacks. You could continue popping from or pushing to such a ring forever; new items just overwrite the oldest items, an continuing to pop indefinitely would just repeat the last N items pushed over and over. 2022-12-22 15:57:07 add another tracking register and baby, you've got a queue going. 2022-12-22 15:57:28 And his stacks have gotten smaller over the years - I think in the GA144 he's down to eight items. 2022-12-22 15:58:24 And in some algorithms (maybe certain signal processing algorithms?) he actually capitalizes on that "eternally repeat" characteristic. Like maybe those are the coefficients of a filter of some kind, or something else like that that he's going to need over and over. 2022-12-22 19:28:50 Well, Artix board came and my Xilinx software install can "see" it. So that all looks good. 2022-12-22 19:59:33 KipIngram: exciting! Please keep us updated on your progress 2022-12-22 21:03:48 I will. Guess first things first - just make "something work." Maybe something that interfaces the switches and LEDs on the board. 2022-12-22 21:03:59 Just so I know I'm actually building something and it's running. 2022-12-22 21:04:12 Then maybe get a serial link working with the computer. 2022-12-22 21:05:06 Then turn to the real business. 2022-12-22 21:06:32 So, apparently when you instantiate block RAMs in the Verilog, you can specify the initial contents. So that's how I'll actually get a Forth OS onto the thing. Or maybe I just do some sort of "bootloader" that way, that will take an image over the serial port and start it. 2022-12-22 21:06:55 Or... maybe an approach like that SectorForth we were looking at a few months ago. 2022-12-22 21:08:21 I found a Verilog RS-232 implementation, so I can take some pointers from that. 2022-12-22 21:09:54 I was a little worried that getting my OS to play nice with the board was going to be painful, but it just came up immediately as soon as I found the right place in the menus to do a connect attempt. 2022-12-22 22:07:13 Are you thinking you'll use verilog the instead of VHDL? 2022-12-22 22:11:26 Oh, yes. 2022-12-22 22:11:54 Verilog has a lot less "mess" associated with it. I regard Verilog vs. VHDL as somewhat like C vs. Ada. 2022-12-22 22:12:52 So the six-bit opcodes work out quite nicely for the conditional specs. There are six basic conditional cases: < <= = != >= > 2022-12-22 22:13:04 So I'll encode one of those in three bits of the six bit field. 2022-12-22 22:14:00 That leaves three bits, and those will be flags. One to imply the implicit 0 argument, one to imply unsigned vs. signed, and one to imply that "dot prefix" I use to prevent a keep the deepest argument. 2022-12-22 22:14:16 So I'll be able to represent all of the words in that family. 2022-12-22 22:14:29 Now, I haven't thought much about the logic that will be required yet. We'll see. 2022-12-22 22:15:29 I already see a bit of a hassle around those, because these words take an action - return or double return - rather than leaving a flag. 2022-12-22 22:16:20 So some of them, like <; or =; or whatever - any of them that take two arguments - drop both arguments. And I've already figured out that double drops are a little hard on the logic. 2022-12-22 22:16:48 It's not actually the double drop that's the problem - it's getting the next cell down which has to go into the top of stack register. 2022-12-22 22:17:37 I may wind up having to have a couple of those instructions take two cycles. 2022-12-22 22:18:01 I'd really really like not to, though - the uniformity of one instruction one cycle is pleasing. 2022-12-22 22:18:30 I like the idea of absolutely precise timing. 2022-12-22 22:26:33 The basic problem is that the RAM has two ports. As long as you're only pushing or popping, you can use one port to address the cell you'll need to read if you pop, and the other to address the cell you'll need to write if you push. 2022-12-22 22:26:57 But however you slice it, a double pop leads to you needing a third cell. 2022-12-22 22:27:28 Now, you don't need both of those cells at the same time, so it can probably be worked out. It's just a little more complicated. 2022-12-22 22:27:53 And of course I could write all of these in Forth, but then they'll be slow. 2022-12-22 22:30:33 I'll have to let things vary depending on what I'm doing. If I'm double popping, then I just need to address the two cells that will fill the registers. If I'm pushing, it'll need to address the right two cells for that.