2023-09-28 00:07:27 Wow - that's a small font. 2023-09-28 00:07:39 Gonna have to crank my browser magnification up for that one. 2023-09-28 00:07:44 Getting old sucks. 2023-09-28 01:05:29 Dang. You're right. Apparently, I have it magged up to 1.7x. 2023-09-28 10:59:34 did someone try to create some kind of async forth? 2023-09-28 10:59:45 do you think its possible to have async words? 2023-09-28 11:06:19 You can have a simple form of coroutines 2023-09-28 11:06:40 : co 2r> swap 2>r ; 2023-09-28 11:20:58 GeDaMo, oh.. well it involves to play with return stack, but.. its too advanced for me for now, could you elaborate a bit more? 2023-09-28 11:25:02 Instead of returning to the word that called co, it returns to the one before that 2023-09-28 11:25:15 You can use it to switch between two words in the middle 2023-09-28 11:26:16 I think this video uses a similar technique https://www.youtube.com/watch?v=mvrE2ZGe-rs 2023-09-28 11:28:59 Also, there have been multi-tasking, multi-user Forth systems 2023-09-28 11:29:48 GeDaMo, interesting thanks 2023-09-28 11:31:20 my question was more about higher level systems, e.g. async implemented outside the Forth implementation, for instance, if i have 'http.. get-url' in a sync system this Forth word will block the system until the website is downloaded, but 'http.. async get-url' what would do? it won't block the Forth system.. but what there will be in the stack? some integer id of some pending task, or what? 2023-09-28 11:39:46 ACTION just recalled how to make an or gate with two diodes 2023-09-28 11:44:10 the problem I had in designing an cartridge for CommanderX16 with a 512Kibi ROM is that its rombanks would have been split between 32-55 and 64-79 (both inclusive) 2023-09-28 11:45:32 512KibiBytes 2023-09-28 11:50:10 another cx16 related thing is I think I have figgured out how to get 640x480 pixmap to work with the VERA picture processing unit 2023-09-28 11:58:51 put one of layers into tilemap mode with 32x30 tilemap that is using 16x16 pixel tiles and then using scanline interrupts, swap in diffrent tileset pixel data 2023-09-28 14:55:11 https://vid.priv.au/watch?v=v7Mt0GYHU9A 2023-09-28 14:56:17 rendar: I have "double returns" in my language. I don't think of it in terms of coroutines, though - I just tend to build my control structures "vertically" up and do wn a call tree. 2023-09-28 14:56:29 (I have conditional returns and double returns) 2023-09-28 14:58:55 That video is Aaron Hsu - xelxebar brought him to my attention. I'm pretty much loving everything I hear him say. 2023-09-28 15:00:31 He wrote an APL compiler in APL 2023-09-28 15:00:57 https://github.com/Co-dfns/Co-dfns 2023-09-28 15:01:37 Yeah, that runs directly on the GPU. 2023-09-28 15:01:45 Or at least he's working in that direction. 2023-09-28 15:09:11 He mentioned in one video I watched that that compiler was 74 lines long. 2023-09-28 15:09:30 His lines are impressively long, though - each one is a "compiler pass." 2023-09-28 15:23:19 KipIngram, i see, how double returns resembles coroutines? can you make a practical example? 2023-09-28 18:00:52 I don't actually think they do - GendaMo just mentioned returning to "the one before that" and those are words I might use to describe my double return. I was also scratching my head over how that would fit with coroutines. 2023-09-28 18:01:13 but I couldn't think of what else "the one before that" would mean. 2023-09-28 18:01:20 Other than the caller's caller. 2023-09-28 18:01:57 I think couroutines are sometimes implemented by exchanging the proram counter with the top item of the return stack. 2023-09-28 18:02:06 program 2023-09-28 18:02:40 That's how Chuck's 18A implements EXECUTE as well. 2023-09-28 18:02:55 And his top return item is cached in a register, so it's really just a reg/reg swap. 2023-09-28 18:03:23 Which is a pretty cool way to do it when you think about it. 2023-09-28 18:27:48 KipIngram: Long lines? He keeps lines within 80 columns and averages less than 40 characters wide. 2023-09-28 18:28:52 Just did a quick analysis on the tokenizer (TK.alpf) and parser (PS.aplf). 2023-09-28 18:35:49 oh, well, I saw stuff that looked like wrapped lines. I mean, it spanned over eight or nine lines, and they all looked basically full. Then every now and again you'd see a short line - I assumed that was where a long line was "ending." 2023-09-28 18:36:07 I was just seeing that on some of his slides. 2023-09-28 18:37:51 Oh, I see. Definitely likely that slides got blown up. 2023-09-28 18:38:42 Hsu calls the Co-dfns architecture/style a "Linear Data Flow" model. 2023-09-28 18:39:10 So, he clearly referred to a 74-line compiler. You're saying that's 74 lines no longer than 80 characters? That was part of what "primed" me to think the lines might be long too - the prospect of getting a compiler into that few lines. 2023-09-28 18:39:23 Input literally just goes in at the top and flows straight down, line-by-line, to the end. 2023-09-28 18:39:43 Yeah, he stressed the fact that there were no loops, no branches, etc. 2023-09-28 18:39:54 Yup. A 500-ish character compiler :) 2023-09-28 18:43:14 Loops and branches are kind of anathema to good GPU code, since they create data dependencies that can't be vectorized. 2023-09-28 18:46:27 To be clear, that 500 char count is just the compilation pass, i.e. generating the AST. It doesn't include tokenization or code generation. 2023-09-28 18:47:38 The code generation is a bunch of C that gets passed off to your friendly neighborhood C compiler and depends on arrayfire for the low level GPU stuff. 2023-09-28 18:53:30 The SLOC ratio between Algol-like languages and Hsu-style APL tends to be on the order 10X for small things and 100X for full systems. 2023-09-28 18:54:41 His modern, full-fledged PS.aplf is more like 250 SLOC (~10 kchars), counting error handling and all that. 2023-09-28 18:55:52 100X that, i.e. 25,000 lines, for a production-level compiler, is on the smaller side. 2023-09-28 18:57:27 One crack-brained idea I have for marrying Forth's implementation simplicity with APL is to write a Forth backend for Co-dfns. 2023-09-28 20:15:46 Oh, neat. I don't know exactly what my mashup is going to look like yet, but I want some of all of these things. 2023-09-28 20:16:22 Although I haven't given any serious thought to GPU operation. I recognize it as the best computing resource in most computers, though, for real number crunching. 2023-09-28 20:17:01 I guess provided you have the right kind of problem. 2023-09-28 20:35:08 I have thought about trying to learn more about this: 2023-09-28 20:35:10 https://www.khronos.org/api/spir 2023-09-28 20:35:25 and see if I could make a Forth GPU interface based on it. 2023-09-28 20:35:50 Cut the middle layers out. It seems like the closest thing to "GPU machine code" I can find. 2023-09-28 20:53:29 As a Forth person, what I really WANT to do is directly produce the binary "stuff" that guides the GPU through its actions. I have no idea if that's even possible though, realistically. 2023-09-28 21:21:18 KipIngram: not being able to do this is one of the reasons why I don't like GPU programming 2023-09-28 21:21:27 there is sort of a portable machine language for GPUs 2023-09-28 21:21:32 but it's once again an abstraction 2023-09-28 21:31:37 Yeah. I probably should still look into it sometime, but... it just doesn't carry the appeal of CPU oriented work. 2023-09-28 21:31:53 I mean, you're chatting with the guy who wishes Intel would open the microcode up. 2023-09-28 21:32:01 I wanna program THE HARDWARE. 2023-09-28 21:32:45 But I suppose the microcode probably changes details every neW CPU, so I guess I get why they don't do that. 2023-09-28 21:32:57 It would just create another layer of backward compatibility fences. 2023-09-28 21:33:32 And maybe would expose some of their IP as well. 2023-09-28 21:33:54 I also wish FPGA vendors would open their bitstream format. 2023-09-28 21:34:15 you can't really change the microcode 2023-09-28 21:34:16 And make it possible to change "pieces" of the FPGA's configuration, on the fly. 2023-09-28 21:34:29 you can ony patch it a little 2023-09-28 21:34:32 Maelleable FPGA hardware, anyone? :-) 2023-09-28 21:34:46 also, most parts of modern Intel CPUs are not actually microcoded 2023-09-28 21:34:52 I didn't know - I figured it was just "reloadable." 2023-09-28 21:34:57 no 2023-09-28 21:35:05 it's extremely sophisticate 2023-09-28 21:35:08 *-ed 2023-09-28 21:35:12 I'm sure. 2023-09-28 21:35:24 the microcode patch RAM is structured a bit like a cache 2023-09-28 21:35:36 i.e. it has tags matching addresses in the microcode ROM and data 2023-09-28 21:35:56 on microcode fetch, the cache is consulted. A hit is loaded from RAM, a miss from ROM. 2023-09-28 21:36:06 So by loading cache lines into the patch RAM, you can patch the microcode. 2023-09-28 21:36:15 Ah, that makes sense. 2023-09-28 21:36:24 There's also a mechanism to pattern-match non-microcoded instructions and make them microcoded. 2023-09-28 21:36:36 so you can fix instructions that are not microcoded. 2023-09-28 21:36:45 This comes at a big performance penalty though. 2023-09-28 21:36:49 Interesting. So it's designed to let them "tweak" things. 2023-09-28 21:36:53 Not totally re-write things. 2023-09-28 21:36:53 yeah 2023-09-28 21:36:56 correct 2023-09-28 21:37:02 That's sensible, actually - given their needs. 2023-09-28 21:37:05 a full microcode RAM would be way too big and power hungry. 2023-09-28 21:37:13 another thing Intel does is "chicken bits" 2023-09-28 21:37:22 every new HW feature has a configuration bit somewhere to turn it off 2023-09-28 21:37:40 Yeah. We have a big RAM in our SSDs at work for logical to physical page translation and they're always belly aching over not wanting it to get any bigger. 2023-09-28 21:37:46 so when a new feature doesn't work, the BIOS can turn off the chicken bit and make the CPU work 2023-09-28 21:38:11 But when you have to map every 16kB page of a 100 TB logical drive, that's a pretty big RAM. 2023-09-28 21:38:16 oh yes 2023-09-28 21:38:43 Actually, it doesn't fit in RAM any more as of this latest eneration - it's paged in and out of flash. 2023-09-28 21:39:23 And our older stuff used 4kB logical pages - the switch to 16kB was to reduce that RAM requirement. 2023-09-28 21:39:56 NAND flash turns out to be super interesting stuff. 2023-09-28 21:40:08 Enough complexity in the things to give them some really involved characteristics. 2023-09-28 21:40:48 makes sense 2023-09-28 21:40:56 so cache locality may now start to matter for flash. 2023-09-28 21:41:22 Does the SSD expose 16 kB sectors to the host? 2023-09-28 21:41:39 Yes - that's what the host sees. An array of 16kB logical pages. 2023-09-28 21:42:05 We have on-the-fly hardware compression, though - we can store severall logical pages in one physical page. 2023-09-28 21:42:09 sometimes. 2023-09-28 21:42:27 And there's also a layer of error detection and correction in there. 2023-09-28 21:42:43 So what actually gets written to the physical flash is "code pages" that come out of the ecc encoder. 2023-09-28 21:42:56 They're around 8kB. 2023-09-28 21:43:33 So that logical table has to identify how the page you've asked for is mapped to code pages. 2023-09-28 21:43:56 It's nice when it's in just one code page. 2023-09-28 21:44:19 Which it often is if you have compressible data. 2023-09-28 21:44:31 But it can be spread across as many as three, worst case. 2023-09-28 21:44:51 So those all get fetched in parallel, and the FPGA stiches the pieces together and shoves it through the ecc decoder. 2023-09-28 21:45:09 goes from there to the decompressor and then to the host via nVME. 2023-09-28 21:45:36 Well, sorry - that's out of order. 2023-09-28 21:46:03 Each code page goes through the ecc, and then what comes out of there is stiched into a logical page and decompressed. 2023-09-28 21:46:49 And all of that is done in hardware - there's no firmware involved unless some sort of a failure occurs. 2023-09-28 21:47:15 There's some firmware in the nvme part. 2023-09-28 21:48:09 Anyway, thanks for the info on the CPU stuff - that's interesting. 2023-09-28 21:48:32 Now I don't have to feel like I'm missing out on something. :-) 2023-09-28 22:44:26 Isn't "portable machine language for GPUs" basically the same thing as an ISA? 2023-09-28 22:46:41 I don't think there's a strong technical difference between having an ISA and having hardware that abstracts away the Maxwell/QED equations for us. 2023-09-28 22:46:54 hmm... what would be in such an ISA? some way to write vertex and fragment shaders, no? 2023-09-28 22:47:54 xelxebar: you have not then seen a simple computer made up of air pressure control units then 2023-09-28 22:48:32 no QED or Maxwell equations there 2023-09-28 22:51:05 Replaced by Laplace's equation. To get the behavior of the water/air/mechanical analogues correct, you have to mess with a tangle of differential equations either way. 2023-09-28 22:52:21 I mean, you don't *have* to. You can fiddle and tinker with things experimentally, but the point stands that the hardware is providing a kind of abstracted interface.