2022-04-18 01:52:35 Well, didi a massive sudo dnf update earlier. 2022-04-18 01:52:44 Took a while - they always make me nervous. 2022-04-18 01:52:59 My first work with Linux 20 years or so ago was at a time when it wasn't always stable. 2022-04-18 06:55:03 So, I do have this item in my system (via locate): 2022-04-18 06:55:06 /usr/lib64/dri/i915_dri.so 2022-04-18 06:56:05 and the linux_hardware.org probe identifies "i915" as my graphics hardware driver. 2022-04-18 06:56:28 So maybe the piece I'm missing is *not* the lowest level piece - maybe it's some intermediate piece that I just haven't learned about yet. 2022-04-18 06:58:04 Also, lsmod reveals this: 2022-04-18 06:58:06 [kipingram@lenovo]$ lsmod | grep i915 2022-04-18 06:58:08 i915 3067904 33 2022-04-18 06:58:10 i2c_algo_bit 16384 1 i915 2022-04-18 06:58:12 ttm 81920 1 i915 2022-04-18 06:58:14 drm_kms_helper 315392 1 i915 2022-04-18 06:58:16 cec 69632 2 drm_kms_helper,i915 2022-04-18 06:58:18 drm 634880 12 drm_kms_helper,i915,ttm 2022-04-18 06:58:20 video 57344 2 thinkpad_acpi,i915 2022-04-18 09:38:43 Ok, progress. It was a tortuous maze, but I now have the 'clinfo' command detecting 2 pltforms - one provided by mesa (not promising - probably isn't using the hardware) but - one "Intel Gen OCL Driver." That one lists a lot of "profile information. 2022-04-18 09:39:02 However, it detects zero "devices" associated with either platform. 2022-04-18 09:39:11 So I guess something ELSE is missing. 2022-04-18 09:40:08 It is seeing something, though, because I get this: 'cl_get_gt_device(): error, unknown device: 9b41' nd that 9b41 is a code I saw in my travels yesterday - it's something associated with my graphics hardware. 2022-04-18 09:40:20 So it looks like the hardware is being seen, but isn't being understood. 2022-04-18 10:00:04 Ah - I think my driver version is just old. 2022-04-18 10:06:25 My hardware requires OpenCL v 3.0, and print(platform.version) in Python gives me 1.2. Not 100% sure that's the same version, but it seems likely, particularly given the convoluted route I followed to get a *.icd file into /etc/OpenCL/vendors 2022-04-18 10:06:44 So hopefully all I need now is a newer .icd file to drop in there. 2022-04-18 10:14:44 I renamed a file to remove the other platform that I'm not interested in. 2022-04-18 10:23:44 That 9b41 string is right there in just about the first line of that "probe" I linked you to last night, in the ID string for my graphics hardware. 2022-04-18 11:14:23 All that icd file does is point to a file called libcl.so elsewhere on my system. 2022-04-18 11:14:53 That file originally came from a package called beignet that I installed via conda. But I now believe beignet is a "legacy" thing, so it's not surprising it's old. 2022-04-18 11:15:26 I tried to install a currently maintained package that announced that it provided libcl.so, and then aimed the icd file at that, but that just got me the platform error back. :-| 2022-04-18 20:02:12 I like programming with forth. But, the speed is getting to me. It is better than a shell but way off from C's performance even without context switches. 2022-04-18 20:02:32 It would probably be faster with a jit 2022-04-18 20:02:41 but, I am not sure. Any experiments to share? 2022-04-18 20:03:37 I love how flexible and easy it is to reason about errors. 2022-04-18 20:08:16 What are you measuring? Compiled speed code, or interpreted code? 2022-04-18 20:08:26 interpreted code. 2022-04-18 20:08:36 Ok - yeah - that's slow. 2022-04-18 20:08:51 how to make it compiled speed code? 2022-04-18 20:08:51 Compiled code is usually within a "factor" of C - like say 30% of C, hopefully. 2022-04-18 20:09:19 What you type is interpreted, but once a word is executed by the interpreter, then all of the stuff it contains is compiled. 2022-04-18 20:09:32 any tutorials or material to get upto speed on the compilation? 2022-04-18 20:09:39 So if you have a bunch of nested loops in a compiled word, they run at compiled Forth speed. 2022-04-18 20:10:45 One way to think of it is that it's really Forth primitives that do the actual work. 2022-04-18 20:10:55 Definitions just specify what order to execute primitives in. 2022-04-18 20:11:39 Each primitive contains its "business code" - the job that primitive is defined as doing for you, and it also contains NEXT - an extra bit of code that connects it to the primitive that comes after it. 2022-04-18 20:11:48 NEXT isn't actually doing any work for you - it's overhead. 2022-04-18 20:12:08 Like for instance the 1- primitive usually just decrements a register. That's one instruction of "work." 2022-04-18 20:12:21 But then it will have three or four or five instructions for NEXT. 2022-04-18 20:12:28 So percentage wise it has a lot of overhead. 2022-04-18 20:12:50 Other primitive do more instructions of work, but have the same NEXT, so the percentage is better on them. 2022-04-18 20:12:58 On my computer my NEXT takes about one nanosecond. 2022-04-18 20:13:33 On the other hand, looking a word up in the dictionary when I'e typed it might take tens of microseconds. 2022-04-18 20:14:05 Well, probably not that long I measured it the other day and shared the number here, and wrote it down, but it's not in front of me. 2022-04-18 20:14:29 one solution is to use an optimizing native code Forth 2022-04-18 20:14:39 But interpreted Forth is probably going to have "Python like" performance - maybe a little better, maybe a little worse, depending on the Forth and the Python. 2022-04-18 20:15:03 Thinking of compiled code, work you do juggling the stack around isn't really working for you either. 2022-04-18 20:15:21 But in C, work you do moving function parameters onto the stack and results off of the stack doesn't either. 2022-04-18 20:15:32 Every system has a "way of operating" that brings some "side work" in with it. 2022-04-18 20:16:04 The usual way of optimizing a Forth application is to profile it and see where your code is spending most of its time. 2022-04-18 20:16:09 the thing is that it's silly to expect interpreted Forth to have any real speed at all 2022-04-18 20:16:15 Then you can consider writing the innermost bits of that in assembly language. 2022-04-18 20:16:24 compiled Forth is where you can get some speed 2022-04-18 20:16:36 Your assembly can do just precisely what it needs to, with no NEXT and so on. 2022-04-18 20:16:52 by optimizing, btw, I mean code that uses stuff like inlining, constant folding, etc. 2022-04-18 20:17:39 A middle ground on the way to assembly is to have a system that can take a string of primitives and turn them into one primitive. 2022-04-18 20:17:49 Then you just get one NEXT for several primitives of "work." 2022-04-18 20:18:08 you might as well just go with subroutine threading 2022-04-18 20:18:25 combined with inlining and constant folding 2022-04-18 20:18:37 https://almy.us/forthcmp.html seems too good to be true 2022-04-18 20:18:44 any free alternatives for it? 2022-04-18 20:19:32 I think subroutine threading has downsides. And I think whether or not it improves performance is processor dependent, but I'm not prepared to make a call on that for any particular processor. 2022-04-18 20:19:38 Or, at least, it used to make a difference. 2022-04-18 20:19:44 Back when I was paying attention to processors. 2022-04-18 20:19:51 It may not anymore; a lot has changed. 2022-04-18 20:20:06 eww shareware 2022-04-18 20:20:17 But anyway, joe9, if you're measuring the speed of interpreted code, you're not going to be giving Forth a fair shake at all. 2022-04-18 20:20:42 Interpreted Forth is meant to be good at what you just said you liked about it - the interactivity, the ease of develoipment. 2022-04-18 20:21:10 if you want anything to run with any speed just put it in a colon definition 2022-04-18 20:21:14 KipIngram: i am looking for ways to get to compiling forth. 2022-04-18 20:21:25 interpreted Forth will always be slow because it's parsing text as it goes along 2022-04-18 20:21:47 but I don't get why one would want to interpret most of one's Forth code 2022-04-18 20:21:55 it is not the parsing that is slow. I can live with that. 2022-04-18 20:22:06 it is the execution speed with indirect threading. 2022-04-18 20:22:13 why use indirect threading 2022-04-18 20:22:26 flexible and can be sandboxed. 2022-04-18 20:22:41 use native code subroutine threading with inlining combined with constant threading 2022-04-18 20:22:48 native code can be sandboxed too 2022-04-18 20:22:55 just run your code in qemu 2022-04-18 20:24:00 or run it on an embedded device isolated from your development machine by a serial connection 2022-04-18 20:24:24 there's no way that indirect threading will be fast 2022-04-18 20:24:37 if you want speed, at least use direct threading 2022-04-18 20:24:46 But it does make certain things a lot easier to do. Or, more elegant to do. 2022-04-18 20:24:51 Particularly create/does. 2022-04-18 20:25:02 I dont' mean any difference in how you use it - I mean how you implement it. 2022-04-18 20:25:21 I've done create does in direct threading and indirect threading, and I found the direct threading implementation "hacky." 2022-04-18 20:25:31 but it's smooth in indirect threrading. 2022-04-18 20:26:25 The way I decided to look at it was that if I have something I truly care about performance on, I can optimize it the way I said. 2022-04-18 20:26:33 I implemented (create is separate and doesn't work with does> in zeptoforth) by having 2022-04-18 20:26:34 Profile it, find the parts that matter, and write them assembler. 2022-04-18 20:26:42 The how you're threading just doesn't matter. 2022-04-18 20:27:20 Same argument that's led research labs and stuff to use Python for scientific computing. 2022-04-18 20:27:34 The heavy lefiting stuff is in packages that are written in optimized C. Python is the glue. 2022-04-18 20:27:41 Python is a dog performance wise, but it doesn't matter. 2022-04-18 20:28:01 Usually it's like a percent or less of your code that really sets your performance. 2022-04-18 20:29:01 So, my notion is that the SYSTEM should be written with elegance in mind. Then you write the bits that matter with performance in mind. 2022-04-18 20:29:23 On the occasions where you're doing something where performance is critical, which isn't always. 2022-04-18 20:31:31 to me native code compiling enables things like compiling x + (where x is a value from 0 to 255) as a single instruction 2022-04-18 20:32:47 and for zeptoforth every ounce of performance matters, because an RTOS is built on top of it, and greater performance means lower latency 2022-04-18 20:53:32 Sure - in a specific example like that you're right. You're really right period, that it's faster. My point is that that's only really "worth something" to you (in terms of computing time) in a small fraction of your code. 2022-04-18 20:54:09 Making a decision that affects your entire system architecture on that basis just isn't the way I want to approach it. I can achieve the very optimizations you're taking about where I need them, if I need them. 2022-04-18 20:54:36 But that's just my preference - your preference better for you. :-) 2022-04-18 20:55:34 I know that I could point to at least one common processor back in the 1990 period where the call instruction was sufficiently inefficient that direct thread and even indirect threading "outran" subroutine threading. 2022-04-18 20:55:45 I can't say that was common - but there was a case I knew about. 2022-04-18 20:56:07 And that may have something to do with technology that has gone away these days. 2022-04-18 20:56:36 but then there's inlining 2022-04-18 20:56:45 I do agree with you that in "most cases" (and most could well mean practically all) code threading will be the top performer. That's not something I'm going to argue against. 2022-04-18 20:56:48 in zeptoforth at least a lot of the time it doesn't even call words 2022-04-18 20:57:00 it just inlines them in the code that uses them 2022-04-18 20:57:15 Yeah, a good code compiler will inline a lot of stuff - sometimes it's more compact that way. 2022-04-18 20:57:42 for instance in independent words being called a BL, an PUSH, and a POP instruction are executed 2022-04-18 20:57:53 three instructions that just go away with inlining 2022-04-18 20:58:01 I just have a strong appreciation for the elegant structure of indirect threading, so that's how I chose to do it. And if I get in a performance bind I know what I need to go do to keep that decision from biting me. 2022-04-18 20:58:16 my first Forth was an indirect threading Forth 2022-04-18 20:58:18 I can't do it yet - my system doesn't have a profiler or an assembler yet. 2022-04-18 20:58:21 But it will in time. 2022-04-18 20:59:00 There are some other decisions I've made in the name of "architecture rather than performance." I jump to my NEXT at the end of each primitive. 2022-04-18 20:59:27 I did put the address in a register, so it's the fastest possible jump, but it would clearly have been faster (but less compact) to inline next at the end of each primitive. 2022-04-18 20:59:51 But having all my primitives pass through one code point is how I plan to implement profiling, a lot of my multitasking, and so on. 2022-04-18 21:00:04 So I get a payoff from it as wel as slightly more compact code. 2022-04-18 21:00:35 I can just pop a new value into that register, and suddenly NEXT can do something new. 2022-04-18 21:01:09 What I plan to have it do is count down a register counter, and every time the register hits zero it will go do whatever task I'm interested in at the time, like see if a task change is needed or whatever. 2022-04-18 21:01:30 I'l do that like every 10k or 100k primitives or something like that, so the cost is amortized over that many. 2022-04-18 21:01:44 The "every time" extra cost will just be a decrement and a non-taken conditional jump. 2022-04-18 21:02:45 I think we just have different priorities, and that's totally ok. There are pride points in having the fastest car in town - no doubt about it. 2022-04-18 21:05:28 I spent some time a decade ago or so "paper diddling" with a Xilinkx Spartan 6 design for a Forth processor. And I bent over backwards to make it as fast as possible. 2022-04-18 21:05:44 And yet all the arguments I just made a few minutes ago would have applied to that case too. 2022-04-18 21:06:36 I was really pleased with it having a "free" return instruction, in the performance sense, but then later I found out that practically everyone's FPGA Forth has that. It's just sort of an automatic payoff of how Forth works. 2022-04-18 21:06:55 back 2022-04-18 21:06:58 You can see it coming before you get to it, so it's easy to know to pop the return stack instead of read from memory. 2022-04-18 21:07:11 Ooops - sorry - I firehosed you while you were gone. :-) 2022-04-18 21:07:20 I'd do something similar, but different 2022-04-18 21:07:52 I'd modify : so that if a flag is set it inserts a call to an instrumentation word at the start of each word 2022-04-18 21:08:27 The other performance trick I played on that design was to spit conditional jumps into two pieces. There were separate fetch and execute units, with a fifo in between - the hope was that the fetch unit would "get ahead" of the execute unit, and therefore keep the execute unit pumped with primitives in spite of having to peel through some call leyers. 2022-04-18 21:08:31 the instrumentation word would similarly decrement a counter, and when the counter reaches zero, restore it to its max value and write the value of the top of the rstack over serial 2022-04-18 21:08:47 But that means that the fetch unit will see the conditional jump before the execute unit does, and wouldn't know which way to go. 2022-04-18 21:09:21 So I arranged for it to be possible, if the coding situation allowed, for the jump *decision* to be calculated well before the jump POINT. 2022-04-18 21:09:44 And there was a channel that let the execute unit "post" those decisions to the fetch unit, with a tag to identify it. 2022-04-18 21:10:02 The hope was that by the time the fetch unit reached the jump point, the execute unit would already have posted the decision. 2022-04-18 21:10:16 One place where I could see that it would definitely work was processing elements of a linked list. 2022-04-18 21:10:39 Just check p->next first at the top of the loop, post the decision, and THEN go and process the list node however you needed to. 2022-04-18 21:11:06 I think I had four tag values, so you could have up to four-nested loops. 2022-04-18 21:11:51 I didn't do any conditional return stuff back then - I don't think that would have fit that model very well. 2022-04-18 21:12:32 to me that kind of thing feels like premature optimization 2022-04-18 21:13:12 yet some kinds of optimizations make sense to me 2022-04-18 21:13:43 It was the heart of the system - I wanted it to be fast. My whole point here was that I was thinking the same sort of performance oriented htoughts you espoused a few minutes ago, in spite of the fact that my own argument against worrying to much over that applied to me back then too. 2022-04-18 21:14:11 The main thing I focused on was designing the logic so that there were only two propagation delays between clock ticks. 2022-04-18 21:14:56 I gotta go take care of dinner - back later on. 2022-04-18 21:15:04 talk to you later 2022-04-18 21:15:15 this discussion is fascinating 2022-04-18 21:16:46 one half baked idea of mine years back was to do a dual stack machine in fpga and have partial reconfiguration of a functional unit area 2022-04-18 21:17:42 but because of the opaqueness of fpga bitstreams I parked it and did not think of it until I heard of BitGrid 2022-04-18 21:20:43 and use ideas from FlowBasedProgramming to do multi core programming 2022-04-18 21:21:41 travisb_: RTOS? basically hard hard real time or? 2022-04-18 21:24:18 it's soft realtime in that it doesn't do hard deadlines unless you tell it to 2022-04-18 21:25:16 and when it does hard deadlines you can always trap them 2022-04-18 21:25:30 e.g. to do cleanup or to display an error message or simply to ignore them 2022-04-18 21:25:47 or to reboot 2022-04-18 21:26:48 it's a priority-scheduled, preemptively-multitasking RTOS 2022-04-18 21:27:14 for robotics? 2022-04-18 21:28:01 I implemented it mainly because I wanted to have my own OS, and if I was going to have my own OS on an embedded platform it might as well be an RTOS 2022-04-18 21:28:09 what kind of control regime is usually used on top? sub sumption model? 2022-04-18 21:28:59 nah 2022-04-18 21:29:36 just looked up "subsumption model" thing and apparently it is an AI framework for realtime robotics 2022-04-18 21:30:45 yeah, it is stupidly simple 2022-04-18 21:31:07 zeptoforth's RTOS is built around the priority scheduler, task queues (basically the inner workings of semaphores), semaphores, locks, queue channels, rendezvous channels, bidirectional channels (basically rendezvous channels with builtin response messages), and byte streams 2022-04-18 21:31:48 with a bit of a thought you can get emergent behaviours that makes a robot look at least lizard intelligent 2022-04-18 21:32:43 yeah, I haven't done anything with robotics 2022-04-18 21:32:54 even though you could write something like that on top of zeptoforth 2022-04-18 21:33:21 I always recommend https://youtube.com/watch?v=8CXReb7f0Eo 2022-04-18 21:34:13 for robotics stuff 2022-04-18 21:34:19 I'll look at that when I get the chance 2022-04-18 21:34:31 the STC forths are very slow as per the moving forth paper. 2022-04-18 21:35:15 but a practical STC Forth isn't a pure subroutine-calling Forth, and btw moving Forth doesn't really cover modern architectures 2022-04-18 21:35:39 like both zeptoforth and Mecrisp-Stellaris make heavy use of inlining and constant folding 2022-04-18 21:37:00 and Mecrisp-Stellaris at least is damn fast 2022-04-18 21:49:46 I strongly suspect STC Forths on branch-predicting processors are significantly faster than DTC Forths 2022-04-18 22:20:50 travisb_: depends on the branch prediction heutoristics the unit was built with 2022-04-18 22:25:59 Does the Mecrisp-Stellaris run as fast as C compiled code? 2022-04-18 22:26:05 compiled code. 2022-04-18 22:27:13 joe9: it's probably the closest you'll get to that with a FLOSS Forth 2022-04-18 22:27:29 is there any place with those figures? 2022-04-18 22:27:50 it remembers the top few items on the stack in registers, and it reassigns registers to stack positions rather than juggling them 2022-04-18 22:27:58 if you use an -ra compiler that is 2022-04-18 22:28:19 like SWAP for instance compiles to no instructions at all in many cases 2022-04-18 22:28:26 it just reassigns the two top stack registers 2022-04-18 22:29:08 joe9: I wouldn't have stats offhand 2022-04-18 22:29:42 Zarutian_HTC, well with STC a subroutine call is an unconditional branch 2022-04-18 22:30:01 ACTION envies Matthias's compiler-fu 2022-04-18 22:31:05 my own zeptoforth is probably slower because its design is centered around a TOS register combined with the remainder of the stack in RAM