2024-10-15 13:09:57 ^^ Here here. I'd say the same thing aboug letting Verilog do your logic synthesis for you. 2024-10-15 13:10:54 Or, rather, hear hear, I guess. :- 2024-10-15 13:10:54 :-| 2024-10-15 13:25:09 veltas: I am willing to give credit where credit is due - I think the extent to which modern C compilers have been "optimized" is quite impressive. I think that in GENERAL they produce extremely good code. But you're absolutely right - in specific targeted situations, the stuff they do can look damn odd and it's not at all rare for you to be able to do much better on your own. I guess the 2024-10-15 13:25:12 compiler teams would just argue that these are "corner cases," and maybe that's a valid point. I still just like having control of what's going on, though. 2024-10-15 13:25:53 In putting togther a Forth, for example, the compiler just doesn't give you tight enough control over the registered to get the best result. That simple fact alone means your assembly Forth is going to kick the pants off of a C Forth. 2024-10-15 13:26:08 And we haven't even talked about the generated code itself yet. 2024-10-15 13:32:10 s/registered/registers/ 2024-10-15 14:03:32 Compilers are very complicated, very clever, and still have a long way to go 2024-10-15 14:04:15 And it's not like they've done a bad job, but it is a 'hard' problem to optimise code, so it's a process of improvement 2024-10-15 14:04:30 It's one of those heavy problems that consumes all available CPU and dev time 2024-10-15 14:36:50 ya, if you need to control specific registers, that's different and the compiler doesnt handle that well since that applies to almost no one other than emulator or VM writers 2024-10-15 14:37:51 the thing im pointing out is what you see as "damn odd" may actually be optimal. there is a good C++ conference presentation I cant find the link to where the speaker kept getting odd results so he hand "optimized" a bunch of the compiler output and the result of that was always slower 2024-10-15 14:39:07 so the often repeated line about the compiler usually beating a person has more to do with the weirdness of modern architectures like x86 where no one human knows everything you need to know to do an ideal job rather than compilers or compiler writers being geniuses 2024-10-15 14:39:26 otoh, if youre looking at output for AVR or something then im sure you find cases now and then where a human would have done better 2024-10-15 15:06:16 I think there's situations where compiled code that looks inoptimal is actually better than the 'obvious' approach 2024-10-15 15:06:37 But there's also a huge amount of code that the compiler does a worse job with than a novice would 2024-10-15 15:06:50 But also assembly takes much more time to write than C 2024-10-15 15:07:07 I don't think it's two times longer, I would say it's more than that, for me anyway 2024-10-15 15:08:25 And it's less portable, less maintainable, etc. etc. 2024-10-15 15:25:26 and by portability it's not just with targets, but with toolchains. what got me into forth in the first place is that i wanted to do low level development but escape the hard toolchain dependencies. 2024-10-15 15:40:50 One of my objections to the way things have gone is precisely that we CAN'T understand what the processor is doing. On the x86 in particular, we don't even really program the hardware anymore - it's a vm itself under the hood. It's how they've maintained backwards compatibility while still improving performance. Which is fine - I have no objection to that in itself. But I'd still like to be 2024-10-15 15:40:53 able to program at the very bottom layer if I wanted to. 2024-10-15 15:41:11 Let me write my own vm at the level of that one if I choose. 2024-10-15 15:42:42 I used to feel quite strongly about that, but then someone pointed out to me that the bulk of the microcode in the chip isn't updatable - there's only some small resource that supports the installation of firmware "patches." 2024-10-15 15:42:56 That brought an economic argument in that is kind of hard to argue with. 2024-10-15 15:43:15 I'd previously thought that you (or Intel, rather) could just re-write the entire firmware if they wanted to. 2024-10-15 16:38:39 veltas: I would love to see examples of huge amounts of code that the compiler does a worse job of than a novice. do you have anything specific in mind? excluding SIMD and such and assuming compiled with optimizations 2024-10-15 16:40:52 veltas: there was a study 20+ years ago (dunno if it still holds) saying programmers write about the same number of lines of code regardless of language including assembly so I think you're right on it taking more than twice as long 2024-10-15 16:43:48 MrMobius: The one that I encountered recently was overflow detection, assuming you're not able to use the larger type because then it does work nicely 2024-10-15 16:45:16 That's a weak point of C specifically though, but the compiler's not much help 2024-10-15 16:45:33 GCC does have a builtin function for this, at least 2024-10-15 16:45:47 But that's a bit of a hack if we're on subject of optimising C code 2024-10-15 16:46:26 And GCC also has 128-bit types for all 64-bit platforms, so you can just use a bigger type and I think it generates the normal overflow detecting assembly 2024-10-15 16:46:47 But sometimes I'm trying to write C, not GCC-flavor C (maybe that's my mistake though, lol) 2024-10-15 16:47:19 I do personally think assuming GCC or clang makes life in C much nicer 2024-10-15 16:49:09 "not GCC-flavor C" exactly what i meant by my comment earlier 2024-10-15 16:49:12 Does anyone here know about CET (control-flow enforcement technology) on x86, and how that might work with Forth? 2024-10-15 16:56:10 a forth may not have a stack of function call returns for that to be relevant? 2024-10-15 16:57:38 MrMobius: Also why are you excluding SIMD? That's moving the goalposts, it's one of the things I had in mind 2024-10-15 16:57:54 I know that SIMD is harder but there are easy cases that compilers (and C itself) fall short on 2024-10-15 17:03:38 Who was it that basically had a return-oriented-forth? 2024-10-15 17:03:50 mark4 maybe? 2024-10-15 17:34:35 veltas: ya to me it's not the compiler's fault if the assembly it generates is slower to conform to C. I get the point though that a human may do better if they're not bound to the same rules 2024-10-15 17:36:05 veltas: not moving the goalpost , just clarifying. my original comment on compilers beating humans did not include SIMD 2024-10-15 19:52:52 KipIngram: I suspect probably veltas could pretty consistently beat most C compilers' code quality. It's just that *I* can't! 2024-10-15 19:54:23 In terms of assembly Forths vs. C Forths, probably a simple JIT compiler written in C will beat the best pure interpreter you can write in assembly. But of course the JIT compiler is generating machine code, so you're sort of dropping to assembly anyway 2024-10-15 19:54:57 Dan Bernstein did a really interesting talk about the death of optimizing compilers 2024-10-15 19:56:39 basically his claim is that as hardware gets faster, the way we use up its performance is, to a large extent, to throw larger amounts of data at it 2024-10-15 19:57:51 "The death of optimizing compilers" http://cr.yp.to/talks/2015%2E04%2E16/slides-djb-20150416-a4.pdf 2024-10-15 19:57:55 yes, thanks 2024-10-15 19:57:58 I can't see a video of it 2024-10-15 19:58:08 I doubt there is one 2024-10-15 19:58:31 which means that the ratio between the number of times different pieces of code gets executed keeps getting larger and larger. On a 6502, at 300k instructions per second, if you're doing a week-long computation, it might run 180 billion instructions, so the largest possible ratio is that one of your instructions is run 180 billion times and another one is run once 2024-10-15 19:58:35 11 orders of magnitude 2024-10-15 19:58:53 and much more common ratios were a million to one 2024-10-15 19:59:37 but on my laptop I can run 180 billion instructions in about 7 seconds, so if I'm doing a week-long computation, the ratios can be five orders of magnitude larger 2024-10-15 20:00:22 so stuff you do for every piece of data is five orders of magnitude more significant now than it was then 2024-10-15 20:01:22 Bernstein's claim is that optimizing compilers are useful for the stuff that you run often enough that naïve, non-optimizing compilation generates code that's unpleasantly slow, but not so often that it's worth rewriting it in assembly 2024-10-15 20:01:42 and that there's less and less of that code, as a share of all our code 2024-10-15 20:03:08 how did he arrive at that conclusion? i would think 90% or more of desktop code falls in exactly that category 2024-10-15 20:03:38 well, he outlines his reasoning in the slides GeDaMo linked above. I don't think it does or people wouldn't have written it in JavaScript 2024-10-15 20:04:08 (he's sort of overlooking the importance of compiler optimization for making JS viable maybe) 2024-10-15 20:04:37 an example I mentioned the other day in talking about uLisp https://news.ycombinator.com/item?id=41821317 is that traditionally Lisp compilers fetch a code pointer from memory on every function call 2024-10-15 20:05:00 so that if you want to redefine the function being called you can just change that pointer to point at the new function 2024-10-15 20:05:30 that way you don't have to make a pass over all the code in memory changing all the call instructions to that function to call the new version of the function 2024-10-15 20:06:14 but this is something that happens when you interactively redefine a function, which you can't really do more than a few times a minute, and where it probably wouldn't be annoying to wait 100 milliseconds 2024-10-15 20:06:54 iterating over the relocations for a particular symbol and pointing them all at a new function is no longer the kind of thing that takes 100 milliseconds, even if it was in 01977 2024-10-15 20:12:09 9 is uncommon in octal numbers? 2024-10-15 20:13:31 I assume xentrac is taking a long term view of time :P 2024-10-15 20:13:37 :-) 2024-10-15 20:14:17 i figured it was to avoid collision when someone defines a word : 1977 ... 2024-10-15 20:14:45 that does work, zelgomer 2024-10-15 20:15:08 I mean it could take 100 milliseconds if you have a million calls to that function I guess? 2024-10-15 20:15:19 8 is supported in some octal implementations, though Kernighan and Pike refuse to say who wrote that bug 2024-10-15 20:15:30 lol 2024-10-15 20:25:57 anyway so I think that's a very simple example of how a performance tradeoff that was a win 50 years ago is now a lose 2024-10-15 20:26:09 specifically due to the kind of thing Bernstein is talking about 2024-10-15 21:06:43 so a funny thing while thinking about the meta-relocatable-forth. "cell" needs to be a deferred value, like a relocation, because the cell size of the target may be different from the cell size on the build host. which then got me thinking, maybe the whole thing should be based on lazy evaluation. 2024-10-15 21:08:03 you could do a build for a particular target 2024-10-15 21:08:06 that's what people usually do 2024-10-15 21:08:35 i don't see how that solves the problem 2024-10-15 21:09:27 when you're doing a build for a particular target, cell doesn't need to be a deferred value, because a particular target has a particular cell size 2024-10-15 21:09:27 at compilation time, "create foo 1 , 2 , 3 , foo 2 cells + @" needs to be able to produce 3 2024-10-15 21:09:54 you do need to be clear on when you're referring to target cells and when you're referring to host cells 2024-10-15 21:10:54 I think 2024-10-15 21:10:54 sure, i could have target-cell and build-cell, but the point of doing this was exactly to not have that kind of distinction 2024-10-15 21:11:09 hmmm 2024-10-15 21:11:39 yeah, that seems pretty tough 2024-10-15 21:12:28 consider 2 cells value z create foo 1 , 2 , 3 , foo z + @ 2024-10-15 21:13:48 same thing. z should return a lazy value that when it's evaluated at the @ there, it produces one value, but when z is cross compiled it writes a different value. 2024-10-15 21:14:24 i don't think it's that difficult, i just broaden my application of deferred evaluations. 2024-10-15 21:14:40 they're no longer strictly relocations 2024-10-15 21:15:16 yeah, you have to be able to do arbitrary Turing-complete computations 2024-10-15 21:16:22 i think i already can :) 2024-10-15 21:16:23 : comp dup 4 = if 2 else dup 8 = if 4 else begin again then then ; z comp for example 2024-10-15 21:16:56 which is an infinite loop if z is anything but 4 or 8, such as 16 2024-10-15 21:16:59 or 2 2024-10-15 21:17:17 oh... 2024-10-15 21:17:48 i think that's still fine. = would force the evaluation. 2024-10-15 21:18:05 maybe z comp value zcomp to make it available in the target 2024-10-15 21:18:47 yeah, i see what you mean. it's still possible to use it to produce and leak incorrect values. 2024-10-15 21:19:51 well damn 2024-10-15 21:49:23 I think it's probably simpler to try to keep track of which version of a given value is the host version and which is the target version