2025-02-08 03:26:44 Shark8: I haven't seen any. Someone did try to translate Crenshaw himself into Forth a few years back, incidentally 2025-02-08 03:29:16 Ulrich's project seems to be mostly forgotten: https://github.com/uho/preForth/forks?include=active%2Carchived%2Cinactive%2Cnetwork&page=1&period=5y&sort_by=stargazer_counts 2025-02-08 03:29:23 xentrac, Thank you; I read someone noting that (Forth translation), which could have some interesting properties: just define a word for your optimization and throw them in together on the your OPTIMIZE word, plus it gives you a chance to re-order and/or re-run optimizations very easily. 2025-02-08 03:30:03 yeah, I think Forth is probably a more flexible language than Pascal for what Crenshaw was up to 2025-02-08 03:31:41 I think part of what he was up to was showing how to do it in an inflexible language though 2025-02-08 03:31:42 Ulrich still seems to be doing Forth stuff though: https://github.com/uho?tab=repositories 2025-02-08 03:32:00 the most seedForth-like thing I know of is CollapseOS (and its more practical successor whose name I'm forgetting at the moment) 2025-02-08 03:32:03 I don't think it was "inflexible" so much as "simple" & "straightforward", and TBH since Pascal *was* designed as a teaching language, it fits well. 2025-02-08 03:32:31 ACTION learned programming with the Turbo Pascal compiler, its error-messages, and manuals. 2025-02-08 03:33:34 xentrac, Thank you for that link, BTW. 2025-02-08 03:35:14 if I want to be more charitable I could say that Pascal is clear and explicit 2025-02-08 03:35:40 I think there's an inherent tradeoff between clarity/explicitness and flexibility 2025-02-08 03:38:57 I mean, I come from the Ada world. I use it for all my personal projects, and was contemplating building a SPARK-verified SeedForth. (SPARK is Ada's verifiable subset + provers.) 2025-02-08 03:40:30 ACTION has a copy of the Forth-2012 spec, and was thinking of doing that, but SeedForth seems a bit simpler, esp considering it's pre-tokenized, and I can just use an enumeration for the words. (I think.) 2025-02-08 03:40:34 yeah, I've been hearing interesting noises from the SPARK direction 2025-02-08 03:41:52 I definitely wouldn't consider myself an expert, but I've done a bit. Enough that I built a verified Base-64 encoder/decoder. If you have questions on it, I can try to answer. 2025-02-08 03:42:24 nice! 2025-02-08 03:42:46 I keep wanting to get into formal methods, especially now that LLMs seem like they might make them more practical 2025-02-08 03:43:11 but I don't really know where to start 2025-02-08 03:43:33 the closest I've gotten is getting my OCaml and Rust code to compile ;) 2025-02-08 03:43:43 oh, and using Hypothesis in Python, which is fantastic 2025-02-08 03:44:26 OpenFirmware/OpenBoot also uses pre-tokenized Forth input, btw. That's what the OLPC XO used 2025-02-08 03:44:37 Well, I think SPARK is a pretty good spot to start: it you can get "the basics" down before ratchetting up. 2025-02-08 03:45:07 Oh, nice! (OpenBoot) -- I hadn't realized that was a tokenized Forth. 2025-02-08 03:45:16 yeah 2025-02-08 03:45:54 I only had a few chances to poke OpenBoot, mainly ressurecting a Sun Ultra45 workstation... turned out to be a bad memory-stick. 2025-02-08 03:46:47 I'm sort of struggling with the idea. OpenBoot was born out of Sun's desire to unseat Intel by making peripherals CPU-architecture-independent, which required that the firmware on them not be bigger than 8086 code 2025-02-08 03:47:01 as I understand it 2025-02-08 03:47:04 which is tough, because 8086 code is pretty fucking dense 2025-02-08 03:48:04 Honestly, OpenBoot was a great idea: let the driver "live" on the device it drives. 2025-02-08 03:48:21 yeah, that's what PC BIOS option ROMs do too 2025-02-08 03:48:31 it's just that the option ROM is written in 8086 code 2025-02-08 03:49:12 but if you're not trying to fit device drivers into tiny early 90s EPROM chips, I think all tokenization saves you is the code to parse the input into spaces and walk the dictionary for each input wod 2025-02-08 03:49:16 *word 2025-02-08 03:49:37 which is, like, not literally zero code, but pretty close 2025-02-08 03:50:52 like maybe five lines of code? 2025-02-08 03:50:52 so I'm struggling to understand the rationale 2025-02-08 03:51:34 can't you do that even if your words vary in size? 2025-02-08 03:52:10 unless you want to update the IR in place, but only in ways that don't increase the number of words in it 2025-02-08 03:53:21 Kind-of, then you run into the problem of knowing how-big something is... if it were a stream, how can you tell you've read enough bits? And if you're using array+offset for arrays, you can't have varying-length elements. 2025-02-08 03:53:44 you know you've read enough bits when you get to a space? 2025-02-08 03:53:52 (Well, technically, you CAN: you make each element of the array the maximum token-size.) 2025-02-08 03:54:47 do you know about the C++ STL taxonomy of iterators: input iterators, output iterators, forward, bidirectional, random access? 2025-02-08 03:55:23 xentrac, If you have a set-minimum/common, yes. (I mean you could also employ Huffman encoding/decoding on the stream, which would do it.) 2025-02-08 03:55:50 I don't think I'm understanding some of the context of the situation you're talking about/describing 2025-02-08 03:56:05 possibly because I've only ever written very simpleminded compilers 2025-02-08 03:56:17 xentrac, Not really: I avoided C/C++ when I could. There's *always* a better option than C or C++, IMO. 2025-02-08 03:57:05 but I did read Appel's 90s textbook on compilers the other week, and I don't remember any of the IR-manipulation algorithms in it seeming like they would benefit from representing your IR as arrays of fixed-length elements 2025-02-08 03:57:39 Ah, don't sweat it too much; I've been thinking on compilers and IRs for a few years. / I'd be happy to rewind/explain my thoughts. 2025-02-08 03:57:40 hmm, well, the iterator taxonomy is a deep and important idea, and it's unfortunately one that Stepanov didn't come across until after he left Ada for C++ 2025-02-08 03:58:46 it's applicable in almost any language, but unfortunately people tend to only know it from C++ or D (which has a further developed variation in the form of "ranges") 2025-02-08 03:59:56 Matthew Heaney ported the ideas back to Ada but I don't think it got any uptake: https://www.sigada.org/ada_letters/sept2004/charles.pdf 2025-02-08 04:00:09 Ada has iterators (forward) and reversible. I'm assuming Bidirectional corresponds to Ada's component-library's Cursor... And Random would probably be the generalized indexing. -- Right? 2025-02-08 04:01:27 xentrac, Ah. Charles-containers. They were the basis for Ada's standard containers, IIUC. 2025-02-08 04:01:49 Hmm, well, there's a subtyping relation. Input iterators are a subset of forward iterators, which are a subtype of bidi, which are a subtype of random-access. 2025-02-08 04:02:08 so if you have an algorithm that wants an input iterator, you can run it on a random-access iterator 2025-02-08 04:02:30 Ah! I see. 2025-02-08 04:02:39 That's better news than I expected, about Charles! 2025-02-08 04:03:45 Anyway what I was going to say is that most of the IR-manipulation algorithms I remember from Appel are forward-iterator algorithms, and it's easy to provide forward iterators for sequences of variable-sized things. 2025-02-08 04:04:39 so I'm trying to understand what kinds of things you want to do with the tokenized code require random access, but not variable-size insertion and deletion operations 2025-02-08 04:07:46 I don't recall if I did read Appel's book, the name sounds familiar. 2025-02-08 04:08:06 Shark, do you know dbutton by any chance? He also comes from Ada, is crazily productive with common lisp nowadays 2025-02-08 04:08:36 he's written a couple of compiler textbooks; this was the one that covers things like transformation to SSA form and SSA optimizations 2025-02-08 04:10:52 veqq, Yes! He and I have talked about a lot of stuff. He even ran an Ada contest I won making a bulletin-board (I used Ada.Containers.Multiway_Trees to represent the formatting/text nodes, so could output to HTML super-easy, and probably PS, too... but I didn't write that transformer.) 2025-02-08 04:10:52 nice 2025-02-08 04:11:33 veqq, did you have any questions? 2025-02-08 04:11:53 Amazing! Early in my career I worked on a few add systems, changing logic around etc. It's amazing how well thought out everything is if not the most ergonomic (from memory at least). 2025-02-08 04:12:12 No questions, just curious! I've talked to him a few times, used a framework he made inspired by one in Ada land 2025-02-08 04:13:28 Was it the Gnoga framework? 2025-02-08 04:14:04 Yes 2025-02-08 04:14:57 Yep, it's a good system... was kicking around the idea of using it in/as a UI for an IDE. 2025-02-08 04:15:36 Same as me. I've been fiddling around with a structure editor (letting you code with only a ...number pad) 2025-02-08 04:15:39 xentrac, (re: why you might want uniform length items to work on) -- https://github.com/OneWingedShark/Byron/blob/master/src/reader/readington.adb 2025-02-08 04:17:16 That link is the reader for my compiler: it takes a file-stream for a text-file and converts it into a Wide_Wide_String (Unicode-codepoints; 32-bit)... 2025-02-08 04:19:28 Then we turn that into a text-token: https://github.com/OneWingedShark/Byron/blob/master/src/lexer/passes/lexington-aux-p0.adb 2025-02-08 04:22:47 Split on whitespace: https://github.com/OneWingedShark/Byron/blob/master/src/lexer/passes/lexington-aux-p1.adb , etc (each of the p## functions is another pass on tokenizing, transofming into the proper tokens.) 2025-02-08 04:24:46 xentrac, Since everything is a single-size in the text-tokens, I don't have to worry about (e.g.) UTF-8 non-latin characters taking up more than a byte, and having to fix-up that... or the various ways you can compose the same character. 2025-02-08 04:26:35 veqq: sounds like fun; do you have a prototype? 2025-02-08 04:27:21 Shark8: I wouldn't worry about that much in any case, just comparing tokens by byte-sequence equality 2025-02-08 04:27:50 veqq, My problem with doing an IDE, is it'd have to be graphics-heavy for the first iteration, and I am not the best at aesthetic UI. -- But the ideal would be something like a system where Booch's graphical/iconic notation was used to show the units, and flow-charts for actual contents of the subprograms. 2025-02-08 04:29:48 in http://canonical.org/~kragen/sw/dev3/hashtrie.c my tokenizer was `char *p = buf; for (char *q = buf; *q; q++) { 2025-02-08 04:29:51 oops 2025-02-08 04:31:06 char *p = buf; for (char *q = buf; *q; q++) {char c = *q; if ('a' <= c && c <= 'z' || 'A' <= c && c <= 'Z') {if (!*p) p = q; continue; } *q = 0; if (strlen(p)) upsert(&t, p)->count++; p = q; } if (strlen(p)) upsert(&t, p)->count++; 2025-02-08 04:32:22 you could easily change the alphabetic test to `if (c != ' ')` 2025-02-08 04:32:55 and then it works fine on UTF-8 non-Latin characters as long as your users aren't malicious 2025-02-08 04:34:07 if they're malicious you might have to convert the input into Normalization Form C or something first 2025-02-08 04:35:01 Not a usable one atm. Mostly just egonomics experiments. I've been having trouble determining what symbols could plausibly be needed in a given situation 2025-02-08 04:35:01 But I just joined forces with the guy making this: https://crowdhailer.me/2025-01-02/the-evolution-of-a-structural-code-editor/ (for the EYG language) which is way cooler 2025-02-08 04:35:11 xentrac, Yeah, I wasn't making that assumption. / I actually want the "core" of the IDE to not have text/files as source: store it in a database as meaningful structure! Relegate "text" and "files" to the "interfaces with the outer world" and don't force the tooling to cater to system-dependence. 2025-02-08 04:37:09 veqq, Nice! / Exactly where I want to go, TBH. (Well, that plus verification "front-to-back".) 2025-02-08 04:37:33 veqq: I've done some ergonomics experiments with (mostly stack-based keyboard-driven) structure editors for formulas like http://canonical.org/~kragen/sw/dev3/rpn-edit#2015_1975_-_365_1_4_/_+_*_17_31_30_8_-_+_+_365_1_4_/_+ and https://asciinema.org/a/389974 and the longer https://asciinema.org/a/391001 2025-02-08 04:37:53 Shark8: do you mean like Interlisp? 2025-02-08 04:38:33 there were some early versions of Smalltalk that also decompiled the bytecode when you wanted to edit methods I think 2025-02-08 04:38:57 xentrac, Interlisp? That sounds vaguely familiar... but I can't recall. 2025-02-08 04:39:57 Shark8: there's a revival at https://github.com/interlisp/medley and friends, but you can find demo videos of InterLisp on YouTube 2025-02-08 04:40:26 xentrac, But kind of like smalltalk. Kind of like the code-editor veqq showed. 2025-02-08 04:40:27 If you're in a statement with a map, you'll probably want to use map funcs, or create something new. So you can add parentheses, an existing symbol, let you define a new symbol etc. So you can hit e.g. 5 to add an existing symbol (built in vs. self defined vs.an import) then... I need to manually categorize the options from there and I haveno idea how to order them. Huge blocker. Doing minimal thins with a single data structure kind of works tho 2025-02-08 04:41:07 Oh! Hold on. I *need* to get you my favorite paper. 2025-02-08 04:41:58 Shark8: it also sounds like you'd probably like to see Jonathan Edwards's Subtext demo videos 2025-02-08 04:42:48 https://users.ece.utexas.edu/~perry/work/papers/icsm87.pdf 2025-02-08 04:43:42 this sounds familiar but I don't remember having read it. thanks! 2025-02-08 04:44:42 ^-- Shows the design of a version-control system that, by design, doesn't suffer from "Dave broke the build", and essentially solves CI. (Add your tests as checks on the "merge up the tree" function.) 2025-02-08 04:45:23 (The history of the root-node *IS* the history of the project in a compilable state.) 2025-02-08 04:48:43 Damn, this is cool stuff 2025-02-08 04:49:43 Aegis was a later (and still available) version-tracking system based on that idea, I think? 2025-02-08 04:51:04 xentrac, I don't know; I'm unfamiliar w/ Aegis. (I'll do a web-search and take a look though.) 2025-02-08 04:51:14 Also, note the date on the paper: 1987. 2025-02-08 04:51:58 yeah, Aegis was from the late 01990s 2025-02-08 04:52:37 https://aegis.sourceforge.net/ https://better-scm.shlomifish.org/aegis/ 2025-02-08 04:55:13 Hm, so yeah. It's kinda like that, but now imagine it being database-based (with a hierarchical database), instead of text-/file-based. 2025-02-08 04:58:22 What's the advantage of being database-based instead of being text-file-based? 2025-02-08 05:00:39 I think hierarchical databases have been obsolete for 55 years now 2025-02-08 05:00:56 With a database, you don't have to worry about (e.g.) moving/restructuring files or directories impacting your project; you don't have to worry about (e.g.) screwing up your PATH or other environment variables; since it's a semantic/structured approach, tab-vs-space doesn't exist: have your display use what you want. 2025-02-08 05:03:20 xentrac, (re: heirchical databases) -- The "hierarchical" could be hoisted to the underlying design, rather than the underlying storage-technology: Imagine having the hierarchy, except the one with leaf-nodes, as representing distributed computers/systems. 2025-02-08 05:12:40 it sounds like you're talking about usability improvements, mostly, but fairly shallow ones 2025-02-08 05:16:20 I mean, my big goal is the verifiability of the IDE (compiler, prover[s], database). 2025-02-08 05:17:09 that seems like it could make sense 2025-02-08 05:17:48 And portability; which is where the interest in Forth/SeedForth comes from. Give me a second to type it out. 2025-02-08 05:23:13 (1) SPARK proved compiler, with modular backend; 2025-02-08 05:23:13 (2) SPARK proved SeedForth; 2025-02-08 05:23:13 (6) Use 4+5, to run 1+3 programs; 2025-02-08 05:23:15 (7) Implement the 30-ish SeedForth words [on another arch]; 2025-02-08 05:23:17 (8) If needed, write a [possibly] SPARK verified backend. 2025-02-08 05:24:39 Bam! Porting Ada boils down to "Implement the 30-ish SeedForth words" and a "recipe" of recompilation. 2025-02-08 05:26:26 I don't think that scenario changes very much whether the AST that is the input to step (1) comes from a conventional parser or is stored in a database 2025-02-08 05:29:54 any assurance that the output of the formally verified compiler is correct depends not only on the verification of the compiler but also the assurance that its input is correct 2025-02-08 05:31:11 you could maybe argue that this becomes easier if you don't have an error-prone tokenizer and parser in between what you're looking at and the AST 2025-02-08 05:31:12 xentrac, Sure; but that *is* talking about bootstrapping/porting... #6+IDE means you're interpreting the IDE, arguably you don't even need an OS (though it might help on some nice things). 2025-02-08 05:32:15 but I think actually tokenization and parsing is the easiest part of the whole compilation process to prove correct (by providing a proof witness for a given parse, at least, not necessarily proving that the parser parses every possible input correctly) 2025-02-08 05:33:36 and if the AST is stored in a database, you still need some kind of presentation layer that shows you what's in the database, and that inspector is exposed to the same kinds of threats to validity that a parser is 2025-02-08 05:35:03 there are plenty of cases of "schizophrenic document" security holes where different pieces of software applied to a single database yield dangerously different interpretations, as well as things like subliminal channels and other unintentional information leaks 2025-02-08 05:36:09 xentrac, not really: you can have that "protocol" operating on objects, and have the "fields" of those being type-checked/-constrained. 2025-02-08 05:36:24 that doesn't help 2025-02-08 05:36:50 ^-- E.G. Like how HTML5 got form-components that disallow improper values. 2025-02-08 05:37:05 I think the general approach of having a simple, small portability layer that makes it easy to port software to a new architecture is valid and valuable, and remains so when you formally verify the software stack 2025-02-08 05:37:46 neauoire's uxn is one implementation of this approach, and crc's konilo is another, but as you know, there are dozens 2025-02-08 05:38:44 I think the only one that is formally verified is CompCert, where the "small" portability layer is C itself 2025-02-08 05:40:08 Honestly, certifying C is way harder than it ought to be: the "simple" of C is quite the deception, and that's not counting how many footguns it promotes. 2025-02-08 05:40:54 I wonder what "The Industry" would be like if BLISS or Forth had been adopted for C's low-level roll. 2025-02-08 05:41:08 *role 2025-02-08 05:41:17 oh, I've seen XML "security" systems that verified the XML (but paid no attention to the tabs and spaces used) 2025-02-08 05:43:31 thrig, Ew. (But, XML did have DTDs, something that, I think, the JSON-craze is going to find/fuel a need for.) 2025-02-08 05:43:42 CompCert isn't written in C, so its authors aren't "certifying C" 2025-02-08 05:44:53 CompCert is written in Coq, from which OCaml code is extracted 2025-02-08 05:45:58 CompCert will happily compile incorrect C code into incorrect executable code. what the proof of CompCert shows is merely that it won't compile *correct* C code into incorrect executable code 2025-02-08 05:46:30 https://xavierleroy.org/publi/compcert-CACM.pdf is a paper on its approach from 16 years ago 2025-02-08 05:47:34 Certifying C, the language, is still (as I said) way harder than it ought to be, and that its "simple" is deceptive. 2025-02-08 05:47:46 what would it mean to "certify C, the language"? 2025-02-08 05:48:05 I can certify that C, the language, should be committed to a mental institution. 2025-02-08 05:49:23 LOL -- Good catch. But "an implementation conforming to the standard in a provable manner" would be what I meant. 2025-02-08 05:49:37 oh, I agree! 2025-02-08 05:50:51 (You *could* try to do the language-standard, perhaps with a model-checker... but, mmmm, that might drive you certifiable.) 2025-02-08 05:50:51 I'm not sure what properties you'd want to prove about the language standard. 2025-02-08 05:51:49 You could prove that it's possible to write buggy code. Maybe you could prove that the standard is consistent in the sense that (some formalization of) it doesn't contradict itself, so that it's possible to write a conforming implementation. If so, you might be able to prove that it's possible to write correct C code. 2025-02-08 05:52:10 But of much more interest is that it's *difficult*, and that's something we don't know how to formalize. 2025-02-08 05:53:15 *shrug* -- There's lots, but you could try for something like consistency, and that the described model (IIRC there's a model/theoretical CPU) works as intended. 2025-02-08 05:54:02 But yeah, I agree. 2025-02-08 06:03:16 I'm going to bed. 2025-02-08 06:37:10 xentrac, Good night. 2025-02-08 17:12:09 o/ 2025-02-08 20:07:19 hi there neauoire 2025-02-08 20:07:43 where are you these days? 2025-02-08 23:57:54 Cool - I actually managed to use the "magic" VLSI design package to lay out a cmos inverter, extracted its information to a SPICE file, and simulated it - and it worked. :-) 2025-02-08 23:58:33 Rather primitive 0.3 micron process, but it's still the first time I've ever done that.