2023-02-16 00:02:04 I it's a form of back referencing that raises the problem in regular expressions. 2023-02-16 00:02:50 But anyway, on a system that has these particular features, you can type some specific string like 30 bytes long, and basically you'll never finish applying it to some modest size input. 2023-02-16 00:02:59 You get an exponential amount of work. 2023-02-16 00:05:46 Aha! Here you go: 2023-02-16 00:05:49 https://swtch.com/~rsc/regexp/regexp1.html 2023-02-16 00:06:25 That's an excellent paper. 2023-02-16 00:21:19 That's annoying. Hexedit won't open a directory. Says it's "not a file." But in fact it *is* a file. 2023-02-16 00:21:28 Just a special purpose one. 2023-02-16 08:59:25 You know, just came to a realization that's never occurred to me before. 2023-02-16 09:00:00 I've told people many times that the development of fusion reactors is hard because we're basically trying to hold a chunk of a star's core in a container. 2023-02-16 09:00:24 But it's far worse than that. In the core of the sun, energy is produced at a leisurelyl rate - a few hundred watts per cubic meter. 2023-02-16 09:00:54 For something practical we need to go far BEYOND the core of a star, since we will have to have much higher energy density in order to make the thing practical. 2023-02-16 09:01:11 So we have to do a large factor MORE than hold a piece of star core in a container. 2023-02-16 09:20:59 And we need to do it without the assistance of gravitational containment. 2023-02-16 09:21:17 I think it's fairly likely we're just not going to be able to do that, but I guess we'll see. 2023-02-16 09:21:34 If we do, we'll be doing something that *doesn't happen* in nature. 2023-02-16 09:25:38 Finally at a point where I'm happy with my New B design, might be worth writing an article about this 2023-02-16 09:25:57 Is the fusion reactor's energy not present further into the core of the Sun? 2023-02-16 09:27:12 The reference I saw implied that the several hundred watts per cubic meter was a limit - his whole point was to highlight how much more we have to achieve to do it artificially. 2023-02-16 09:27:33 Regarding regular expressions, I think the right way is to create a sort of DSL that you can write in colon definitions for defining the pattern 2023-02-16 09:27:47 Now, this guy could be wrong and acknowledges himself he's not an expert in the field. But he's another fellow who "answers a lot of Quora questions," and I've been very impressed with him so far. 2023-02-16 09:28:00 But... he's not an "authority." 2023-02-16 09:28:06 I don't have any reason to doubt him, it's not something that's hard to look up 2023-02-16 09:28:32 I usually make a point to read his posts anytime I see them, because they are so uniformly "competent seeming" - *I* haven't caught him in an error yet. 2023-02-16 09:28:44 On a smaller scale things are easier though 2023-02-16 09:29:14 Like we've done cold fusion at small scale, and the LHC probably smashes stuff with higher energy than anything in the Sun 2023-02-16 09:29:25 An old cyclotron might do that even 2023-02-16 09:29:54 Well, for sure - yes. The LHC reeaches conditions from just a fraction of a second in the life of the universe. 2023-02-16 09:30:04 But it's certainly not an energy producing system. :-) 2023-02-16 09:30:33 The pragmatic solution has been coal and fission for years 2023-02-16 09:30:41 yes. 2023-02-16 09:30:49 And fission would cover us for quite a long time. 2023-02-16 09:30:56 so we already KNOW how to do it. 2023-02-16 09:31:02 We're just not doing it. 2023-02-16 09:31:19 I think we should build fission reactors, but I think we should only build "passively safe" ones. 2023-02-16 09:31:26 Like pebble bed designs and so on. 2023-02-16 09:31:27 We've spent so much money on renewables, apparently they're now more economical but I doubt they would be if fission had the same kind of funding 2023-02-16 09:31:39 you can turn off all the equipment and walk away, and won't get any disaster. 2023-02-16 09:31:49 It'll just sit there and radiate heat out into the atmosphere. 2023-02-16 09:32:03 I have heard that's the case with the plant in Ukraine that was involved in fighting 2023-02-16 09:32:22 I think the main safety issue for passively safe reactors would probably be terrorist potential. 2023-02-16 09:32:42 If someone deliberately blows the thing to kingdom come, some radiation is probably going to get out. 2023-02-16 09:32:57 From what I've heard the one in Ukraine could have a plane crash into it and it wouldn't meltdown or release radioactive material 2023-02-16 09:33:20 Neat. :-) 2023-02-16 09:33:23 The Fukushima one was ancient tech and next to the sea, not a good combo for an area with tsunami risk 2023-02-16 09:33:27 too cheap to meter, but https://en.wikipedia.org/wiki/Fermi_1 2023-02-16 09:33:44 The main reason we haven't pursued passively safe designs is because the military wanted to be able to harvest weaponizable materials. 2023-02-16 09:34:06 Pragmatically I do think it's necessary 2023-02-16 09:34:22 I mean look at Ukraine, they gave up their nukes, a lot of good it did them 2023-02-16 09:34:27 Well, I think it at least has been - I do think there might be such a thing as "enough" nukes. 2023-02-16 09:34:35 But yeah, we couldn't afford not to go there. 2023-02-16 09:34:43 Because others in the world would have anyway. 2023-02-16 09:37:03 You can get rid of the nukes when there's only one country left, until then.... 2023-02-16 09:37:45 Yeah. 2023-02-16 09:37:48 I agree. 2023-02-16 09:39:04 A few of the folks who historaclly were very anti-nuclear (for power purposes) have "come around." One guy has a couple of videos on YouTube. I have to have some admiration for someone who can set aside old biases based on learning new things. 2023-02-16 09:39:26 And have the courage to get on a public forum and say it out loud. 2023-02-16 09:40:18 The problem is when people don't admit mistakes, but also when they fail to apply those lessons in future 2023-02-16 09:40:27 Yeah. 2023-02-16 09:40:38 It's not just about changing your mind, it's also about looking back and thinking "why did I believe this obvious rubbish?" 2023-02-16 09:41:05 And the answer to a lot of it is just "I want a consistent model in my head for people, and people are more complicated than any model I can fit in my head" 2023-02-16 09:41:54 But society makes you want a simple model so you can join a party and blame everything on the other party 2023-02-16 09:43:09 Yes. And emotions come into it too. We have a tendency to want parties we "don't like" (pehaps for perfectly valid reasons) to be "wrong about everything." 2023-02-16 09:44:05 And as you note, it's generally just more complicated than that. 2023-02-16 09:51:03 Here's the pertinent bit from that regular expression paper: 2023-02-16 09:51:07 "As far as the theoretical term is concerned, regular expressions with backreferences are not regular expressions. The power that backreferences add comes at great cost: in the worst case, the best known implementations require exponential search algorithms, like the one Perl uses. " 2023-02-16 09:51:56 So yeah, back references are "cool." But they're not "easy." 2023-02-16 09:52:29 maybe you want to match an email address with a regex 2023-02-16 10:15:27 Sure. 2023-02-16 10:15:36 I think no one denies the usefulness of back references. 2023-02-16 10:16:07 This guy's main point is that when Perl is presented with a regular expression that DOESN'T require back referencing, it should switch to a superior algorithm. 2023-02-16 10:16:21 Apparently (at least at that time) it didn't. 2023-02-16 10:16:28 And paid a stiff performance penalty for it. 2023-02-16 10:18:25 but then you need multiple regex engines in memory, more CPU to figure out when to use what, ... 2023-02-16 10:18:35 Yes. 2023-02-16 10:18:49 But we're not talking about a "small" performance gain here. 2023-02-16 10:18:53 But, it's muddy water. 2023-02-16 10:19:03 so much cpu is burned during regex compilation it doesn't seem relevant 2023-02-16 10:19:06 Because only some applications would get into the "pathological" territory. 2023-02-16 10:19:13 Your "typical" application is probably fine. 2023-02-16 10:19:18 you're compiling your regexes before reusing them, right? 2023-02-16 10:19:41 I think so, yes, but I don't really know how the actual real software does it. 2023-02-16 10:19:52 This paper is all about generating finite automata from regexes. 2023-02-16 10:19:59 Which sounds like "compiling" it to me. 2023-02-16 10:20:55 yeah, that's what compiling usually means in this context 2023-02-16 10:21:16 However, this paper doesn't talk about back referencing. He's discussing the highly efficient non-back referencing algorithms. 2023-02-16 10:21:23 The "original" regular expressions. 2023-02-16 10:23:19 He shows how each "piece" of a regular expresion corresponds to a sub-network of an NFA. He then talks about how you connect those together. He draws little pictures with bubbles and arrows, but in fact each one of those could be a segment of machine code. 2023-02-16 10:23:39 That you then concatenated in a suitable way, with appropriate glue code. 2023-02-16 10:41:47 This is interesting. The algorithm for doing one of these basically just involves walking the graph structure. The complication is that in some states you have multiple possible next states. The paper presents a stack-based structure for keeping up with those multiple threads of work, but I wonder if in a concurrent process environment if you could just fork on those junctions and send one of the resulting 2023-02-16 10:41:49 processes down each path. 2023-02-16 10:42:06 Meanwhile one thread is reading the input and making it available to all of the machine executor threads. 2023-02-16 10:42:20 If a thread reaches a dead end, he'd just remove himself from play. 2023-02-16 10:42:47 So, at each step the input thread reads the next input char, announces it, and waits for all players to announce they're done with it. 2023-02-16 10:42:48 not the scottish play! 2023-02-16 10:42:54 the CPU is probably already doing this behind the scenes 2023-02-16 10:43:02 And if any thread succeeds in reaching a match state, you're done. 2023-02-16 10:43:25 That way the actuall machine walking code would be written such that it was always following only one path, and no "multi-path" data structure would be needed. 2023-02-16 10:47:39 Makes me picture something like this: 2023-02-16 10:48:04 val1 ... valN N FANOUT ... 2023-02-16 10:48:34 FANOUT would have N threads running, where there had been just one before, and each one would get a single vali value. 2023-02-16 10:48:50 Then they'd all proceed with the same code, just using their "personal" val. 2023-02-16 10:49:48 The input thread would load the next input char somewhere, and then set a variable to N. 2023-02-16 10:50:34 Each machine thread would process that input and then decrement the variable. -1 variable +! would do that in an atomic way. 2023-02-16 10:50:46 The input thread would proceed when it saw that the variable reached zero. 2023-02-16 10:51:09 It will have to be told if a thread kills itself, because then it will want to set the variable to N-1, etc. 2023-02-16 11:09:01 That would be the delicate part of it - working up a good way for the input thread to always know how many worker threads were in play. 2023-02-16 11:09:37 I suppose each thread could increment a thread counter when it's created and decrement it when it ends itself. 2023-02-16 11:26:08 This is neat - looks like regular expression processing is a rather perfect application for concurrent "green threads" (I mean by that threads with no environment - just their little stack region and no other resources). 2023-02-16 11:26:23 They'd just come and go as needed to handle the particular expression. 2023-02-16 11:27:01 You can do this with one line of logic, of course - if you run into a dead end you back up to the most recent branch p oint that's not fully explored. 2023-02-16 11:27:12 But doing it that way means processing parts of the input more than once. 2023-02-16 11:27:24 I.e., rewinding the input. 2023-02-16 11:50:14 Pretty sure the old regex stuff in ed et. al. just backtracked when it needed to 2023-02-16 11:50:48 Although it might have been turned into machine code at runtime... such a step would be unnecessary today 2023-02-16 11:51:10 I think grep might have done that anyway 2023-02-16 11:52:33 Backtracking shouldn't slow things down depending on the regex. But these days everyone wants something general-purpose. The old regex stuff isn't even a "regular expression" in technical sense, it really is just a DSL for "try this, then try that..." 2023-02-16 12:03:23 You can have at most 1024 characters on a line in notepad.exe 2023-02-16 12:05:24 Yeah, the paper says the same time. Ken Thompson's first implementation used sections of machine code. 2023-02-16 12:05:49 The algorithm presented in the paper is in C, though, and uses a stack to handle the multiple work paths. 2023-02-16 12:05:54 With one thread. 2023-02-16 12:06:07 Same thing, not same time. 2023-02-16 12:07:36 regular expressions are one of those stumbling blocks of academia that's consumed a lot of thought and time and has produced very little 2023-02-16 12:09:50 I think they've got a few worthwhile possibilities. But I doubt they're the panacea some seem to think they are. 2023-02-16 12:11:07 So I've been reading these Witcher stories. A major thread of those is that he gets all tangled up with this woman (a sorceress) Yennifer. All I have to say is that as far as I can tell Yennifer is a total bitch. I don't think I'd want her around. 2023-02-16 12:16:44 One of the things I like about my new B design is its fixed-point features. Because there's only one type, a word, it seems quite mathematically weak. But fixed point adds a lot out the box without adding new types 2023-02-16 12:17:33 So like Forth writing something like "1.000" is equivalent to 1000, decimal point is just sugar 2023-02-16 12:21:16 And I'm finding it interesting thinking about how I can do things like cos/sin in fixed-point 2023-02-16 12:22:06 Ok, interesting. in an NFA implementation (where you can be in more than one state at a time and the sort of thing I was describing above), there are never more states in the state machine than there are "meaningful" (i.e., non-parenthesis) characters in the regular expression. 2023-02-16 12:22:30 You can, however, convert that NFA (non-deterministic) to a DFA (deterministic) machine in which you are never in more than one state. 2023-02-16 12:22:41 Effectively, each DFA state corresponds to some set of NFA states. 2023-02-16 12:22:49 You can have more such states than chars in the expression. 2023-02-16 12:23:04 This is how you create a DFA, you can easily convert a regular expression to an NFA, and then convert the NFA to DFA 2023-02-16 12:23:13 It's also possible to begin with the NFA and "cache"/memoize DFA states found on the fly, to get the best of both worlds to some extent. 2023-02-16 12:23:23 Right. 2023-02-16 12:23:27 The DFA size can be way larger than the NFA 2023-02-16 12:23:31 yeah. 2023-02-16 12:23:36 Combinatorics. 2023-02-16 12:24:15 I'm rather enamored with the notion of running the NFA on multiple threads - I'd be interested in seeing how the Forth code to do that, on this system I'm planning, turned out. 2023-02-16 12:24:35 We're talking regular expressions, lots of things people call 'regular expressions' won't fit in an NFA though 2023-02-16 12:24:55 Right - he points out that "modern" regular expressions often aren't regular expressions as historically defined. 2023-02-16 12:25:01 And that's kind of his whole point. 2023-02-16 12:25:16 The thing about NFA's is they're easy to optimise in practice because one branch is usually taken 99% of the time 2023-02-16 12:25:16 His algorithms here are for the strictly defined regular expressions. 2023-02-16 12:25:55 So generating a DFA could be a waste of everyone's time 2023-02-16 12:25:58 In a backtracking implementation you could try to make sure you took that one. 2023-02-16 12:26:06 Yeah 2023-02-16 12:26:08 And life is short 2023-02-16 12:26:16 In this multi-threaded approach you'd be launching them both, and there really wouldn't be a difference in their "priority." 2023-02-16 12:26:41 The multi-threaded approach would be slower because you have communication overhead 2023-02-16 12:26:53 Provided you can accurately predict the brances 2023-02-16 12:26:56 branches* 2023-02-16 12:27:11 Yeah, I'd just be interested in seeing if it made the code seem "simpler." 2023-02-16 12:27:20 the app level code I mean - you can't really make the work go away. 2023-02-16 12:27:29 Again remember we are dealing with matching often short sequences of text, for example regular expressions are sufficient for lexing, so 'tokens' 2023-02-16 12:27:37 But if it's packaged nicely so that parts of it are "built in features," then you wouldn't see those parts at the application level. 2023-02-16 12:27:48 How hard do you think it is to match a token? Turns out not hard at all, none of this stuff is needed for that 2023-02-16 12:29:27 I actually think a classic regex (which also isn't a proper theoretical regex) is usually more helpful than a modern regex 2023-02-16 12:30:15 I haven't really worked a lot with either kind. The "classic" one just appeals to me because you can implement it efficiently and it never bites you. 2023-02-16 12:30:20 For example to a classic regex .* can't be followed by anything other than $, because it will greedily match... unless it's in a capture group that can fail and backtrack 2023-02-16 12:30:29 This definition is easier to think about conceptually 2023-02-16 12:30:30 But I can see how someone who'd gotten used to having back references would miss them. 2023-02-16 12:30:39 Whereas Perl etc I have seen confuse people no end 2023-02-16 12:31:33 Ok, a lot of my typical sed usages are along the lines of this: 2023-02-16 12:32:05 sed 's/foo.*bar/foo bar/g' 2023-02-16 12:32:16 So I certainly put stuff after .*. 2023-02-16 12:32:31 I'm just wanting to cut out everything in between foo and bar and reduce it to a space. 2023-02-16 12:32:45 Yeah I'm thinking of something else, never mind 2023-02-16 12:33:02 There is a difference I just can't remember the specifics 2023-02-16 12:33:15 .* could certainly gobble up the whole rest of the expression, but then you fail. 2023-02-16 12:33:45 But I guess if it starts by trying to take everything, and then backtracks one character at a time that's kind of slow. 2023-02-16 12:34:45 That's how it's got to do it because it wants the largest match it can get 2023-02-16 12:36:01 Is it slow? It never bothers even checking until it finds 'foo', and then if 'bar' is nearby it doesn't have that much extra work to do, comparitively 2023-02-16 12:36:15 Yeah. 2023-02-16 12:36:37 But what if there's a shit ton of input after bar? 2023-02-16 12:36:45 I mean rather if the line is short 2023-02-16 12:36:50 Won't .* try to take ALL of it initially, and then give it up one byte at a time? 2023-02-16 12:36:53 Yes 2023-02-16 12:37:01 foo[^b]*bar if you can 2023-02-16 12:37:12 Exactly 2023-02-16 12:37:33 Welcome to dumb regex algorithms, it's not that hard to optimise usually 2023-02-16 12:37:48 Truth is I've probably never used anything long enough (expression or input) to get into problem land. 2023-02-16 12:38:02 My inputs are typically a few dozen bytes long. 2023-02-16 12:38:14 At most. 2023-02-16 12:38:19 Exactly 2023-02-16 12:38:22 YAGNI 2023-02-16 12:38:46 And if you do run and it's slow just Ctrl+C and re-run with [^b]* instead of .* 2023-02-16 12:39:08 Now we're thinking in forth terms lol 2023-02-16 12:39:13 :-) 2023-02-16 12:39:25 Yes, Forth would advocate the simplest algorithm that performs acceptably. 2023-02-16 12:39:46 "Acceptably' meaning "doesn't cause pain and irritation." 2023-02-16 12:40:42 I suppose this paper is expressing a facet of the "library mentality." 2023-02-16 12:40:49 "You need to be ready for anything..." 2023-02-16 12:41:32 And when I've *thought* about regular expressions and using them in Forth, I think I've impliictly invoked back referencing without really paying much attention to it. 2023-02-16 12:41:51 What I've usually imagined is that I have some big input text that is some "formatted information." 2023-02-16 12:42:04 I.e., it's a large number of records, and each record has the same structure. 2023-02-16 12:42:22 And I imagine the regular expression describing that structure and identifying the data fields of interest within it. 2023-02-16 12:42:51 And when I'm done I want it to have extracted those data fields, performed any conversion of interest, and have given them to me in an easy to use format. 2023-02-16 12:43:12 Maybe one record at a time - process a record, do whatever I want with the data, iterate. 2023-02-16 12:43:37 So I am wanting it to "extract" portions of the data and giveme those values. 2023-02-16 12:57:04 But we all know even with libraries, you can't be ready for everything and the programmer needs to profile 2023-02-16 12:57:33 And the dumb algorithm still provides what's needed to work, even if a bit of optimisation is occasionally needed where it would have been otherwise unnecessary 2023-02-16 12:57:41 And the dumber the algorithm the more predictable the results 2023-02-16 12:58:06 Smarter algorithms might confuse the programmer as to why something is fast, or something is slow. With the dumb one we can all just 'eyeball' it 2023-02-16 12:59:55 Predictable results are good. 2023-02-16 13:00:18 And yes, I definitely believe in using software that you UNDERSTAND. 2023-02-16 13:00:29 Don't like black boxes. 2023-02-16 13:01:45 The methods described in this paper, though, fall into that category of things I usually enjoy tinkering with just because of how "slick" they are algorithmically. It has a theory aspect to it that I always find appealing. 2023-02-16 13:02:53 And while I'm a big fan Forth's very simple space delimiting, the idea that at some point I might want to do *some job* that benefited from more power just isn't something I can rule out. 2023-02-16 13:03:24 For one thing, I want to be able to work nicely with vectors and matrices, and space delimiting that starts to get ugly. You *can*, but it's not very pretty code. 2023-02-16 13:04:00 Sooner or later I'm going to want to be able to type [1, 2, 3] and have a three element vector "on the stack." 2023-02-16 13:04:42 None of this belongs in Forth at the ground floor, though. 2023-02-16 13:05:31 maybe with reader macros of some sort 2023-02-16 13:05:45 We already make an exception (strictly speaking) for numbers. 2023-02-16 13:06:00 A number in a line of Forth is something that isn't in the dictionary, and yet we want to do something specific with it. 2023-02-16 13:06:20 We have done stuff before like: CREATE ARR ,{ 1 2 3 4 5 } 2023-02-16 13:06:23 Conceptually, a string "my string" is no different - it's a literal sure as a number is. That matrix i typed up there is a literal. 2023-02-16 13:06:33 So where we draw this line is, to some extent, a matter of custom. 2023-02-16 13:06:49 We've all accepted numbers, and most of us stop calling it real Forth if it's taken much further than that. 2023-02-16 13:07:56 I mentioned yesterday the possibility of recognizing a well formed regular expression - that would be a literal too, of type regex. 2023-02-16 13:08:04 If you wanted to make it such that it was. 2023-02-16 13:08:14 What we call a literal is a rather fuzzy notion. 2023-02-16 13:09:36 Yeah but at some point of complexity I'm like ... okay why don't I just use C 2023-02-16 13:10:01 Forth is a balancing act of remaining on the lite side of that proposition 2023-02-16 13:10:51 Mind you I think types go too far too 2023-02-16 13:12:56 My favourite forth code is that which manages to 'get by' with the core words, or maybe a few more than that, and do its job quite light and effectively despite the syntax 2023-02-16 13:13:44 Forth syntax is a balance between machine-friendly and programmer-friendly 2023-02-16 13:13:45 Yeah, I toyed with types a little, but decided that while it might be ok to write a Forth application that had "types" that also has no place in the foundation. 2023-02-16 13:14:18 I think such an application isn't okay because it implies you're not developing it the forth way 2023-02-16 13:14:21 I do want to have a language that I can use to program the bare metal - to bring the machine up - but that I can also use to write anything I want, no matter how high level. 2023-02-16 13:14:26 I don't want to switch languages. 2023-02-16 13:14:28 Which is necessarily experimental, constant testing and iteration 2023-02-16 13:15:07 Which limits what you can do, really, because there are problems that can't be attacked that way 2023-02-16 13:16:07 Well, I shouldn't make it sound THAT open-ended. I have a set of applications in mind, which includes scientific computng, that I want to be able to do "nicely." 2023-02-16 13:16:18 I won't be trying to solve every problem in the world. 2023-02-16 13:16:42 It's a good thing to try and approach, it's where Fortran came from 2023-02-16 13:16:53 And that's one of the 'root nodes' of programming 2023-02-16 13:18:00 I probably should do some reading about Fortran. I learned it in college, and used it, but later on I read that there are specific things about it that make it allow higher efficiency on number crunching applications, and I'd probably like understanding that more fully. 2023-02-16 13:18:28 Probably the restrict fiasco 2023-02-16 13:18:46 To do with pointer aliasing 2023-02-16 13:19:36 If you're interested in programming history it's worth reading the FORTRAN 1 manual 2023-02-16 13:19:55 And FORTRAN 2 is mostly the same but they added subroutines and functions 2023-02-16 13:21:48 FORTRAN 1 is pretty much just expressions, branches, and I/O. You can use your own functions if they're written in assembly. 2023-02-16 13:22:28 There are two types: floating-point and integer. The types are determined by conventions like the name of the function being used 2023-02-16 13:22:53 i.e. integer variables have to start with particular letters 2023-02-16 13:23:09 Yes, I do recall some of that type selection. 2023-02-16 13:23:22 It's been a LONG time, though. 2023-02-16 13:23:30 I was never exposed to Fortran I. 2023-02-16 13:23:37 I actually quite like the idea of determining types this way, a lot more brief than what we do today 2023-02-16 13:23:47 It is. 2023-02-16 13:24:26 And it fits our math instincts anyway - we just tend to use i, j, k etc. for integer quantities, and others for real quantities. 2023-02-16 13:24:28 The first serious high-level language and they already had a lot of stuff figured out nicely 2023-02-16 13:24:32 Yeah exactly 2023-02-16 13:24:56 In other words, they didn't invent it - we were already doing it, and they just accomodated our behavior. 2023-02-16 13:25:03 As a good language should. 2023-02-16 13:25:27 The type system is quite smart IMO 2023-02-16 13:25:55 And also the choice of how to fit algebra into computer text 2023-02-16 13:26:15 I guess that was inherited from typewriter maths though 2023-02-16 13:49:26 Ah, this paper: 2023-02-16 13:49:28 https://swtch.com/~rsc/regexp/regexp2.html 2023-02-16 13:49:45 actually discusses a thread-based approach much like what I was imagining earlier. 2023-02-16 13:49:59 Describes a little simple multi-threaded vm for matching regular expressions. 2023-02-16 13:51:11 He uses SPLIT X, Y where I had x y FANOUT, except I was a llowing for more than just a two-way split. 2023-02-16 13:51:23 Dang it - SPLIT is a better name. 2023-02-16 21:22:16 You know, tools like yacc and lex and so on are fairly nice. they do a good job of "packaging" algorithms and making them accessible in a smooth way. But one of the whole points of Forth is that it just removes the need for those algorithms in the first place. I've always thought it's better to make a problem go away than it is to solve the problem. 2023-02-16 22:22:49 Well, I've been thinking more about this whole business of recognizing things and what Forth takes as acceptable inputs and so on. Trying to decide where a sharp line can be drawn. Because we do allow numbers, and in some cases we allow two categories of numbers (ints and doubles, or ints and floats - whatever). So "recognizing categories of things" doesn't seem to be where the line is. I've just about 2023-02-16 22:22:51 decided that the sharp line that can be drawn is that each space-delimited thing needs to be able to stand on its own. Numbers are still space delimited things. 2023-02-16 22:23:39 We cheat a little with things like IF ... THEN. That kind of breaks the "no syntax" rule. But notice that it's only allowed when compiling, and the reason for that is that when you're compiling you are not interpreting, and thus the stack is available as a communication channel forward in the definition. 2023-02-16 22:24:00 So when we're compiling we can implement things like IF THEN without requiring complex parsing. 2023-02-16 22:25:49 Regarding vectors (or complex numbers, which would be the same sort of thing), I'd distinguish between [1,2,3] and [1, 2, 3]. The first one is a fully complete space-delimited entity. The second one isn't. 2023-02-16 22:27:31 So if someone wanted to write a Forth that supported complex arithmetic and wanted to write (1.4,-3.5) or 1.4-i*3.5 - I don't think I'd complain much. 2023-02-16 22:28:12 On the other hand, wanting to have literal strings is more problematic, since "my string" has that pesky space in the middle. 2023-02-16 22:28:40 You don't get the whole thing via the main space delimited separation. 2023-02-16 22:32:34 Of course, we also seem to be ok with anything of the form 2023-02-16 22:32:46 THING: <... whatever stuff we want to have here ...> 2023-02-16 22:32:56 Because now we have that word THING: to relegate the special work to. 2023-02-16 22:33:36 Most commonly THING: is ( 2023-02-16 23:23:44 I guess another Forth truism is that any word, at any time, is free to consume as much input as it wishes and to do anything whatsoever with it. 2023-02-16 23:23:51 That pretty much opens the door to anything. 2023-02-16 23:24:51 Want multi-line comment blocks? Great - you're really already got it with (. The ) doesn't have to be on the same line, does it? When you load a block the newlines aren't even there. 2023-02-16 23:25:19 I sidetracked myself on that - I was going to say "great - define """ but then I realized that ( will already work. 2023-02-16 23:26:19 I think my point is that you CAN do anything - and deciding when someone had "crossed the line" and wasing doing Forth anymore would be a rather arbitrary personal call. 2023-02-16 23:51:39 s/wasing/wasn't/ 2023-02-16 23:51:59 Looks like I have Fortran on my Linux system already. Who knew...