2022-03-22 20:23:55 <vms14       > now I've realized is broken so time to rewrite :D
2022-03-22 21:08:20 <KipIngram   > Man, I'm having a frustrating time trying to chase down information on responding to segfaults.  First of all, virtually all of the information out there is oriented toward C programs.  No one talks much (well, almost no one) about doing the task with assembly.  But most of all, the attitude seems to be that once a segfault happens, the only thing you should even think about doing is "failing gracefully."  I
2022-03-22 21:08:23 <KipIngram   > think those two issues go together - apparently the C runtime infrastructre can't be trusted after these things happen.  But in my system a segfault usually just means I've fouled up a word definition.  Just as a toy example, say I type 0 @ and a segfault is generated.  Well, NOTHING IS WRONG.  The system is absolutely fine, and the hardware caught the mistake and headed it off before any damage could occur.
2022-03-22 21:09:11 <KipIngram   > All I want to do is basically "Hey, OS - don't worry about that; I've got it" and run through my normal error recovery routine with an appropriate error message, which sends me back to the QUIT loop with the data stack looking just like I did before I started typing that line.
2022-03-22 21:10:19 <KipIngram   > If I need to I can restore the processor stack pointer to the value it had at that time do, in case the OS has pushed a bunch of stuff on their on its way to sending me the sigsegv signal.  I don't want to "return" from the signal handler - I want it to function as a trap door through which I'll re-initialize.
2022-03-22 21:10:33 <KipIngram   > But anyway, this is not an easy thing to run down.
2022-03-22 21:11:03 <KipIngram   > A lot of SO comments just say "don't handle signals in asm."  But I find that ridiculous - anything you can do in C you can also do in asm, because at some point the C gets CONVERTED to asm.
2022-03-22 21:11:53 <KipIngram   > Anyway, having such errors crash the program is a wart that I'd really like to remove.
2022-03-22 22:38:03 <tabemann    > KipIngram: in a previous Forth of mine I added SIGSEGV recovery code that worked remarkably well, but in my latest Forth a segfault can occur anywhere in the codebase, including in the multitasker or some other interrupt handler, and there recovery is practically impossible
2022-03-22 22:41:56 <KipIngram   > I'm mostly interested in catching "sloppy coding mistakes."  Where I just leave out a dup or something like that.
2022-03-22 22:42:17 <KipIngram   > But you make a good point - I haven't really given a lot of thought to the fully-developed ramifications of this.
2022-03-22 22:43:09 <KipIngram   > Probably the worst result of crashing in situations like that is that any dirty disk buffers outstanding get lost.
2022-03-22 22:43:34 <KipIngram   > Probably just good practice to flush before running code under development.
2022-03-22 22:43:47 <tabemann    > furthermore a lot of faults in zeptoforth are not due to segfaults at all, but rather, say, a high priority task waiting forever on something, locking out the REPL while doing nothing
2022-03-22 22:44:38 <KipIngram   > I was just thinking earlier tonight how easy it would be to add a "timeout" to the system.  Not really a "time" out, but a "watchdog countdown," so to speak.
2022-03-22 22:44:56 <KipIngram   > Couple of machine instructions in next could do that.
2022-03-22 22:45:00 <tabemann    > a lot of MCU's come with watchdog timers built in
2022-03-22 22:45:10 <KipIngram   > Yes, for sure.
2022-03-22 22:45:24 <KipIngram   > If I had hands on all the hardware, there would be a lot of possibilities.
2022-03-22 22:45:48 <tabemann    > the only question with watchdog timers is where do you reset it
2022-03-22 22:46:03 <KipIngram   > Also, since my "next" is basically a jump through a register value, I could replace next just by reloading that register.
2022-03-22 22:46:16 <KipIngram   > so I can have the "performance next," a "profiling next," etc.
2022-03-22 22:46:23 <tabemann    > e.g. if you reset it in your multitasker you could have a task that effectively locks up the system, while the multitasker keeps on running like nothing happened, and thus the watchdog keeps on getting frobbed
2022-03-22 22:47:12 <KipIngram   > Also, since each task will have its own registers, they could use different nexts.
2022-03-22 22:47:31 <KipIngram   > So I could profile or countdown protect a given task, while others ran normally.
2022-03-22 22:50:07 <tabemann    > what one could do is make the SEGV handler put the system in some kind of "safe mode", e.g. so one could do a postmortem or erase the flash and start over
2022-03-22 22:56:26 <KipIngram   > I assume if I have such a handler it can inspect the registers of the task that resulted in the error?
2022-03-22 22:56:40 <KipIngram   > So I can know in the handler who caused the problem, right?
2022-03-22 22:57:34 <KipIngram   > If it's anything other than the main console process, then just killing the task and logging the issue might indeed be the thing to do.
2022-03-22 22:57:59 <KipIngram   > If it's a line of stuff I just typed at the keyboard, though, I'd like to get an error message and get reset, just as I am for compile errors.
2022-03-22 22:58:15 <KipIngram   > "Rest" meaning put back at the start of line state.
2022-03-22 22:59:34 <tabemann    > unless there is memory protection protecting each task from one another, the thing to remember is that there may be damage done to variables and data structures in memory that may have occurred before the segfault itself
2022-03-22 22:59:53 <tabemann    > which likely are why it segfaulted in the first place
2022-03-22 23:00:04 <tabemann    > yes, segfaults like 0 @ are easily recoverable
2022-03-22 23:00:28 <tabemann    > but segfaults are often a good sign that the system is in an undefined state
2022-03-22 23:00:34 <KipIngram   > Sure, I understand that - in a shared environment it's hard to be 100% sure about things.
2022-03-22 23:00:53 <tabemann    > if you have tasks being protected from one another, sure, killing the process and logging the error is a perfectly fine solution
2022-03-22 23:01:00 <KipIngram   > I'm mostly looking to catch the common "oops" kind of situations, mostly done by me from the keyboard, or perhaps by loading a block I'm working on.
2022-03-22 23:03:25 <tabemann    > I supposed just assuming that things are okay overall and killing the current task unless it's the REPL, where then you'd just reset it, is an okay solution
2022-03-22 23:03:29 <tabemann    > *suppose
2022-03-22 23:10:10 <KipIngram   > Right - I'm sure there are corner cases.  It's hard to think of everything on stuff like this.
2022-03-22 23:10:32 <KipIngram   > I did give some thought earlier to write-protecting the error recovery snapshot while I'm interpreting repl text.
2022-03-22 23:10:44 <KipIngram   > That would at least make it secure against being stepped on.
2022-03-22 23:11:09 <KipIngram   > I'd unlock the page, make the snapshot, lock it back up.
2022-03-22 23:11:21 <KipIngram   > And if I needed to error recover, I could read it back without unlocking it.
2022-03-22 23:11:31 <tabemann    > that's not a bad idea
2022-03-22 23:11:52 <tabemann    > if you can use memory protection you can protect your key data structures except for when you really mean to write to them
2022-03-22 23:12:06 <KipIngram   > COLD makes a fresh copy of the write-protected original system load.
2022-03-22 23:12:16 <tabemann    > so that it'll segfault on purpose when you attempt to write to them otherwise
2022-03-22 23:12:37 <KipIngram   > Yes.