2022-02-15 05:31:45 joe9: It's something I've thought about, security. I think that you could apply memory protection practices at a smaller granularity, in a way that 'fits' FORTH a bit better 2022-02-15 05:32:15 For a desktop FORTH the answer is easy, the FORTH is a program and so just apply normal userspace program security ideas 2022-02-15 05:32:53 Anyway I'm sure you can implement good security into a FORTH OS without much pain 2022-02-15 05:32:54 veltas, I have boundary checks on fetch, store and execute. 2022-02-15 05:33:16 and syscall addresses 2022-02-15 05:34:09 other than that, I am not sure if someone can access memory in other ways.. 2022-02-15 05:34:17 What is this? An OS? 2022-02-15 05:34:32 there is no code word available. 2022-02-15 05:34:51 I am replacing the userspace in inferon with forth. 2022-02-15 05:35:14 inferno uses limbo/dis vm and replaces the userspace demarcation. 2022-02-15 05:35:16 Well if I can write arbitrary machine code and execute it then it will bypass your checks 2022-02-15 05:35:28 every syscall is a function call. 2022-02-15 05:36:13 how does one write arbitrary machine code without the "code" and this is indirect-threaded code. 2022-02-15 05:36:16 The 'real' solution is to use virtual memory protection, you need hardware that supports that or basically don't bother trying to do any memory security, or only allow execution of an abstract VM 2022-02-15 05:36:34 Where's the code area? Just in the dictionary? 2022-02-15 05:37:01 With "code" word, I mean the forth "code". eforth/f83 have it. 2022-02-15 05:37:11 If it's in an address I can write to, then I can write the opcodes myself, I don't need assembly. I'm assuming EXECUTE takes the address of code to execute? Or address of a code pointer? 2022-02-15 05:37:15 it is used to program in assembler in the forth dictionary. 2022-02-15 05:37:37 I'm just saying if someone malicious wants to execute arbitrary machine code then it sounds like they can 2022-02-15 05:38:30 03:37 < veltas> I'm just saying if someone malicious wants to execute arbitrary machine code then it sounds like they can -- I am trying to figure out how? 2022-02-15 05:38:30 FORTH is too low-level to sensibly make secure, instead treat FORTH like C and use virtual memory security 2022-02-15 05:39:04 What does your EXECUTE do? 2022-02-15 05:39:57 http://okturing.com/src/13105/body 2022-02-15 05:40:13 this is plan9 assembler. So, the syntax is different. 2022-02-15 05:40:29 It is basically src, destination operand style. 2022-02-15 05:41:03 the aboveupe and belowup are the exceptions 2022-02-15 05:41:45 just the basic indirect threaded code. 2022-02-15 05:42:34 It doesn't check the CODE pointer is in range btw 2022-02-15 05:42:57 It retrieves it and just jumps to it 2022-02-15 05:43:09 you could probably replace c! with a bounds checked version 2022-02-15 05:43:23 But even if it did, I can overwrite the machine code at that pointer and then execute that 2022-02-15 05:43:34 sure, I have been thinking about that. It should be easy to do as I know the bounds where the code is. 2022-02-15 05:43:53 no, that code will be kernel read-only. 2022-02-15 05:44:01 it cannot be changed. 2022-02-15 05:44:19 That is what I am trying to get at, you need to set up the memory map properly..... 2022-02-15 05:44:31 So there will be no executable+writeable memory mapped? 2022-02-15 05:44:45 yes, the kernel executable is readonly. 2022-02-15 05:45:03 Not equivalent statements but okay 2022-02-15 05:45:13 What does EXIT do? 2022-02-15 05:46:21 http://okturing.com/src/13106/body 2022-02-15 05:46:28 Are UP and UPE the kernel bounds? 2022-02-15 05:46:43 user program memory. 2022-02-15 05:46:55 all the dictionary and user state are within these bounds. 2022-02-15 05:47:11 the only stuff outside this are the primitive code words. 2022-02-15 05:47:29 the primitive code words are provided by the kernel (r/o) 2022-02-15 05:48:13 and the kernel bindings such as read, open, etc... 2022-02-15 05:48:17 syscalls 2022-02-15 05:49:17 Potentially you're open then to attacks where the code pointer in a user dictionary word is modified to include an address from the kernel that happens to let me read/write from where I shouldn't 2022-02-15 05:49:42 yes. 2022-02-15 05:49:47 Also you are vulnerable to SPECTRE et al if everything's mapped 2022-02-15 05:49:52 I am wondering how that would work though. 2022-02-15 05:51:01 Well if you can get W to contain the address somehow (probably not hard to find a way) 2022-02-15 05:51:27 Then you can use a code pointer to the "MOVQ (W), CX" part of a NEXT expansion 2022-02-15 05:51:58 Hmm that lets you execute anything in kernel, whoops we can already do that 2022-02-15 05:52:12 What does fetch look like? 2022-02-15 05:52:17 @ 2022-02-15 05:52:41 http://okturing.com/src/13107/body 2022-02-15 05:53:18 Then have a code pointer to the bit after CHECKADDRESS 2022-02-15 05:53:43 and you can read anything 2022-02-15 06:01:00 cool, did not think about that.. 2022-02-15 06:01:02 thanks. 2022-02-15 06:03:09 would it be possible to protect against such by making the dictionary readonly after compilation? 2022-02-15 06:03:37 or, make the bounds checking exclude the dictionary space? 2022-02-15 06:04:16 I guess it does not help much as execute can be used to bypass that. 2022-02-15 07:05:58 joe9: In my opinion FORTH is too low-level to make 'safe' within its own model, you would need to use the same techniques any OS uses to protect memory between processes / in kernel 2022-02-15 07:06:26 You could have a FORTH *on* a VM that's 'safe', but it just seems like a terrible waste of CPU cycles to me 2022-02-15 08:23:49 veltas, dis vm uses a jump table of primitives. 2022-02-15 08:25:07 I guess by VM I mean something like having a set of opcodes, processing those ops manually 2022-02-15 08:25:14 You could do a tokenised FORTH 2022-02-15 08:25:26 yes. 2022-02-15 08:25:41 do you know of any benchmarks with tokensied forth? 2022-02-15 08:26:07 No but I wouldn't imagine it would be slower than i.e. python 2022-02-15 08:27:45 I would expect something in the order of a 10 times slowdown, that might not be a huge deal depending on what you're doing and if you're willing to write certain critical paths as specific opcodes / machine code 2022-02-15 08:28:27 And I could be totally wrong to be fair, might be even faster than that 2022-02-15 08:29:02 10 times seems about right as I saw the dis vm run at that speed. 2022-02-15 08:29:56 I've written a VM that had similar properties and it ran like 20 times slower, wasn't particularly optimised though (was a table of opcodes in C) 2022-02-15 08:30:09 20 times slower than C that is 2022-02-15 08:30:35 after optimising opcode usage manually 2022-02-15 08:30:47 So that's why I say "order of 10" 2022-02-15 08:31:14 it can probably be sped up with a jit. 2022-02-15 08:31:18 interpreter. 2022-02-15 08:32:12 Well the C code was probably worse than what you'd get writing NEXT, DOCOL et al in assembly manually 2022-02-15 08:32:44 So don't even need to resort to a JIT, but yeah JIT is getting closer to full speed 2022-02-15 08:32:57 Then your VM simulator is an emulator ;) 2022-02-15 08:45:09 do you think there is any merit to making the dictionary read-only (unless using forget)? Then the execute can check against a list of primitive code words as the only entry points. 2022-02-15 08:46:39 sure, the compiler and the interpreter will take a hit. 2022-02-15 08:47:13 I mean the interpreting cycle at the prompt. 2022-02-15 08:47:39 the interpreting from compiled words should run as usual without these performance hits? 2022-02-15 09:19:59 "I've written a VM that had similar properties and it ran like 20 times slower" interesting. is this just raw primitive speed or also taking into account that stack manipulation is a lot slower than using registers? 2022-02-15 09:22:30 ie benchmarking each primitive or benchmarking a whole function or program 2022-02-15 10:18:56 MrMobius: I'm not sure what you mean, I just had it do a very contrived program like a test loop or something to see how fast it ran, probably not even representative 2022-02-15 10:19:27 This wasn't a FORTH project it was a fake CPU-type architecture I made a while ago... can't remember why lol 2022-02-15 10:19:31 Just messing around 2022-02-15 10:21:05 The reason why Python's C interpreter is so slow is because Python has a lot of high level stuff going on as well, if it was just a simple machine of integers and flat memory it would be inherently faster 2022-02-15 10:23:12 veltas, I know what you mean about a VM. what Im asking is if you made a meaningful function to test on other than just a test loop 2022-02-15 10:24:05 if youre juggling data in a loop on the stack, for example, then you will lose even more speed because all those DUP, DROP, SWAP, ROTS, etc add to overhead whereas the same program written in C would not have that 2022-02-15 10:24:43 Yeah the machine I simulated had CPU registers 2022-02-15 10:24:59 well, you may lose the speed. i think the mechanisms of x86 may help with stack thrashing so the penalty may not be as much. was just curious what the 20x number represents exactly 2022-02-15 10:25:26 i see 2022-02-15 10:25:40 Indirect (and tokenised) execution flow would be what it represents, also code controlling it was written in C so not even optimal probably 2022-02-15 10:26:18 So I assume a tokenised forth would be at least as fast as that, if not faster 2022-02-15 10:26:40 Only one way to find out though 2022-02-15 10:28:27 this might be interesting: http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/ 2022-02-15 10:30:19 Yeah that is interesting, and seems to conform to the rough model in my head of what Python's up to and why it's slow 2022-02-15 10:51:53 veltas, any opinions on locking down create into a primitive that checks cfa and then making the dictionary read-only? 2022-02-15 10:52:14 forget will also be a primitive too. 2022-02-15 10:55:00 I don't know, sounds dirty to me, I don't think there's a nice way of making this secure 2022-02-15 10:55:42 I think the tokenised option is the only one you can make absolutely bullet-proof 2022-02-15 10:57:48 any opinions on 4th? 2022-02-15 11:06:18 how about tokens for just the primitives (all cfa's) instead of the whole dictionary as tokens? 2022-02-15 11:08:55 This thing? https://thebeez.home.xs4all.nl/4tH/4tHmanual.pdf (re '4th') 2022-02-15 11:11:05 joe9: Have a token word to execute a colon word at an inline relative offset, yeah just primatives as tokens 2022-02-15 11:11:26 cool, that would work. 2022-02-15 11:11:55 I can beat or am close to C userspace now. 2022-02-15 11:12:20 So I would have like CALL16 CALL32 CALL64 to execute at 16,32,64-bit signed offsets from IP after the offset 2022-02-15 11:12:22 with Indirect threaded code. It should not be a big deal to add another asm lookup for tokens. 2022-02-15 11:12:57 Yeah the basic compiler that comes with your system is not very optimised right? 2022-02-15 11:13:21 the c compiler, you mean? 2022-02-15 11:13:25 Yes 2022-02-15 11:13:51 it is plan9 C. I think it does some optimizations.. but, I presume not close to gcc level.. 2022-02-15 11:15:08 why do you use CALL16, CALL32 or CALL64? Are they the equivalents of (branch)/(if) primitives? 2022-02-15 11:15:16 in your forth? 2022-02-15 11:15:37 That's what I'd call the tokens that call arbitrary colon definitions 2022-02-15 11:15:54 So you can have more than 256 words! 2022-02-15 11:16:34 oh, instead of docol(?) 2022-02-15 11:17:13 I think docol has a slightly different job 2022-02-15 11:18:12 docol expects the colon def content in W or something, but CALL16 for example expects a 16-bit offset at IP, and increments IP by 2 2022-02-15 11:18:23 would you mind sharing your implementation of CALL16, CALL32, CALL64? I think it is easier to understand when there is code to see. 2022-02-15 11:18:35 I've not implemented this it's in my head :P 2022-02-15 11:19:02 I've got an (optional) token forth but it's for an 8-bit CPU so there's just fixed 16-bit addresses 2022-02-15 11:19:51 Actually in my 8-bit implementation I encode the full 16-bit lookup in 2 bytes instead of 3 2022-02-15 11:20:33 By writing them in big endian order, as the dictionary's in high memory, so I use 0-127 as token codes and 128-255 starts a big endian address 2022-02-15 11:23:19 joe9: In use it would look something like this: colon emit_twice, "EMIT-TWICE"; db dup; db call16; dw emit-$; db call16; dw emit-$; db exit 2022-02-15 11:23:39 And I'd probably make macros to clean it up 2022-02-15 11:24:15 For : EMIT-TWICE DUP EMIT EMIT ; 2022-02-15 11:26:16 why not just use the NEXT macro? 2022-02-15 11:26:34 oh, I get it. 2022-02-15 11:26:51 it is just another way of jumping around.. 2022-02-15 11:30:33 joe9: NEXT is for the end of CODE definitions, CALL16 et al are special tokens for calling non-token words in a colon definition 2022-02-15 11:31:46 Having them be relative is important, it saves size and makes it easier to save state for position-independent executables 2022-02-15 11:32:20 if they are the goals, then it makes sense. 2022-02-15 11:32:34 Saving size is important, even on modern arch's, to generally reduce cache misses 2022-02-15 11:33:07 without any optimizations, my core word set to get the interpreter working is around 5KB. 2022-02-15 11:33:16 I have 64GB of RAM. 2022-02-15 11:33:16 Nice 2022-02-15 11:33:25 so, I am not worried about size yet. 2022-02-15 11:48:16 09:11 < veltas> joe9: Have a token word to execute a colon word at an inline relative offset, yeah just primatives as tokens -- i like this idea. 2022-02-15 11:48:27 It makes more and more sense and it is also very elegant. 2022-02-15 11:48:42 it locks down the system too.(?).. 2022-02-15 11:49:02 thanks. 2022-02-15 11:51:29 Yes it locks the system down if you're only allowed colon defs by CALLx 2022-02-15 11:52:09 It needs a little playing around with but yeah you could probably make that locked down 2022-02-15 11:56:38 09:51 < veltas> Yes it locks the system down if you're only allowed colon defs by CALLx -- could you please talk more about this? The only words in my dictionary would be colon words or primitives (no asm code words). Is this what you mean? 2022-02-15 11:58:59 I think you are referring to colon words as CALLx words(?) 2022-02-15 12:06:44 joe9: Well I'm assuming you have some kind of code pointer field? 2022-02-15 12:10:00 If CALLx assumes DOCOL and just initiates a colon definition, rather than checking the code pointer, then it's safe. If it executes the code pointer then you can execute arbitrary code. 2022-02-15 12:10:55 So the locked down approach might be sort of like a cross between subroutine threading and tokenised threading, where there's no code field and everything's a colon word, even CREATE words 2022-02-15 12:12:17 And then CREATE words can start with a token word that puts the address of the rest of the definition on stack and then rdropexit's, and you can convert to a DOES> word by overwriting that token 2022-02-15 12:41:12 instead of using the stack, why not make NEXT jump to a token table when the address is below the user memory? 2022-02-15 12:42:31 no, I am wrong. 2022-02-15 12:42:36 let me think this through. Thanks. 2022-02-15 12:42:45 I would want the jump to be unconditional (although indirect) for performance reasons 2022-02-15 12:42:56 I'd have a jump table for every value from 0 to 255 2022-02-15 12:43:24 Padded with the address of an "invalid token" handling word 2022-02-15 12:44:16 yes, I agree with the jump table. 2022-02-15 12:44:24 But, this would belong in the NEXT macro. 2022-02-15 12:44:31 s/this/this jump/ 2022-02-15 12:44:45 then the cfa will always be a token. 2022-02-15 12:45:02 for colon words, the cfa is docol's token 2022-02-15 12:45:14 for primitives, the token corresponding to the primitive. 2022-02-15 12:46:05 Sounds good 2022-02-15 12:46:28 Yep think it through, sounds interesting what you're doing all the same 2022-02-15 15:05:18 the dictionary size comes to 21KB out of the box. not 4KB as mentionead earlier.