2021-12-17 02:15:30 anyone have a nicely factored utf-8 decoder they wouldn't mind sharing? I struggled enough that I gave up and wrote it in assembly lol 2021-12-17 02:16:22 (encoding is surprisingly much nicer, I suppose because it's easy to factor out "turn the low 6 bits into a continuation byte") 2021-12-17 02:18:24 i found this while going through the forth standard: http://www.forth200x.org/reference-implementations/xchar.fs 2021-12-17 02:18:49 no idea how good it is.. i'm a newbie and i can't read it :-p 2021-12-17 02:19:47 I would generally support more comments than this has lmao; I'll see if the spec has commentary 2021-12-17 02:20:09 i can sorta read it :-p but i am seriously a newbie so i can't judge it's quality 2021-12-17 02:20:25 interesting to see that they use a loop for determining the character size instead of branching; I guess they don't error on >21-bit values (which are no longer considered valid as of some time ago) 2021-12-17 02:22:57 hm, looks like all these do way less error-checking than I do lol 2021-12-17 02:23:42 I suppose they might just rely on doing a pass ahead-of-time to check validity and allocate a target buffer 2021-12-17 02:24:45 i do like good error checking 2021-12-17 02:26:20 c is so full of silent corruption 2021-12-17 02:26:26 yeah, I guess separating checking if a thing is valid from parsing it helps simplify a lot 2021-12-17 02:26:47 yeah... 2021-12-17 02:27:09 i come from c 2021-12-17 02:27:33 I do program analysis stuff at $dayjob and half the job is banging my head against a table wishing people would stop writing new C in 2021 and immediately hooking it up to a socket without checking it under ASan, Valgrind, etc 2021-12-17 11:07:39 lisbeths: When talking about performance like that the only way to know which is faster is to try 2021-12-17 11:08:22 There are so many factors that you just need to profile, understanding principles of optimisation just helps avoid stuff that's always slow, or reason about things to try 2021-12-17 11:29:43 I'm not quite ready to try out running my program yet 2021-12-17 11:33:12 You can tell I'm brand new at JavaScript because I'm googling "javascript dollar operator" 2021-12-17 11:33:28 "I keep seeing the dollar operator being used everywhere, what does it do!?!?" 2021-12-17 11:34:10 lisbeths: You're guilty of what is known as 'premature optimisation' then 2021-12-17 11:34:56 guilty as charged 2021-12-17 11:43:58 I was thinking recently that the web is kind of like a modern terminal, in the progression of: paper teletype -> vt100 -> web 2021-12-17 11:44:05 With some stuff missing in-between 2021-12-17 11:44:42 And I was trying to think about the most relevant way of applying FORTH to this medium (I know there is already a lot of web FORTH stuff, including web-assembly) 2021-12-17 11:44:52 But I am just trying to think for myself about it 2021-12-17 12:58:40 I am trying to port felix forth to an amd64 system. http://okturing.com/src/12819/body 2021-12-17 12:58:56 I am not able to figure out the reason for using 64 in findnfa 2021-12-17 12:59:33 I understand that the highest bit (1<<7) of the len byte is immediate flag. 2021-12-17 12:59:49 but, I cannot figure out what the need for 64 here is. 2021-12-17 13:00:15 cell+ c@ 64 and # is the relevant portion 2021-12-17 13:00:24 cell+ to skip the link pointer, 2021-12-17 13:00:34 c@ to get the length of the dictionary entry 2021-12-17 13:00:50 64 and \ why? 2021-12-17 13:03:38 seems like that's 8 cells. 2021-12-17 13:04:05 oh 2021-12-17 13:05:07 `and` is bitwise. `64` is 0b01000000. 2021-12-17 13:05:45 if the immediate flag isn't set, the `if @` doesn't fire. 2021-12-17 13:05:50 if it is set, it does. 2021-12-17 13:07:02 `c@ 64 and` just fetches the flags byte, bitwise-ANDs it with 64 (returning nonzero if the 7th bit is set, 0 if it isn't). 2021-12-17 13:10:03 why did it not use " c@ 128 and " to check for the immediate flag? 2021-12-17 13:10:28 128 is 0b1000 0000 2021-12-17 13:10:50 so, might as well check for the last bit (immediate bit) directly? 2021-12-17 13:11:13 further down that code, they are using 128 to set the immediate bit 2021-12-17 13:11:43 0xb1000 0000 (with immediate flag) + len, correct? 2021-12-17 13:11:51 maybe they're referring to a different flag then? 2021-12-17 13:11:55 not something at the topmost bit. 2021-12-17 13:12:15 yes, that seems more plausible. 2021-12-17 13:12:20 whatever they're doing, they're using the topmost bit as a flag to fetch the thing at TOS under the flags byte. 2021-12-17 13:13:35 next-to-topmost, rather. 2021-12-17 13:17:18 I think they are using that flag to hide/reveal words. 2021-12-17 13:17:54 core.f:: smudge ( -- ) current @ @ cell+ dup c@ 64 or swap c! ; 2021-12-17 13:17:57 core.f:: reveal ( -- ) current @ @ cell+ dup c@ 64 invert and swap c! ; 2021-12-17 13:18:16 thaaaat'd be it. 2021-12-17 13:18:35 why would that be needed? 2021-12-17 13:18:58 http://okturing.com/src/12820/body 2021-12-17 13:19:24 Because colon words aren't findable until their definition is finished 2021-12-17 13:19:43 In case an error occurs and to allow finding previous definitions of the same name 2021-12-17 13:20:39 \ colon word definition 2021-12-17 13:20:39 : smudge ( -- ) current @ @ cell+ dup c@ 64 or swap c! ; 2021-12-17 13:20:39 : reveal ( -- ) current @ @ cell+ dup c@ 64 invert and swap c! ; 2021-12-17 13:20:41 : recurse ( -- ) current @ @ cell+ >cfa , ; immediate 2021-12-17 13:20:46 I like those terms for it, "smudge" and "reveal" 2021-12-17 13:20:53 Nicer than what I was doing anyway 2021-12-17 13:23:41 Are you sure cell+ is skipping over link pointer? 2021-12-17 13:24:10 yes, I think so. why do you ask? 2021-12-17 13:24:18 I think it might be skipping over some reserved code used to implement `does>` 2021-12-17 13:24:27 I am debugging this code. so, could be messed up. 2021-12-17 13:24:29 too. 2021-12-17 13:24:36 The reason I think this, could be wrong, is because reading `current @ @ cell+` 2021-12-17 13:24:53 In my head `current @` is the current word being compiled, at the code field 2021-12-17 13:25:05 Then `current @ @` is address to start of code 2021-12-17 13:25:54 But I don't know jonesforth, so don't believe me, check yourself (or just ignore me if you know I'm wrong already) 2021-12-17 13:26:11 this is not jonesforth. 2021-12-17 13:26:20 thanks for the tip. something is messed up. 2021-12-17 13:26:20 Sorry, felixforth 2021-12-17 13:26:29 so, it could be that too. 2021-12-17 13:26:31 thanks. 2021-12-17 13:27:01 I think in my forth `current` is called `last-def` because I thought it would be less likely to collide 2021-12-17 13:28:22 I guess I am wrong, since smudge does `current @ @ cell+ dup c@` to get the flags 2021-12-17 13:29:25 Maybe the arrangement is like this: `current @` is the definition record in a data area, and `current @ @` is the part of the definition in the text area 2021-12-17 13:30:30 In x86 you want to keep the code in a different section than variables 2021-12-17 13:30:50 joe9: ^ 2021-12-17 13:31:51 In x86 you want to keep the code in a different section than variables -- why? 2021-12-17 13:31:59 in my port, I am keeping them together. 2021-12-17 13:32:25 it gets messy with different sections. 2021-12-17 13:32:51 For performance reasons, if you are writing to a cache line with code in it I think there is a performance penalty 2021-12-17 13:33:59 Because x86 has a separate instruction and data cache in L1, but is kind enough to abstract this away. So you can write over executing code, but it causes some synchronisation that might be expensive. And I would expect this to be done on a cache-line granularity 2021-12-17 13:39:02 So you could end up with code running slower just because it happens to be defined after space for some data that's in active use... not good 2021-12-17 14:36:45 ok, thanks. 2021-12-17 14:39:04 Day 3 part 2 of AoC was significantly more code, but it could be just me not thinking simpler: https://github.com/neuro-sys/advent-of-code-2021/blob/main/3-2.fs.md 2021-12-17 14:43:57 God, this is embarassing. I'm checking other people's solutions in other languages, and they're significantly simpler. I will redo it later. 2021-12-17 15:52:13 Yes but they can't run their solution on a zx spectrum ;) 2021-12-17 15:58:59 I like your use of vectored execution 2021-12-17 16:12:45 "but they can't run their solution on a zx spectrum" Haha! Yes, absolutely worth it. 2021-12-17 16:46:20 Yeah could be that FORTH can't write it much shorter, or maybe you could do better. Just have to have another crack at it another day with a fresh mind 2021-12-17 16:46:40 Or try refactoring further somehow 2021-12-17 16:48:52 Day 2 was quite small or about the same as in other language examples. Somehow it depends on the problem I suppose. But this one I probably went a bit roundabout. I'll come back to it later again. 2021-12-17 16:56:26 Yeah it does really depend 2021-12-17 16:56:37 Day 2 was short because FORTH already has the words you need 2021-12-17 16:57:15 Whereas other languages probably have some more of the functions/constructs you need for other things that in FORTH you need to write or work around 2021-12-17 16:58:01 The FORTH environment is special though because it can be so much smaller than the alternatives in total, code + environment 2021-12-17 16:58:41 And because it's also so flexible, you can choose to write the problem in a number of different creative ways that wouldn't be available in most languages 2021-12-17 17:08:27 I was planning to create generic words and stash them into a library, and exclude it from the problem solution. That is for a fairer comparison with other languages in terms of expressiveness. Although arguably since the way one must think Forth is different, utilities from other languages can be less relevant. 2021-12-17 17:09:53 One thing I am now enjoying and focusing on is to build a vocabulary for an isolated sub problem, make use of re-defining words to hide/extend meaning, and also build the vocabulary in a way to reduce or eliminate stack juggling. 2021-12-17 17:10:30 Which is just more or less the Forth way. 2021-12-17 17:11:50 All this takes some additional effort and thinking when writing the program, but is certainly more fun and satisfying. 2021-12-17 17:13:13 Yeah I agree, was what I found doing the problems last year 2021-12-17 17:35:15 https://www.youtube.com/watch?v=-e2sUlnSlTc 2021-12-17 17:35:19 Game stuff in Forth 2021-12-17 17:45:48 Nice 2021-12-17 17:47:53 Just realised you can implement # with `um/mod` 2021-12-17 17:51:34 And `um*` 2021-12-17 17:57:23 I used # definition from eForth and Zen which uses um/mod. 2021-12-17 17:57:49 At some point I'll build more standards words of my own, but wanted quick results. 2021-12-17 17:59:09 I'm not going to judge 2021-12-17 17:59:21 I have been stealing my assembly from clang and GCC lol 2021-12-17 18:02:37 Also trying to minimize the amount of assembly code, so that I can port it to Z80 later on. 2021-12-17 18:03:27 But then I have this bootstrap/meta compiler thingy in a mixture of assembly code and hand assembled Forth to build the rest of the system from source file. 2021-12-17 18:04:14 It's time wasted to compile the Forth on every cold start, but whatever. 2021-12-17 18:06:18 I don't think that's significant waste of time, it should be a small fixed amount of time each cold start right? 2021-12-17 18:07:25 Yeah it should be quite fast. And on Z80 machine after the first time boot, I can save the region of memory (dictionary) to disk, and use it from there next time. 2021-12-17 18:08:35 The only minor issue right now is reading the file. I added `open-file` for it on Linux, but I should better use BLOCK words instead. 2021-12-17 18:08:59 And store the file in blocks (in Windows/Linux can still emulate it via a block file). 2021-12-17 18:09:23 Even though the Z80 machine has a file system (CP/M 1.2) I want to disable the whole ROM. 2021-12-17 18:13:04 I guess on Z80 it's more significant 2021-12-17 18:13:16 But depends on application, on cold start I don't mind personally 2021-12-17 18:13:45 I'm used to ZX Spectrum where I load from cassette tape, slow as anything 2021-12-17 18:14:26 I used the cassette tape as a child, but now I have a gotek disk drive, and use the emulator every now and then. 2021-12-17 18:14:40 It's Amstrad CPC though, quite similar to later ZX's. 2021-12-17 21:48:12 : compare ( a1 n1 a2 n2 -- n ) rot 2dup >r >r min 0 do over i + c@ over i + c@ - signum ?dup if 2nip unloop unloop exit then loop 2drop r> r> - signum ; 2021-12-17 21:48:20 does this definition make sense to you? 2021-12-17 21:48:41 if n1 != n2, it is straight-off not equal, correct? 2021-12-17 21:48:57 why bother comparing the minimum of n?