From ed57e254a6bf11cd5b8a887154edbd9d8eb7d4ab Mon Sep 17 00:00:00 2001 From: Leonardo Santiago Date: Mon, 3 Jun 2024 12:51:15 -0300 Subject: fix typos --- blog.org | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'blog.org') diff --git a/blog.org b/blog.org index e783cb6..39fb4bb 100644 --- a/blog.org +++ b/blog.org @@ -53,7 +53,7 @@ the first thing one learns when they're studying formal languages (the field tha from it, it can be inferred that the grammar *must not* allow in the language any words that does not have defined semantics, and in the opposite direction, that the language should not specify semantics to any program that cannot be parsed by the rules of the grammar. both of these are required in order to make study of this grammar <-> language partnership fun, pleasing, and most importantly sound. -going beyond, formal language theory also gives you the knowledge that the execution of any program can be given as a set of grammar rules in an abstract machine (the most famous one being a turing machine). in the same way you can define a set of grammar rules to parse parenthesized arithmetic expressions using a stack automaton, you can define a set of grammar rules to model the execution of a C program, and, albeit super complex, can be modeled as a turing machine. this usually gets the name of C abstract machine, and is the basis for formal specification of behavior in the language. +going beyond, formal language theory also gives you the knowledge that the execution of any program can be given as a set of grammar rules in an abstract machine (the most famous one being a turing machine). in the same way you can define a set of grammar rules to parse parenthesized arithmetic expressions using a stack automaton, you can define a set of grammar rules to model the execution of a C program, that, albeit super complex, can be modeled as a turing machine. this usually gets the name of C abstract machine, and is the basis for formal specification of behavior in the language. and no, i'm not talking about modeling a C parser as a state machine (which probably is easier than most languages, if you ignore pre-processor stuff). i'm talking about modeling C *execution* as a language being parsed. drawing a parallel, when parsing parenthesized expressions, you pop things in and out of the stack to represent "balancedness", and in the same way, when "parsecuting" C code, you must write to memory, represent side effects, represent type casts and pointer conversions and everything *as part of the language*. @@ -71,7 +71,7 @@ framing it this way really exposes the fragility of C, because undefined behavio language designers and compiler developers are by no means dumb, and yes, they know much, much more than me about these problems. undefined behavior exists exactly because there must be parts of your code that your compiler *must* assume that aren't possible, so that it can correctly compile. for example, let's say that you inadvertently try to dereference a pointer that you have no knowledge about. the C compiler simply does not have enough information to know if it is ~NULL~, if it is still pointing to valid memory, or if the memory has been initialized, so it's approach is to simply emit code *as if* it was a valid, initialized, non-null pointer. -it is essential to realize that this is an *assumption*, and in almost most cases, the compiler does not care whether or not it actually was actually still valid, so writing to it may have a myriad of effects of different effects (none of which are the compiler's concerns). sometimes, your system might correctly intercept a read/write from invalid/null memory and raise you a signal, but that is not guaranteed. +it is essential to realize that this is an *assumption*, and in almost most cases, the compiler does not care whether or not it was actually still valid, so writing to it may have a myriad of effects of different effects (none of which are the compiler's concerns). sometimes, your system might correctly intercept a read/write from invalid/null memory and raise you a signal, but that is not guaranteed. and there are a huge number of tools to aid in finding undefined behavior in a code base, it's just that 1. they are not by any means standards of C development (not in spec and not in standard compilers) and @@ -118,8 +118,7 @@ the part that programmers usually get paid millions of dollars for is the step * it is not by chance that Yang et al. could only find measly 9 bugs after 6 CPU years of fuzzing in [[https://compcert.org/man/manual001.html][compcert]], a formally verified c compiler (written in coq), where as in gcc and clang, they [[https://users.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf#subsection.3.2][found and reported more than 300]]. all these 9 bugs where in the unverified front end of the compiler (the parser), and there were literally 0 middle end (compiler passes and AST translations) bugs found, which is unheard of. this is not by chance, they've spent many years writing proofs that all of their passes are correct, safe, and preserve the meaning of the original program. i really think software developers should strive for that kind of resilience, which i believe can only be achieved through properly valuing *correctness* . i don't think it is reasonable to expect that all software be built using coq and proving every little bit of it (due to business constraints) but i think that rust is a good enough language to start taking things more seriously. - - +** exploiting thunks for fun and profit * COMMENT Local Variables :ARCHIVE: # Local Variables: # eval: (org-hugo-auto-export-mode) -- cgit v1.2.3