summaryrefslogtreecommitdiffhomepage
path: root/blog.org
blob: b06b0d3e035e602e451df93140f46358c17a4a15 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
#+title: lowest case
#+author: santi
#+description: a lower case only blog, purely for aesthetics
#+hugo_base_dir: .
#+hugo_auto_set_lastmod: t

* home
:PROPERTIES:
:EXPORT_HUGO_SECTION: /
:EXPORT_FILE_NAME: _index
:END:

a lower case only blog, purely for aesthetics.

here I talk about discrete mathematics (mostly computer science), including compilers, language theory, type theory, computability theory, software correctness, formal verification, and any other (entirely theoretical, almost non-applicable) nerd topic you can think of.

* about
:PROPERTIES:
:EXPORT_HUGO_SECTION: /
:EXPORT_FILE_NAME: about
:EXPORT_HUGO_CUSTOM_FRONT_MATTER: :menu main
:END:

i'm leonardo santiago, a software engineer based in brazil. my interests are in compiler design, programming tools (emacs), functional programming, proof languages and a bit of devops using nix/nixos. i'm most comfortable in nix and rust, but i know a fair share of other languages.

currently, i work as a software engineer @ [[https://mixrank.com][Mixrank]], and you can find my curriculum [[./static/cv.pdf][here]].

if you feel like smugly responding to any of my posts (or just want to kindly send me a message), these are my socials:
- [[https://www.linkedin.com/in/leonardo-ribeiro-santiago/][linkedin]]
- [[https://github.com/o-santi][github]]
this blog was built using emacs' excellent org-mode and [[https://github.com/gohugoio/hugo][hugo]] (with the [[https://github.com/janraasch/hugo-bearblog][bearblog theme]]), and it's source code is openly available [[https://git.santi.net.br/blog][here]].

* blog
:PROPERTIES:
:EXPORT_HUGO_SECTION: /blog
:END:
** rust is not about memory safety :rust:correctness:
SCHEDULED: <2024-06-01 dom>
:PROPERTIES:
:EXPORT_FILE_NAME: rust-is-not-about-memory-safety
:END:

most of rust discussions nowadays revolve around memory safety, and how it is safer than C / C++ / zig / go / whatever language is being trashed on twitter that day. while yes, that is true - not that the bar for most of these is particularly high - what I think is the main point of the language is always glossed over: correctness. when one tries to criticize any of the aforementioned languages, one is answered with the following argument:
#+begin_quote 
your program segfaults? skill issue
#+end_quote
but i'd like to make the counter-argument that, no, this has nothing to do with skill issue.
*** formal language theory
the first thing one learns when they're studying formal languages (the field that studies grammars, state automata, etc) is that the rules that describe a certain grammar must match *exactly* the ones that you want to include in your language. this means that there's a bidirectional relationship between the grammar you describe (which directly define the automata that parses that language) and the words[fn:: formally they are defined as a sequence of tokens in certain alphabet that the automata closures over. normally we think of "words" as the whole program that we're parsing.] that it parses (which are related to the semantics of the program, how it executes).

from it, it can be inferred that the grammar *must not* allow in the language any words that does not have defined semantics, and in the opposite direction, that the language should not specify semantics to any program that cannot be parsed by the rules of the grammar. both of these are required in order to make study of this grammar <-> language partnership fun, pleasing, and most importantly sound.

going beyond, formal language theory also gives you the knowledge that the execution of any program can be given as a set of grammar rules in an abstract machine (the most famous one being a turing machine). in the same way you can define a set of grammar rules to parse parenthesized arithmetic expressions using a stack automaton, you can define a set of grammar rules to model the execution of a C program, that, albeit super complex, can be modeled as a turing machine. this usually gets the name of C abstract machine, and is the basis for formal specification of behavior in the language.

and no, i'm not talking about modeling a C parser as a state machine (which probably is easier than most languages, if you ignore pre-processor stuff). i'm talking about modeling C *execution* as a language being parsed. drawing a parallel, when parsing parenthesized expressions, you pop things in and out of the stack to represent "balancedness", and in the same way, when "parsecuting" C code, you must write to memory, represent side effects, represent type casts and pointer conversions and everything *as part of the language*.

in the same way that you'd hope that a parenthesized arithmetic expression parser would recognize that ~(1 + 2) + 3)~ is an invalid expression, you'd expect that the C compiler would correctly verify that the following series of tokens is not a /well behaved/ program:
#+begin_src c
int foo(int * myptr) {
  ,*myptr = 5;
}
foo(NULL);
#+end_src

i say /well behaved/ because i can't say /invalid/. it is in fact defined by the spec that when you dereference a ~NULL~ pointer the result is [[http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html][undefined behavior]]. and this is C's achilles heel: instead of outright banning programs like the one above (which i'd argue is the correct approach), it will happily compile and give you garbage output. 

framing it this way really exposes the fragility of C, because undefined behavior has to always be taken into account. and, by the nature of it, there is no way to represent it other than as a black box, such that, if your code ever encounters it, then literally all you can say is that *the whole result of the program* is undefined - that is, it can be anything. you cannot show properties, nor say what will happen once your program enters this state, as the C specification literally does not define it. it may come to a halt, write garbage to the screen or completely delete half of the files of your program, and there's no way to predict what will come out of it, by definition. in the lucky case, it will segfault while executing and you'll be extremely pissed off, but that is not at all guaranteed. this is akin to having a float expression with some deep term being ~NaN~, in that it eventually must evaluate to ~NaN~ and you can't draw any conclusions about the result of the expression (other that it isn't a number).

language designers and compiler developers are by no means dumb, and yes, they know much, much more than me about these problems. undefined behavior exists exactly because there must be parts of your code that your compiler *must* assume that aren't possible, so that it can correctly compile. for example, let's say that you inadvertently try to dereference a pointer that you have no knowledge about. the C compiler simply does not have enough information to know if it is ~NULL~, if it is still pointing to valid memory, or if the memory has been initialized, so it's approach is to simply emit code *as if* it was a valid, initialized, non-null pointer.

it is essential to realize that this is an *assumption*, and in almost most cases, the compiler does not care whether or not it was actually still valid, so writing to it may have a myriad of effects of different effects (none of which are the compiler's concerns). sometimes, your system might correctly intercept a read/write from invalid/null memory and raise you a signal, but that is not guaranteed.

and there are a huge number of tools to aid in finding undefined behavior in a code base, it's just that
1. they are not by any means standards of C development (not in spec and not in standard compilers) and
2. they are fallible and will always let some undefined programs slip by.
*** runtime exceptions are not the solution

most languages try to handle this by introducing some sort of runtime exception system, which i think is a terrible idea. while this is much, much safer than what C does, it still makes reasoning about the code extremely hard by completely obliterating locality of reason. your indexing operation may still be out of bounds, and while this now has defined outcomes, it is one of the possible outcomes of your program (whether you like it or not), and you must handle it. and, of course, no one handles all of them, for it is humanely impossible to do it in most languages because:

1. it is hard to know when an operation can raise an exception, and under which conditions.
2. even if documented, it is never enforced that all exceptions must be gracefully handled, so some random function in a dependency of a dependency may raise an error from an unexpected corner case and you must deal with it.
   
this is a symptom of virtually all modern languages, and none of them have any good answers to it. java mandates that you report in your function type signature the errors that it may raise (which is a rare java W), but it does let you write code with [[https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html][unchecked exceptions]] that won't signal a compile error if ignored, which eventually will crash your minecraft game. python, ruby, php and most other languages (even [[https://www.tweag.io/blog/2020-04-16-exceptions-in-haskell/][haskell]] made this mistake) do not even attempt to signal when a function might raise an exception. javascript somehow manages to be even worse, by having horrible implicit-by-default type casts, having undefined AND null, using strings as UTF-16, using floats as standard numbers, implicitly inserting semicolons, and honestly the list could go on forever.

the root of all these problems is, quite literally, the same: that your compiler (or interpreter) lets into your program execution states that you didn't anticipate for. one of the best of examples of the opposite, surprisingly enough, is regex matchers. while i concede that their syntax can be extremely confusing, they have the best property of software: if they compile, they work exactly as intended - which i henceforth will call *correctness*. this is because regular languages' properties and their state automata have been studied to extreme depths, and it is entirely possible to write a regex implementation that is *correct* (in the same way as above), going as far as providing formal verifications of that [fn::the excellent software foundations book [[https://softwarefoundations.cis.upenn.edu/lf-current/IndProp.html][explains thoroughly]] how one might formally write one possible regex matcher, and prove that the implementation is correct].

from this definition of *correctness* we can also derive a semantically useful definition for the word bug: an unexpected outcome for the program, that shouldn't be allowed in the language. of course java behavior might be defined for all inputs (for the most part, i'm sure there are might be problems here and there) but just because one possible outcome of program is ~NullPointerException~ doesn't mean that it is *expected*, making it, by my definition, a bug.

*** make invalid states unrepresentable
what the regex example makes clear is that the key to correctness is to make your language tight enough to have defined and *desired* output for all possible inputs. this is not to say that it won't raise errors; much to the contrary, it must have parser errors saying that some strings aren't valid regexes. instead, it means that all errors are *predictable*, and *well defined* (in some sense).

you, as the programmer, are then in charge of ensuring that the resulting regex program actually solves the problem you have at hand. want to match 3 words of 2 digit numbers followed by a capital letter? great, they can do that. want to match balanced parenthesized expressions? sadly, regex is incapable of ever solving that, because that language [[https://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages#Use_of_the_lemma_to_prove_non-regularity][is not regular]], so no matter how hard you try it will never solve it.

in a way, there's a beauty in how C sidesteps this: it defines one of the possible program outputs as being /undefined/, and it is on the programmers behalf to tightly ensure that the program has 0 paths to /undefined behavior/. in fact, it is probably one of the most [[https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf][well specified languages]], which is what makes it suitable for writing formally verifiable programs [fn::through the use of external tools like coq's [[https://vst.cs.princeton.edu/veric/][verifiable C series]]].

the main strength of rust, and where it differs from all mainstream languages, is that it has a very clear focus on program *correctness*. the raison d'ĂȘtre of the borrow checker is statically assuring that all references are pointing to valid memory, such that it is literally impossible for any borrow be null or to point to some freed memory (modulus [[https://github.com/Speykious/cve-rs][implementation errors of course]]). this completely rules out this possibility of bugs from the language we're trying to "parse". remember the C excerpt from above, where i said that the compiler should rule out the program as invalid? well, it is literally impossible to write that sort of program in rust, because one cannot construct a ~NULL~ reference.

not only that, but rust languages features makes it so, so much easier to write *correct* software: sum types (tagged unions), ~Option~ instead of ~NULL~ (which in and of itself is amazing), ~Result~ for errors (making obligatory to handle all possible branches your program can take), a strong and powerful static type system, and ditching inheritance and classes in favor of traits. 

note that i never ever talked about memory safety. even in a world where C wasn't in fact full of memory vulnerabilities, rust would still be miles better, because it statically assures you that the *meaning of your program is tightly reproduced by the code you've written*. it is, by design, more correct than C, and the only way a problem can possibly happen is by side stepping rust static checks by using ~unsafe~.

it is just a happy coincidence that this leads to a language that isn't garbage collected, that is relatively lean, fast, easy to embed, has good ergonomics and that enables you to write asynchronous and multi-threaded programs. these properties are awesome to boost rust to a very well regarded status between developers, but aren't at all related to languages that enable you to build reliable, correct software. out of curiosity, i'd happily defend the case that [[https://coq.inria.fr/][coq]] is also one of these languages, and it absolutely does not hold any of these properties.

*** software engineering as a craft

finally, i think this relates to how i personally model the software development job as a whole. it starts by having some problem you think you can use computers to solve, and then follow 3 clearly stratified steps:

1. define how one might solve the problem. this usually means splitting it into several possible cases and treating each and every one of them separately.
2. define an abstract machine that executes the very same steps, *and making sure that it tightly adheres to your plan*
3. implement the very same machine in a language, *making sure that your implementation adheres tightly to your abstract machine*

the part that programmers usually get paid millions of dollars for is the step *1 -> 2*, which is by far the hardest and that requires the most creativity and craftsmanship. what usually makes people say that [[https://www.youtube.com/watch?v=FeAMiBKi_EM][software is in decline]] is that we don't learn the value of executing step *3* properly. this leads to sloppy, half baked software that crashes when X Y Z happens, and we've just come to terms with software being so brittle.

it is not by chance that Yang et al. could only find measly 9 bugs after 6 CPU years of fuzzing in [[https://compcert.org/man/manual001.html][compcert]], a formally verified c compiler (written in coq), where as in gcc and clang, they [[https://users.cs.utah.edu/~regehr/papers/pldi11-preprint.pdf#subsection.3.2][found and reported more than 300]]. all these 9 bugs where in the unverified front end of the compiler (the parser), and there were literally 0 middle end (compiler passes and AST translations) bugs found, which is unheard of. this is not by chance, they've spent many years writing proofs that all of their passes are correct, safe, and preserve the meaning of the original program.

i really think software developers should strive for that kind of resilience, which i believe can only be achieved through properly valuing *correctness* . i don't think it is reasonable to expect that all software be built using coq and proving every little bit of it (due to business constraints) but i think that rust is a good enough language to start taking things more seriously.

** self host the world  :nix:nixos:
:PROPERTIES:
:EXPORT_FILE_NAME: self-host-the-world
:END:

I've known since forever that google is not to be trusted, and that [[https://killedbygoogle.com/][they whimsically create and destroy]] products like no one else. I've also been a not so proud owner of a google mail account for the past 15 years, that I rely on for almost all my services.

honestly, those facts didn't bother me that much, because, like everyone else, I'm always under the impression that /it isn't going to happen to me, right/? that was, until june of last year, when [[https://www.theverge.com/2023/6/16/23763340/google-domains-sunset-sell-squarespace][google sunset'd google domains]] - which I was under the impression to be the king of domain registrars - plus the rise of AI and LLM's seriously made me question the hegemony of google search in the current state of affairs, and how well I was positioned to a possible google meltdown as a whole.

of course, I don't think that gmail is going anywhere soon, but it tipped me off into searching into the world of self hosting. I mean, how hard could it be to host my own emails right? I wanted to find it out using a home device and nixos, in order to get declarative and reproducible systems for free.

*** the raspberry pi

i managed to get my hands on a raspberry pi model 4B in december 2023, but at the time I didn't have time to try and get something running on it. it was just around april or may of 2024 that I actually started to try to get it working. at first, I wanted to go with a completely headless nixos setup, by writing a proto-configuration akin to my [[https://git.santi.net.br/nixos][current ones]], exporting it as an sd image and flashing it to the pi, while baking in my ssh key. thus, no manual installation process would be necessary, and just inserting it to the pi and turning it own would be all.

sadly it didn't work, as in it would turn on but never appear as a network device, and given that detecting it through the network was the only way to interact with it [that I knew of], it left me in a dead end. at the time, I believed that it was because I was doing something wrong in the nixos configuration - maybe some network config mishap? maybe I forgot to turn something on? - but given that I couldn't see it's video output, I just gave up and decided to buy an HDMI adapter and do it the normal way.

of course, I only bought the hdmi adapter about 3 months later, and only then did I try to install nixos manually. I went with the normal approach: downloading the bare arm image, fetching [[https://git.santi.net.br/nixos][my nixos repo]] locally and rebuilding to install everything. only by having visual confirmation did I understand that the problem wasn't with my original nixos image, but rather with the fact that it was shutting down after the first boot phase!

it made me realize that I had never given proper thought into buying a real power supply, as I thought that connecting it through a usb-c cable I had lying around to my computer's usb port was good enough. I was able to gracefully connect the dots and realize that most likely it was rebooting because it didn't have enough power to boot, so I switched it to a 5 amps 3 volts cellphone charger I had to spare and it finally booted correctly!

*** networking issues

after that, I figured that I'd like to be able to not only turn it on, but also connect to my raspberry pi from outside my house's network. sadly, my router's public ip changes pretty much every day, so my only real option was to use ddns + a domain name.

I bought =santi.net.br= cheaply and quickly transfered it to cloudflare, as I wanted to get some ddns action going on. as I'm using the ISP provided all-in-one [shitty] router, it's not shocking to say that trying to open the relevant ports (22, 80 and 443) in the default configuration interface wouldn't have any external effect whatsoever.

I found out that there was a way to get the "admin version" of the router's setup page, and through that I was able to get port 22 open to the public internet (even though changing it the normal way wouldn't do anything), but neither 80 nor 443 were pingable still. I even questioned if my network was inside a CGNAT, as that is very common in brazil, but my ip wasn't one of the common formats and I could access port 22 of my router's public ip just fine. I don't know how the ISP could be blocking it other than the router's admin page port forwarding setup being a no-op for some specific ports.

I fought with this problem for a week but eventually decided to give up and just setup cloudflare tunnels for 80 and 443 ports, and route all the subdomains through that. cloudflare turnnels work by serving as an outbound only connection, by using a ~cloudflared~ instance running on the pi to route the requests through. after using some stateful commands to generate credentials, the relevant piece of code to set this up in nixos is very simple:
#+begin_src nix
{
  # ...
  services.cloudflared = {
    enable = true;
    tunnels.iori = {
      default = "http_status:404";
      credentialsFile = "/var/lib/cloudflared/iori.json";
      ingress = {
        "santi.net.br" = "http://localhost:80";
        "git.santi.net.br" = "http://localhost:80";
      };
    };
  };
}
#+end_src

though, I couldn't really use these tunnels to connect through ssh, and honestly I don't know why. I believe cloudflare expects you to use their [[https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/download-warp/][warp]] tool to authenticate through ssh connections (besides ssh key auth?), but I thought it was too much trouble to configure yet another tool (in all my computers), so I chose to use the router's public ip + ddns with port forwarding instead. I tested pretty much all ddns nixos services exposed in nixpkgs, and the only one that worked reliably was =inadyn=:
#+begin_src nix
{
  # ...
  services.inadyn = {
    enable = true;
    user = "leonardo";
    group = "users";
    settings.provider."cloudflare.com" = {
      hostname = "santi.net.br";
      username = "santi.net.br";
      proxied = false;
      include = config.age.secrets.cloudflare.path;
    };
  };
}
#+end_src

**** remote rebuilds

given that my computers' architecture (=x86_64-linux=) and the raspberry pi's one (=aarch64-linux=) are not the same, I needed a way to either trigger rebuilds remotely, or to cross compile locally and =nix-copy-closure= to the pi. cross compilation can be enabled by setting =boot.binfmt.emulatedSystems=, but I don't really like that solution as it requires me to enable that flag on every computer I'd like to deploy.

instead, I went with the most barebones approach possible, [[https://www.haskellforall.com/2023/01/announcing-nixos-rebuild-new-deployment.html][nixos-rebuild]], by using the following command:
#+begin_src sh
nixos-rebuild switch --fast --use-remote-sudo \
              --flake .#<remote> \
              --build-host <remote-host-url> \
              --target-host <remote-host-url>
#+end_src

this works because =--fast= avoids rebuilding =nixos-rebuild=, and passing =--build-host= forces it to build directly on the pi, avoiding the cross compilation issue. I still intend to use a proper build tool (most inclined to using [[https://github.com/serokell/deploy-rs][deploy-rs]]) but that is for the future.

*** self hosting
after setting up a way to connect to the pi from the public network, I could finally get some self hosting started. 

initially, all I did was a simple setup where I added my blog's repository as a flake input that would serve the result of calling =hugo build= on it through nginx. it did look something like the following:
#+begin_src nix
let 
  blog = pkgs.stdenv.mkDerivation {
    name="hugo-blog";
    src = inputs.blog;
    buildInputs = [ pkgs.hugo ];
    buildPhase = ''
      mkdir $out 
      hugo --destination $out
    '';
  };
in {
  # ....
  networking.firewall.allowedTCPPorts = [ 80 443 ];
  services.nginx = {
    enable = true;
    virtualHosts."santi.net.br" = {
      addSSL = true;
      enableACME = true;
      root = blog;
    };
  };
  security.acme = {
    acceptTerms = true;
    certs."santi.net.br".email = "[email protected]";
  };
}
#+end_src
it sure worked fine for the first couple of weeks, and it auto generated ssl certificates for me, which is convenient, but it had a glaring flaw: in order to change something, I'd need to push a new commit to the blog repo, =nix flake update blog= and then =nixos-rebuild switch= (remotely) on the pi, every single time. the whole process was unnecessarily complicated, so I sought out to setup a simpler one.

I vaguely knew that git repos had a notion of hooks, that can be run pre and post any command or action you take, but never had I implemented or tinkered with them. still, it occurred to me that if I could setup a bare git "upstream" in my pi, and set a hook to run after every commit it receives, I could run =hugo build= on the source files and generate a new blog in a known path, that I could then hardwire =nginx= to constantly watch. this way, it would be very much like the old setup that I had with github pages, except local and not depending on microsoft's ai products.

funnilly enough, mere minutes after searching for this idea on the internet, I found a [[https://andreas.rammhold.de/posts/git-receive-blog-hook-deployment/][blog post]] by Andreas that did exactly that. while searching, I also figured that it would be pretty cool to have a [[https://git.zx2c4.com/cgit/][cgit instance]] exposed that could track my changes in this "git repos" directory, so that I could really stop relying on github while keeping the code fully open source.

the main idea is to setup a git repository declaratively [of course] pre-baked with a =post-receive= hook file that calls =hugo build= with the directory we'd like =nginx= watch. Andreas' post shows exactly how to idempotently create (or no-op after second run) the git repo using a systemd one shot service, and the only problem remaining is, as always, managing the permissions around these directories:
1. my user, =leonardo=, has it's own files and it's what I use to run =nixos-rebuild= from.
2. the =git= user, will own the permissions to the git repositories directory
3. the =cgit= user, will be responsible to run the cgit server.
4. the =nginx= user, is responsible to run the nginx instance and respond to requests.

thus, I devised the following structure:
- =/server/blog= is where the hugo-generated files are going to be. the =nginx= user must be able to read it, and =git= must be able to write to it.
- =/server/git-repos= is where the git repositories will be. the =cgit= user must be able to read all of it's directories, and =git= user must be able to read and write to it.

it seems to suffice to set =git= as the owner of both of these directories, and give all users permission to read and execute files. to implement them, I used =systemd.tmpfile.rules=. I know, there's =tmp= in their name, but rest assured, you can use them to create permanent files setting the correct permissions if you don't give them an age parameter:
#+begin_src nix
users.users.git = {
  description = "git user";
  isNormalUser = true;
  home = git-repo-path;
};
systemd.tmpfiles.rules = [
  "d ${blog-public-path} 0755 git users -"
  "d ${git-repo-path} 0755 git users -"
];
#+end_src
after figuring this stuff out, the rest is pretty much text book nixos. we set up cgit with ~scanPath = git-repo-path~, with a hook using =pandoc= to generate the about pages using org files correctly:
#+begin_src nix
services.cgit.santi = let
  org2html = pkgs.writeShellScript "org2md" ''
    ${pkgs.pandoc}/bin/pandoc \
      --from org \
      --to html5 \
      --sandbox=true \
      --html-q-tags \
      --ascii \
      --standalone \
      --wrap=auto \
      --embed-resources \
      -M document-css=false
  '';
in {
  enable = true;
  scanPath = git-repo-path;
  nginx.virtualHost = "git.santi.net.br";
  settings = {
    readme = ":README.org";
    root-title = "index";
    root-desc = "public repositories for santi.net.br";
    about-filter = toString org2html;
    source-filter = "${pkgs.cgit}/lib/cgit/filters/syntax-highlighting.py";
    enable-git-config = true;
    enable-html-cache = false;
    enable-blame = true;
    enable-log-linecount = true;
    enable-index-links = true;
    enable-index-owner = false;
    enable-commit-graph = true;
    remove-suffix = true;
  };
};

#+end_src
while the following snippet sets up a systemd one shot service to initialize the path to the blog's public files (ran with the =git= user):
#+begin_src nix
systemd.services."blog-prepare-git-repo" = {
  wantedBy = [ "multi-user.target" ];
  path = [
    pkgs.git
  ];
  script = ''
    set -ex
    cd ${git-repo-path}
    chmod +rX ${blog-public-path}
    test -e blog || git init --bare blog
    ln -nsf ${post-receive} blog/hooks/post-receive
  '';
  serviceConfig = {
    Kind = "one-shot";
    User = "git";
  };
};
#+end_src
where the =post-receive= hook is very similar to the one Andreas used in his post:
#+begin_src nix
post-receive = pkgs.writeShellScript "post-receive" ''
  export PATH=${env}/bin
  set -ex
  
  GIT_DIR=$(${pkgs.git}/bin/git rev-parse --git-dir 2>/dev/null)
  if [ -z "$GIT_DIR" ]; then
          echo >&2 "fatal: post-receive: GIT_DIR not set"
          exit 1
  fi
  
  TMPDIR=$(mktemp -d)
  function cleanup() {
    rm -rf "$TMPDIR"
  }
  trap cleanup EXIT
  
  ${pkgs.git}/bin/git clone "$GIT_DIR" "$TMPDIR"
  unset GIT_DIR
  cd "$TMPDIR"
  ${pkgs.hugo}/bin/hugo --destination ${blog-public-path}
'';
#+end_src

after running it the first time, I went ahead and statefully copied the git repo from github to the pi, in order to not lose the history, but other than that it should be fine.

*** next steps

sadly, I haven't got the time to actually setup email hosting. currently, I read my email through [[https://djcbsoftware.nl/code/mu/mu4e.html][mu4e]], using mu as a local maildir indexer and searcher. what I'd need is to host a server to receive and send email. receiving doesn't seem to have many difficulties, as it's just a normal listener, but sending apparently is a huge problem, as there seem to be a lot of measures need to be taken in order for your email to actually be delivered and not be flagged as spam.

besides having to setup a reverse DNS lookups, you also need to mess with SPF, DMARC and DKIM, which are scary looking acronyms for boring business authentication stuff. moreover, your ip might be blacklisted, or have low reputation (what does that even mean?), and to top it off it seems like my router's port 25 is blocked forever so I'd also need to configure cloudflare tunnels for that, most likely. I'm currently avoiding all of it, but I intend to look into them in the near future.

I've been meaning to experiment with [[https://gitlab.com/simple-nixos-mailserver/nixos-mailserver][nixos simple mailserver]]'s setup for a while now, but it is an "all in one" solution, and I think it might be trying to do much more than what I'm currently trying to achieve. if anyone has tinkered with it, I'd love to know more about it.

* COMMENT Local Variables                                           :ARCHIVE:
# Local Variables:
# eval: (org-hugo-auto-export-mode)
# End: