Reading List
The most recent articles from a list of feeds I subscribe to.
Hosting my static sites with nginx
Hello! Recently I’ve been thinking about putting my static sites on servers that I run myself instead of using managed services like Netlify or GitHub Pages.
Originally I thought that running my own servers would require a lot of maintenance and be a huge pain, but I was chatting with Wesley about what kind of maintainance their servers require, and they convinced me that it might not be that bad.
So I decided to try out moving all my static sites to a $5/month server to see what it was like.
Everything in here is pretty standard but I wanted to write down what I did anyway because there are a surprising number of decisions and I like to see what choices other people make.
the constraint: only static sites
To keep things simple, I decided that this server would only run nginx and
only serve static sites. I have about 10 static sites right now, mostly projects for wizard zines.
I decided to use a $5/month DigitalOcean droplet, which should very easily be able to handle my existing traffic (about 3 requests per second and 100GB of bandwidth per month). Right now it’s using about 1% of its CPU. I picked DigitalOcean because it was what I’ve used before.
Also all the sites were already behind a CDN so they’re still behind the same CDN.
step 1: get a clean Git repo for each build
This was the most interesting problem so let’s talk about it first!
Building the static sites might seem pretty easy – each one of them already has a working build script.
But I have pretty bad hygiene around files on my laptop – often I have a bunch of uncommitted files that I don’t want to go onto the live site. So I wanted to start every build with a clean Git repo. I also wanted this to be fast – I’m impatient so I wanted to be able to build and deploy most of my sites in less than 10 seconds.
I handled this by hacking together a tiny build system called tinybuild. It’s basically a 4-line bash script, but with extra some command line arguments and error checking. Here are the 4 lines of bash:
docker build - -t tinybuild < Dockerfile
CONTAINER_ID=$(docker run -v "$PWD":/src -v "./deploy:/artifact" -d -t tinybuild /bin/bash)
docker exec $CONTAINER_ID bash -c "git clone /src /build && cd /build && bash /src/scripts/build.sh"
docker exec $CONTAINER_ID bash -c "mv /build/public/* /artifact"
These 4 lines:
- Build a Dockerfile with all the dependencies for that build
- Clone my repo into
/buildin the container, so that I always start with a clean Git repo - Run the build script (
/src/scripts/build.sh) - Copy the build artifacts into
./deployin the local directory
Then once I have ./deploy, I can rsync the result onto the server
It’s fast because:
- the
docker build -means I don’t send any state from the repository to the Docker daemon. This matters because one of my repos is 1GB (it has a lot of PDFs in it) and sending all that to the Docker daemon takes forever - the
git cloneis from the local filesystem and I have a SSD so it’s fast even for a 1GB repo - most of the build scripts just run
hugoorcatso they’re fast. Thenpmbuild scripts take maybe 30 seconds.
apparently local git clones make hard links
A tiny interesting fact: I tried to do git clone --depth 1 to speed up my git
clone, but git gave me this warning:
warning: --depth is ignored in local clones; use file:// instead.
I think what’s going on here is that git makes hard links of all the objects to
make a local clone (which is a lot faster than copying). So I guess with the
hard links approach --depth 1 doesn’t make sense for some reason? And
file:// forces git to copy all objects instead, which is actually slower.
bonus: now my builds are faster than they used to be!
One nice thing about this is that my build/deploy time is less than it was on
Netlify. For jvns.ca it’s about 7 seconds to build and deploy the site
instead of about a minute previously.
running the builds on my laptop seems nice
I’m the only person who develops all of my sites, so doing all the builds in a Docker container on my computer seems to make sense. My computer is pretty fast and all the files are already right there! No giant downloads! And doing it in a Docker container keeps the build isolated.
example build scripts
Here are the build scripts for this blog (jvns.ca).
Dockerfile
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y git
RUN apt-get install -y wget python2
RUN wget https://github.com/gohugoio/hugo/releases/download/v0.40.1/hugo_0.40.1_Linux-64bit.tar.gz
RUN wget https://github.com/sass/dart-sass/releases/download/1.49.0/dart-sass-1.49.0-linux-x64.tar.gz
RUN tar -xf dart-sass-1.49.0-linux-x64.tar.gz
RUN tar -xf hugo_0.40.1_Linux-64bit.tar.gz
RUN mv hugo /usr/bin/hugo
RUN mv dart-sass/sass /usr/bin/sass
build-docker.sh:
set -eu
scripts/parse_titles.py
sass sass/:static/stylesheets/
hugo
deploy.sh:
set -eu
tinybuild -s scripts/build-docker.sh \
-l "$PWD/deploy" \
-c /build/public
rsync-showdiff ./deploy/ root@staticsites:/var/www/jvns.ca
rm -rf ./deploy
step 2: get rsync to just show me which files it updated
When I started using rsync to sync the files, it would list every single file instead of just files that had changed. I think this was because I was generating new files for every build, so the timestamps were always newer than the files on the server.
I did a bunch of Googling and figured out this incantation to get rsync to just show me files that were updated;
rsync -avc --out-format='%n' "$@" | grep --line-buffered -v '/$'
I put that in a script called rsync-showdiff so I could reuse it. There might
be a better way, but this seems to work.
step 3: configuration management
All I needed to do to set up the server was:
- install nginx
- create directories in /var/www for each site, like
/var/www/jvns.ca - create an nginx configuration for each site, like
/etc/nginx/sites-enabled/jvns.ca.conf - deploy the files (with my deploy script above)
I wanted to use some kind of configuration management to do this because that’s how I’m used to managing servers. I’ve used Puppet a lot in the past at work, but I don’t really like using Puppet. So I decided to use Ansible even though I’d never used it before because it seemed simpler than using Puppet. Here’s my current Ansible configuration, minus some of the templates it depends on.
I didn’t use any Ansible plugins because I wanted to maximize the probability that I would actually be able to run this thing in 3 years.
The most complicated thing in there is probably the reload nginx handler,
which makes sure that the configuration is still valid after I make an nginx
configuration update.
step 4: replace a lambda function
I was using one Netlify lambda function to calculate purchasing power parity (“PPP”) for countries that have a weaker currency relative to the US on https://wizardzines.com. Basically it gets your country using IP geolocation and then returns a discount code if you’re in a country that has a discount code. (like 70% off for India, for example). So I needed to replace it.
I handled this by rewriting the (very small) program in Go, copying the
static binary to the server, and adding a proxy_pass for that site.
The program just looks up the country code from the geolocation HTTP header in a hashmap, so it doesn’t seem like it should cause maintenance problems.
a very simple nginx config
I used the same nginx config file for templates for almost all my sites:
server {
listen 80;
listen [::]:80;
root /var/www/{{item.dir}};
index index.html index.htm;
server_name {{item.server}};
location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
}
}
The {{item.dir}} is an Ansible thing.
I also added support for custom 404 pages (error_page /404.html) in the main nginx.conf.
I’ll probably add TLS support with certbot later. My CDN handles TLS to the client, I just need to make the connection between the CDN and the origin server use TLS
Also I don’t know if there are problems with using such a simple nginx config. Maybe I’ll learn about them!
bonus: I can find 404s more easily
Another nice bonus of this setup is that it’s easier to see what’s happening with my site – I can just look at the nginx logs!
I ran grep 404 /var/log/nginx/access.log to figure out if I’d broken
anything during the migration, and I actually ended up finding a lot of
links that had been broken for many years, but that I’d just never noticed.
Netlify’s analytics has a “Top resources not found” that shows you the most common 404s, but I don’t think there’s any way to see all 404s.
a small factor: costs
Part of my motivation for this switch was – I was getting close to the Netlify free tier’s bandwidth limit (100GB/month), and Netlify charges $20/100GB for additional bandwidth. Digital Ocean charges $1/100GB for additional bandwidth (20x less), and my droplet comes with 1TB of bandwidth. So the bandwidth pricing feels a lot more reasonable to me.
we’ll see how it goes!
All my static sites are running on my own server now. I don’t really know what this will be like to maintain, we’ll see how it goes – maybe I’ll like it! maybe I’ll hate it! I definitely like the faster build times and that I can easily look at my nginx logs.
Some ways DNS can break
When I first learned about it, DNS didn’t seem like it should be THAT complicated. Like, there are DNS records, they’re stored on a server, what’s the big deal?
But with DNS, reading about how it works in a textbook doesn’t prepare you for the sheer volume of different ways DNS can break your system in practice. It’s not just caching problems!
So I asked people on Twitter for example of DNS problems they’ve run into, especially DNS problems that didn’t initially appear to be DNS problems. (the popular “it’s always DNS” meme)
I’m not going to discuss how to solve or avoid any of these problems in this post, but I’ve linked to webpages discussing the problem where I could find them.
problem: slow network requests
Your network requests are a little bit slower than expected, and it’s actually because your DNS resolver is slow for some reason. This might be because the resolver is under a lot of load, or it has a memory leak, or something else.
I’ve run into this before with my router’s DNS forwarder – all of my DNS requests were slow, and I restarted my router and that fixed the problem.
problem: DNS timeouts
A couple of people mentioned network requests that were taking 2+ seconds or 30 seconds because of DNS queries that were timing out. This is sort of the same as “slow requests”, but it’s worse because queries can take several seconds to time out.
Sophie Haskins has a great blog post Misadventures with Kube DNS about DNS timeouts with Kubernetes.
problem: ndots
A few people mentioned a specific issue where Kubernetes sets ndots:5 in its /etc/resolv.conf
Here’s an example /etc/resolv.conf from Kubernetes pods /etc/resolv.conf ndots:5 option and why it may negatively affect your application performances.
nameserver 100.64.0.10
search namespace.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal
options ndots:5
My understanding is that if this is your /etc/resolv.conf and you look up
google.com, your application will call the C getaddrinfo function, and
getaddrinfo will:
- look up
google.com.namespace.svc.cluster.local. - look up
google.com.svc.cluster.local. - look up
google.com.cluster.local. - look up
google.com.eu-west-1.compute.internal. - look up
google.com.
Basically it checks if google.com is actually a subdomain of everything on the search line.
So every time you make a DNS query, you need to wait for 4 DNS queries to fail before you can get to the actual real DNS query that succeeds.
problem: it’s hard to tell what DNS resolver(s) your system is using
This isn’t a bug by itself, but when you run into a problem with DNS, often it’s related in some way to your DNS resolver. I don’t know of any foolproof way to tell what DNS resolver is being used.
A few things I know:
- on Linux, I think that most things use /etc/resolv.conf to choose a DNS resolver. There are definitely exceptions though, for example your browser might ignore /etc/resolv.conf and use a different DNS-over-HTTPS service instead.
- if you’re using UDP DNS, you can use
sudo tcpdump port 53to see where DNS requests are being sent. This doesn’t work if you’re using DNS over HTTPS or DNS over TLS though.
I also vaguely remember it being even more confusing on MacOS than on Linux, though I don’t know why.
problem: DNS servers that return NXDOMAIN instead of NOERROR
Here’s a problem that I ran into once, where nginx couldn’t resolve a domain.
- I set up nginx to use a specific DNS server to resolve DNS queries
- when visiting the domain, nginx made 2 queries, one for an
Arecord, and one for anAAAArecord - the DNS server returned a
NXDOMAINreply for theAquery - nginx decided “ok, that domain doesn’t exist”, and gave up
- the DNS server returned a successful reply for the
AAAAquery - nginx ignored the
AAAArecord because it had already given up
The problem was that the DNS server should have returned NOERROR – that
domain did exist, it was just that there weren’t any A records for it. I
reported the bug, they fixed it, and that fixed the problem.
I’ve implemented this bug myself too, so I understand why it happens – it’s
easy to think “there aren’t any records for this query, I should return an
NXDOMAIN error”.
problem: negative DNS caching
If you visit a domain before creating a DNS record for it, the absence of the record will be cached. This is very surprising the first time your run into it – I only learned about this last year!
The TTL for cache entry is the TTL of the domain’s SOA record – for example
for jvns.ca, it’s an hour.
problem: nginx caching DNS records forever
If you put this in your nginx config:
location / {
proxy_pass https://some.domain.com;
}
then nginx will resolve some.domain.com once on startup and never again. This
is especially dangerous if the IP address for some.domain.com changes
infrequently, because it might keep happily working for months and then
suddenly break at 2am one day.
There are pretty well-known ways to fix this and this post isn’t about nginx so I won’t get into it, but it’s surprising the first time you run into it.
Here’s a blog post with a story of how this happened to someone with an AWS load balancer.
problem: Java caching DNS records forever
Same thing, but for Java: Apparently depending on how you configure Java, “the JVM default TTL [might be] set so that it will never refresh DNS entries until the JVM is restarted.”
I haven’t run into this myself but I asked a friend about it who writes more Java than me and they told me that it’s happened to them.
Of course, literally any software could have this problem of caching DNS records forever, but the main cases I’ve heard of in practice are nginx and Java.
problem: that entry in /etc/hosts you forgot about
Another variant on caching issues: entries in /etc/hosts that override your
usual DNS settings!
This is extra confusing because dig ignores /etc/hosts, so everything SEEMS
like it should be fine (”dig whatever.com is working!“).
problem: your email isn’t being sent / is going to spam
The way email is sent and validated is through DNS (MX records, SPF records, DKIM records), so a lot of email problems are DNS problems.
problem: internationalized domain names don’t work
You can register domain names with non-ASCII characters or emoji like https://💩.la.
The way this works with DNS is that 💩.la gets translated into xn--ls8h.la with an encoding called “punycode”.
But even though there’s a clear standard for how they should work with DNS, a lot of software doesn’t handle internationalized domain names well! There’s a fun story about this in Julian Squires’ great talk The emoji that Killed Chrome!!.
problem: TCP DNS is blocked by a firewall
A couple of people mentioned that some firewalls allow UDP port 53 but not TCP port 53. But large DNS queries need to use TCP port 53, so this can cause weird intermittent problems that are hard to debug.
problem: musl doesn’t support TCP DNS
A lot of applications use libc’s getaddrinfo to make DNS queries. musl is an
alternative to glibc that’s used in Alpine Docker container which doesn’t
support TCP DNS. This can cause problems if you make DNS queries where the
response would be too big to fit inside a regular DNS UDP packet (512 bytes).
I’m still a bit fuzzy on this so I might have it wrong, but my understanding of how this can break is:
- musl’s getaddrinfo makes a DNS query
- the DNS server notices that the response is too big to fit in a single DNS response packet
- the DNS server returns an empty truncated response, expecting that the client will retry by making a TCP DNS query
musldoes not support TCP so it does not retry
A blog post about this: DNS resolution issue in Alpine Linux
problem: round robin DNS doesn’t work with getaddrinfo
One way you could approach load balancing is to use “round robin DNS”. The idea
is that every time you make a DNS query, you get a different IP address.
Apparently this works if you use gethostbyname to make DNS queries, but it
does not work if you use getaddrinfo because getaddrinfo sorts the IP
responses it receives.
So you could run into an upsetting problem if you switch from gethostbyname to getaddrinfo behind the scenes without realising that this will break your DNS load balancing.
This is especially insidious because you might not realize that you’re
switching to gethostbyname to getaddrinfo at all – if you’re not writing a
C program, those functions calls are hidden inside some library. So it could be
part of a seemingly innocuous upgrade.
Here are a couple of pages discussing this:
problem: a race condition when starting a service
A problem someone mentioned with Kubernetes DNS: they had 2 containers which started simultaneously and immediately tried to resolve each other. But the DNS lookup failed because the Kubernetes DNS change hadn’t happened yet, and then the failure was cached so it kept failing.
that’s all!
I’ve definitely missed some important DNS problems here, so I’d love to hear what I’ve missed. I’d also love links to blog posts that write up examples of these problems – I think it’s really useful to see how the problem specifically manifests in practice and how people debugged it.
How to find a domain's authoritative nameservers
Here’s a very quick “how to” post on how to find your domain’s authoritative nameserver.
I’m writing this because if you made a DNS update and it didn’t work, there are 2 options:
- Your authoritative nameserver doesn’t have the correct record
- Your authoritative nameserver does have the correct record, but an old record is cached and you need to wait for the cache to expire
To be able to tell which one is happening (do you need to make a change, or do you just need to wait?), you need to be able to find your domain’s authoritative nameserver and query it to see what records it has.
But when I looked up “how to find a domain’s authoritative nameserver” to see what advice was out there, I found a lot of different methods being mentioned, some of which can give you the wrong answer.
So let’s walk through a way to find your domain’s authoritative nameservers that’s guaranteed to always give you the correct answer. I’ll also explain why some of the other methods aren’t always accurate.
first, an easy but less accurate way
If you definitely haven’t updated your authoritative DNS server in the last
week or so, a very easy way to find it is to run dig +short ns DOMAIN
$ dig +short ns jvns.ca
art.ns.cloudflare.com.
roxy.ns.cloudflare.com.
In this case, we get the correct answer. Great!
But if you have updated your authoritative DNS server in the last few days (maybe because you just registered the domain!), that can give you an inaccurate answer. So here’s the slightly more complicated way that’s guaranteed to always give you the correct answer.
step 1: query a root nameserver
We’re going to look up the authoritative nameserver for jvns.ca in this example.
No matter what domain we’re looking up, we need to start with the root
nameservers. h.root-servers.net is one of the 13 DNS root nameservers, and dig @h.root-servers.net means “send the query to h.root-servers.net”.
$ dig @h.root-servers.net jvns.ca
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42165
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;jvns.ca. IN A
;; AUTHORITY SECTION: <------------ this is the section we're interested in
ca. 172800 IN NS c.ca-servers.ca. <------- we'll use this record
ca. 172800 IN NS j.ca-servers.ca.
ca. 172800 IN NS x.ca-servers.ca.
ca. 172800 IN NS any.ca-servers.ca.
;; ADDITIONAL SECTION:
c.ca-servers.ca. 172800 IN A 185.159.196.2
j.ca-servers.ca. 172800 IN A 198.182.167.1
x.ca-servers.ca. 172800 IN A 199.253.250.68
any.ca-servers.ca. 172800 IN A 199.4.144.2
c.ca-servers.ca. 172800 IN AAAA 2620:10a:8053::2
j.ca-servers.ca. 172800 IN AAAA 2001:500:83::1
x.ca-servers.ca. 172800 IN AAAA 2620:10a:80ba::68
any.ca-servers.ca. 172800 IN AAAA 2001:500:a7::2
;; Query time: 96 msec
;; SERVER: 198.97.190.53#53(198.97.190.53)
;; WHEN: Tue Jan 11 08:30:57 EST 2022
;; MSG SIZE rcvd: 289
The answer we’re looking for is this line in the “AUTHORITY SECTION”:
ca. 172800 IN NS c.ca-servers.ca.
It doesn’t matter which line in this section you pick, you can use any of them. I just picked the first one.
This tells us the server we need to talk to in step 2: c.ca-servers.ca.
step 2: query the .ca nameservers
Now we run dig @c.ca-servers.ca jvns.ca
$ dig @c.ca-servers.ca jvns.ca
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24920
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;jvns.ca. IN A
;; AUTHORITY SECTION: <------------ this is the section we're interested in
jvns.ca. 86400 IN NS art.ns.cloudflare.com. <---- we'll use this record
jvns.ca. 86400 IN NS roxy.ns.cloudflare.com.
;; Query time: 26 msec
;; SERVER: 185.159.196.2#53(185.159.196.2)
;; WHEN: Tue Jan 11 08:32:44 EST 2022
;; MSG SIZE rcvd: 90
Same as last time: the answer we’re looking for is this line in the “AUTHORITY SECTION”:
jvns.ca. 86400 IN NS art.ns.cloudflare.com.
Again, it doesn’t matter which line in this section you pick, you can use any of them. I just picked the first one.
success! we know the authoritative nameserver!
The authoritative nameserver for jvns.ca is art.ns.cloudflare.com.. Now you
can now query art.ns.cloudflare.com. directly to see what DNS records it has
for jvns.ca.
$ dig @art.ns.cloudflare.com. jvns.ca
jvns.ca. 292 IN A 172.64.80.1
Nice, it worked.
this is exactly what’s happening behind the scenes when you make a DNS query
The reason I like this method is that it mimics what’s happening behind the
scenes when you make a DNS query. When Google’s DNS resolver 8.8.8.8. looks
up jvns.ca, the server it queries to to get jvns.ca’s authoritative nameserver is
c.ca-servers.net (or one of the other options, like j.ca-servers.ca. or x.ca-servers.ca.)
Because this method uses the exact same information source as a real DNS query, you’re guaranteed to get a correct answer every time.
Often in practice I skip step 1 because I remember that the answer for .ca
domains is c.ca-servers.net, so I can skip straight to step 2.
this is useful to do when you’re updating your nameservers
When I update my nameservers with my domain registrar, they don’t actually update the authoritative nameserver right away. It takes a while, maybe an hour. So I like to go through these steps to check if my registrar has actually updated my authoritative nameserver yet.
other ways to get a domain’s authoritative nameserver
Here are a few other ways you can get the authoritative nameserver for a domain and why I didn’t recommend them as the main method.
dig +trace jvns.ca
This does the exact same thing so it will always give you the right answer, but the output is a bit confusing to read so I’m a bit more hesitant to recommend it.
dig ns jvns.ca
This will usually give you the right answer, but there are 2 reasons it might be wrong:
- You might get an old cached record
- The NS record you get doesn’t come from the same place as it does when we do
the method described in this post. In this example, instead of getting a NS
record from
c.ca-servers.net,dig ns jvns.cawill give you an NS record fromart.ns.cloudflare.com. In practice usually these are the exact same thing, but in some weird edge cases they might not be.
dig soa jvns.ca
You can also find nameservers in the SOA record!
$ dig SOA jvns.ca
jvns.ca. 3600 IN SOA art.ns.cloudflare.com. dns.cloudflare.com. 2267173366 10000 2400 604800 3600
^^^^^^^^^^^^^^^^^^^^^
here it is
This will usually give the right answer, there are 2 reasons it might be wrong, similarly to the NS record:
- This response comes from your authoritative nameserver. So if you’re in the middle of updating your nameserver, you might get the wrong answer because your DNS resolver sent the request to the old nameserver.
- Your authoritative nameserver could be returning a SOA record which doesn’t have the correct nameserver for some reason
whois jvns.ca
This will usually give you the right answer, but it might be an old cached version.
Here’s what this looks like on my machine for this example: (it gives us the right answer)
$ whois jvns.ca | grep 'Name Server'
Name Server: art.ns.cloudflare.com
Name Server: roxy.ns.cloudflare.com
that’s all!
I hope this helps some of you debug your DNS issues!
Why might you run your own DNS server?
One of the things that makes DNS difficult to understand is that it’s decentralized. There are thousands (maybe hundreds of thousands? I don’t know!) of authoritative nameservers, and at least 10 million resolvers. And they’re running lots of different software! All these different servers running software means that there’s a lot of inconsistency in how DNS works, which can cause all kinds of frustrating problems.
But instead of talking about the problems, I’m interested in figuring out – why is it a good thing that DNS is decentralized?
why is it good that DNS is decentralized?
One reason is scalability – the decentralized design of DNS makes it easier to scale and more resilient to failures. I find it really amazing that DNS is still scaling well even though it’s almost 40 years old. This is very important but it’s not what this post is about.
Instead, I want to talk about how the fact that it’s decentralized means that you can have control of how your DNS works. You can add more servers to the giant complicated mess of DNS servers! Servers that you control!
Yesterday I asked on Twitter why you might want to run your own DNS servers, and I got a lot of great answers that I wanted to summarize here.
you can run 2 types of DNS servers
There are 2 main types of DNS servers you can run:
- if you own a domain, you can run an authoritative nameserver for that domain
- if you have a computer (or a company with lots of computers), you can run a resolver that’s resolves DNS for those computers
DNS isn’t a static database
I’ve seen the “phone book” metaphor for DNS a lot, where domain names are like names and IP addresses are like phone numbers.
This is an okay mental model to start with. But the “phone book” mental model
might make you think that if you make a DNS query for google.com, you’ll
always get the same result. And that’s not true at all!
Which record you get in reply to a DNS query can depend on:
- where you are in the world (maybe you’ll get an IP address of a server that’s physically closer to you!)
- if you’re on a corporate network (where you might be able to resolve internal domain names)
- whether the domain name is considered “bad” by your DNS resolver (it might be blocked!)
- the previous DNS query (maybe the DNS resolver is doing DNS-based load balancing to give you a different IP address every time)
- whether you’re using an airport wifi captive portal (airport wifi will resolve DNS records differently before you log in, it’ll send you a special IP to redirect you)
- literally anything
A lot of the reasons you might want to control your own server are related to the fact that DNS isn’t a static database – there are a lot of choices you might want to make about how DNS queries are handled (either for your domain or for your organization).
reasons to run an authoritative nameserver
These reasons aren’t in any particular order.
For some of these you don’t necessarily have to run your own authoritative nameserver, you can just choose an authoritative nameserver service that has the features you want.
To be clear: there are lots of reasons not to run your own authoritative nameserver – I don’t run my own, and I’m not trying to convince you that you should. It takes time to maintain, your service might not be as reliable, etc.
reason: security
[There’s a] risk of an attacker gaining DNS change access through your vendor’s customer support people, who only want to be helpful. Or getting locked out from your DNS (perhaps because of the lack of that). In-house may be easier to audit and verify the contents.
reason: you like running bind/nsd
One reason several people mentioned was “I’m used to writing zone files and
running bind or nsd, it’s easier for me to just do that”.
If you like the interface of bind/nsd but don’t want to operate your own server, a couple of people mentioned that you can also get the advantages of bind by running a “hidden primary” server which stores the records, but serve all of the actual DNS queries from a “secondary” server. Here are some pages I found about configuring secondary DNS from from NS1 and cloudflare and Dyn as an example.
I don’t really know what the best authoritative DNS server to run is. I think I’ve only used nsd at work.
reason: you can use new record types
Some newer DNS record types aren’t supported by all DNS services, but if you run your own you can support any record types you want.
reason: user interface
You might not like the user interface (or API, or lack of API) of the DNS service you’re using. This is pretty related to the “you like running BIND” reason – maybe you like the zone file interface!
reason: you can fix problems yourself
There are some obvious pros and cons to being able to fix problems yourself when they arise (pro: you can fix the problem, con: you have to fix the problem).
reason: do something weird and custom
You can write a DNS server that does anything you want, it doesn’t have to just return a static set of records.
A few examples:
- Replit has a blog post about why they wrote their own authoritative DNS server to handle routing
- nip.io maps 10.0.0.1.nip.io to 10.0.0.1
- I wrote a custom DNS server for mess with dns
reason: to save money
Authoritative nameservers seem to generally charge per million DNS queries. As an example, at a quick glance it looks like Route 53 charges about $0.50 per million queries and NS1 charges about $8 per million queries.
I don’t have the best sense for how many queries a large website’s authoritative DNS server can expect to actually need to resolve (what kinds of sites get 1 billion DNS queries to their authoritative DNS server? Probably a lot, but I don’t have experience with that.). But a few people in the replies mentioned cost as a reason.
reason: you can change your registrar
If you use a separate authoritative nameserver for your domain instead of your registrar’s nameserver, then when you move to a different registrar all you hvae to do to get your DNS back up is to set your authoritative DNS server to the right value. You don’t need to migrate all your DNS records, which is a huge pain!
You don’t need to run your own nameserver to do this.
reason: geo DNS
You might want to return different IP addresses for your domain depending on where the client is, to give them a server that’s close to them.
This is a service lots of authoritative nameserver services offer, you don’t need to write your own to do this.
reason: avoid denial of service attacks targeted at someone else
Many authoritative DNS servers are shared. This means that if someone attacks
the DNS server for google.com or something and you happen to be using the
same authoritative DNS server, you could be affected even though the attack
wasn’t aimed at you. For example, this DDoS attack on Dyn in 2016.
reason: keep all of your configuration in one place
One person mentioned that they like to keep all of their configuration (DNS records, let’s encrypt, nginx, etc) in the same place on one server.
wild reason: use DNS as a VPN
Apparently iodine is an authoritative DNS server that lets you tunnel your traffic over DNS, if you’re on a network that only allows you to contact the outside world as a VPN.
reasons to run a resolver
reason: privacy
If someone can see all your DNS lookups, they have a complete list of all the domains you (or everyone from your organization) is visiting! You might prefer to keep that private.
reason: block malicious sites
If you run your own resolver, you can refuse to resolve DNS queries (by just not returning any results) for domains that you consider “bad”.
A few examples of resolvers that you can run yourself (or just use):
- Pi-Hole blocks advertisers
- Quad9 blocks domains that do malware/phishing/spyware. Cloudflare seems to have a similar service
- I imagine there’s also corporate security software that blocks DNS queries for domains that host malware
- DNS isn’t a static database. It’s very dynamic, and answers often depend in real time on the IP address a query came from, current load on content servers etc. That’s hard to do in real time unless you delegate serving those records to the entity making those decisions.
- DNS delegating control makes access control very simple. Everything under a zone cut is controlled by the person who controls the delegated server, so responsibility for a hostname is implicit in the DNS delegation.
reason: get dynamic proxying in nginx
Here’s a cool story from this tweet:
I wrote a DNS server into an app and then set it as nginx’s resolver so that I could get dynamic backend proxying without needing nginx to run lua. Nginx sends DNS query to app, app queries redis and responds accordingly. It worked pretty great for what I was doing.
reason: avoid malicious resolvers
Some ISPs run DNS resolvers that do bad things like nonexistent domains to an IP they control that shows you ads or a weird search page that they control.
Using either a resolver you control or a different resolver that you trust can help you avoid that.
reason: resolve internal domains
You might have an internal network with domains (like
blah.corp.yourcompany.com) that aren’t on the public internet. Running your
own resolver for machines in the internal network makes it possible to access
those domains.
You can do the same thing on a home network, either to access local-only services or to just get local addresses for services that are on the public internet.
reason: avoid your DNS queries being MITM’d
One person said:
I run a resolver on my LAN router that uses DNS over HTTPS for its upstream, so IoT and other devices that don’t support DoH or DoT don’t spray plaintext DNS outside
that’s all for now
It feels important to me to explore the “why” of DNS, because it’s such a complicated messy system and I think most people find it hard to get motivated to learn about complex topics if they don’t understand why all this complexity is useful.
Thanks to Marie and Kamal for discussing this post, and to everyone on Twitter who provided reasons
2021: Year in review
It’s the end of the year again! This was my second full year working for myself on wizard zines.
Here are some thoughts about what I’m working towards, a bunch of things I made this year, and a few ideas and questions about 2022.
made some progress on a “mission statement”
I think the two hardest things about working for myself are working alone and having to decide what to do.
This year I spent some time trying to figure out what I’m doing with wizard zines. This is hard for me because I work in a pretty intuitive way – I have some feelings about what’s important, and even though following my feelings usually turns out well, I can’t always explain what I’m trying to do.
Here are some thoughts about what I’m trying to do:
- I’m only interested in explaining “old” established technologies (HTTP, DNS, assembly, C, etc)
- I want to help people who are learning about these “old” things for the first time today (whether they’re new to programming or just new to the thing)
- I think it’s very important to learn through experimentation and actually using the thing
- There’s a lot of hidden knowledge about these “old” tools (what are they used for? what are the common mistakes? which parts can you ignore?). I want to make that knowledge easy to find.
There’s also something which I can’t quite articulate yet, which is that computers were “simpler” in the past, and that it’s still possible in 2022 to learn programming in a “simpler” way and cut through some of the chaos and churn in modern computing.
As stated this sounds unrealistic, because you do need to use a lot of more “complicated” stuff to get things done in a real programming job. Most people can’t avoid it! But understanding the “simple” core of how computers work really gives me a lot more confidence when dealing with all of the chaos. (“I don’t know what all this nonsense is but at the end of the day it’s all just assembly instructions and memory and system calls, I can figure this out”)
So I guess I want to figure out how to help other people move towards that confidence of “ok, I know how this computer stuff works, I can figure out any weird new nonsense you throw at me”. That seems ambitious and it might be unrealistic for a collection of a few zines and and blog posts and websites but it feels important to me.
educational websites!
This year I felt more excited about making websites than making zines and so I spent more time building websites than I did writing zines. I think this is because:
- the way I learn things is by doing, not by reading
- I feel really comfortable experimenting and breaking things on my computer, but I get the impression that not everyone feels the same level of comfort
- writing code is fun
I’m not going to write about each project at length here because I already wrote long blog posts about each one, but here’s a list of all of the educational websites I made this year (and one command line tool):
- mysteries.wizardzines.com (blog post), a choose-your-own-adventure debugging game
- nginx playground (blog post), where you can do nginx experiments
- mess with dns (blog post), where you can do DNS experiments. I built this one with Marie Claire LeBlanc Flanagan.
- dnspeep (blog post), a command line tool for spying on your DNS queries
- a simple dns lookup tool (blog post), a friendly website version of
dig - what happens behind the scenes when you make a DNS query, a friendly website version of
dig +trace - a sort of “linux challenges” site that I wrote about a lot but haven’t released yet
As you can tell, a lot of them are related to DNS :)
collaboration!
My biggest problem with working for myself is that it gets lonely sometimes, and in previous years I haven’t been very good at finding collaborators. This year I got to work with Marie on Mess With DNS. It was a lot more fun (and a lot easier!) than building it by myself would have been.
opened a print zine store!
I finally (after several years of struggling to figure it out) opened an online store that sells print versions of my zines in May.
Unlike my previous artisanal attempts where I did all the shipping myself (which was kind of fun but totally unsustainable), this time an amazing company called White Squirrel is handling the shipping.
Setting this up involved a lot of logistics (I just finished figuring out how to handle EU VAT for example), but now that it’s set up it doesn’t need too much ongoing attention except to answer occasional emails from people who run into shipping problems.
put all of my comics online!
I put all of my comics online in one place at https://wizardzines.com/comics. Here’s a blog post about that. I’m really happy about this and it’s amazing to be able to just link to any comic I want (like grep).
some partial zines!
I worked on a few zine ideas this year but did not finish any. Here are some bits and pieces that I wrote this year.
Some pages from a DNS zine draft:
- why updating DNS is slow
- the 4 types of DNS servers
- CNAME records
- DNS record types
- why we need DNS
- subdomains
- top-level domains
I also worked on a zine about debugging that I’ve been writing on and off since 2019. Here are some pages I wrote in this year’s attempt: (you can find more by searching for “debugging” at https://wizardzines.com/comics)
- track your progress
- guesses are often wrong
- why some bugs feel “impossible”
- make a minimal reproduction
- list what you’ve learned
- this flowchart
- “we think about debugging as a technical skill (and it absolutely is!!) but a
huge amount of it is managing your feelings so you don’t get discouraged and
being self-aware so you can recognize your incorrect assumptions” (this was just a tweet, not a comic)
I also wrote 3 pages about how things are represented in memory (hexadecimal, little-endian, bytes), but there’s a very long way to go on that project.
I got stuck writing these for the reason I usually get stuck finishing zines – I need to have a clear explanation in my mind of what the zine is about. The explanation looks something like “the reason many people struggle with TOPIC is because they don’t understand X, here’s what you need to know”. My first guess at what “X” is often wrong, and the farther I am away from learning the topic for the first time myself the more wrong I am.
But I have some thoughts about how to get unstuck, we’ll see what happens!
blog posts!
here are some of blog posts I wrote this year, by category:
“how things work”:
- The OSI model doesn’t map well to TCP/IP
- DNS “propagation” is actually caches expiring
- Quadratic algorithms are slow (and hashmaps are fast)
“how to use X tool”:
- What problems do people solve with strace?
- How to look at the stack with gdb
- Tools to Explore BGP
- Some notes on using esbuild
- Firecracker: Start a VM in less than a second
- Docker Compose: a nice way to set up a dev environment
meta-posts about learning and writing:
- Get better at programming by learning how things work
- Blog about what you’ve struggled with
- Patterns in confusing explanations
- Write good examples by starting with real code
- How to get useful answers to your questions
- Teaching by filling in knowledge gaps (this one was really helpful for me to write to articulate to myself what I’m trying to do)
debugging:
- Why bugs might feel “impossible”
- Debugging by starting a REPL at a breakpoint is fun
- Debugging a weird ‘file not found’ error
I’ve historically had a “never write anything negative” rule for this blog and I’m trying to back off that a little bit because it feels limiting. Specifically the posts on dns propagation, the osi model, and confusing explanations are a little more negative than what I would have written previously, and I think they’re better because of it.
Looking at this list, maybe I want to write more hands-on “how things work” / “how to use X tool” posts next year.
Also, I see people linking to Get your work recognized: write a brag document (from 2019) on Twitter all the time. I’m very happy it’s still helping people.
the business is doing well!
This past year I released a lot of free things and not as many things that cost money, but the business is doing well nonetheless and I’m happy about that.
All of the work in this post continues to be 100% funded by zine sales, which I’m grateful for.
things I might do in 2022
Here are some things I might do in 2022. I write these because it’s interesting to see at the end of the year which ones happened and which ones didn’t.
- keep working with other people.
- try to finish some zines
- make more websites where people can do computer experiments
- figure out where to take the computer mysteries project
- give a talk (I’ve taken 3 years off writing talks, but I have some ideas)
- try to understand what’s going on with this “mission statement”
some things I’m less likely to do, but that I’ve thought about:
- write an actual book (for example a book about computer networking)
- write some computer networking command line tools that are easier to use
- collaborate with someone to write a zine about a topic that I know almost nothing about (like databases)
- spend more (than 0) time reading papers about CS education
some questions
Here are some questions I have going into 2022.
- What’s the scope of the “zines” project? How many more do I want to write?
- What’s hard for developers about learning to use the Unix command line in 2021? What do I want to do about it?
- If I keep making these educational websites, how can/should I measure how much they’re used? (I’m thinking here about this paper by Philip Guo about python tutor’s design guidelines)
- Do any of my educational websites need to make money, or can I keep releasing them for free?