Skip to Content

Comics!

Here are (almost) all of the comics I've published! They're ordered approximately by popularity, most popular first.

bash tricks
### panel 1: ctrl + r search your history! I use this **constantly** to rerun commands ### panel 2: magical braces ```$ convert file.{jpg,png}``` expands to ```$ convert file.jpg file.png``` `{1..5}` expands to 1 2 3 4 5 (for i in {1..100}...) ### panel 3: !! expands to the last command run `$ sudo !!` ### panel 4: commands that start with a space don't go in your history. good if there's a password ### panel 5: loops ``` for i in *.png do convert $i $i.jpg done ``` person: for loops: easy & useful! ### panel 6: $ ( ) gives the output of a command ``` $touch file- $ (date -1)``` create a file named file-2018-05-25 ### panel 7: more keyboard shortcuts ctrl + a beginning of line ctrl + e end of line ctrl + l clear the screen & lots more emacs shortcuts too!
/proc
### panel 1: Every process on Linux has a PID (process ID) like 42. In /proc/42, there's a lot of VERY USEFUL information about process 42. ### panel 2: /proc/PID/cmdline command line arguments the process was started with. ### panel 3: /proc/PID/environ all of the process's environment variables ### panel 4: /proc/PID lexe Symlink to the process's binary magic: works even if the binary has been deleted! ### panel 5: /proc/PID/status Is the program running or asleep? How much memory is it using? And much more! ### panel 6: /proc/PID/fd Directory with every file the process has open! Run ```$1s-1 /proc/42/fd``` to see the list of files for process 42. These symlinks are also magic & you can use them to recover deleted files ### panel 7: /proc/PID/stack The kernel's current stack for the process. Useful if it's stuck in a system call. ### panel 8: /proc/PID/maps List of process's memory maps. Shared libraries, heap, anonymous maps, etc. ### panel 9: and more Look at ```man proc``` for more information!
how I got better at debugging
### Remember: the bug is happening for a logical reason. It's never magic. Really. Even when it makes no sense. ### Be confident I can fix it before: maybe this is too hard now: well I've fixed a lot of hard bugs before ### Talk to my coworkers person 1: ? person 2: ! ### know my debugging toolkit before: I want to know $THING but I don't know how to find out now: I KNOW! I'll use tcpdump! ### most importantly: I learned to like it before: oh no! a bug! now: I think I'm about to learn something (facial expression: determination)
grep
### panel 1: grep lets you files for text search ```$ grep bananas foo.txt``` Here are some of my favourite grep command line arguments! ### panel 2: -E Use if you want regexps like ".+" to work. otherwise you need to use ".\+" ### panel 3: -v invert match find : all lines that don't match ### panel 4: -r recursive! Search all the files in a directory. ### panel 5: -o only print the matching part of the line (not the whole line) ### panel 6: -i case insensitive ### panel 7: -A -B -C Show **c**ontext for your search ```$grep -A 3 foo``` will show 3 lines of context **a**fter a match ### panel 8: -l only show the **filenames** of the files that matched ### panel 9: -F aka fgrep don't treat the match string as a regex eg ```$ grep -F...``` ### panel 10: -a search binaries: treat binary data like it's text instead of ignoring it! ### panel 11: grep alternatives ack ag ripgrep (better for searching code!)
permissions
### panel 1: There are 3 things you can do to a file. **r**ead **w**rite e**x**ecute ### panel 2: Is -1 file.txt shows you permissions. Here's how to interpret the output: rw- **bork** (user) can read & write rw- **staff** (group) can read & write r-- **ANYONE** can read ### panel 3: File permissions are 12 bits First digit: setuid Second digit: setgid Third digit: sticky User 110 rwx Group 110 rwx all 100 rwx For files: r = can read W = can write X = can execute For directories, it's approximately: r = can list files W = can create files x = can cd into & access files ### panel 4: 110 in binary is 6 so rw- = 110 = 6 r-- = 100 = 4 r-- = 100 = 4 ```chmod 644 file.txt|``` means change the permissions to: rw- r-- r-- Simple! ### panel 5: setuid affects executables ```$1s-1/bin/ping``` rw**s** r-x r-x root root the s means ping always runs as root ```setgid``` does 3 different unrelated things for executables, directories, and regular files. person: unix! why?? unix: it's a long story
how to be a wizard programmer
more bash tricks
### `cd -` changes to the directory you were last in `pushd` & `popd` let you keep a stack ### `ctrl + z` suspends (SIGTSTP) the running program ### `fg` brings backgrounded/suspended program to the foreground ### `bg` starts suspended program & backgrounds it (use after `ctrl + z`) ### `shellcheck` (with hearts around it) shell script linter! helpes spot common mistakes ### `<( )` (process substitution) treat process output like a file (no more temp files!) eg: `$diff <(ls) <(ls -a)` ### `fc` ("fix command") open the last command you ran in an editor then run the edited version ### `type` tells you if something is a builtin, program, or alias try running type on - time - ping - pushd (they're all different types)
ssh
### ssh keys An ssh key is a secret key that lets you SSH to a machine person: hello! ssh: That's on my list of authorized keys! come in! ### ssh-copy-id This script installs your SSH key on a host (over SSH) `$ ssh-copy-id user@host` (puts it in .ssh/authorized-keys etc) installing a SSH key is surprisingly finicky so this script is helpful! ### port forwarding ``` ssh user@host.com - Nfl 3333:localhost:8888 ``` 3333 = local port 8888 = remote port Lets you view a remote server that's not on the internet in your browser. ### just run 1 command `$ ssh user@host uname -a` runs the command `uname -a` & exits. ### ssh-agent remembers your SSH key passphrase so you don't have to keep typing it ### ~. <Enter> ~. closes the SSH , connection. Useful if it's hanging! ### mosh ssh alternative: keeps the connection open if you disconnect + reconnect later ### .ssh/config Lets you set, per host: - Username to use. - SSH key to use - an alias! so you can type `$ ssh ALIAS` instead of `ssh user@very longdomain.com`
what to talk about in 1:1s
[manager]
Each of these items is enclosed in a little thought bubble, with an image of a stick figure with short curly hair. The person is smiling in every illustration, except "what's not going well". ### what's been going well I LOVE this project! ### what's not going well I got paged 10 times last week ### team priorities how does my work fit in with company goals? ### career planning I'd like to be promoted this year ### ask for opportunities I want to work on a customer-facing project ### ask for feedback do you have any concerns about how PROJECT is going? ### brainstorm let's think about this problem! ### give feedback the team seems really unfocused recently ### ask for resources I think this training would really help me
bash brackets cheat sheet
[bash]
### shell scripts have a lot of brackets here's a cheat sheet to help you identify them all! we'll cover the details later. ### (cd ~/music; pwd) `(...)` runs commands in a subshell. ### VAR=$(cat file.txt) `$(COMMAND)` is equal to `COMMAND`'S stdout ### { cd ~/music; pwd; } `(...)` groups commands. runs in the same process. ### x=(1 2 3) `x=(...)` creates an array ### x=$((2+2)) `$(())` does arithmetic ### if [...] `/usr/bin/[` is a program that evaluates statements ### <(COMMAND) "process substitution": an alternative to pipes ### a{.png, .svg} this expands to `a.png a.svg` it's called "brace expansion" ### if [[ ... ]] `[[` is bash syntax. it's more powerful than `[` ### ${var//search/replace} see page 21 for more about `${...}`!
the box model
### every HTML element is in a box ``` <div class="1"> <div class="2" /> <div class="3" /> </div> ``` Illustration of a larger box, labelled 1. Nested inside it are two boxes. The one on top is labelled 2, and the one below 2 is labelled 3. ### boxes have padding, borders, and a margin Illustration of a series of nested boxes. The middle box is empty. The area around the middle box is labelled "padding". The area around the padding is labelled "border". The area around the border is labelled "margin". ### width & height don't include any of those The same illustration from the previous panel, but with two lines measuring the width and height of only the middle box, not the padding, border, or margin. ### margins are allowed to overlap sometimes Illustration of two sets of nested boxes, similar to the diagrams above. One is on top of the other, and the area between the sets of boxes is shaded in green, showing that the bottom margin of the first set of boxes, and the top margin of the second set of boxes, overlap. the browser combines these top/bottom margins. look up "margin collapse" to learn more ### `box-sizing: border-box;` includes border + padding in the width/height Illustration of a series of nested boxes with a middle box surrounded by padding, border, and margin. In this version, the lines measuring width and height extend all the way to the edge of the border (but don't include the margin surrounding the border.) ### inline elements ignore other inline elements' vertical padding Illustration of two dotted line boxes stacked directly on top of one another. Each has the word "`span`" inside it. you can set vertical padding but the other span won't move
memory allocation
### your program has memory 10MB: program binary 3MB: stack 587 MB: heap the heap is what your allocator manages ### Your memory allocator (malloc) is responsible for 2 things. THING 1: keep track of what memory is used/free. ### THING 2: Ask the OS for more memory! malloc: oh no! I'm being asked for 40 MB and I don't have it. malloc: can I have 60 MB more? OS: here you go! ### your memory allocator's interface - malloc(size_t size): allocate size bytes of memory & return a pointer to it. - free (void* pointer): mark the memory as unused (and maybe give back to the OS) - realloc(void pointer, size_t size): ask for more/less memory for pointer. - Calloc (size-t members, size_t size): allocate array + initialize to 0. ### malloc tries to fill in for space memory when you ask your code: can I have 512 bytes of memory? malloc: YES! ### malloc isn't magic! it's just a function! you can always: - use a different malloc library like jemalloc or tcmalloc (easy!) - implement your own malloc (harder)
misc commands
### `rlwrap` adds history & ctrl support to REPLs that don't already have them (`rl` stands for readline) `$ rlwrap python` ### `watch` rerun a command every 2 second ### `file` figures out what kind of file (png? pdf?) a file is ### `pv` "pipe viewer", gives you stats on data going through a pipe ### `cal` a tiny calendar (heart) ### `ts` add a timestamp in front of every input line ### `comm` find lines 2 sorted files have in common ### `ncdu` figure out what's using all your disk space ### `column` format input into columns ### `diff` diff 2 files. Run with `-U 8` for context. ### `xsel/xclip` copy/paste from system clipboard (`pbcopy`/`pbpaste` on Mac)
SELECT queries start with FROM
Conceptually, every step (like "`WHERE`") of a query transforms its input, like this: cats owner: 1 name: daisy owner: 1 name: dragonsnap owner: 3 name: buttercup owner: 4 name: rose `WHERE owner = 1` owner: 1 name: daisy owner: 1 name: dragonsnap The query's steps don't happen in the order they're written: how the query' is written SELECT... FROM + JOIN WHERE ... GROUP BY ... HAVING ... ORDER BY... LIMIT... how you should think about it: FROM + JOIN ↓ WHERE ↓ GROUP BY ↓ HAVING ↓ SELECT ↓ ORDER BY ↓ LIMIT (In reality query execution is much more complicated than this. There are a lot of optimizations.)
the most important HTTP request headers
These are the most important request headers: ### Host The domain The only required header. `Host: examplecat.com` ### User-Agent name + version of your browser and OS `User-Agent: curl 7.0.2` ### Referer (yes, it's misspelled!) website that linked or included the resource `Referer: http://examplecat.com` ### Authorization eg a password or API token base64 encoded user: password `Authorization: Basic YXZ` ### Cookie send cookies the server sent you earlier keeps you logged in `Cookie: user=b0rk` ### Range lets you continue downlats ("get bytes 100-200") `Range: bytes=100-200` ### Cache-Control "max-age=60" means cached responses must be less than 60 seconds old ### If-Modified-Since only send if resource was modified after this time `If-Modified-Since: Wed, 21 Oct...` ### If-None-Match only send if the ETag doesn't match those listed `If-None-Match: "e7ddac"` ### Accept MIME type you want the response to be `Accept: image/png` ### Accept-Encoding set this to "gzip" and you'll probably get a compressed response" `Accept-Encoding: gzip` ### Accept-Language set this to "fr-CA" and you might get a response in French `Accept-Languag: fr-CA` ### Content-Type MIME type of request body, e.g. "application/json" ### Content-Encoding will be "gzip" if the request body is gzipped ### Connection "close" or "keep-alive". Whether to keep the TCP connection open
curl
### `curl` smiling stick figure with short curly hair: it's my favourite way to make HTTP requests! great for testing APIs! `$ curl wizardzines.com` ### `-H` is for for header good for POST requests to JSON APIs `-H` "content-Type:application/json" allow compressed response: `-H` "Accept-Encoding: gzip" ### `-L` follow 3xx redirects ### `--data` `--data '{"name": "julia"'` `--data @filename.json` (@ reads the data to send from a file) ### `i` show response headers ### `I` show ONLY response headers (makes a HEAD request) ### `- X POST` send a POST request instead of a GET (`-X PUT` etc works too) ### `- v` show request headers & more ### `- k` insecure: don't verify SSL certificates ### `--connect to ::IP` (or hostname) send request to IP instead use bfore changing DNS to a new IP ### `copy as cURL` Have something in your browser you want to download from the command line? In Firefox/Chrome/Safari: Developer Tools -> Network tab -> right click on the request -> copy as curl (can have sensitive info in cookies!)
bash errors
### by default, bash will continue after errors bash, represented by a box with a smiley face: oh, was that an error? who cares, let's keep running!!! programmer, represented by a nonplussed stick figure with short curly hair: uh that is NOT what I wanted ### `set -e` stops the script on errors ``` set -e unzip fle.zip ``` (typo! script stops here!) programmer, smiling: this makes your scripts WAY more predictable ### by default, unset variables don't error `rm -r "$HOME/$SOMEPTH"` bash, happily: `$SOMEPTH` doesn't exist? no problem, i'll just use an empty string! programmer: OH NOOOO that means `rm -rf $HOME` ### `set -u` stops the script on unset variables ``` set-u rm -r "$HOME/$SOMEPTH" ``` bash, concerned: I've never heard of `$SOMEPTH`! STOP EVERYTHING!!! ### by default, a command failing doesn't fail the whole pipeline `curl yxqzq.ca | grep 'panda'` bash, pleased with itself: `curl` failed but `grep` succeeded so it's fine! success! ### `set -o pipefail` makes the pipe fail if any command fails you can combine `set -e`, `set -u`, and `set -o pipefail` into one command I put at the top of all my scripts: `set -euo pipefail`
xargs
### xargs takes white space separated strings from stdin and converts them into command-line arguments ``` $ echo "/home /tmp" | xargs ls ``` will run `ls /home/tmp` ### this is useful when you want to run the same command on a list of files! - delete (`xargs rm`) - combine (`xargs cat`) - search (`xargs grep`) - replace (`xargs sed`) ### how to replace "foo" with "bar" in all .txt files: ``` find. -name '*.txt' | xargs sed -i s/foo/bar/g ``` ### how to lint every Python file in your Git repo: ``` git ls-files | grep pyl xargs pep8 ``` ### if there are spaces in your filenames "my day.txt" xargs will think it's 2 files ""my" and "day.txt" fix it like this: ``` find -print0 | xargs -0 COMMAND ``` ### more useful xargs options `-n 1` (max-args): makes xargs run a separate process max-args for each input `-P` (capital P, max-procs): is the max number of parallel processes xargs will start
awk
### panel 1: awk is a tiny programming language for manipulating columns of data person: I only know how to do 2 things with awk but it's still useful! ### panel 2: basic awk program structure ``` BEGIN {...} CONDITION (action} CONDITION (action} ``` (do action on lines matching CONDITION) ``` END {...} ``` ### panel 3: extract a column.of text with awk ```awk -F, '{print $5}'``` the comma is the column separator the ' is a single quote! ```{print $5}``` means print the 5th column person: this is 99% of what I do with awk ### panel 4: SO MANY Unix commands print columns of text (ps! Is!) so being able to get the column you want with awk is GREAT ### panel 5: awk program example sum the numbers in the 3rd column ``` {s += $3};``` (action) ``` END {print s}'``` (at the end, print the sum!) ### panel 6: awk program example print every line over 80 characters ```length($0) > 80``` "length" is the condition (there's an implicit ```{print}``` as the action)
CORS
cross-origin resource sharing Cross-origin requests are not allowed by default: (because of the same origin policy!) Javascript from clothes.com: POST request to api.clothes.com? Firefox (thought bubble): same origin flow chart Firefox: NOPE. api.clothes.com is a different origin from clothes.com If you run api.clothes.com, you can allow clothes.com to make requests to it using the `Access-Control-Allow-Origin` header. Here's what happens: javascript on clothes.com: `POST /buy_thing`\ `Host: api.clothes.com` Firefox (thought bubble): That's cross-origin. I'm going to need to ask api.clothes.com if this request is allowed. Firefox: `OPTIONS /buy_thing`\ `Host: api.clothes.com` ("hey, what requests are allowed?" preflight request) api.clothes.com: ```204 No Content`` ```Access-Control-Allow-Origin: clothes.com` Firefox (thought bubble): cool, the request is allowed! Firefox: `POST /buy_thing`\ `Host: api.clothes.com`\ `Referer: clothes.com/checkout` api.clothes.com: `200 OK`\ `{"thing_bought": true}` This OPTIONS request is called a "preflight" request, and it only happens for some requests, like we described in the diagram on the same-origin policy page. Most GET requests will just be sent by the browser without a preflight request first, but POST requests that send JSON need a preflight.
CSS variables
### duplication is annoying Illustration of a frowning stick figure with curly hair. person, thinking: ugh, I have `color: #f79` set in 27 places and now I need to change it in 27 places ### define variables in any selector ``` body { --text-color: #f79; body { } ``` (applies to everything) ``` #header { --text-color: #c50; } ``` (applies to children of `#header`) ### use variables with `var()` ``` body { color: var(--text-color); } ``` (variables always start with `--`) ### do math on them with `calc()` ``` #sidebar { width: calc( var (--my-var) + 1em ); } ``` ### you can change a variable's value in Javascript ``` let root = document.documentElement; root.style.setProperty( '--text-color', 'black'); ``` ### changes to variables apply immediately JS, represented by a box with a smiley face: set `--text-color` to red css renderer, also represented by a box with a smiley face: ok everything using it is red now!
containers aren't magic
These 15 lines of bash will start a container running the fish shell. Try it! (download this script at bit.ly/containers-arent-magic) It only runs on Linux because these features are all Linux-only. `wget bit.ly/fish-container -O fish.tar` (# 1. download the image) `mkdir container-root; cd container-root` `tar -xf ../fish.tar` (# 2. unpack image into a directory) `cgroup_id="cgroup_$(shuf -i 1000-2000 -n 1)"` (# 3. generate random cgroup name) `cgcreate -g "cpu, cpuacct, memory: $cgroup_id"` (# 4. make a cgroup & set CPU/memory limits) `cgset -r cpu. shares=512 "$cgroup_id"` `cgset -r memory.limit_in_bytes=1000000000 \` `"$cgroup_id"` `cgexec -g "cpu, cpuacct, memory: $cgroup_id" \ ` (# 5. use the cgroup) `unshare -fmuipn --mount-proc\` (# 6. make and use some namespaces) ` chroot "$PWD" \` (# 7. change root directory) `/bin/sh -c "` `/bin/mount -t proc proc /proc &&` (# 8. use the right /proc) `hostname container-fun-times &&` (# 9. change the hostname) `/usr/bin/fish"` (# 10. finally, start fish!)
virtual memory
### your computer has physical memory memory 868 204-PIN SODIMM DDR3 CE ### physical memory has addresses, like O-8GB but when your program references an address like Ox 5c69a2a2, that's not a physical with memory address! It's a virtual address. ### every program has its own virtual address space program 1: Ox 129520 → "puppies" program 2: Ox 129520 → "bananas" ### Linux keeps a mapping, from virtual memory pages to physical memory pages called the page table a "page" is a 4kb or chunk of memory (or sometimes bigger) PID -- virtual addr -- physical addr 1971 -- Ox 20000 -- Ox 192000 2310 -- Ox 20000 -- Ox 228000 2310 -- Ox21000 -- Ox 9788000 ### when your program accesses a virtual address CPU: I'm accessing Ox21000 MMU "memory management unit" (hardware): I'll look that up in the page table and then access the right physical address ### every time you switch which process is running, Linux needs to switch the page table Linux: here's the address of process 2950's page table MMU: thanks, I'll use that now!
how to talk to your operating system
## how to talk to your operating system ### WRONG: weird guy: good morning madam would you care to open a file for me operating system: what ### RIGHT: system calls! yay! (surrounded by hearts, smiley faces, and exclamation marks) happy stick figure with curly hair: ``` open("/cool.txt") connect(<my friend's computer>) ``` your programs can: - open - read - write files talk to other computers with networking: - connect - sendto - recvfrom start other programs: - execue AND MUCH MORE!!! (these are all system calls on Linux!)
content delivery networks
In 2004, if your website suddenly got popular, often the webserver wouldn't be able to handle all the requests. slashdot: person 1: I want cat picture! person 2: me too! person 3: me 300,000! server, on fire: <no response> web host: now you owe me $1000 for bandwidth you: how will I pay for this? A CDN (content delivery network) can make your site faster and save you money by caching your site and handling most requests itself. 20 million requests for 1 cute cat picture -> CDN (many powerful computers) -> just 1 request: hey send me that cat picture? server: here you go! Today, there are many free or cheap CDN services available, which means if your site gets popular suddenly you can easily keep it running! This is great but caching can cause problems too! I updated my site yesterday but people are still seeing the old site! (Cache-Control header) French users are seeing the English site?!? Why? (Vary header) Next, we'll explain the HTTP headers your CDN or browser uses to decide how to do caching.
inline vs block
### HTML elements default to inline or block example inline elements: `<a> <span> <strong> <i> <small> <abbr> <img> <q> <code>` example block elements: `<p> <div> <ol> <ul><li> <h1> <h6> <blockquote> <pre>` ### inline elements are laid out horizontally text text text `<a>` text text text text `<span>` text text ### block elements are laid out vertically by default `<div>` `<p>` to get a different layout, use `display: flex` or `display: grid` ### inline elements ignore width & height* Setting the width is impossible, but in some situations, you can use `line-height` to change the height `*` img is an exception to this: look up "replaced elements" for more ### display can force an element to be inline or block `display` determines 2 things: 1. whether the element itself is `inline`, `block`, `inline-block`, etc 2. how child elements are laid out (`grid`, `flex`, `table`, `default`, etc) ### display: inline-block; TRY ME! `inline-block` makes a block element be laid out horizontally like an inline element inline text more inline text inline-block inline text
ask for specific feedback
[manager]
I used to ask for feedback like this: Illustration of two stick figures, both smiling. Person 1, the employee, has short curly hair, and person 2, the manager, doesn't have hair. person 1 (speech bubble): dо you have any feedback for me? person 2 (speech bubble): not right now! person 1 (thought bubble): is there something they're not telling me? person 2 (thought bubble): what specifically does she want feed back on? I've learned that I get WAY BETTER answers if I ask more specific questions! - what do you think of this design? - did I prioritize these things well? - should I be doing more or less of X? - do you have any concerns about PROJECT? - was that email clear? Bonus: asking specific questions forces me to actually think about which areas I might want to focus on.
ip
### ip (Linux only) lets you view + change network configuration. `ip OBJECT COMMAND` (`OBJECT` = addr, link neigh, etc) (`COMMAND` = add, show, delete, etc) Here are some ways to use it! ### ip addr list shows ip addresses your devices. Look for something like this: ``` 2: eth0: link/ether 3c:97... inet 192.168.1.170/24 ``` ### ip route list displays the route table. `default via 192.168.1.1` (my router) `169.240.0.0/16 dev docker` `...` to see all route tables: `ip route list table all` ### change your MAC address good for cafés with time limits (devil face emoji) ``` $ ip link set wlan0 down $ ip link set wlan0 address 3ca9f4d1:00:32 $ ip link set wlan0 up $ service network-manager restart ``` (or whatever you use) ### `ip link` network devices! (like eth0) ### `ip neigh` view/edit the ARP table ### `ip xfrm` is for IPsec ### `ip route get IP` what route will packets with $IP take? ### `--color` (the letters of "color" are in various rainbow colours) pretty colourful output! ### `-- brief` show a summary
how URLs work
`https://examplecat.com:443/cats?color=light%20gray#banana` - scheme (`https://`): Protocol to use for the request. Encrypted (`https`), insecure (`http`), or something else entirely (`ftp`). - domain (`examplecat.com`): Where to send the request. For HTTP(s) requests, the Host header gets set to this (`Host: example.com`) - port (`:443`): Defaults to 80 for HTTP and 443 for HTTPS. - path (`/cats`): Path to ask the server for. The path and the query parameters are combined in the request, like: `GET /cats?color=light%20gray HTTP/1/1` - query parameters (`color=light gray`): Query parameters are usually used to ask for a different version of a page ("I want a light gray cat!"). Example: `hair-short&color=black&name=mr%20darcy`. Hair is the name, short is the value, separated by & - URL (`encoding %20`): URLS aren't allowed to have certain special characters like spaces, @, etc. So to put them in a URL you need to percent encode them as % + hex representation of ASCII value. space is %20, % is %25, etc. - fragment id (`#banana`): This isn't sent to the server at all. It's used either to jump to an HTML tag (`<a id="banana"..>`) or by Javascript on the page.
namespaces
[containers]
### inside a container, things look different2` Illustration of a smiling stick figure with curly hair. Person: I only see 4 processes in `ps aux`, that's weird... ### why things look different: namespaces Illustration of a container, represented by a box with a smiley face Container: I'm in a different PID namespace so `ps aux` shows different processes! ### every process has 7 namespaces ``` $ lsns -p 273 NS TYPE 4026531835 cgroup 4026531836 pid 4026531837 user 4026531838 uts 4026531839 ipc 4026531840 mnt 4026532009 net ``` -p is the PID 4026532009 is the namespace ID you can also see a process's namespace with: `$ ls -1 /proc/273/ns` ### there's a default ("host" namespace) Person: "outside a container" just means "using the default namespace" ### processes can have any combination Container: I'm using the host network namespace but my own mount container namespace!
CSS units
### CSS has 2 kinds of units: absolute & relative absolute: - px - pt - pc - in - cm - mm relative - em - rem - vw - vh - % ### `rem` the root element's font size `1rem` is the same everywhere in the document. `rem` is a good unit for setting font sizes! ### `em` the parent element's font size ``` .child { font-size: 1.5em; } ``` Illustration of a box labelled "parent". Inside it is a box labelled, in larger text, "child". An arrow is pointing to the "child" text, labelled "font size is 1.5 x parent". ### O is the same in all units ``` .btn { margin: 0; } ``` also, `0` is different from `none`. `border: 0` sets the border width and `border: none` sets the style ### 1 inch = 96 px on a screen, 1 CSS "inch" isn't really an inch, and 1 CSS "pixel" isn't really a screen pixel. look up "device pixel ratio" for more. ### rem & em help with accessibility ``` .modal { width: 20rem; } ``` this scales nicely if the user increases their browser's default font size
centering in CSS
### center text with `text-align` ``` h2 { text-align: center; } ``` ### center block elements with `margin: auto` example HTML: ``` <div class="parent"> <div class="child"> </div> </div> ``` ### `margin: auto` only centers horizontally ``` .child { width: 400px; margin: auto; } ``` Illustration of a smaller box, labelled "child", inside a larger box. The child box is at the top of the larger (parent) box. An arrow pointing to the child box is labelled "not centered vertically!" ### vertical centering is easy with flexbox or grid A spiky box labelled "TRY ME" here's how with grid: ``` .parent { display: grid; place-items: center; } ``` and with flexbox: ``` .parent { display: flex; } .child { margin: auto; } ``` ### it's ok to use a flexbox or grid just to center one thing Illustration of a smaller box nested inside a larger box. The larger box is labelled ".parent `(display: grid)`" and the smaller box is labelled ".child (centered!)"
less
### less is a pager that means it lets you view (not edit) text files. man uses your pager (usually `less`) to display man pages ### many vim shortcuts work in less - `/` search - `n/N` next/prev match - `j/k` down / up a line - `m/'` mark/return to line - `g`(`gg` in vim)/`G` beginning /end of file ### less -r displays bash escape codes as colours try `ls --color | less -r` with `-r`: - `a.txt` - `a.txt.gz` (red, bold) without `-r` - `a.txt` ESCLOM ESC C01;31ma.txt.gz ESCCOM (ugh) or piped in text ### q quit (smiley face) ### v (lowercase) edit file in your $EDITOR ### arrow keys, Home / End, PgUp, Pg Dn work in less ### F press F to keep reading from the file as it's updated (like `tail -f`) press Ctrl+C to stop reading updates ### + `+` runs a command when less starts - `less +F` : follow updates - `less +G`: start at end of file - `less +20%`: start 20% into file - `less +/foo`: search for 'foo' right away
netcat
### `nc` lets you create TCP (or UDP) connections from the command line smiling stick figure, to box with smiley face: I hand wrote this HTTP request for you! ### `nc - l PORT` start a server! this listens on `PORT` and prints everything received network connection -> `nc` -> `stdout` ### `nc IP PORT` be a client! opens a TCP connection to `IP:PORT` (to send UDP use `-u`) ### `make HTTP request by hand` ``` printf 'GET / HTTP/ 1.1\r\nHost:example.com\r\n\r\n' | nc example.com 80 ``` type in any weird HTTP request you want! (smiley face) ### `send files` want to send a 100GB file to someone on the same wifi network? easy! ``` receiver: nc-1 8000 > file sender: cat file | nc YOUR_IP 8080 ``` happy stick figure with short curly hair: I love this trick It works even if you're disconnected from the internet!
HAVING
person: every user has a different email right? 1 query later... person, now sad: oh no This query uses `HAVING` to find all emails that are shared by more than one user: ``` SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1 ``` users: id 1, email asdf@fake.com id 2, email bob@builder.com id 3, email asdf@fake.com query output: email asdf@fake.com, `COUNT`(*) 2 `HAVING` is like `WHERE`, but with 1 difference: `HAVING` filters rows AFTER grouping and `WHERE` filters rows BEFORE grouping. Because of this, you can use aggregates (like `COUNT` (*)) in a `HAVING` clause but not with `WHERE`. Here's another `HAVING` example that finds months with more than $6.00 in income: ``` SELECT month FROM sales GROUP BY month HAVING SUM(price) > 6 ``` sales: month: Jan. item: catnip price: 5 month: Feb item: laser price: 8 month: March item: food price: 4 month: March item: food price: 3 query output: month: Feb month: March
sed
### sed is most often used for replacing text in a file `$ sed s/cat/dog/g file.txt` "cat"can be a regular expression ### change a file in place. with -i person: in GNU sed it's -i, in BSD sed, -i SUFFIX confuses me every time. ### Some more sed incantations... ### sed -n 12 p print 12th line -n suppresses output so only what you print with 'p' gets printed ### sed 5 d delete 5th line ### sed /cat/d delete lines matching /cat/ ### sed -n 5,30 p print lines 5-30 ### sed s+cat/+dog/+ ('+' can be any character) Use + as a regex delimeter person: way easier than escaping /s like s/cat\//dog\//! ### sed -n s/cat/dog/p only print changed lines. ### sed G double space a file (good for long error lines) ### sed /cat la dog' append 'dog' after lines containing 'cat' ### sed 'i 17 panda" insert "panda" on line 17
write for one person
ngrep
[tcpdump]
### like grep for your network (network is surrounded with glowy lines) `$ sudo ngrep GET` will find every plaintext HTTP GET request ### ngrep syntax ``` $ ngrep [options] [regular expression] [BPF filter] ``` ("regular expression" is what to search packets for) "BPF filter" use the same format as tcpdump uses! ### panel 3 Illustration of a smiling stick figure with curly hair. person: I started using `ngrep` when I was intimidated by tcpdump and I found it easier (heart) ### -d is for device which network interface to use. same as tcpdump's `-i` (try `-d any`!) ngrep ### -W byline prints line breaks as line breaks, not "\n". Nice when looking at HTTP requests ### -I file.pcap -O file.pcap read/write packets from/to a pcap file
du & df
floating point
### a double is 64 bits ``` 10011011 10011011 10011011 1011011 10011011 10011011 10011011 1011011 ``` (the first digit is the sign, the next 11 digits are the exponent, the rest is the fraction) ± 2 ^ E - 1023 x 1.frac That means there are 2^64 doubles. The biggest one is about 2^1023 ### weird double arithmetic 2^52 + 0.2 = 2^52 (the next number after 2^52 is 2^52 + 1) ### doubles get farther apart as they get bigger betweeen 2^n and 2^n+1 there are always 2^52 doubles, evenly spaced that means that the next double after 2^60 is 2^60 =64 (2^60 / 2^52) ### Javascript *only* has doubles (no integers!) > 2**53 9007199254740992 > 2**53+1 9007199254740992 (same number! uh oh!) ### panel 5 person with short spiky hair, baffled: doubles are scary and their arithmetic is weird! person with short curly hair, calm and reassuring: they're very logical! just understand how they work and don't use integers over 2^53 in Javascript <3
cgroups
[containers]
### processes can use a lot of memory process 1: I want 10 GB of memory process 2: me too! Linux: guys, I only have 16 GB total ### a cgroup is a group of processes every process in a container is in the same cgroup ### cgroups have memory/CPU limits Linux: you three get 500 MB of RAM to share, okay? ### use too much memory:| get OOM ("out of memory") killed process: I want 1 GB of memory Linux: NOPE your limit was 500 MB you die now! process, dead: oh no ### use too much CPU: get slowed down process: I want to use ALL THE CPU! Linux: you hit your quota for this 100ms period, you'll have to wait ### cgroups track memory & CPU usage Linux: that cgroup is using 112.3 MB of memory right now you can see it in `/sys/fs/cgroup`
shared libraries
### panel 1: Most programs on Linux use a bunch of C libraries. Some popular libraries: openssl (for SSL!) sqlite (embedded db?) zlib (gzip!) lib pcre (regular expressions!) libstdc++ (C++ standard library!) ### panel 2: There are 2 ways. to use any library: 1. Link it into your binary your code (big binary with lots of things!) | zlib | sqlite and 2. Use separate shared libraries your code zlib sqlite (all different files) ### panel 3: Programs like this your code | zlib | sqlite are called "statically linked" programs like this your code zlib sqlite are called "dynamically linked" ### panel 4: person 1: how can I tell what shared libraries a Program is using? person 2: Idd!! ```$ Idd /usr/bin/curl libz.so.1 => /lib/x86-64... lib resolv.so.2 =>.... libc.so.6 =>... ``` +34 more ### panel 5: person 1: I got a "library not found" error when running my binary?! person 2: If you know where the library is, try setting the ```LD_LIBRARY_PATH``` environment variable dynamic linker: ```LO-LIBRARY_PATH``` tells me where to look! ### panel 6: Where the dynamic linker looks 1. ```ODT. RPATH``` in your executable 2. ```LD- LIBRARY_PATH``` 3. ```DT- RUNPATH``` in executable 4. ```/etc/ld.so.cache.``` (run ```Idconfig -p``` to See contents) 5. ```/lib, /usr/lib```
take on hard projects
To wrap up, let's talk about one last wizard skill: confidence. When there's a hard project, sometimes I think: maybe someone better than me should work on this? and I imagine this magical human: - codes really fast - knows everything about every technology - understands the business well - great communicator - has time for the project - 20 years of experience But in programming: - we're changing the tech we use all the time. - every project is different, and it's rarely obvious how to do it. - there aren't many experts, and they certainly don't have time to do everything. So instead, we have me: - learns fast - works hard - 6 years of experience - good at debugging I figure "someone's gotta do this' write down a plan, and get started! A lot of the time, it turns out well. I learn something and feel a little more like a WIZARD.
anatomy of a HTTP response
### HTTP responses have: - a status code (200 OK! 404 not found!) - headers - a body (HTML, an image, JSON, etc) ### Here's the HTTP response from `examplecat.com/cat.txt`: ``` HTTP/1.1 200 OK status Accept-Ranges: bytes Cache-Control: public, max-age=0 Content-Length: 33 Content-Type: text/plain; charset=UTF-8 Date: Mon, 09 Sep 2019 01:57:35 GMT Etag: "ac5affa59f554a1440043537ae973790-ssl" Strict-Transport-Security: max-age=31536000 Age: 0 Server: Netlify [ASCII image of a cat, labelled "cat!" with a smiley face] ``` The first line, `HTTP/1.1 200 OK` is the status code. "200" is the status. The lines from `Accept-Ranges` to `Server` are the headers. The cat picture is the body. ### There are a few kinds of response headers: - when the resource was sent/modified: ``` Date: Mon, 09 Sep 2019 01:57:35 GMT Last-Modified: 3 Feb 2017 13:00:00 GMT ``` - about the response body: ``` Content-Language: en-US Content-Length: 33 Content-Type: text/plain; charset=UTF-8 Content-Encoding: gzip ``` - caching: ``` ETag: "ac5affa..." Vary: Accept-Encoding Age: 255 Cache-Control: public, max-age=0 ``` - security: (see page 25) ``` X-Frame-Options: DENY X-XSS-Protection: 1 Strict-Transport-Security: max-age=31536000 Content-Security-Policy: default-src https: ``` - and more: ``` Connection: keep-alive Accept-Ranges: bytes Via: nginx Set-Cookie: cat-darcy; HttpOnly; expires=27-Feb-2020 13:18:57 GMT; ```
anatomy of a http request
HTTP requests always have: - a domain (like `examplecat.com`) - a resource (like `/cat.png`) - a method (`GET`, `POST`, or something else) - headers (extra information for the server) There's an optional request body. `GET` requests usually don't have a body, and `POST` requests usually do. This is an HTTP 1.1 request for `examplecat.com/cat.png`. It's a `GET` request, which is what happens when you type a URL in your browser. It doesn't have a body. ``` GET /cat.png HTTP/1.1 Host: examplecat.com User-Agent: Mozilla... Cookie: ..... ``` `GET` = method (usually GET or POST) `/cat.png` = resource being requested `HTTP/1.1` = HTTP version `examplecat.com` = domain being requested, header `User-Agent: Mozilla`... = header `Cookie: .....` = header Here's an example POST request with a JSON body: ``` POST /add_cat HTTP/1.1 Host: examplecat.com content type of body Content-Type: application/json Content-Length: 20 ``` {"name": "mr darcy"} `POST` = method `Host: examplecat.com` = header `Content-Type: application/json` = content type of body, header `Content-Length: 20` = header `{"name": "mr darcy"}` = request body: the JSON we're the server sending to
why containers?
### there's a lot of container hype Illustration of two stick figures with medium-length straight hair. One has hearts in their eyes and a speech bubble that says "containers" with hearts around it, the other one has a thought bubble that says "???" Here are 2 problems they solve... ### problem: building software is annoying ``` $ ./configure $ make all ERROR: you have version 2.1.1 and you need at least 2.2.4 ``` ### solution: package all dependencies in a * container * stick figure with medium length straight hair and bangs, thinking: I ran the container and the build worked RIGHT AWAY?? is that allowed?? Many CI systems use containers. ### containers have their own filesystem This is the big reason containers are great. host OS, represented by a box with a smiley face: I'm running Ubuntu 19.04 container, also represented by a box with a smiley face: I'm running an old CentOS distribution host OS from 2014! ### problem: deploying software is annoying too sad stick figure with short curly hair: ugh my website is broken because I used a Python 3.6 feature and the server only has Python 3.5 ### solution: deploy a container server: I have the exact same version of everything as in development! no more silly errors! happy stick person with short curly hair: yay! I can get back to writing code!
dig
### dig makes DNS queries! ``` $ dig google.com ``` answers have 5 parts: - query: `google.com` - TTL `22` - clas: `IN` (for "internet", ignore this) - record type: `A` - record value: `172.217.13.110` ### dig TYPE domain.com this lets you choose which DNS record to query for! types to try: - NS - MX - TXT - CNAME - A (default) ### dig @ 8.8.8.8 domain (google DNS server) dig@server lets you pick which DNS server to query. Useful when your system DNS is misbehaving :) ### dig + trace domain traces how the domain gets resolved, starting at the root nameservers. if you just updated DNS, dig + trace should show the new record. ### dig -x 172.217.13.174 makes a reverse DNS query -find which domain resolves to an IP! Same as `dig ptr 172.217.13.174.in.addr.arpa` ### dig +short domain Usually dig pints lots of output! With +short it just prints the DNS record.
ps
my rules for simple JOINs
my rules for simple `JOIN`s Joins in SQL let you take 2 tables and combine them into one. On the left side of the page, there is an illustration of two small tables with the words "INNER JOIN" between them. One has columns labelled a, b, c, and d, the other has columns labelled x, y, and z. On the right side of the page, there is a big table with columns a, b, c, d, x, y, and z. Joins can get really complicated, so we'll start with the simplest way to join. Here are the rules I use for 90% of my joins: #### Rule I: only use LEFT JOIN and INNER JOIN There are other kinds of joins (`RIGHT JOIN`, `CROSS JOIN`, `FULL OUTER JOIN`), but of the time I only use `LEFT JOIN` and `INNER JOIN`. #### Rule 2: refer to columns as `table_name.column_name` You can leave out the table name if there's just one column with that name, but it can get confusing. #### Rule 3: Only include 1 condition in your join Here's the syntax for a LEFT JOIN: ``` table1 LEFT JOIN table2 ON <any boolean condition> ``` I usually stick to a very simple condition, like this: ``` table1 LEFT JOIN table2 ON table1.some_column = table2.other_column ``` #### Rule 4: One of the joined columns should have unique values If neither of the columns is unique, you'll get strange results like this: owners_bad: | name | age | |-------|-----| | maher | 16 | | maher | 32 | | rishi | 21 | `INNER JOIN` cats_bad: | name | age | |-------|------------| | maher | daisy | | maher | dragonsnap | | rishi | buttercup | (these are "bad" versions of the "owners" and "cats" tables that don't `JOIN` well) ``` owners_bad INNER JOIN cats_bad ON owners_bad.name = cats_bad.owner ``` | name | name | age | |-------|------------|-----| | maher | daisy | 16 | | maher | dragonsnap | 16 | | maher | daisy | 32 | | maher | dragonsnap | 32 | | rishi | buttercup | 21 |
http status codes
Every HTTP response has a status code. browser, optimistically: `GET /cat.png` (request) server, sadly: 404 not found (404 is the status code!) There are 50ish status codes but these are the most common ones in real life: 2xxs mean ★Success★ - 200 OK 3xx s aren't errors, just redirects to somewhere else - 301 Moved Permanently - 302 Found: temporary redirect - 304 Not Modified: the client already has the latest version, "redirect" to that 4xx errors are generally the client's fault: it made some kind of invalid request - 400 Bad Request - 403 Forbidden: API key/OAuth/something needed - 404 Not Found: we all know this one :) - 429 Too Many Requests: you're being rate limited 5xx errors generally mean something's wrong with the server. - 500 Internal Server Error: the server code has an error - 503 Service Unavailable: could mean nginx (or whatever proxy) couldn't connect to the server - 504 Gateway Timeout: the server was too slow to respond
sort & uniq
### `sort` sorts its inputs `$ sort name.txt` the default sort is alphabetical ### `sort -n`: numeric sort `sort order` (sad face): - 12 - 15000 - 48 - 6020 - 96 `sort -n` order (happy face): - 12 - 48 - 96 - 6020 - 15000 ### `sort -h`: human sort `sort -n` order (sad face): - 15 G - 30 M - 45 K - 200 G `sort -h` order (happy face): - 45 K - 30 M - 15 G - 200 G useful example: `du -sh * | sort -h` ### `uniq` removes duplicates before: - a - b - b - a - c - c after: - a - b - a - c (notice there are still 2 'a's! uniq only uniquifies adjacent matching lines ### `sort` + `uniq` = (heart) Pipe something to `sort | uniq` and you'll get a deduplicated list of lines! `sort -u` does the same thing. before `sort -u` (or `sort | uniq`): - b - a - b - a after: - a - b ### `uniq -c` counts each line it saw. Recipe: get the top 10 most common lines in a file: ``` $ sort foo.txt | uniq -c | sort -n | tail -n 10 ``` happy little stick figure with curly hair: I use this a lot!
how indexes make your queries fast
By default, if you run `SELECT * FROM cats WHERE name = 'mr darcy'` the database needs to look at every single row to find matches. database, sad: reading 30 GB of data from disk takes like 60 seconds by itself, you know! (at 500 MB/s SSD speed) Indexes are a tree structure that makes it faster to find rows. Here's what an index on the 'name' column might look like. a-z aaron to ahmed aaron to abdullah agnes to ahmed molly to nasir 60 children waseem to zahra database indexes are b-trees and the nodes have lots of children (like 60) instead of just 2. log <sub>60</sub> (1,000,000,000) = 5.06 This means that if you have 1 billion names to look through, you'll only need to look at maybe nodes in the index to find the name you're looking for (5 is a lot less than 1 billion!!!). person 1: are you saying indexes can make my queries 1,000,000x faster? person 2: yes! actually some queries. on large tables are basically impossible (or would take weeks) without using an index!
ss
### panel 1 two stick figures talking. the first one is bald and looks unhappy. the second one has short curly hair and is smiling. person 1: I can't start my server because it says something is using port 8080! person 2: 1. Use ss ("socket statistics") to find the process ID using the port 2. Kill the other process! ### * tuna, please! * `$ ss -tunapl` (the 'a' here doesn't do anything) This is my favourite way to use ss! It shows all the running servers. ### -n use numeric ports (80 not http) ### -P show PIDs using the socket ### TONS of information -i -m -o (-i is in a spiky bubble, -m is in a cloud bubble, and -o is in a heart) ### which sockets ss shows listening or connections (non-listening/established)? default: connections -1: listening -a: both which protocols? default: all -t: TCP -u: UDP -X: unix domain Sockets ### netstat netstat -tunapl and ss -tunap! do the same thing netstat is older and more complicated. If you're learning now I'd recommend ss!
nmap
### nmap lets you explore a network which ports are open? what hosts are up? security people use it a lot! ### find which hosts are up `$ nmap-sn 192.168.1.0/24` `168` is my home network `-sn` means "ping scan". (not `-s-n` it's `-sn`) just finds hosts by pinging every one, doesn't port scan ### aggressive scan `nmap -v -A scanme.nmap.org` `-A` = aggressive port, server version, even OS ### -Pn skip doing a ping scan and assume every host is up. good if hosts block ping (lots do) ### fast port scan `$ nmap -SS-F 192.168.1.0/24` just sends a SYN packet to check if each port is open. I found out which ports my printer has open! ``` 80 http 443 https 515 printer 631 ipp 9100 jetdirect ``` ### -F scan less ports: just the most common ones ### -T4 or -T5 scan faster by timing out more quickly ### ♡ check TLS version and ciphers ♡ check if your server still supports old TLS versions ``` $ nmap --script ssl-enum-ciphers -p 443 wizardzines.com ``` list all scripts with: `$ nmap --script-help '*'`
lsof
### `lsof` stands for list open files stick figure, distraught: somebody has that file open, WHO IS IT? `lsof`, represented by a rectangle with a goofy face: I can tell you! ### what `lsof` tells you for each open file: - `pid` - file type (regular? directory? FIFO? socket?) - file descriptor (FD column) - user - filename/socket address ### `-p PID` list the files `PID` has open ### `lof /some /dir` list just the open files in `some/dir` ### `-i` list open network sockets (sockets are files!) examples: - `-i -n P` (`-n` & `-P` mean "don't resolve host names/ports" [also `-Pni`])" - `-i : 8080` - `-i TCP` - `-i -s TCP:LISTEN` ### find deleted files `$lsof | grep deleted` will show you deleted files! You can recover open deleted files from `proc/<pid>/fd/<fd>` (`<pid>` is the process that opened the file) ### `netstat` another way to list open sockets on Linux is: `netstat -tunapl` (tuna, please!) On Mac, `netstat` has different args.
man page sections
man pages are split up into 8 sections 1 2 3 4 5 6 7 8 `$ man 2 read` means "get me the man page for `read` from section 2". There's both - a program called "read" - and a system call called "read" So `$ man 1 read` gives you a different man page from `$ man 2 read` If you don't specify a section, man will look through all the sections & show the first one it finds. ### man page sections 1. programs `$ man grep ` `$ man ls` 2. system calls `$ man sendfile `$ man ptrace 3. C functions `$man printf `$ man fopen 4. devices `$ man null` for /dev/null docs 5. file formats `$ man sudoers` for `/etc/sudoers` `$ man proc` files in `/proc`! 6. games not super useful. `$man sl` is my favourite from that section 7. miscellaneous explains concepts! `$man 7 pipe` `$ man 7 symlink` 8. sysadmin programs `$ man apt` `$ man chroot`
CSS selectors
### panel 1 Illustration of a smiling stick figure with curly hair. person: now that we have the right attitude, let's move on to how CSS actually works! ### div matches `div` elements `<div>` ### #welcome `#` matches elements by `id` `<div id="welcome">` ### .button matches elements by `class` `<a class="button">` ### div .button match every `.button` element that's a descendent of a `div` ### div.button match divs with class "`button`" `<div class="button">` ### div > .button match every `.button` element that's a direct child of a `div` ### .button, #welcome matches both `button` and `#welcome` elements ### a[href^="http"] match `a` elements with a `href` attribute starting with `http` ### a:hover matches `a` elements that the cursor is hovering over ### :checked matches if a checkbox or radio button is checked ### tr:nth-child(odd) match every other child of a parent element
cookies
Cookies are a way for a server to store a little bit of information in your browser. They're set with the `Set-Cookie` response header, like this: ### first request: server sets a cookie browser, represented by a box with a smiley face: `GET /my-cats` server, also represented by a box with a smiley face: ``` 200 OK Set-Cookie: user = b0rk; HttpOnly <response body> ``` (`user` is the name, `b0rk` is the value. `HttpOnly` is the cookie options (expiry goes here)) ### Every request after: browser sends the cookie back browser: ``` GET /my-cats Cookie: user= b0rk ``` server, thinking: oh, this is b0rk! I don't need to ask them who they are then! Cookies are used by many websites to keep you logged in. Instead of `user=b0rk` they'll set a cookie like `sessionid=long-incomprehensible-id`. This is important because if they just set a simple cookie like `user=b0rk`, anyone could pretend to be b0rk by setting that cookie! Designing a secure login system with cookies is quite difficult— to learn more about it, google "OWASP Session Management Cheat Sheet".
CSS testing checklist
Finally, it's important to test your site with different browsers, screen sizes, and accessibility evaluation tools. ### browsers - Chrome - Safari - Firefox - maybe others! ### sizes - small phone (300px wide) - tablet (~700px) - desktop (~1200px) ### accessibility - colour contrast - text size - keyboard navigation - works with a screen reader ### performance - fake a slow/high latency network connection! Illustration of a smiling stick figure with curly hair. person: the most important thing is to know your users! Check your analytics: if 10% of your users are using IE, test your site on IE!
copy on write
### On Linux, you start new processes using the fork() or clone() system call. calling fork creates a child process that's a copy of the caller ### the cloned process has EXACTLY the same memory. - same heap - same stack - same memory maps if the parent has 36B of memory, the child will too. ### copying all that memory every time we fork would be slow and a waste of RAM often processes call `exec` right after `fork`, which means they don't use the parent process's memory basically at all! ### so Linux lets them share physical RAM and only copies the memory when one of them tries to write process: I'd like to change that memory Linux: okay! I'll make you your own copy! ### Linux does this by giving both the processes identical page tables. (same RAM) but it marks every page as read only. ### when a process tries to write to a shared memory address: 1. there's a page fault= 2. Linux makes a copy of the page & updates the page table 3. the process continues, blissfully ignorant process, happily: It's just like I have my own copy
the OOM killer
[linux2]
CPU scheduling
[linux2]
HTTP/2
HTTP/2 is a new version of HTTP. Here's what you need to know: ### A lot isn't changing All the methods, status codes, request/response bodies, and headers mean exactly the same thing in HTTP/2. before (HTTP/1.1): ``` method: GET path: /cat.gif headers: - Host: examplecat.com - User-Agent: curl ``` after (HTTP/2): ``` method: GET path: /cat.gif authority: examplecat.com headers: - User-Agent: curl ``` one change: Host header => authority #### HTTP/2 is faster Even though the data sent is the same, the way HTTP/2 sends it is different. The main differences are: - It's a binary format (it's harder to ```tcpdump``` traffic and debug) - Headers are compressed - Multiple requests can be sent on the same connection at a time before (HTTP/1.1): → request 1 response 1 ← → request 2 response 2 ← after (HTTP/2): → request 1 → request 2 response 2 ← response 1 ← (out of order is ok) (one TCP connection) All these changes together mean that HTTP/2 requests often take less time than the same HTTP/1.1 requests. ### Sometimes you can switch to it easily A lot of software (CDNs, nginx) let clients connect with HTTP/2 even if your server still only supports HTTP/1.1. 1. Firefox to CDN: HTTP/2 request 2. CDN to your server: HTTP/1.1 request 3. your server to CDN: HTTP/1.1 response 4. CDN to Firefox: HTTP/2 response
tar
### panel 1 The tar file format combines many files into one file. a.txt b.txt dir/c.txt tar files aren't compressed by themselves. Usually you gzip them: .tar.gz or .tgz! ### panel 2: Usually when you use the 'tar' command, you'll run some incantation. To unpack a tar.gz, use: ```tar -xzf file.tar.gz`` person 1: what's xzf? person 2: let's learn! ### panel 3: -X is for extract into the current directory by default (change with -C) ### panel 4: -C is for create makes a new tar file! ### panel 5: -t is for list lists the contents of a tar archive ### panel 6: -f is for file which tar file to create or unpack ### panel 7: tar can compress / decompress -z gzip format (.gz) -j bzip2 format (.bz2) -J x2 format (.xz) & more! see the man page ### panel 8: putting it together list contents of a .tar.bz2: ```$tar tvf file.tar.bz2 ``` j = verbose create a .tar.gz: ```$ tar -c2f file.tar.gz dir/``` dir/ = files to go in the archive
unix domain sockets
### unix domain sockets are files ``` $ file mysock.sock socket ``` the file's permissions determine who can send data to the socket. ### they let 2 programs on the same computer communicate Docker uses Unix domain sockets, for example! process: GET/container (HTTP request) Docker: Here you go! ### There are 2 kind of unix domain sockets - `stream`: Like TCP! Lets you send a continuous stream of bytes - `datagram`: Like UDP! Let you send discrete chunks of data ### advantage 1 Lets you use file permission to restrict access to HTTP/database services! `chmod 600 secret.sock` This is why Docker uses a unix domain socket. (lock icon) evil process: run evil container Linux, nonplussed: permission denied ### advantage 2 UDP sockets aren't always reliable (even on the same compute). unix domain datagram sockets ARE reliable! And they won't reorder packets! Process: I can send data and I KNOW it'll arrive ### advantage 3 You can send a file descriptor over a unix domain socket. Useful when handling untrusted input files. process: here's a file I downloaded from sketchy.com (putting it into video decoder, a sandboxed process)
iptables
### panel 1: iptables lets you create rules to match network packets and accept/drop/modify them It's used for firewalls and NAT ### tables have chains. chains have rules. tables: - `filter` - `nat` - `mangle` - `raw` - `security` chains: - `INPUT` - `FORWARD` - `PREROUTING` - etc rules: like `-s 10.0.0.0/8 -j DROP` ### `iptables-save` This prints out all iptables rules. You can restore them with `iptables-restore` but it's also the easiest way to view all rules. ### `-j TARGET` Every iptables rule has a target (what to do with matching packets). Options: - `ACCEPT, DROP, RETURN` - the name of an iptables chain - an extension (man iptables.extensions) Popular: `DNAT, LOG, MASQUERADE` ### tables have different chains filter: `INPUT, OUTPUT, FORWARD` mangle: `INPUT, OUTPUT, FORWARD, PREROUTING, POSTROUTING` nat: `OUTPUT, PREROUTING, POSTROUTING` It helps to know when packets get processed by a given table/chain (eg locally generated packets go through `FILTER` and `OUTPUT` ### you can match lots of packet attributes - `-s`: src ip - `-d`: tcp/udp - `-p`: dst ip - `-i`: network interface - `-m`: lots of things! (bpf rules! gcroups! ICMP type! cp! conntrack state! more! For more, run `$ man iptables-extensions`
tshark
[tcpdump]
### Wireshark is an amazing graphical packet analysis tool ("Wireshark" has hearts around it) tshark is the command line version of Wireshark it can do 100x more things than tcpdump (heart) ### `-Y` filter which packets are captured ``` tshark -y 'http.request.method == "GET" ``` (uses Wireshark's SUPER POWERFUL filter language) ### `-d` is for "decode as" tells tshark what protocol to interpret a port as Example: 8888 is often HTTP! ``` $tshark -d tcp.port==8888,http ``` ### `-T FORMAT` Output format. My favourites: - json - fields: csv/tsv (for these above two you can specify which fields you want with `-e`) - text: default summary ### `-e` Which fields to output. Ex: ``` $tshark -T fields -e http.request.method -e http.request.uri -e ip.dst ``` (supports WAY more protocols than HTTP) ``` GET /foo 92.183.216.34 POST /bar 10.23.38.132 ``` ### `-r file.pcap` analyze packets from a file instead of the network ### `-w` (same as tcpdump) Write captured packets to a file. If `-w file.pcap` has permission issues, try `tshark -w - > file.pcap`
EXPLAIN
Sometimes queries run slowly, and `EXPLAIN` can tell you why! 2 ways you can use `EXPLAIN` in PostgreSQL: (other databases have different syntax for this) 1. Before running the query (`EXPLAIN SELECT... FROM ...`) This calculates a query plan but doesn't run the query. I always run EXPLAIN on a query. before running it on my production database. I won't risk overloading the database with a slow query! 2. After running the query `(EXPLAIN ANALYZE SELECT ... FROM...)` person 1: why is my query so slow? person 2: `EXPLAIN ANALYZE` runs the query and analyzes why it was slow Here are the EXPLAIN ANALYZE results from PostgreSQL for the same query run on two tables of 1,000,000 rows: one table that has an index and one that doesn't `EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1` unindexed table: ``` Seq Scan on users Filter: (id = 1) Rows Removed by Filter: 999999 Planning time: 0.185 ms Execution time: 179.412 ms ``` "Seq Scan" means it's looking at each row (slow!) indexed table: ``` Index Only Scan using users_id_idx on users Index Cond: (id = 1) Heap Fetches: 1 Planning time: (3.411 ms Execution time: 0.088 ms ``` the query runs 50 times faster with an index
cat
### cat concatenates files `$ cat myfile.txt` prints contents of myfile.txt| `$ cat *.txt` prints all .txt files put together! ### you can use cat as an EXTREMELY BASIC text editor: 1. Run $ cat > file.txt 2. type the contents (don't make mistakes (smiley face)) 3. press ctrl+d to finish ### cat -n prints out the file with line numbers! 1. Once upon a midnight.. 2. Over many a quaint. 3. While I nodded, nearly ### zcat cats a gzipped file! Actually just a 1-line shell script that runs `gzip -cd`, but easier to remember. ### tee `tee file.txt` will write. its stdin to both stdout and file.txt `stdin` > `tee a.txt` > `stdout` and `a.txt` ### how to redirect to a file owned by root `$ sudo echo "hi">> x.txt` this will open x.txt as your user, not as root, so it fails! `$ echo "hi" I sudo tee -a x.txt` will open x.txt as root (smiley face)
find
### find searches a directory for files `find /tmp -type d -print` `tmp`: directory to search `-type d`: which files `-print`: action to do with the files There are my favourite find arguments! ### -name/-iname case insensitive the filename! eg `-name '*.txt'` ### -path /-ipath search the full path! `-path /home/*/*.go` ### -type [TYPE] f: regular file d: directory 1: symlink and more! ### -maxdepth NUM only descend NUM levels when searching a directory. ### -Size O find empty files! Useful to find files you created by accident ### -exec COMMAND action: run COMMAND on every file found ### -print0 print null-separated filenames Use with xargs -O! ### -delete action: delete all files found ### locate The locate command searches a database of every file on your system. good: faster than find bad: can get out of date ### $ sudo updatedb updates the database
debugging is hard. take breaks.
[debugging]
segmentation faults
[memory linux2]
CSS isn't easy
### CSS seems simple at first ``` h2 { font-size: 22px; } ``` Illustration of a smiling stick figure with curly hair. person: ok this is easy! ### and it is easy for simple tasks image of a page with header and text underneath a layout like this is simple to implement! ### but website layout is not an easy problem image of a page with a logo, header, text, sidebar, and multiple images this needs to adjust to so many screen sizes! ### the spec can be surprising TRY ME! CSS 2.1: setting `overflow: hidden;` on an inline-block element changes its vertical alignment Illustration of a stick figure with curly hair, looking worried. person: weird! ### and all browsers have bugs Safari: I don't support flexbox for `<summary>` elements person: ok fine ### accept that writing CSS is gonna take time person: if I'm patient I can fix all the edge cases in my CSS and make my site look great everywhere!
page faults
### every Linux process has a page table `*` page table `*` | virtual memory address | physical memory address | |--------------------------|--------------------------| | 0x19723000 | 0x1422000 | | 0x19724000 | 0x1423000 | | 0x1524000 | not in memory | | 0x1844000 | 0x4a000 read only | ### some pages are marked as either: - read only - not resident in memory when you try to access a page that's marked "not resident in memory" it triggers a ! page fault ! ### what happens during a page fault? - the MMU sends an interrupt - your program stops running - Linux kernel code to handle the page fault runs Linux, represented by a box with a smiley face: I'll fix the problem and let your program keep running ### "not resident in memory" usually means the data is on disk! virtual memory: Illustration of a bar that is about 60% filled in purple, labelled "in RAM". The remaining 40% is filled in orange and labelled "on disk" Having some virtual memory that is actually on disk is how `swap` and `nmap` work ### how swap works 1. run out of RAM Illustration where RAM bar is completely full, disk bar still has lots of room 2. Linux saves some RAM data to disk Some of the RAM bar has now been moved over to the disk bar 3. mark those pages as "not resident in memory" in the page table There are arrows between the RAM and disk bars, and the empty portion of the RAM bar is labelled "not resident" 4. When a program tries to access the memory, there's a ! page fault ! 5. Linux: time to move some data back to RAM! Illustration of virtual memory and RAM, with arrows running back and forth between them 6. if this happens a lot, your program gets VERY SLOW program, sadly: I'm always waiting for data to be moved in & out of RAM
network protocols
[linux2]
bash parameter expansion
## panel 1: `${...}` is really powerful person: "it can do a lot of string operations, my favourite is search/replace ## panel 2: `${var}` same as `$var` ## panel 3: `${&#35;var}` length of the string or array `var` example: ``` $ x=panda $ echo ${#x} 5 ``` ### panel 4: `${var/bear/panda}` search & replace. Example: ``` $ x="I'm a bearbear! $ echo ${x/bear/panda} # replace 1 instance of 'bear' I'm a pandabear! $ echo ${x//bear/panda} # replace every instance of 'bear' I'm a pandapanda! ``` ### panel 5: `${var:-othervar}` use a default value if `var` is unset/null Example: ``` echo ${asdf:-some default value} ``` ### panel 6: `${var:?some error}` prints "some error" and exits if `var` is null or unset ### panel 7: `${var#pattern}` and `${var%pattern}` remove the prefix/suffix `pattern` from `var. Example: ``` $ x=motorcycle.svg $ echo "${x%.svg}" motorcycle ``` ### panel 8: `${var:offset:length}` get a substring of `var`. Example: ``` $ x='panda bear time' $ echo ${x:6:4} time ``` ### panel 9 person: "there are LOTS more, look up 'bash parameter expansion'!"
COALESCE
`COALESCE` is a function that returns the first argument you give it that isn't `NULL` ``` COALESCE(NULL, 1, 2) => 1 COALESCE(NULL, NULL, NULL) => NULL COALESCE(4, NULL, 2) => 4 ``` #### 2 ways you might want to use `COALESCE` in practice: 1. Set a default value: In this table, a `NULL` discount means there's no discount, so we use `COALESCE` to set the default to O: ``` SELECT name, price - COALESCE(discount, 0) as net_price FROM products ``` products: | name | price | discount | |----------|---------|------------| | orange | 200 | NULL | | apple | 100 | 23 | | lemon | 150 | NULL | query output: | name | net_price | |----------|-------------| | orange | 200 | | apple | 77 | | lemon | 150 | 2. Use data from 2 (or more!) different columns This query gets the best guess at a customer's state: ``` SELECT customer, COALESCE(mailing_state, billing_state, ip_address_state) AS state FROM addresses ``` (Mailing address, most accurate. If not, try billing address. As a last resort, use their IP address) | customer | mailing_state | billing_state | ip_address_state | |------------|-----------------|-----------------|--------------------| | 1 | Bihar | Bihar | Bihar | | 2 | NULL | Kerala | Kerala | | 3 | NULL | NULL | Punjab | | 4 | Gujarat | Punjab | Gujarat | | state | |------------| | Bihar | | Kerala | | Punjab | | Gujarat |
single quote your strings
In some SQL implementations (like PostgreSQL), if you double quote a string it'll interpret it as a column name: smiling stick figure with short curly hair: ``` SELECT * FROM cats WHERE name "ms piggy" ; ``` postgres: ``` error: column "ms piggy" does not exist ``` person, thinking: right, I need to use single quotes Here's a table explaining what different quotes mean in different SQL databases. "Identifier" means a column name or table name. Sometimes table names have special characters like spaces in them so it's useful to be able to quote them. | | single quotes ('miss piggy') | double quotes ("miss piggy") | backticks (`miss piggy`) | |---------------|--------------------------------|----------------------------------|----------------------------| | MySQL | string | string or identifier | identifier | | PostgreSQL | string | identifier | invalid | | SQLite | string | string or identifier | identifier | | SQL server | string | string or identifier | invalid | person: I always use single quotes for strings in SQL queries! It keeps me (and others!) from getting confused.
mmap
### what's mmap for? person 1: I want to work with a VERY LARGE FILE but it won't fit in memory person 2: You could try mmap! (mmap = "memory map") ### load files lazily with mmap When you mmap a file, it gets mapped into your program's memory. 2 TB file: 2 TB of virtual memory but nothing is ACTUALLY read into RAM until you try to access the memory. (how it works: page faults!) ### how to mmap in Python ``` import mmap f= open("HUGE.txt") mm= mmap.mmap (f. filenol), 0) ``` (this won't read the file from disk! Finishes ~instantly.) `print (mm C-1000:7)` this will read only the last 1000 bytes! ### sharing big files with mmap three processes: we all want to read the same file! mmap: no problem! Even if 10 processes mmap a file, it will only. be read into memory once ### dynamic linking uses mmap program: I need to use libc.so.6 (standard library) ld dynamic linker: you too eh? no problem. I always mmap, so that file is probably loaded into memory already. ### anonymous memory maps - not from a file (memory set to by default) - with `MAP.SHARED`, you can use them to share memory with a subprocess!
processes
## what's in a process? ### PID process: process #129 reporting for duty! ### USER and GROUP from offscreen: who are you running as? process: julia! ### ENVIRONMENT VARIABLES like `PATH`! you can set them with `$ env A=val./program` ### SIGNAL HANDLERS process 1: I ignore `SIGTERM`! process 2: I shut down safely! ### WORKING DIRECTORY Relative paths (./blah) are relative to the working directory! `chdir` changes it. ### PARENT PID PID 129 -> PID 147 -> PID 1 (`init`) is everyone's ancestor ### COMMAND LINE ARGUMENTS see them in `/proc/PID/cmdline` ### OPEN FILES very open file has an offset. process: I've read 8000 bytes of that one ### MEMORY heap! stack! shared libraries! the program's binary! mmaped files! ### THREADS sometimes one sometimes LOTS ### CAPABILITIES process 1: I have `CAP_PTRACE` process 2: well I have `CAP_SYS_ADMIN` ### NAMESPACES process: I'm in the host network namespace container process: I have my own namespace!
libc
[linux2]
when debugging, your attitude matters
[debugging]
overlay filesystems
### how layers work: `mount -t overlay` smiling stick figure with curly hair: can you combine these 37 layers into one filesystem? linux, represented by Tux the penguin: yes! Just run `mount —t overlay` with the right parameters! ### `mount -t overlay` has 4 parameters - `lowerdir`: list of read-only directories - `upperdir`: directory where writes should go - `workdir`: empty directory for internal use - `target`: the merged result ### `upperdir`: where all writes go when you create, change, or delete a file, it's recorded in the upperdir. usually this starts out empty and is deleted when the container exits ### lowerdir: the layers. read only. smiling stick figure with curly hair: you can run `$ mount -t overlay` inside a container to gee all the lower dirs that were combined to create its filesystem! ### here's an example! ``` $ mount -t overlay overlay -o lowerdir=/lower , upperdir=/upper , workdir=/work /merged $ ls /upper cat.txt dog.txt $ ls /lower dog. txt bird.txt $ ls /merged cat.txt dog.txt bird.txt ``` the merged version of dog.txt is the one from the upper directory
what's a shell?
[linux2]
debugging tips: check your assumptions
[debugging]
inodes
[linux2]
pipes
### panel 1 Sometimes you want to send the output of one process to the input of another ``` $ ls | wc -l 53 ``` (53 files!) ### a pipe is a pair of 2 magical file descriptors Illustration of a tube where the left side is labelled "IN" and the right side is labelled "OUT". There is an arrow between them. "IN" is labelled "pipe input" and "OUT" is labelled "pipe output" `stdin` -> `ls` -> IN -> OUT -> `wc` -> `stdout` ### panel 3 when `ls` does write (IN, "hi") `wc` can read it! read (OUT) -> "hi" Pipes are one way -> You can't write to OUT. ### Linux creates a buffer for each pipe `ls` -> IN [Buffer: data waiting to be read] OUT -> `wc` If data gets written to the pipefaste than it's read, the buffer will fill up. When the buffer is full, writes to IN will block (wait) until the reader reads. This is normal & ok (smiley face). ### what if your target process dies? -> `ls` -> [dead IN face] [dead OUT face] -> [dead `wc` face] -> If `wc` dies, the pipe will close and `ls` will be sent `SIGPIPE`. By default, `SIGPIPE` terminates your process. ### named pipes `$ mkfifo my-pipe` This lets 2 unrelated pocesses communicate through a pipe process 1, wearing a hat: ``` f=open(./my_pipe) f.write("hi!\n") ``` process 2, with curly hair: ``` f=open(./my_pipe) f.readline() <- "hi!" ```
learn one thing at a time
[debugging]
ping
### ping checks if you can reach a host and how long the host took to reply `$ping health.gov.au` output: `... time=253ms...` Australia is 17,000 km from me. at the speed of light it's still far! ### ping works by sending an ICMP packet and waiting for a reply ping: to: health.gov.au hello! health.gov.au: I'm here! ### myth: if a host doesn't reply to ping, that means it's down Some hosts never respond to ICMP packets. This is why traceroute shows "..." sometimes. ping: hello! host (thinking): not listening!! ### traceroute tells you the path a packet takes to get to a destination me → my ISP → NYC → Sacramento → Australia ### example traceroute `$ traceroute health.gov.au` `1: 192.168.1.1 3ms` ← router `2:...yul.ebox.ca 12 ms` ← ISP `...` `8: NYC4. ALTER.NET 24 ms` ← here the packet crossed the USA! from NYC- Sacramento! `9: SAC1.ALTER.NET 97 ms` `...` `16: health.gov.au 253ms` ← crossing the US takes time ### mtr like traceroute, but nicer output! try it! ### last panel look up how traceroute works (using TTLs!) it's simple + cool!
kill
### kill doesn't just kill programs you can send ANY signal to a program with kill! `$ kill -SIGNAL PID` (name or number) ### which signal kill sends name num ``` kill => SIGTERM 15 kill -9 => SIGKILL 9 kill -KILL => SIGKILL 9 kill -HUP => SIGHUP kill -STOP => SIGSTOP ``` ### kill -l lists all signals. 1. HUP 2. INT 3. QUIT 4. ILL 5. TRAP 6. ABRT 7. BUS 8. FPE 9. KILL 10. USR1 11. SEGV 12. USR2 13. PIPE 14. ALRM 15. TERM 16. STKFLT 17. CHLD 18. CONT 19. STOP 20. TSTP 21. TTIN 22. TTOU 23. URG 24. XCPU 25. XFS2 26. VTALRM 27. PROF 28. WINCH 24. POLL 30. PWR 31. SYS ### killall -SIGNAL NAME signals all processes called NAME for example: `$ killall firefox` useful flags: -w: wait for all signaled processes to die -i: ask before signalling ### pgrep prints PIDs of matching running programs pgrep fire matches firefox firebird NOT bash firefox.sh To search the whole command line (eg bash firefox.sh), use `pgrep -f` ### pkill same as pgrep, but signals PIDS found. Example: `$ pkill -f firefox` I use pkill more than killall these days.
terminals
[linux2]
a branch is a pointer to a commit
[git]
A branch in git is a pointer to a commit SHA master → 2e9fab awesome-feature → 3bafea fix-typo → 9a9a9a Here's some proof! In your favourite git repo, run this command: ```$ cat .git/refs/heads/master``` "master" is just a text file with the commit SHA master points at! Understanding what a branch is will make it WAY EASIER to fix your branches when they're broken: you just need to figure out how to get your branch to point at the right commit again! 3 main ways to change the commit a branch points to: - ```git commit``` will point the branch at the new commit - ```git pull``` will point the branch at the same commit as the remote branch - ```git reset COMM T_SHA``` will point the branch at ```COMM T_SHA```
file buffering
### panel 1 person with short curly hair, distressed and surrounded by question marks: I printed some text but it didn't appear on the screen. why?? cheerful person with long straight hair: time to learn about flushing! ### On Linux, you write to files & terminals with the system call <3 `write` <3 process, represented by a box with a smiley face: please write "I <3 cats" to file #1 (`stdout`) Linux, also represented by a box with a smiley face: okay! ### I/O libraries don't always call `write` when you print `printf("I <3 cats");` `printf`: I'll wait for a newline before actually writing This is called buffering and it helps save on syscalls. ### 3 kinds of buffering (defaults vary by library) 1. None. this is the default for `stderr` 2. Line buffering (write after newline). The default for terminals. 3. "full" buffering (write in big chunks). The default for files and pipes. ### flushing (little picture of a toilet) To force your I/O library to write everything it has in its buffer right now, call `flush`~ `stdio`: I'll call `write` right away! ### when it's useful to flush - when writing an interactive prompt! Python example: `print ("password: ", flush=True)` - when you're writing to a pipe/socket program: no seriously, actually write to that pipe please
hash functions
openssl
### openssl is a tool for doing *SSL things* (aka TLS) - inspect certificates - create CSRs - sign certificates It uses the OpenSSL library (or LibreSSL) ### inspect a certificate ``` $ openssl x509 -in FILE.crt -noout -text ``` This works for files ending in .crt or .pem! Try it out: you probably have certs in `usr/share/ca-certificates` ### look at a website's certificate ``` $openssl s_client -showcerts -connect google.com:443 ``` happy little stick figure with curly hair: pipe this to openssl x509 to parse! ### panel 4 certificate authority, represented by a box with a neutral expression: please upload a CSR person: a WHAT?! to get a SSL cert for your website, you need to make a file called a "certificate signing request." ### make a CSR ``` $ openssl req -new -sha256 -key FILE.key -out FILE.csr ``` make a `FILE.key` with `$ openssl genrsa` ### `md5/sha1/sha25b/sha512` Not quite SSL but useful: `$openSSL md5 FILE` computes the md5sum of FILE. Same for other digests `$ openssl LIST -digestcommands` shows all supported digests.
bash redirects
### panel 1: unix programs have 1 input and 2 outputs When you run a command from a terminal, the input & outputs go to/from the terminal by default. Picture of a program (represented by a box with a smiley face) with 1 arrow coming in and 2 arrows out. The arrows are numbered 0, 1, and 2, and there's a comment: "each input/output has a number, its "file descriptor") **arrow 0 (coming into program): `<` redirects stdin** `wc < file.txt` and `cat file.txt | wc` both read `file.txt` to wc's stdin ``` wc < file.txt cat file.txt ``` **arrow 1 (coming out of program): `>` redirects stdout** ``` cmd > file.txt ``` **arrow 2 (coming out of program): `2>` redirects stderr** ``` cmd 2> file.txt ``` ### panel 2: `2>&1` redirects stderr to stdout ``` cmd > file.txt 2>&1 ``` Illustration of cmd, represented by a box with a smiley face. There is one arrow, labelled "sdout(1)", leading to a box labelled "file.txt". There is a second arrow coming out of cmd, labelled "stderr(2)". Then, there's a squiggly third arrow, labelled "2>&1", that leads from "stderr(2)" to "file.txt". ### panel 3: `/dev/null` your operating system ignores all writes to `/dev/null` ``` cmd > /dev/null ``` picture of stdout going to a trash can (`/dev/null`) and stderr still going to the terminal ### panel 2: sudo doesn't addect redirects your bash shell opens a file to redirect to it, and it's running as you. So ``` $ sudo echo x > /etc/xyz ``` won't work. do this instead: ``` $ sudo echo x | tee /etc/xyz ```
why updating DNS is slow
[dns]
shellcheck
### shellcheck finds problems with your shell scripts `$ shellcheck my-script.sh` shellcheck: oops, you can't use in an `if [ ... ]`! ### it checks for hundreds of common shell scripting errors shellcheck: hey, that's a bash- only feature but your script starts with `#!/bin/sh` ### every shellcheck error has a number (like "SC2013") and the shellcheck wiki has a page for every error with examples! I've learned a lot from the wiki. ### it even tells you about misused commands shellcheck: hey, it looks like you're not using `grep` correctly here person: wow I'm not! thanks! ### your text editor probably has a shellcheck plugin shellcheck: I can check your shell scripts every time you save! ### basically, you should probably use it bash has too many weird edge cases for me to remember, I love that shellcheck can help me out!
if you understand a bug, you can fix it
[debugging]
sockets
### networking protocols are complicated book: TCP/IP Illustrated, Volume 1, by Stevens (600 pages) person: what if I just want to download a cat picture? ### Unix systems have an API called the "socket API" that makes it easier to make network connections Unix: you don't need to know how TCP works. I'll take care of it! ### here's what getting a cat picture with the Socket API looks like: 1. Create a socket: `fd= socket(AF_INET, SOCK-STREAM...)` 2. Connect to an IP/port: `connect (fd, 12.13.14.15:80)` 3. Make a request: `write (fd, "GET /cat.png HTTP/I.I...)` 4. Read the response: `cat-picture= read (fd...)` ### Every HTTP library uses sockets under the hood `$curl awesome.com` Python: `requests.get("yay.us")"` (sockets) person: oh, cool, I could write an HTTP library too if I wanted`*`. Neat! `*` SO MANY edge cases though! :) ### AF_INET? What's that? AF-INET means basically "internet socket": it lets you connect to other computers on the internet using their IP address. The main alternative is AF-UNIX ("unix domain socket") for connecting to programs on the same computer. ### 3 kinds of internet (AF INET) sockets: 1. `SOCK_STREAM` = TCP (curl uses this) 2. `SOCK_DGRAM` = UDP (dig (DNS) uses this) 3. `SOCK.RAW` = just let me send IP packets. I will implement my own protocol. (ping uses this)
file descriptors
### Unix systems use integers to track open files Process, represented by a box with a smiley face: Open `foo.txt` kernel, also represented by a box with a smiley face: okay! that's file #7 for you. these integers are called file descriptors ### `lsof` (list open files) will show you a process's open files `$lsof -P 4242` (4242 is the PID we're interested in) FD NAME ``` 0 /dev/pts/tty1 1 /dev/pts/tty1 2 pipe: 29174 3 /home/bork/awesome.txt 5 /tmp/ ``` (FD is for file descriptor) ### file descriptors can refer to: - files on disk - pipes - sockets (network connections) - terminals (like `xterm`) - devices (your speaker! `/dev/null`!) - LOTS MORE (`event fd`, `inotify`, `signalfo`, `epoll`, etc.) little tiny smiling stick figure: not EVERYTHING on Unix is a file, but lots of things are ### When you read or write to a file/pipe/network connection you do that using a file descriptor person: connect to google.com OS: ok! fd is 5! person: write GET / HTTP/1.1) to fd #5 OS: done! ### Let's see how some simple Python code works under the hood: Python: ``` f = open ("file.txt") f. read lines() ``` Behind the scenes: Python program: open file.txt OS: ok! fd is 4 Python program: read from file #4 OS: here are the contents! ### (almost) every process has 3 standard FDs: - `stdin`: 0 - `stdout`: 1 - `stderr`: 2 "read from stdin" means "read from the file descriptor O" (could be a pipe or file or terminal)
HTTP response headers
### Age how many seconds response has been cached ```Age: 355``` ### Date when response was sent ```Date: Mon, 09 Sep 2019...``` ### Last-Modified when content was last modified (not always accurate) ### ETag Version of response body ```Etag: "ac5affa.."``` ### Cache-Control various caching ```settings Cache-Control: max-age=300``` ### Vary request headers that response will vary based on ### Via added by proxy servers ```Via: nginx``` ### Expires The response is stale and should be re-requested after this time. ### Connection "close" or "keep-alive" Whether to keep the TCP connection open ### Set-Cookie Sets a cookie. ```Set-Cookie: name=value; HttpOnly``` ### Access-Control-* Called CORS headers. These allow cross-origin requests. ### Content-Type MIME type of body ```Content-Type: text/plain``` ### Content-Length length of body in bytes ```Content-Length: 33``` ### Content-Language Language of body ```Content-Language: en-US``` ### Content-Encoding Whether body is compressed ```Content-Encoding: gzip``` ### Location URL to redirect to ```Location: /cat.png``` ### Accept-Ranges Whether Range request header is supported for this resource
tcpdump
[bpf]
### tcpdump lets you view network packets being sent and received happy little stick figure: it's not the easiest to use but it's usually installed (heart) ### `-n` don't try to resolve IP addresses/ports to DNS/port names. makes it run faster ### `i wlan0` Which network interface to capture packets on person: I often use "`-i any`" to make sure I'm not missing any packets! ### `-w file.pcap` Write packets to a file for later analysis with tcpdump/tshark/wireshark/another tool pcap is short for "packet capture" ### `-A` print packet contents, not just headers. Nice if you want to quickly see what a few packets contain. ### `-c 1000000` Only capture a limited count of packets. person: I use it with `-w` so I don't accidentally fill up my disk!
head & tail
### head shows the first you 10 lines of a file. if you pipe a program's output to head, the program will stop after printing 10 lines (it gets sent SIG PIPE) ### tail tail shows the last 10 lines! `tail -f FILE` will follow: print any new lines added to the end of FILE. Super useful for log files! ### -n NUM -n NUM (either head or tail) will change the # lines shown NUM can also be negative. Example: `$ head -n 5 file.txt` will print all lines except the last 5 ### -C NUM show the first /last NUM bytes of the file `$ head -c 1k` will show the first 1024 bytes ### tail --retry keep trying to open file if it's inaccesible ### tail --pid PID stop when process PID stops running (with `-f`) ### tail --follow-name Usually `tail -f` will follow a file descriptor. `tail --follow-name FILENAME` will keep following the same file name, even if the file descriptor changes
mitmproxy
### panel 1: phone: ??? server: ??? small sad stick figure, thinking: what is my phone saying about me? looks like it's encrypted ### panel 2: mitmproxy can proxy connections from your laptop or phone and let you see the contents. It even works with encrypted connections. An illustration showing a phone, a server (represented by a box with a smiley face), and mitmproxy between them. There are arrows going to and from mitmproxy and the phone and server. ### how you use it 1. install mitmproxy root CA on your laptop/phone 2. run `mitmweb` (web UI version) on computer 3. tell the program/phone to proxy through mitmproxy ### how it works phone: wizardzines.com certificate plz mitmproxy: yes I am wizardzines.com phone: sounds legit, this CA I trust [the fake mitmproxy CA you installed] says that certificate is valid ### some apps pin a cert makes mitmproxy not work, look up "trust killer" to get around that ### script it in Python modify requests/responses arbitrarily ### other similar tools (not all are free, though) - charles proxy - burp suite - fiddler
oh shit! I accidentally committed to the wrong branch!
1. Check out the correct branch `git checkout correct-branch` `cherry-pick` makes a new commit with the same changes as *, but a different parent 2. Add the commit you wanted to it `git cherry-pick COMMIT_ID` ↑ use '`git log wrong-branch`' to find this 3. Delete the commit from the wrong branch. ``` git checkout wrong-branch git reset --hard HEAD^ ``` be careful when running '`git reset --hard!`' always run '`git status`' first to make sure there aren't uncommitted changes and '`git stash`' to save them if there are
the same-origin policy
[cors]
async functions
socat
### `socat` lets you proxy basically any 2 things Diagram of a star, a heart, and the word "`socat`". There are arrows going to and from `socat` to the heart and star. the basic syntax: `socat THING1 THING2` ### socat supports: - tcp sockets - unix domain sockets - pipes - SSL sockets - files - processes - UDP sockets - ... and MORE! ### order doesn't matter `socat THING1 THING2` is the same as `socat THING2 THING1` ### expose a unix domain socket on port 1337 ``` socat TCP-LISTEN:1337 UNIX-CONNECT:/path ``` ### proxy from local HTTP port to remote server ``` socat TCP-LISTEN:1337 TCP:domain.com:80 ``` ### `-v` write all transferred data to `stderr` happy little stick figure: useful for debugging!
assembly
[linux2]
top
### top a live-updating summary of the top users of your system's resources: sad little stick person: who's using all my memory top: chrome, obv! let's explain some numbers in top! ### load average 3 numbers that roughly reflect demand for your CUs on the system in the last 1, 5, and 15 minutes. if it's higher than the # of CPUs you have, that's often bad! ### memory 4 numbers: total/free/used/cached One perhaps unexpected thing: total is NOT free+ used! total = free + used + cached (filesystem cache) ### % CPU confused stick figure with hair sticking out: 350%? what? this column is given as the % of a single core. If you have 4 cores, this can go up to 400%! ### RES this column is the "resident set size", aka how much RAM your process is using. SHR is how much of the RES is shared with other processes ### htop a prettier & more interactive version of top Illustration of a graph showing different users of system resources with graphs showing used and cached memory in different colours.
TLS certificates
To establish an HTTPS connection to examplecat.com, the client needs proof that the server is `examplecat.com`. Browser, represented by the Firefox logo: hey I want examplecat.com Server, represented by a box with a smiley face: here's proof that I'm examplecat.com. (the proof is called a certificate.) A TLS certificate has: - a set of domains it's valid for (eg `examplecat.com`) - a start and end date (example: July 1 2019 to Oct 1 2019) - a secret private key that only the server has (this is the only secret part, the rest is public) - a public key to use when encrypting - a cryptographic signature from someone trusted A box that reads "wizardzines.com, Jul 1 - Oct 1 2019, <public key> with a logo that says Let's Encrypt Approved" The trusted entity that signs the certificate is called a Certificate Authority (CA) and they're responsible for only signing certificates for a domain for that domain's owner. smiling stick figure with short spiky hair: will you sign this certificate for examplecat.com? let's encrypt, represented by a box with a smiley face: lol no I checked `examplecat.com/.well-known/acme-challenge` and you don't own that domain. When your browser connects to `examplecat.com`, it validates the certificates using a list of trusted CAs installed on your computer. These CAS are called "root certificate authorities". browser, thinking: 1. the examplecat.com server is signed by Let's Encrypt 2. Let's Encrypt's cert is signed by IdenTrust 3. IdenTrust is on my trusted list. 4. This is okay!
capabilities
### we think of root as being all-powerful... The following items are in spiky bubbles: - edit any file - change network config - spy on any program's memory ### ... but actually to do "root" things, a process needs the right ★capabilities★ Process, represented by a box with a smiley face: I want to modify the route table! Linux, represented by a penguin: you need CAP_NET_ADMIN! ### there are dozens of capabilities Illustration of a smiling stick figure with curly hair. Person: `$ man capabilities` explains all of them but let's go over 2 important ones! ### CAP_SYS_ADMIN lets you do a LOT of things. avoid giving this if you can! ### CAP_NET_ADMIN allow changing network settings ### by default containers have limited capabilities Process: can I call process_vm_ready? Linux: nope! you'd need CAP_SYS_PTRACE for that! ### $ getpcaps PID print capabilities that PID has ### getcap / setcap system calls: get and set capabilities!
HTTP security headers
These are headers your server can set. They ask the browser to protect your users' data against attackers in different ways: ### Content-Security-Policy (often called CSP) Only allow CSS/Javascript from certain domains you choose to run on your website. Helps protect against cross-site-scripting (aka XSS) attacks. ### Referrer-Policy Control how much information is sent to other sites in the Referer header. Example: `Referrer-Policy: no-referrer`. (spelling is inconsistent with Referer header :( ) ### Strict-Transport-Security (often called HSTS) Require HTTPS. If you set this the client (browser) will never request a plain HTTP version of your site again. Be careful! You can't take it back! ### Expect-CT Certificate Transparency (CT) is a system that can help find malicious SSL certificates issued for your site. This header gives the browser a URL to use to report bad certificates to you. ### X-XSS-Protection Another way to protect against XSS attacks. Not supported by all browsers, `Content-Security-Polcy` is more powerful.
questions to ask about your data
It's really easy to make incorrect assumptions about the data in a table: stick figure with short curly hair, smiling: every hospital patient has a doctor right? same person, three hours later, sad: why is everyone from May 2013 missing a doctor?? ### Some questions you might want to ask: - Does this column have `NULL` or `0` or empty string values? person, thinking: some patients have `NULL` names, that's good to know - How many different valves does this column have? person, thinking: huh there are 3000 extra doctors in the system who never worked at the hospital, I should filter them out - Are there duplicate values in this column? person, thinking: sometimes a doctor has 2 appointments at the same time, that shouldn't happen - Does the id column in table A always have a match in table B? person, thinking: why are there 213 doctor IDs with no match in the doctors table?! A lot of these can also be enforced by `NOT NULL` or `UNIQUE` or `FOREIGN KEY` constraints on your tables.
threads
### Threads let a process do many different things at the same time process: thread 1: I'm calculating ten million digits of π! so fun! thread 2: I'm finding a REALLY BIG prime number! ### threads in the same process share memory thread 1: I'll write some digits of to π O x 129420 in memory thread 2: uh oh! that's where I was putting my prime numbers. ### and they share code calculate-pi find-big-prime-number but each thread has its own stack and they can be run by different CPUs at the same time CPU 1: π thread CPU 2: primes thread ### sharing memory can cause problems (race conditions!) at the same time: memory: 23 thread 1: I'm going to add 1 to that number! thread 2: I'm going to add 1 to that number! RESULT: 24 WRONG. Should be 25! ### why use threads instead of starting a new process? a thread takes less time to create. sharing data between threads is very easy. But it's also easier to make mistakes with threads. thread 1: you weren't supposed to CHANGE that data!
highlight the main ideas
ethtool
### ethtool is for people who need to manage physical networks server, represented by a sad rectangle: why no internet?? smiling stick figure with short curly hair: oops! it would help if you ethernet cable was plugged in! ### `ethtool eth0` (name of network interface) this tells you: - is it even connected? ("link detected") - speed - lots more ### `--show-offload --offload` your network card can do a lot for you! Like computing checksums. This is called "offloading". This lets you see/change configured offloads. ### `--identify INTERFACE` blink the light on the ethernet port. good if you have multiple ports! and cute. (heart) ### `-S INTERFACE` show statistics like bytes sent. works for wifi interfaces too. ### `-s` change speed/duplex/other settings of an interface. `$ ethtool - eth0 speed 100` ### `-i INTERFACE` show firmware info ### `iw dev wlan0 link` ethtool is mostly for Ethernet. To see the speed (and more) of a wireless connection, use iw.
container kernel features
### containers use these Linux kernel features "container" doesn't have a clear definition, but Docker containers use all of these features. ### pivot_root set a process's root directory to a directory with the contents of the container image ### cgroups limit memory/CPU usage for a group of processes Linux, represented by a box with a smiley face: only 5OO MB of RAM for you! ### namespaces allow processes to have their own: - network - PIDs - hostname - mounts - users - + more ### capabilities security: give specific permissions ### seccomp-bpf security: prevent dangerous system calls ### overlay filesystems this is what makes layers work! Sharing layers saves disk space & helps containers start faster
what's your manager's job?
Understanding a little about your manager's job helps you work well with them! Some things your manager is responsible for: Each of these items is enclosed in a thought bubble with an illustration. ### make sure the team is doing important projects Illustration of a smiling stick figure (the manager). manager: X is a priority this quarter! ### keep projects on track Illustration of two smiling stick figures, one with medium length straight hair (the CEO) and another one with no hair (the manager). CEO: what's the status of x project? manager: [needs to answer] ### communicate with other teams Illustration of two smiling stick figures, one with curly hair (person on other team) and another one with no hair (the manager). person on other team: we're doing x manager: our teams should collaborate on that! ### help team members grow Illustration of a smiling stick figures with curly hair. person: I learned so much this year!
debugging tip: code one thing at a time
[debugging]
CSS specificity
### different rules can set the same property which one gets chosen? ``` a:visited { color: purple; font-size: 1.2em; ``` ``` } #start-link { color: orange; } ``` ### CSS uses the "most specific" selector that matches an element In our example, the browser will use `color: orange` because IDs (like `#start-link`) are more specific than pseudoclasses (like `:visited`) ### TRY ME! CSS can mix properties from different rules it'll use this font size: ``` a:visited { color: purple; font-size: 1.2em; ``` but use this color because `#start-link` is more specific: ``` } #start-link { color: orange; } ``` ### how CSS picks the "most specific" rule a selector with element names: ``` body div span a { color:red; } ``` loses to a selector with `.classes` or `:pseudoclasses`: ``` .sidebar .link { color: orange; } ``` loses to a selector with an `#id`: ``` #header a { color: purple; } ``` loses to an inline style: ``` style="color: green; ``` loses to an `!important` rule: ``` "color: blue !important; ``` (`!important` is very hard to override, which makes life hard for your future self!)
tc
### tc is for "traffic control" humanoid traffic light, its hand held up as though directing traffic: packets! stop/slow down/go the other way! great for simulating network problems! ### make your internet slow ``` $ sudo tc qdisc add dev wlp3s0 root netem delay 500ms ``` (delay packets by 500ms) and fast again: ``` $ sudo tc qdisc del dev wlp3s0 root netem ``` ### netem rules netem ("network emulator") is a part of tc that lets you: - drop - duplicate - delay - corrupt packets. see the man page `$ man netem` ### make your brother's internet slow Have a Linux router? You can configure tc on it to make your brother's internet slower than yours. google "tc QoS" for a start. ### show current tc settings ``` $ tc qdisc show $ tc class show dev DEV $ tc filter show dev DEV ``` ### panel 6 smiling stick figure: `tc` can do 10 million more things! This is just the beginning!
containers = processes
### a container is a group of Linux processes Illustration of a smiling stick figure with curly hair. person: on a Mac, all your containers are actually running in a Linux virtual machine ### panel 2 person: I started 'top' in a container. Here's what that looks like in ps: - outside the container ``` $ ps aux grep top USER PID START COMMAND root 23540 20:55 top bork 23546 20:57 top ``` - inside the container ``` $ ps aux | grep top USER PID START COMMAND root 25 20:55 top ``` (`root 23540 20:55 top` and `root 25 20:55 top` are the same process!) ### container processes can do anything a normal process can... Illustration of a smiling stick figure with curly hair, and Linux, represented by its penguin mascot person: I want my container to do X Y Z W! Linux: sure! your computer, your rules! ### but usually they have restrictions (there are drawings of locks on either side of the word "restrictions") Illustration of a container, represented by a box with a smiley face. Around it are arrows with the following labels: - different PID namespace - different root directory - cgroup memory limit - limited capabilities - not allowed to run some system calls ### the restrictions are enforced by the Linux kernel Linux: NO, you can't have more memory! person: on the next page we'll list all the kernel features that make this work!
writing code with bugs is normal
[debugging]
NULL surprises
NULL isn't equal (or not equal!) to anything in SQL (x = NULL and x != NULL are never true for any x). This results in 2 behaviours that are surprising at first: ### Surprise! x= NULL doesn't work fish name: NULL owner: bob name: nemo owner: ahmed ``` SELECT * FROM fish WHERE name = NULL ``` no results! You need to use `x IS NULL` instead. works name IS NULL name IS NOT NULL doesn't work name = NULL name != NULL surprise! name != 'betty' doesn't match NULLs fish name: NULL owner: bob name: nemo owner: ahmed ``` SELECT FROM fish WHERE name != 'betty' ``` name: NULL owner: bob To match NULLS as well, I'll often write something like `WHERE name = 'betty' OR name IS NULL` instead. ### more surprising truths More operations with NULL which might be surprising: 2 + NULL => NULL NULL * 10 => NULL CONCAT('hi', NULL) => NULL NULL = NULL => NULL (NULL isn't even equal to itself!) 2 = NULL => NULL 2 != NULL => NULL
on surviving performance reviews
[manager]
Performance reviews can be really stressful. Illustration of two stick figures. One has no hair and is smiling, the other one has short curly hair and looks unhappy. person 1: here's the self assessment form to fill out! person 2 (thought bubble): AWESOME PLAN: procrastinate for 2 weeks and then do it at the last minute in a panic! Here's what I've been doing for the last year or so, which has helped! About a month before performance review season comes around, I'll compile a HUGE DOCUMENT with: - every project I did in the last year - the project's goals & results - cool graphs/metrics that show it was a success - what my contributions to the project were - people I've mentored (eg an intern!) - project plans & documentation I've written and send it to my manager. My manager's reaction: Illustration of a smiling stick figure with no hair. THANK YOU! Having all this information makes it really easy for me to explain why your work is so great!
binary search
understand the bug before trying to fix it
[debugging]
CSS grid areas
### panel 1 Illustration of a smiling stick figure with curly hair. person: CSS grid is a big topic, so I just want to show you one of my favourite grid features: areas! ### let's say you want to build a layout Illustration of a long rectangle, labelled "header". Underneath it are two rectangles, side by side, labelled "sidebar" and "content" ### `grid-template-areas` lets you define your layout in an almost visual way ``` grid-template-areas: "header header" "sidebar content" ``` I think of it like this: Illustration of a two rectangles side-by-side, both labelled "header". Underneath them are two rectangles, side by side, labelled "sidebar" and "content" ### write your HTML ``` <div class="grid"> <div class="top"></div> <div class="side"></div> <div class="main"></div> </div> ``` ### 2. define the areas ``` .grid { display: grid; grid-template-columns: 200px 800px; grid-template-areas:"header header" "sidebar content"; } ``` ### 3. set grid-area ``` .top {grid-area: header} .side {grid-area: sidebar} .main {grid-area: content} ``` result: Illustration of a long rectangle, labelled "`.top`". Underneath it are two rectangles, side by side, labelled "`.side`" and "`.main`"
HTTP caching headers
- ETag reponse header - If-None-Match request header - If-Modified-Since is similar to If-None-Match but with Last-Modified instead of ETag These 3 headers let the browser avoid downloading an unchanged file a second time. ### initial request: browser, thinking: this page needs cats.css, let's request it! browser: GET cats.css server: 200 OK ETag:"ab23ef" (hash of the content) <the css file> browser, thinking: OK, I'll save version ab23ef of cats.css in case I need it later ### the next day: browser, thinking: cats.css! I've seen that file before. I'll ask if it's changed! browser: GET cats.css If-None-Match:"ab23ef" (from the Etag) server: 304 Not Modified browser, thinking: yay I can use the old one, the page will load faster Vary: response header Sometimes the same URL can have multiple versions (Spanish, compressed or not, etc). Caches categorize the versions by request header like this: | Accept-Language | Accept-Encoding | content | |--------------------|--------------------|----------------------------------------| | en-US | - | hello | | es-ES | - | hola | | en-US | GZIP | f$xx99aef^.. (compressed gibberish) | The Vary header tells the cache which request headers should be the columns of this table. Cache-Control: request AND response header Used by both clients and servers to control caching behaviour. For example: `Cache-Control: max-age=99999999999` from the server asks the CDN or browser to cache the thing for a long time.
set clear expectations
[manager]
I used to often get stressed out about whether the way I was prioritizing my work was reasonable. Illustration of a stick figure with short curly hair, looking uneasy. person: I'm spending a lot of time on X and no time on Y. I hope that's okay!!!! Everything got easier once I could just: 1. come up for a plan for what to prioritize 2. tell my manager the plan and ask if it sounds good 3. trust them when they say yes Illustration of two stick figures talking. The employee has short curly hair, and the manager has no hair. employee: this quarter I'm planning to get BIG PROJECT done and spend time with my intern. I'm not planning to work on OTHER PROJECT at all. manager: sounds good! Just do X too? Setting expectations is awesome because: - I feel confident that my plans are reasonable - my manager is aware of what I'm planning and can coordinate Everybody wins!!!
position: absolute
### `position: absolute;` doesn't mean absolutely positioned on the page... ``` #star { position: absolute; top: 1em; left: 1em; } ``` doesn't always place element at the top left of the page! ### ... it's relative to the "containing block" the "containing block" is the closest ancestor with a `position` that isn't `static`, or the body if there isn't one. (`position: static` is the default) Illustration of a larger box, labelled "body", with a smaller box, labelled "`#star` nested inside it. The smaller box is off-centre within the larger box. The smaller box is labelled "this element has `position: relative` set" ### `top, bottom, left, right` will place an absolutely positioned element ``` top: 50%; bottom: 2em; right: 30px; left: -2in; ``` "`left: -2in;`" is labelled "negative works too" Illustration of two overlapping boxes. The top of the smaller one is halfway down the height of the larger one. The gap between the tops of the two boxes is labelled "50%". The smaller one extends to the left of the larger one, representing "`left: -2in;`", and its right and bottom sides are nested inside the larger one, representing "`right: 30px;`" and "`bottom: 2em;`". ### left: 0; right: 0; != width: 100%; `left: 0; right: 0;` Illustration of two boxes. The smaller box is nested within the larger box. It is the same width as the larger box, and is aligned to the top of it. This illustration is labelled "left and right borders are both 0px away from containing block". `width: 100%;` Illustration of two boxes. The smaller box is nested within the larger box, but its right edge extends past the right edge of the larger box. This illustration is labelled "width is the same as width of containing block". ### absolutely positioned elements are taken out of the normal flow Illustration of two stick figures having a conversation. Person 1: will a parent element expand to fit an absolutely positioned child? Person 2: nope!
seccomp-bpf
### all programs use system calls program, represented by a box with a smiley face: read 2000 bytes from this file Linux, represented by a box with a smiley face:here you go! ### rarely-used system calls can help an attacker - `reboot` - `request_key` - `process_vm_ready` (read memory from another process) ### seccomp-BPF lets you run a function before every system call smiling stick figure with short curly hair: run this function before every syscall that process makes Linux, represented by a box with a smiley face: okay! ### the function decides if that syscall is allowed example function: ``` if name in allowed_list { return true; } return false; ``` `return false` means the syscall doesn't happen! ### Docker blocks dozens of syscalls by default Docker, represented by a box with a smiley face: most programs don't need those system calls so I told Linux to block them for you! ### 2 ways to block scary system calls 1. limit the container's capabilities 2. set a seccomp-bpf whitelist You should do both!
shell script arguments
### panel 1: a script's arguments are in `$1`, `$2`, `$3`, etc ``` ./script.sh panda banana ``` `$1` is `"panda"` and `$2` is `"banana"` ### panel 2: arguments are great for making simple scripts Here's a 1-line `svg2png` script that I use to convert SVGs to PNGs: ``` #!/bin/bash inkscape "$1" -b white --export-png="$2" ``` I run it like this: ``` $ svg2png old.svg new.png ``` (arrow pointing to `"$2"`: "always quote your variables!") ### panel 3: get all the arguments with `"${@}"` ``` ls --color "${@}" ``` ### panel 4: you can loop over arguments ``` for i in "${@}" do echo "$i" done ``` ### panel 5: 1 line shell scripts are great person: "I can write a tiny script so I don't have to remember a long command!"
bash if statements
### the basic syntax ``` if COMMAND then # do thing else # do other thing fi ``` (you need a new line or ; before then) ### `[` vs `[[` there are 2 commands often used in if statements: `[` and `[[` `if [ -e file.txt ] ` `/usr/bin/[` (aka `test`) is a program that returns 0 if the test you pass it succeeds `if [[ -e file.txt ]]` `[[` is built into bash. It lets you do tests like `[[e x.txt && -e y.txt ]]` that wouldn't work with a command line tool ### `if COMMAND` did `COMMAND` return 0? ### if ! COMMAND did `COMMAND` NOT return 0? ### `if true` `true` always returns 0 :) ### `if [ -n "$var" ]` is `$var` nonempty? ### `if [ e file.txt ] ` does `file.txt` exist? ### combine with `&&` and `||` `if [ -e file] && [ -e file2 ]` ### `if [ -d somedir ]` does `somedir` exist? ### `if [ -x script.sh ] ` is `script.sh` executable? ### `man [` for more you can do a lot!
window functions
Let's talk about an advanced SQL feature: window functions! Normally SQL expressions only (et you refer to information in a single row. `SELECT CONCAT(firstname, ' ', lastname) as full_name` (`firstname` and `lastname` are 2 columns from the same row) person 1, bald and with a worried expression: can I refer to other rows though? Like subtract the value in the previous row? person 2, happy with curly hair: yes, with `*` window functions `*` Window functions are SQL expressions that let you reference values in other rows. The syntax (explained on the next page!) is: `[expression] OVER ([window definition])` Example: use `LAG()` to find how long since the last sale ``` SELECT item, day - LAG(day) OVER (ORDER BY day) FROM sales ``` sales: | item | day | |--------|-----| | catnip | 2 | | laser | 40 | | tuna | 70 | | tuna | 72 | query output: | item | `day - LAG(day) OVER (ORDER BY day)` | |--------|--------------------------------------| | catnip | `NULL` (2 - `NULL`) | | laser | 38 (40 - 2) | | tuna | 30 (70 - 40) | | tuna | 2 (72 - 2) | They're part of `SELECT`, so they happen after `HAVING`: `FROM + JOIN` -> `WHERE` -> `GROUP BY` -> `HAVING` -> `SELECT` -> `ORDER BY` -> `LIMIT` (arrow pointing to `SELECT`) happy little stick figure with curly hair: window functions are here!
why the same-origin policy matters
[cors]
debugging tip: change one thing at a time
[debugging]
NULL: unknown or missing
`NULL` is a special state in SQL. It's very commonly used as a placeholder for missing data ("we don't know her address!") What `NULL` means exactly depends on your data. For example, it's really important to know if `allergies IS NULL` means: - "no allergies" or - "we don't know if she has allergies or not" `NULL` "should" mean "unknown" but it doesn't always. smiling stick figure with curly hair: it would be easier if `NULL` always meant the same thing but it really depends on your data! ### where `NULL`s come from - There were already `NULL` values in the table - The window function `LAG()` can return `NULL` - You did a `LEFT JOIN` and some of the rows on the left didn't have a match for tiny pensive stick figure with curly hair: ooh, not very cat has an owner so sometimes the owner name is `NULL` ### ways to handle NULL s * - Leave them in! smiling stick figure with curly hair: I'd rather see a `NULL` and know there's missing data than get misleading results - Filter them out! `... WHERE first_name IS NOT NULL ...` - Use `COALESCE` or `CASE` to add a default value
CSS isn't design
### panel 1: web design is really hard Illustration of a stick figure with short curly hair, looking pensive. person (thinking): "wow, forms are way more complicated than I thought" ### panel 2: writing CSS is also hard person (thinking): "ok, how exactly does flexbox work again?" ### panel 3: remember that they're 2 different skills person (thinking): "hmm, I have NO IDEA what I want this site to look like, maybe that's the problem and not CSS" ### panel 4: CSS is easier when you have a good design Illustration of a box with smaller boxes arrayed inside it. person (thinking, and now smiling): "I can make it look like that!" ### panel 5: usually you have to adjust the design person (thinking): "oh right, I didn't think about how that menu should look on desktop" ### panel 6: sketching a design in advance can help! Illustration of a box with text reading "title", and a grid of smaller boxes underneath. even a simple sketch can help you think!
GROUP BY
`GROUP BY` combines multiple rows into one row. Here's how it works for this table & query: ``` SELECT item, (COUNT(*), MAX(price) FROM sales GROUP BY item ``` (COUNT(*), MAX(price) aggregates sales: | item | price | |--------|-------| | catnip | 5 | | laser | 8 | | tuna | 4 | | tuna | 3 | query output: | item | count | price | |--------|-------|-------| | catnip | 1 | 5 | | laser | 1 | 8 | | tuna | 2 | 4 | 1. Split the table into groups for each value that you grouped by: item='catnip' | item | price | |--------|------------| | catnip | 5 | item='laser' | item | price | |-------|------------| | laser | 8 | item='tuna' | item | price | |------|------------| | tuna | 4 | | tuna | 3 | 2. Calculate the aggregates from the query for each group: | item | price | |--------|------------| | catnip | 5 | ``` COUNT(*)=1 MAX(price)=5 ``` | item | price | |-------|------------| | laser | 8 | ``` COUNT(*)=1 MAX(price)=8 ``` | item | price | |------|------------| | tuna | 4 | | tuna | 3 | ``` COUNT(*)=2 MAX(price)=4 ``` 3. Create a result set with 1 row for each group | item | count | price | |--------|-------|-------| | catnip | 1 | 5 | | laser | 1 | 8 | | tuna | 2 | 4 |
what's a header?
Every HTTP request and respnse has headers. Headers are a way for the browser or server to send extra information! Firefox: `Accept-Encoding:gzip` This means "I understand compressed responses" ### Headers have a name and a value. `Accept-Encoding` is the name, `gzip` is the value. ### Header names aren't case sensitive: `aCcEpT-eNcOdIng: gzip` is totally valid. ### There are a few different kinds of headers: - Describe the body: ``` Content-Type: image/png Content-Encoding: gzip Content-Length: 12345 Content-Language: es-ES ``` - Ask for a specific kind of response ``` Accept: image/png Range: bytes=l-10 Accept-Encoding: gzip Accept-Language: es-ES ``` (Every Accept-header has a corresponding Content-header) - Manage caches: ``` ETag: "abc123" If-None-Match: "abc123" Vary: Accept-Encoding If-Modified-Since: 3 Aug 2019 13:00:00 GMT Last-Modified: 3 Feb 2018 11:00:00 GMT Expires: 27 Sep 2019 13:07:49 GMT Cache-Control: public, max-age=300 ``` - Say where the request comes from: ``` User-Agent: curl Referer: https://examplecat.com ``` - Cookies: ``` Set-Cookie: name=julia; HttpOnly (server -> client) Cookie: name=julia (client -> server) ``` and more!
amazing debugging features
[debugging]
understand your error messages
[debugging]
HTTP request methods 1
Every HTTP request has a method. It's the first thing in the first line: `GET /cat.png HTTP/1.1` `GET` means it's a `GET` request There are 9 methods in the HTTP standard. 80% of the time you'll only use 2 (`GET` and `POST`). ### `GET` When you type an URL into your browser, that's a `GET` request. examplecat.com/cat.png client, represented by a box with a smiley face: ``` GET /cat.png Host: examplecat.com ``` server, also represented by a box with a smiley face: ``` 200 OK Content-Type: image/png <the cat picture> ``` ### `POST` When you hit submit on a form, that's (usually) a `POST` request. client: ``` POST /add_cat Content-Type: application/json {"name": "mr darcy"} ``` (`POST` requests usually have a request body) server: ``` 200 OK Content-Type: text/html <after sign up page> ``` The big difference between `GET` and `POST` is that `GET`s are never supposed to change anything on the server. ### `HEAD` Returns the same result as GET, but without the response body. client: ``` HEAD /cat.png ``` server: ``` 200 OK Content-Type: image/png ``` (no image, just headers)
signals
### If you've ever used kill you've used signals person, angrily: DIE!!! process, sad: okay ### the Linux kernel sends processes signals in lots of situations - your child terminated - the timer you set expired - that pipe is closed - illegal instruction - segmentation fault ### you can send signals yourself with the kill system call or command ``` SIGINT Ctrl-C SIGTERM kill SIGKILL kill -9 SIGHUP kill -HUP ``` (various levels of "die") `SIGHUP` is often interpreted as "reload config", e.g. by nginx. ### Every signal has a default action, which is one of: - ignore - kill process - kill process AND make core dump file - stop process - resume process ### Your program can set Custom handlers for almost any signal person: `SIGTERM` (terminate) process: okay! I'll (clean up and then exit! exceptions: `SIGSTOP` & `SIGKILL` can't be ignored dead program: got `SIGKILL`ed ### Signals can be hard to handle correctly since they can happen at ANY time process: handling a signal person: SURPRISE! another signal!
PID namespaces
### the same process has different PIDs in different namespaces PID in host / PID in container 23512 / 1 (PID 1 is special) 23513 / 4 23518 / 12 ### PID namespaces are in a tree Diagram showing "host PID namespace (the root)" with three arrows coming down from it, each pointing to a label that says "child". Often the tree is just 1 level deep (every child is a container) ### you can see processes in child PID namespaces Illustration of a host, represented by a box with heart eyes and a big smile. host: aw! look at all those containers running! ### if PID 1 exits, everyone gets killed Illustration of PID 1, represented by a box with a smiley face, and Linux, represented by its penguin mascot. PID 1: ok I'm done! Linux: I'm kill -q'ing everyone else in this PID namespace IMMEDIATELY ### Killing PID 1 accidentally would be bad Illustration of a container process, represented by a box with a smiley face, and Linux, represented by its penguin mascot. container process: `kill 1` Linux: do you WANT everyone to die? I'm not gonna let you do that ### rules for signaling PID 1 - from same container: only works if the process has set a signal handler - from the host: only SIGKILL and SIGSTOP are ok, or if there's a signal handler
let your bugs teach you
[debugging]
using HTTP APIs
Lots of services (Twitter! Twilio! Google!) let you use them by sending them HTTP requests. If an HTTP API doesn't come with a client library, don't be scared! You can just make the HTTP requests yourself. Here's what you need to remember: ### Set the right `Content—Type` header Often you'll be sending a POST request with a body, and that means you need a `Content—Type` header that matches the body. The 2 main options are: - `application/json` (JSON!) - `application/x-www-form-urlencoded` (same as what an HTML form does) If you don't set the `Content—Type`, your request won't work. Smiling stick figure with short curly hair: a common error is to try to send POST data as one content type (like JSON) when it's actually another (like application/x-www-form-urlencoded) ### Identify yourself Most HTTP APIs require a secret API key so they know who you are. Here's how that looks for the Twilio API: ``` curl https://api.twilio.com/2010-04-01/Accounts/ACCOUNT_ID/Messages.json -H "Content-Type: application/json" -u CCOUNT_ID:AUTH_TOKEN -d '{ "from": "+15141234567", "to": "+15141234567", "body": "a text message" }' ``` (this sends a POST request) `u ACCOUNT_ID:AUTHO_TOKEN` sends the username/password in the Authorization
conntrack
### conntrack not a command line tool: it's a Linux kernal system for tracking TCP/UDP connections. It's a kernel module called `nf-conntrack` ### conntrack is used for: - NAT (in a router!) - firewalls (eg only allow outbound connections) You control it with iptables rules. ### conntrack has a table of every connection Each entry contains: - src + dest IP - src + dest ports - the connection state (eg `TIME_WAIT`) ### how to enable conntrack enable: `$ sudo modprobe nf_conntrack` check if it's enabled: `$lsmod | grep conntrack` change table size with the sysct | `net.netfilter.nf_conntrack_max` ### if the conntrack table gets full, no new connections can start smiling rectangle: hello? (SYN packet gets dropped) sad rectangle: silence ### moral: be careful about enabling conntrack! sad stick person with curly hair: why are connections mysteriously failing? happy stick figure with medium length straight hair: maybe the conntrack table is full!
SQL: ways to count
bash functions
### panel 1: defining functions is easy ``` say_hello() { echo "hello!" } ``` and so is calling them: ``` say_hello ``` (no parentheses when calling a function! ### panel 2: functions have exit codes ``` failing_function () { return 1 } ``` `0` is a success, everything else is a failure. A program's exit codes work the same way -- 0 is success, everything else is failure. ### panel 3: you can't return a string you can only return an exit code from 0 to 255 ### panel 4: arguments are `$1`, `$2`, `$3`, etc ``` say_hello() { echo "Hello, $1!" } say_hello "Ahmed" ``` the above code prints `Hello, Ahmed!`. Again, `say_hello "Ahmed"`, not `say_hello("Ahmed")` ### panel 5: The `local` keyword declares local variables ``` say_hello() { local x x=$(date) # this is a local variable y=$(date) # this is a global variable } ``` ### panel 6: `local x=VALUE` suppresses errors this line never fails, even if `asdf` doesn't exist: ``` local x=$(asdf) ``` but this will fail (as you would expect) -- if you have `set -e` set, it'll stop the program ``` local x x=$(asdf) # this line will fail ``` person: "I really have NO IDEA why it's like this, bash is weird sometimes"
trap
### when your script exits, sometimes you need to clean up nonplussed stick figure with short curly hair: oops, the script I created a bunch of temp files I want to delete ### `trap` sets up callbacks `trap COMMAND EVENT` COMMAND: what command to run EVENT: when to run the command ### bash runs COMMAND when EVENT happens `trap "echo 'hi!!!'" INT` OS, represented by a box with a smiley face: <sends `SIGINT` signal> bash, also represented by a box with a smiley face: ok, time to print out `hi!!!!` ### events you can trap - unix signals (`INT`, `TERM`, etc) - the script exiting (`EXIT`) - every line of code (`DEBUG`) - function returns (`RETURN`) ### example: kill all background processes when Ctrl+C is pressed `trap "kill $(jobs -p)" INT` when you press `CTRL+C`, the OS sends the script a `SIGINT` signal ### example: cleanup files when the script exits ``` function cleanup() { rm -rf $TEMPDIR rm $TEMPFILE } trap cleanup EXIT ``` EXIT is a fake "signal" that triggers on exit
container networking
figure out what your manager is great at
[manager]
Different managers are good at different things! I've worked with managers who are amazing at: Each of these items is enclosed in a thought bubble. - product design - helping people resolve conflicts - understanding the business - building remote teams - prioritizing ruthlessly - running meetings - solving tricky technical problems - organizational politics Not every manager is good at every single thing, and that's okay! I like to figure out what my manager is awesome at and lean on them for those things. (heart) Also, strengths change over time! If they're not good at something today, maybe check back in a year & see if that's changed.
container registries
### sharing container images is useful smiling stick figure with curly hair: I made an image you can use to run Redis with just one command! smiling bald stick figure: yay! ### a registry iS a server that serves images images have an ID, like "leff92" and sometimes a tag, like "18.04" or "latest" ### registries let you download just the layers you need client, represented by a box with a smiley face: I already have the Ubuntu base image, I just need `0fe223` registry, represented by a smaller box with a smiley face: here's `0fe223`! ### there are public container registries... person: I'm going to use the latest official public Redis image to test my code! ### ... and private registries developer at COMPANY, represented by a smiling bald stick figure: every time we build our web service, we upload a new image to our private registry ### be careful where your container images come from smiling stick figure: I'll just run this image from RANDOM_PERSON 2 months later: oh no! RANDOM_PERSON is mining bitcoin on my server
the 4 types of DNS servers
[dns]
subqueries
Some questions can't be answered with one Simple SQL query. For example, this query finds owners who have named their dogs popular names: ("boring" owners :)) dogs: | owner | name | |----------|-----------| | ken | darcy | | bob | darcy | | bob | lassie | | ahmed | darcy | | sara | floof | | sara | lassie | ``` SELECT owner FROM dogs WHERE name in SELECT name FROM dogs GROUP BY name HAVING count(*) > 2) ``` the subquery evaluates to ('darcy') query output: | owner | |----------| | ken | | bob | | ahmed | ### common table expressions "Common table expressions" (or CTEs) let you name a query so people reading it can understand what it's for. Here's the query above rewritten using a CTE: ``` WITH popular_dog_names AS ( SELECT name FROM dogs GROUP BY name HAVING count(*) > 2 ) SELECT owner FROM dogs INNER JOIN popular_dog_names ON dogs.name = popular_dog_names.name ``` ### Where you can use a. subquery/CTE #### in a `FROM` ``` SELECT ..... FROM (<subquery or CTE>) GROUP BY ..... ``` #### in a `WHERE` ``` SELECT ... WHERE name IN (<subquery>) ```
stacking contexts
### a z-index can push an element up/down... ``` .first { z-index: 3; } . second { z-index: 0; } ``` Illustration of two boxes. The one labelled "`.first`" is layered over top of the other one. ### TRY ME: but a higher z-index doesn't always put an element on top Illustration of a box labelled "`z-index: 0`". On top of that is a box labelled "`z-index: 10`". Another box is on top of that one. Layered over top of all of these is a box labelled "`z-index: 2`". `z-index: 2` is on top! why? ### every element is in a stacking context The same illustration as the previous panel, but a label pointing to both the "`z-index: 10`" and "`z-index: 2`" boxes says, "these 2 elements are in different stacking contexts" ### a stacking context is like a Photoshop layer Illustration of two boxes, each with three smiley faces and an "ok" button in it, one layered on top of the other. These are labelled "two 'layers'". by default, an element's children share its stacking context ### setting z-index creates a stacking context ``` #modal { z-index: 5; position: absolute; } ``` this is a common way to create a stacking context ### stacking contexts are confusing You can do a lot without understanding them at all. But if `z-index` ever isn't working the way you expect, that's the day to learn about stacking contexts (smiley face)
WHERE
`WHERE` filters the table you start with. For example, let's break down this query that finds all owners with cats named "daisy" ``` SELECT owner FROM cats WHERE name = 'daisy' ``` `FROM cats` is pulling from a database with tables of cats and people. | owner | name | |-------|------------| | 1 | daisy | | 1 | dragonsnap | | 3 | buttercup | | 4 | rose | `WHERE name = 'daisy' ` | owner | name | |-------|------------| | 1 | daisy | `SELECT owner` | owner | |-------| | 1 | ## What you can put in a WHERE : ### `expr LIKE `...` Check if a string contains a substring! `WHERE name LIKE '%darcy%'` (% is a wildcard, like * in your shell) ### `exprIN (...)` Check if an expression is in a fist of values `WHERE name IN ('bella', 'simba')` ### `=, !=, <, >=` these work the way you'd guess, except when `NULL` is involved. `WHERE revenue - costs >=0` ### `expr IS NULL`, `expr- IS NOT NULL` more about NULL on pages 15-17 = NULL (crossed out) IS NULL (circled) ### `AND, OR, NOT` You can `AND` together as many conditions as you want tiny little illustration of a smiling stick figure with curly hair: If I'm using lots of ANDs, like to write them like this: ``` (....) AND (....) AND (....) ``` (put all the ORs in the parentheses)
padding syntax
### there are 4 ways to set padding `padding: 1em;` (all sides) `padding: 1em 2em;` (first value is vertical, second is horizontal) `padding: 1em 2em 3em;` (first value is top, second is horizontal, third is bottom) `padding: 1em 2em 3em 4em;` (first value is top, second is right, third is bottom, fourth is left) ### tricks to remember the order 1. trouble top right left bottom 2. it's clockwise ### you can also set padding on just 1 side ``` padding-top: 1em; padding-right: 10px; padding-bottom: 3em; padding-left: 4em; ``` ### TRY ME: differences between padding & margin - padding is "inside" an element: the background color covers the padding, you can click padding to click an element, etc. Margin is "outside". - you can center with margin: auto, but not with padding - margins can be negative, padding can't ### margin syntax is the same as padding `border-width` also uses the same order: top, right, bottom, left
CNAME records
[dns]
### there are 2 ways to set up DNS for a website 1. set an A record with an IP `www.cats.com A 1.2.3.4` 2. set a CNAME record with a domain name `www.cats.com CNAME cats.github.io` ### CNAME records redirect every DNS record, not just the IP I like to use them whenever possible so that if my web host's IP changes, I don't need to change anything! ### what actually happens during a CNAME redirect Illustration of a conversation between a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown. resolver: what's the A record for `www.cats.com`? authoritative nameserver: `www.cats.com CNAME cats.github.io` resolver (thinking): okay, I'll look up the A record for `cats.github.io`! ### rules for when you can use CNAME records 1. you can only set CNAME records on subdomains (like `www.example.com`), not root domains (like `example.com`) 2. if you have a CNAME record for a subdomain, that subdomain can't have any other records (technically you can ignore these rules, but it can cause problems, the RFCs say you shouldn't, and many DNS providers enforce these rules) ### some DNS providers have workarounds to support CNAME for root domains Look up "CNAME flattening" or "ANAME" to learn more.
DNS record types
[dns]
talk about problems early
[manager]
### Every so often I'll start with a small problem Illustration of a stick figure with short curly hair, looking nonplussed. employee: hmm this isn't great ### and forget to talk about it until I'm REALLY MAD Illustration of a stick figure with short curly hair, looking very upset, and another stick figure, the manager, who has medium length straight hair, and looks confused, with question marks over their head. employee: THIS IS TERRIBLE manager, thinking: whoa where did that come from? ### It's way better to bring up a problem early and figure it out before it turns into a big deal! Illustration of a stick figure with short curly hair, looking nonplussed, and their manager, a stick figure with medium length straight hair, who is smiling. employee: I got paged 15 times this week, can we talk about how to improve this? manager: yes let's work on that!
how to read an error message
[debugging]
why we need DNS
[dns]
flexbox basics
### display: flex; set on a parent element to lay out its children with a flexbox layout. by default, it sets `flex-direction: row;` ### flex-direction: row; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. They are side-by-side in a single row. by default, children are laid out in a single row. the other option is `flex-direction: column` ### flex-wrap: wrap; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side, then an arrow winds around to the starburst box, which is underneath the other two, aligned to the left. will wrap instead of shrinking everything to fit on one line ### justify-content: center; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side. The starburst box is centred underneath them. horizontally center (or vertically if you've set `flex-direction: column`) ### align-items: center; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The boxes are different heights, and are placed side-by-side in a single row, centred horizontally. vertically center (or horizontally if you've set `flex-direction: column`) ### you can nest flexboxes A box labelled `display: flex`. Inside it are two smaller boxes, side-by-side. Each is also labelled `display: flex`. One of the smaller boxes has three boxes side-by-side in it. The other smaller box has three boxes stacked on top of one another, inside it.
debugging tip: you've probably seen this bug before
[debugging]
debug by writing a test
[debugging]
subdomains
[dns]
### to make a subdomain, you just have to set a DNS record! To set up cats.yourdomain.com, create a DNS record like this in your authoritative nameservers: cats.yourdomain.com A 1.2.3.4 yourdomain is the name A is the record type 1.2.3.4 is the value ### there are 2 ways a nameserver can handle subdomains 1. Store their DNS records itself nameserver, represented by a box with a smiley face wearing a crown: here's the IP for cats.yourdomain.com! 2. Redirect to another authoritative nameserver (this happens if you set an NS record for the subdomain, it's called "delegation") nameserver: ask this other DNS server instead! ### you can create multiple levels of subdomains For example, you can make: a.b.c.d.e.f.g.example.com up to 127 levels is allowed! ### www is a common subdomain Usually www.yourdomain.com and yourdomain.com point to the exact same IP address. If you wanted to confuse people, you could make them totally different websites! ### panel 5 Illustration of a smiling stick figure with curly hair. person: I love using subdomains for my projects (like dns-lookup.jvns.ca) because they're free, I can give a subdomain a different IP, and it keeps projects separate.
top-level domains
[dns]
how to handle intermittent bugs
[debugging]
picking a domain registrar
[dns]
authoritative nameservers
[dns]
### your domain has dns records `example.com A 1.2.3.4 300` `example.com` is the name `A` is the type of record `1.2.3.4` is the value `300` is the TTL ### these records are cached on lots of servers server 8.8.8.8, thinking: I was told example.com's IP is `1.2.3.4` but when that cache expires... ### the only source of truth is your authoritative nameserver your nameserver: I have all example.com's DNS records! ### how to get a domain's authoritative nameserver: ask its TLD namesever person: who's the authority for example.com? .com nameserver: `b.iana-servers.net` ### here's how to look up example.com's nameserver Run this: ``` $ dig ns example.com g.gtld-servers.net ``` `g.gtld-servers.net` is one of the .com nameservers ### you can update your nameserver on your registrar's website person: hey I want to use a different nameserver registrar: I'll tell the TLD nameservers!
make your code easy to debug
[debugging]
work with your manager to get promoted
[manager]
Where I work, my manager wants people on the team to get promoted. If people are being promoted, it (hopefully) means that they're growing & getting more awesome at their jobs, which makes the team's manager look good! Illustration of a smiling stick figure with short curly hair. person, thinking: huh, maybe promotions are just a normal thing we can have a conversation about? Some ways to start conversations: - can we walk through the expectations for the next level to make sure I understand them? - what areas do you think I should focus on? - if I accomplished X Y Z, do you think that would be enough to get promoted? If this is something you care about, keep checking in periodically! The person who cares the most about your career is you ♡♡
keep conversations mostly constructive
I've had periods with some managers where, every time we talk, we're talking about SOME problem: Two illustrations of the same stick figure with curly hair, looking unhappy. me: why did y happen? me: X has been a problem for a year and it's STILL not fixed These days, I try to bring up problems that I'm interested in fixing and bring ideas for solutions when I can. Often we just talk about our work: Each item is illustrated with a smiling stick figure with curly hair saying them. - here's an idea I had... - my intern is doing awesome work! - did you see that great thing this other team did? - here's an interesting bug from this past week... - I thought of an onboarding project for the new person! Sometimes venting can be useful too, though! If there's a problem, it's often helpful to bring it up even if I don't have a solution.
debugging tip: build your mental model
[debugging]
a SHA always refers to the same code
Let's start with some fundamentals! If you understand the basics about how git works, it's WAY easier to fix mistakes. So let's explain what a git commit is! Every git commit has an id like 3f29abcd233fa, also called a SHA ("Secure Hash Algorithm"). A SHA refers to both: - the changes that were made in that commit (see them with ```git show```) - a snapshot of the code after that commit was made No matter how many weird things you do with git, checking out a SHA will always give you the exact same code. It's like saving your game so that you can go back if you die You can check out a commit like this: ```git checkout 3f29abk``` SHAS are long but you can just use the first 6 chars This makes it way easier to recover from mistakes! person at 10 am: ok, let's commit, that's a2992b person at 11 am: I really screwed up this file, let's go back to the version from a2992b
HTTP request methods 2
### OPTIONS `OPTIONS` is mostly used for `CORS` requests. The `CORS` page has more about that. It also tells you which methods are available. ### DELETE Used in many APIs (like the Stripe API) to delete resources. box with a smiley face 1: `DELETE /v1/customers/cus_12345` ་("delete this customer please!") box with a smiley face 2: `200 OK` ("deleted!") ### PUT Used in some APIs (like the S3 API) to create or update resources. `PUT /cat/1234` lets you `GET /cat/1234` later. ### PATCH Used in some APIs for partial updates to a resource ("just change this 1 field"). ### TRACE I've never seen a server that supports this, you probably don't need to know about it. ### CONNECT Different from all the others: instead of making a request to a server directly, it asks for a proxy to open a connection. If you set the `HTTPS_PROXY` environment variable to a proxy server, many HTTP libraries will use this protocol to proxy your requests. client, represented by a box with a smiley face: `CONNECT test.com` `$AFO XXRTZ` (encrypted request) proxy, also represented by a box with a smiley face, thinking: ok, I'll open a connection to test.com. proxy: `$AFO XXRTZ` test.com, represented by a box with a smiley face: [is here]
build the support system you need
The flip side of "figure out what things they're great at" is that there are always going to be things your manager I can't help you with. When that happens, there are a few choices: 1. Get mad that they can't help 2. Resign yourself to not getting help with those things 3. Find help elsewhere!!! Lara Hogan (her blog is GREAT) has an amazing blog post called "When your manager isn't supporting you, build a Voltron" about building a crew of people with lots of different skills who you can ask for help! Some of her tips: - figure out what you need help with before asking. Use their time well!` - focus on problem solving, not venting Illustration of a big cool robot with wings, holding a big sword. Various parts of its body are labelled with the points below. A Voltron is a robot built out of several other robots - works in a different field - awesome at communication - more experience than me bit.ly/managervoltronbingo has a useful bingo card!
receiving email at your domain
[dns]
domain privacy
[dns]
debugging tip: more assumptions to check
[debugging]
debugging tip: get specific about what the bug is
[debugging]
how to give good feedback
directories and symlinks
ipv6
what's a mac address?
inter-process communication
2fa
person 1: I have a really secure email password! person 2: that's awesome! but you know, if a hacker got my password, they STILL can't get into my email :) person 1: what? how? ### There are 3 common ways to use 2FA: #### SMS (okay!) person: I'd like to login email: I've sent you an SMS with a code. Enter the code to finish logging in Problems: - Your phone # can get stolen (this happens in real life!) - Sometimes SMS doesn't arrive #### google authenticator app, aka TOTP (very good!) person: I'd like to login phone: 12345 email: enter the code from that app on your phone! Problem: These codes can still be phished #### security key, aka U2F (the easiest to use! the most secure!) person: I'd like to login tap yubikey - done! These work AWESOME for gmail! You just plug it into a USB port! Problems: - you have to buy it - not every website has support
user space vs kernel space
the senior engineer
ways i want my team to be
tcp
page table
having productive conversations when i disagree
what does an operating system do?
the stack
no feigning surprise
how to talk to your operating system
anatomy of a packet
when you get a webpage, like Facebok, it comes into your computer in many small packets. Let's see what those look like! Packets are split into a few sections (or "headers") ### ethernet/wifi `82:53:ac:99:2f:33` (MAC address) "physical layer": this gets changed constantly as your packet moves between computers ### IP ("internet protocol") `FROM: 172.96.2.3 TO: 123.9.2.32` in charge of getting your packet to the right server (like an address on an envelope) ### TCP (or UDP) `sequence number: 877392` (counts bytes sent so far) `checksum: 8847` (detect corrupted data) `from: port 9979 to: port 80` in charge of preventing data corruption and helping you retry lost packets. video streaming uses UDP instead. UDP does not try to be reliable. ### HTTP (or whatever) ``` GET / HTTP 1.1 Host: google.com Accept-Language:en-US ``` the actual data you're trying to send!
networking concepts
network address translation
mutexes
man pages are awesome
rr
bpf filters
bash tips
the cap theorem
acid
## what's acid? notes from Martin Kleppman's *amazing* "Designing Data-Intensive Applications" book. ACID is about safety guarantees for database transactions. ### Atomicity NOT about concurrent writes, that's "isolation" application: do these 5 writes atomic DB: omg there was an error in the middle, rolling them all back! ### Consistency super overloaded term. This sense of "consistency" is actually an application property not a DB property. not linearizability not as in "eventual consistency" About preserving application invariants like "every sale gets an invoice" ### Isolation app 1: I'm selling a watch app 2: I'm selling the same watch Isolation is about preventing rare conditions like this. Some isolation levels: - serializability - snapshot isolation - read committed ### Durability Durable DB: I committed your writes app: phew my data won't get lost even if the DB crashes/there's a hardware failure Perfect durability doesn't exist. Can involve: - write-ahead log (usually) - replication
the filesystem cache
computers are fast
blogging principles
how does dns work
directories and symlinks
linux tracing systems
read the source code
getting started with ftrace
vim sessions
what's slow on a computer
learning to design software
ways to build expertise
tips for reading code
scenes from design docs
love your bugs
(thanks to Allison Kaptur for teaching me this attitude! she has a great talk called "Love Your Bugs.) Debugging is a great way to learn. First, the harsh reality of bugs in your code is a good way to reveal problems with your mental model. program: error: too many open files person: I can't just open as? many files as I want?. Interesting! Fixing bugs is a good way to learn to write also more reliable code! person, thinking: hmm, I should put in error handling here in case that data base query times out. Also, you get to solve a mystery and get immediate feedback about whether you were right or not. person 1: that's weird... person 1: oh goodness, that's a lot of errors person 1: I have an idea! person 1: [coding a fix] person 1: it works now! person 2: great work! Nobody writes great code without writing + fixing lots of bugs. So let's talk about debugging skills a bit!
let's build expertise!
learning at work
it's not too late to start learning
invest in understanding
asking good questions (part 2)
asking good questions
One of my favorite tools for learning is asking questions of all the awesome people I know! what's a good question? ### good questions: - are easy for the person to answer - get you the information you're looking for ### Here are some strategies for asking them: - state what you know person 1: so, I know when the database gets a lot of writes, the hard drive can't keep up. person 2: that's right! I don't think that was) our problem, though. Look at this... This helps because: - I'm forced to think about what I know - I'm less likely to get answers that are too basic or too advanced Guessing the answer: - makes me think! - helps my coworker see what kind of answer I'm looking for guess what the answer might be person 1: Do we have 5 load balancers because we get a lot of HTTP requests? person 2: actually, we just want to be sure it's ok if one goes down.
building confidence in kubernetes
understand your manager's goals
[manager]
Illustration of two stick figures having a conversation. The manager is smiling and has straight shoulder length hair. The employee looks confused and has short curly hair. manager: can you get metrics on X's speed? me: why? That won't help us get the code done! They might be asking for metrics because: - they're hearing complaints about X being slow (that you might not be hearing!) - without metrics, it's hard for them to have an informed conversation about those complaints (& defend you if X is actually fast!) Having regular conversations about their priorities for the team is SO USEFUL and means that I'm surprised less often. (illustration of two smiley faces) Illustration of the same two stick figures as above, but now they're both smiling. manager: performance / speed is getting more important recently! me: good to know, should I work on speeding up X?
remember your manager's only human
[manager]
Sometimes I fall into a trap where I think my manager should be able to solve EVERY problem on the team and if they're not then they're not doing their job. (the word "every" is surrounded by glowing lines for emphasis) It's helpful for me to remember that at any given time they're probably dealing with a lot! Illustration of a smiling stick figure, representing the manager, surrounded by spiky bubbles containing the following items. - hire 2 people - coordinate with other teams - make sure the intern gets an offer on time (illustration of a clock) - write 10 performance reviews - finalize plans for next quarter - make sure we have an onboarding plan for the new person - interview new manager candidate - a team member is unhappy, figure out what's going on - ... personal life (smiley face) I try to be somewhat aware of what my manager is dealing with & help out when I can. Illustration of two smiling stick figures, one with curly hair representing the employee, and one with medium length straight hair, representing the manager. employee: Here's a project I think could be a good fit for the new person! manager: good idea, thanks!
on emotional labour
[manager]
"Emotional labour" is the idea that dealing with feelings-related problems is work. Illustration of two stick figures having a conversation. The employee has short curly hair and looks angry. The manager is smiling and has no hair. I'm angry that my contributions on that project weren't recognized... manager: [understanding face, doing work] Emotional labour is part of what managers are paid to do. But!! Managers aren't therapists. Illustration of a smiling stick figure, crossed out in red. manager: tell me about your father... not good 1:1 material (smiley face) When I'm upset about something, I try to be clear about why and ideally explain what I think a reasonable resolution would be. employee: can we just make sure it features in my next performance review? manager: yes definitely!
how to work well with your manager
[manager]
Most of the rest of this zine is about COMMUNICATION (The word "communication" is surrounded by hearts, smiley faces, stars, and exclamation marks) Basically your manager's job is to make sure that your team is getting work done that will help the business. This is awesome because it means that if you just communicate with them well, then you can mostly focus on programming!!! (the word "awesome" is surrounded by glowing lines and hearts) Communicating well can help you: - get awesome opportunities - solve problems - build trust - understand priorities - get promoted - get feedback (each of the above items is in a spikey bubble) To start, let's talk about 1:1s (which hopefully your manager schedules regularly).
getting a new manager
[manager]
Being assigned a new manager is a little scary. Not all of my managers have been great! Illustration of a stick figure with short curly hair, looking uncertain. person: OH NO what if my new manager is hard to work with ?!?! But! More than once I've started out thinking, Illustration of a stick figure with short curly hair, looking scared. person: who is this person they seem suspicious and ended up, a year later, at Illustration of a stick figure with short curly hair, smiling. person: wow they have helped me and the team so much, this is AMAZING so I try to assume that's where we'll end up. Some things I've found helpful: - write a document explaining my past work to them - ask them about any concerns directly - often they have great answers! - pay close attention to what they do well - tell them when they do something great
what's a branch?
You can think about a Git branch in 3 different ways: ### 1. just the commits that "branch" off this is how I usually think about branches: `armadillo` branches off `main` Illustration of a vertical black line, labelled "main". Coming of off it is a red line, labelled "armadillo" The armadillo line has two dots on it. The two dots are labelled "I think of the armadillo branch as these 2 commits" #### How this shows up in git: Git DOESN'T KNOW that `armadillo` is branched off of `main`: for all it knows, main could be branched off of `armadillo`! You need to tell it when you merge or rebase, for example: ``` git checkout main git merge armadillo ``` ### 2. every previous commit Even though git doesn't treat the `main` branch in any special way, I think of `main` differently from other branches. Illustration of a vertical red line, labelled "main", which has four dots along it. Coming of off it is a black line, labelled "armadillo". The red dots on the red line are labelled "I think of my main branch as these 4 commits" #### How this shows up in git: It's what `git log BRANCHNAME` shows you! How `git log main` works: Illustration of a vertical line with four dots along it. The dot at the top is labelled `main` (start here). The lines between the dots are labelled "parent". ### 3. just the commit at the end This is how branches are actually implemented in git. Illustration of a vertical black line, labelled "main", which has four dots along it. Coming of off it is a red line, labelled "armadillo". The final dot along "armadillo" is labelled "the latest commit on the branch" #### How this shows up in git: It's how branches are stored internally: a branch is fundamentally a name for a commit ID. `.git/refs/heads/main` (branch name) `a276f62` (ID of the latest commit on the branch)
the current branch: HEAD
### HEAD is a tiny file containing the name of the current branch Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`". `HEAD` = `main` `main` = [squiggle] ### when you commit, git updates the current branch to point at the new commit Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`". `HEAD` = `main` `main` = [squiggle] Diagram of four boxes in a row, joined by lines. One has a heart, one has a star, one has a squiggle, and one has a spiral. The final one, with the spiral, is labelled "`main`". `HEAD` = `main` `main` = [spiral] ### SO MANY things in git use the current branch * `git commit` moves it forward * `git merge` merges into it * `git rebase` copies commits from it * `git push` and `git pull` sync it with a remote ### many git disasters are caused by accidentally running a command while on the wrong branch Illustration of a sad stick figure person: `git commit` person, thinking: UGH I didn't mean to do that on `main` ### I keep my current branch in my shell prompt `~/work/homepage (main) $` to me it's as important as knowing what directory I'm in ### panel 6 Illustration of a smiling stick figure with curly hair. person: I think `HEAD` is a weird name for the current branch (why not `CURRENT` or something?) but we're stuck with it
rules for rebasing
### don't rebase a million tiny commits you can end up having to fix the same merge conflict 25 times and it's a nightmare. instead, do it in 2 steps: 1. squash into 1 commit with `git rebase -i` 2. `git rebase main` ### don't force push to a shared branch it's totally ok if it's your own branch that nobody else will ever have to git pull from, but if other people are using it, it makes things weird ### don't do more than one thing in a `git rebase -i` you can * combine commits * reorder commits * edit commits but don't do all of them at once! It's too confusing! ### don't rebase other people's commits I only modify my own commits ### stop a rebase if it's going badly it's MUCH easier to run `git rebase --abort` and bail out than to have to undo it later. It'll take you back to where you were before the rebase. ### you never have to rebase the only reason to rebase is to tidy up your git history, if you're not comfortable rebasing then just don't do it! You can merge or `git commit --amend` instead
remote branch caching
### the "up to date" in `git status` is misleading ``` $ git status Your branch is up to date with origin/main ``` this does NOT mean that you're up to date with the remote main branch. But why not??? ### some old version control systems only worked if you were online Illustration of a sad stick figure with short curly hair. person (thinking): my internet went out, guess I can't work ### git works offline Illustration of a smiling stick figure with short straight hair. git developer (thinking): I want to be able to code on a train with no internet git developer (thinking): NOTHING in git will use the internet except `git pull`, `git push`, and `git fetch` ### this makes `git status` weird git developer (thinking): we need to tell people if their branch is up to date... with NO INTERNET??? how? ### solution: CACHING Every remote branch has a local cache named like `origin/mybranch` (`origin` is the remote name, `mybranch` is the branch name) Git doesn't call it a cache though, it calls it a "remote tracking branch" local branch: `mybranch` cache: `origin/mybranch` (only updated on `git pull`, `git push`, `git fetch`) remote branch: `origin mybranch` (`git push origin mybranch` updates this) (git has no easy way to see when `origin/mybranch` was last updated)
orphan commits
### commits in git are usually saved forever Except! Orphaned commits are deleted periodically. Illustration of a little garbage can. Commits are orphaned when you: - `git commit --amend` - `git rebase` - delete a branch that hasn't been merged ### what is an orphaned commit? it's a commit that isn't in the history of any branch they're almost totally invisible, since Git will usually only show you commits on branches ### orphan #1: `git commit --amend` before: An illustration for a box that says `parent`, with a line to a second box that says `fix color buug` (typo!). The second box is labelled `main` branch. after: The same diagram as above, but there is now a second line coming out of the `parent` box, going to a third box that says `fix color bug`. The `fix color buug` box is now labelled "now it's an orphan!" and the `fix color bug` box is labelled "`main` branch". ### orphan #2: `git rebase` before: A box with two branches coming out of it. The top one is labelled "`main` branch". The second branch has two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch". after: A box with two branches coming out of it. The top branch consists of three boxes, one blank, one with a heart, and one with a star. The blank box is labelled "`main` branch", and the box with the star is labelled "`feature` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "now these two are orphans!" ### orphan #3: `deleting unmerged branch` before: A box with two branches coming out of it. The first branch consists of one blank box, labelled "`main` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch". after deleting `feature`: The same diagram as above, except that the second branch is now labelled "now these two are orphans!" ### how to find orphan commits the only way to find them is with `git reflog` (or by memorizing their commit ID somehow)
oh shit! I want to undo something from 5 commits ago!
If you made a mistake but want to keep all of the commits since then, `git revert` is your friend! `git revert` will create a reverse patch for the changes in a commit and add it as a new commit. 1. Find the commit SHA for the commit you want to undo. 2. Run: `git revert SHA` 3. Enter a commit message for the revert commit. Now all of the changes you made in that commit are undone! person: this is super useful if you push a bad commit to a shared repository and need to undo it!
oh shit! I want to split my commit into 2 commits!
1. Stash any uncommitted changes (so they don't get mixed up with the changes from the commit): `git stash` 2. Undo your most recent commit: `git reset HEAD^` (safe: this points your branch at the parent commit but doesn't change any files) 3. Use `git add` to pick and choose which files you want to commit and make your new commits! 4. Get your uncommitted changes back: `git stash pop` person: you can use `git add -p` if you want to commit some changes to a file but not others!
oh shit! I tried to run a diff but nothing happened!
Suppose you've edited 2 files ``` $ git status On branch main Changes to be committed: modified: staged.txt Changes not staged for commit: modified: unstaged.txt ``` ("`modified: staged.txt`" are staged changes added with `git add`. `modified: unstaged.txt` are unstaged changes.) Here are the 3 ways git can show you a diff for these changes: - `git diff`: unstaged changes - `git diff --staged`: staged changes - `git diff HEAD`: staged+unstaged changes A couple more diff tricks: - `git diff --stat` gives you a summary of which files were changed & number of added/deleted lines - `git diff --check` checks for merge conflict markers & whitespace errors
oh shit! I started rebasing and now I have 1000000 conflicts to fix!
This can happen when you're rebasing many commits at once. 1. Escape the rebase of doom: `git rebase --abort` 2. Find the commit where your branch diverged from main: `git merge-base main my-branch` 3. Squash all the commits in your branch together: `git rebase -i $SHA_YOU_FOUND` 4. Rebase on main: `git rebase main` person: alternatively, if you have 2 branches with many conflicting commits, you can just merge!
oh shit! I need to change the message on my last commit!
No problem! Just run: `git commit --amend` Then edit the commit message & save! `git commit --amend` will replace the old commit with a new commit with a new SHA, so you can always go back to the old version if you really need to. Person: if you run `git commit` but change tour mind, you can always abort by deleting the commit message & saving + quitting. Or quit without saving!
oh shit! I have a merge conflict!
Suppose you had `main` checked out and ran `git merge feature-branch`. If that causes a merge conflict, you'll see something like this in the files with conflicts: ``` <<<<<<< HEAD if x == 0: return false ======= ``` (this is the code from `main`) ``` if y == 6: return true elif x ==0: return false feature-branch >>> d34367 ``` (this is the code from `feature-branch`) ### To resolve the conflict: 1. Edit the files to fix the conflict 2. `git add` the fixed files 3. `git diff` --check: check for more conflicts. 4. `git commit` when you're done. (or `git rebase --continue` if you're rebasing!) Smiling stick figure with medium length straight hair: You can use a GUI to visually resolve conflicts with `git mergetool`. Meld (meldmerge.org) is a great choice!
oh shit! I did something terribly wrong, does git have a magic time machine?
Yes! It's called git reflog and it logs every single thing you do with git so that you can always go back. Suppose you ran these git commands: ``` git checkout my-cool-branch (1) git commit -am "add cool feature" (2) git rebase master (3) ``` Here's what git reflog's output would look like. It shows the most recent actions first: ```245fc8d HEAD @{2} rebase -i (start):``` (3) checkout master ```b623930 HEAD @{3} commit:``` (2) add cool feature ```01d7933 HEAD @{4} checkout:``` (1) moving from master to my-cool-branch If you really regret that rebase and want to go back, here's how: ```git reset --hard b623930``` ```git reset --hard HEAD@{3} ``` 2 ways to refer to that commit before the rebase
oh shit! I committed but I want to make one small change!
1. Make your change 2. Add your files with git add 3. Run: `git commit --amend --no-edit` person: this usually happens to me when I forget to run tests/linters before committing! You can also add a new commit and use `git rebase -i` to squash them but this is about a million times faster.
oh shit! I committed a file that should be ignored!
Did you accidentally commit a 1.5GB file along with the files you actually wanted to commit? We've all done it. 1. Remove the file from Git's index: `git rm --cached FILENAME` This is safe: it won't delete the file 2. Amend your last commit: `git commit --amend` 3. (optional) Edit your `.gitignore` so it doesn't happen again person: now your coworkers won't be stuck downloading a HUGE git commit
merge commits
### merging 2 diverged branches creates a commit `git merge mybranch` Diagram of two boxes in a row, one with a heart, and one with a star. From the star, it branches out into a branch with a hash symbol, labelled `main`. The other branch coming off of the star has a box with a spiral followed by a box with a spiky symbol. The two branches converge in a box with a diamond symbol, labelled "merge commit!". merge commits have a few surprising gotchas! ### gotcha: merging isn't symmetric normal: ``` git checkout main git merge mybranch ``` weird: ``` git checkout mybranch git merge main ``` these two result in the same code, but the merge commit's parents have a different order This comes up when you use `HEAD^`: it refers to the first parent, and usually you want that to be the commit from the main branch ### gotcha: you can keep coding during a merge If you forget you're doing a merge, it's easy to accidentally keep writing code and add a bunch of unrelated changes into the merge commit. I use my prompt to remind me. ### gotcha: git show doesn't tell you what the merge commit did It'll often just show the merge commit as "empty" even if the merge did something important (like discard changes from one side). Illustration of a tiny sad stick person with curly hair person: why ### tip: see what a merge did with `git show --remerge-diff` `git show --remerge-diff COMMIT_ID` will re-merge the parents and show you the difference between the original merge and what's actually in the merge commit
meet the merge
### merging is a huge thing in git But the terminology around merging is a bit confusing: - `git merge` isn't the only way to combine branches: you can also use `git rebase`! - merge conflicts (surrounded by sad faces) can happen if you do any of these: - `git merge` - `git rebase` - `git cherry-pick` - `git revert` - `git stash pop ` - merge commits are only created by `git merge` Illustration of two stick figures talking, one is bald and looks unhappy, the other has curly hair and is smiling. person 1: ... and what the heck is "fast forward"? person 2: let's talk about it! ### there are 3 situations when combining branches 1. easy: no divergence ("fast-forward") Diagram of a box with a heart in it, labelled "main". Branching off it in a horizontal line, are three boxes with a star, a hash symbol, and a squiggle. The squiggle box is labelled "panda". git merge moves the main branch forward to where the panda branch is, like this: Same diagram as above, except now the squiggle box is labelled "main" as well as "panda". 2. harder: diverged branches, no conflicts Diagram of two boxes in a horizontal line, one with a heart, and one with a star. Branching off of the star box are two boxes, one with a hash symbol and one with a spiral. These two boxes are labelled "editing different code". you have to decide whether to merge or rebase, but it'll succeed 3. hardest: diverged branches with merge conflicts The same diagram as above, except now the two final boxes are labelled "editing the same code", and there is a sad stick figure standing beside it. you have to decide whether to merge or rebase, AND fix a merge conflict ### git merge checks for these 3 situations in order 1. is this the "easy" situation? - if no, run the merge - if yes, fast forward! 2. run the merge. Is there a merge conflict? - if yes, tell you to manually resolve the conflict - if no, done! 3. tell you to manually resolve the conflict ### `git pull` needs to combine branches too `git pull` will ONLY fast forward (easy mode) by default. If it can't, it'll ask you to specify if you want to rebase or merge. `git pull --rebase` runs `git rebase` `git pull --no-rebase` runs `git merge `
meet the commit
### commits never change once you've made a commit, it's set in stone: - the files in it never change - its diff never changes - its history never changes - the message/author never change ### commit hashes commits never change because their ID is calculated from their contents. Illustration of a box labelled `sha1 hash`. Going into the box are: - every file - parent(s) - message - author - timestamp Coming out of the box is an arrow labelled `3530a42`. ### you can think of commits as a pile of diffs Illustration of a stack of boxes connected with lines. Each box is labelled "diff", except for the bottom one, which is labelled "START". The top box has an arrow pointing to it that says "current". if you combine all the diffs together, you'll get the current state of the project! (not how Git works, but a VERY useful way to think about commits!) ### you can also think of commits as a pile of snapshots Illustration of a stack of boxes connected with lines. Each box is labelled "snapshot", except for the bottom one, which is labelled "START". The top box has an arrow pointing to it that says "current". this is how Git is implemented! confused bald stick figure: is git saving a NEW copy, EVERY TIME?? happy stick figure with curly hair: not quite! it has some tricks! (on the next page) ### diffs are calculated from snapshots Illustration of two boxes, one on top of the other, connected with lines. Both boxes are labelled "snapshot". the diff is the difference between a commit and its parent happy stick figure with curly hair: hey what's the diff for `353ea42`? git, represented by a box with a smiley face: let me calculate that REALLY FAST! ### things git can do with a commit - get the files in the commit (like `git checkout`) - calculate the diff from its parent (like `git show`) - merge if with arother commit (like `git merge`) - look at its parents, grandparents, etc (like `git log`)
meet the branch
### theoretically you could use git without branches You could keep track of your commit IDs manually: Illustration of a smiling stick figure with medium-length straight hair. person: hmm, what was I working on? oh yes, `a38b997`! But most people use branches. ### every branch has 3 things - a name (like `main`) - a latest commit (like `2e9ffc`) - a reflog of how that branch has evolved over time (page 26) Branches also sometimes have a corresponding remote branch which they "track" ### branches are core to how git stores your work If your commits are "lost" (not on a branch) (page 13): - (sad face) git's garbage collection will eventually delete them - (sad face) they'll become incredibly difficult to find ### the only difference between the main branch and any other branch is how you treat them For example: it's common to never commit to main directly, and instead commit to other branches which you merge into main when you're done. ### all changes to a branch are recorded in its reflog The reflog records every rebase, amended commit, pull, merge, reset, commit, etc. You can look at the reflog like this: `git reflog BRANCHNAME` reflog stands for "reference log" (not re-flog ) (smiley face) ### git will let you do literally anything with a branch - when you push/pull a branch, the local branch name doesn't have to match the remote branch name - you can remove commits from a branch with `git reset` Git often won't protect you from messing up your branch!
losing your work
### people are always saying: Illustration of two stick figures talking. One is bald and smiling, the second has long curly hair and is frowning. person 1: don't worry! it's impossible to lose your work in git! person 2 (thinking): my lost work says otherwise but some parts of git are MUCH safer than others ### commits on a branch / tag (lock icon) never change Illustration of a smiling stick figure with curly hair. Their speech bubble is surrounded by hearts and stars. person: you can ALWAYS use the commit ID to get your work back! ### orphan commits (lock icon) never change, except... they'll eventually get deleted by git's garbage collection (usually not for a few months though) ### branches and `HEAD` (unlocked lock icon) change ALL THE TIME (clock going backwards icon) BUT there's a history of all the changes in the reflog Tiny cute illustration of a smiling stick figure with curly hair. person: the reflog is NOT easy to use but at least it's there ### staging area (unlocked lock icon) changes ALL THE TIME (crossed out clock going backwards icon) no history (sad face) just gotta be careful ### the stash (crossed out clock going backwards icon) `git stash pop` deletes entries forever ... but you can technically get them back by using `git fsck` to search EVERY SINGLE COMMIT
let's explore a commit
### panel 1: you can see for yourself how git is storing your files! You just need one command: `git cat-file -p` First, get a commit ID. You can get one from `git log` ### panel 2: read the commit ``` $ git cat-file -p 3530a4 tree 22b920 parent 56cfdc author Julia <julia@fake.com> 1697682215 -0500 committer Julia <julia@fake.com> 1697682215 -0500 ``` ### panel 3: read the directory ``` $ git cat-file -p 22b920 100644 blob 4fffb2 .gitignore 100644 blob e351d9 404.html 100644 blob cab416 Cargo.toml 100644 blob fe442d hello.html 040000 tree 9de29f src ``` ### panel 4: read a file ``` $ git cat-file -p fe442d <!DOCTYPE html> <html lang="en"> <body> <h1>Hello!</h1> </body> </html> ``` ### panel 5: and we're done! `fe442d` is the sha1 hash of the contents of the file. It's called a "blob id". this is how git keeps things efficient: it only needs to make a new copy when the file changes
inside .git
### `HEAD` `HEAD` is a tiny file that just contains the name of your current branch `.git/HEAD` `ref: refs/heads/main` `HEAD` can also be a commit ID, that's called "detached `HEAD` state" ### branches a branch is stored as a tiny file that just contains 1 commit ID. It's stored in a folder called `refs/heads`. `7622629` - (actually 40 characters) tags are in `refs/tags`, the stash is in `refs/stash` ### commit a commit is a small file containing its parent(s), message, tree, and author `.git/objects/7622629` ``` tree c4e6559 parent 037ab87 author Julia <x@y.com> 1697682215 committer Julia <x@y.com> 1697682215 commit message goes here ``` these are compressed, the best way to see objects is with `git cat-file -p HASH` ### trees trees are small files with directory listings. The files in it are called "blobs" `.git/objects/c4e6559` ``` 100644 blob e351d93 404.html 100644 blob cab4165 hello.py 040000 tree 9de29f7 lib ``` the permissions here LOOK like unix permissions, but they're actually super restricted, only 644 and 755 are allowed ### blobs blobs are the files that contain your actual code `.git/objects/cab4165` `print("hello world!!!!")` ### reflog the reflog stores the history of every branch, tag, and `HEAD` `.git/logs/refs/heads/main` ``` 2028ee0 c1f9a4c Julia Evans <x@y.com> 1683751582 commit: no ligatures in code ``` each line of the reflog has: - before/after commit IDs - user + - timestamp - log message ### remote-tracking branches remote-tracking branches store the most recently seen commit ID for a remote branch `.git/refs/remotes/origin/main` `a9bbcae` when git status says "you're up to date with `origin/main`", it's just looking at this ### .git/config .git/config is a config file for the repository. it's where you configure your remotes `.git/config` ``` [remote "origin"] url = git@github.com: jvns/int-exposed fetch = +refs/heads/*: refs/remotes/origin/* [branch "main"] remote = origin merge refs/heads/main ``` git has and local global settings, the local settings are here and the global ones are in `~/.gitconfig` ### hooks hooks are optional scripts that you can set up to run (eg before a commit) to do anything you want `.git/hooks/pre-commit` ``` #!/bin/bash any-commands-you-want ``` ### the staging area the staging area stores files when you're preparing to commit `.git/index` `(binary file)`
HEAD is the commit you have checked out
[git]
In git you always have some commit checked out. `HEAD` is a pointer to that commit and you'll see `HEAD` used a lot in this zine. Like a branch, `HEAD` is just a text file. Run `cat .git/HEAD` or `git status` to see the current `HEAD`. Examples of how to use HEAD: - show the diff for the current commit: `git show HEAD` - UNDO UNDO UNDO UNDO: reset branch to 16 commits ago `git reset --hard HEAD~16` (`HEAD~16` means 16 commits ago) - show what's changed since 6 commits ago: `git diff HEAD~6` - squash a bunch of commits together `git rebase -i HEAD~8` (this opens an editor, use "fixup" to squash commits together)
HEAD and heads
### panel 1: have you ever seen refs/heads/main or HEAD and wondered what they mean? here's the deal: * `head` = branch * `HEAD` = current branch (yes, these are TERRIBLE names) ### panel 2: a head in git is a branch nobody really uses the term "head" for a branch except the official git docs though ### panel 3: HEAD is the current branch for example HEAD could be set to main it's stored in .git/HEAD Unless you don't have a current branch... ### panel 4: `HEAD` can be a commit ID instead of a branch This means you have no current branch. Git calls this a "detached head state" (another terrible name!) (silly picture of a stick figure whose head has fallen off) fixing this is easy though: `git checkout BRANCHNAME` ### panel 5: the current branch matters for these commands ``` git commit git rebase git merge git cherry-pick ``` these 4 will work if you have no current branch but will create commits that you have no easy way to refer to ``` git pull git push ``` these don't work at all if you're not a branch
git mistakes you can't fix
Most mistakes you make with git can be fixed. If you've ever committed your code, you can get it back. That's what the rest of this zine is about! Here are the dangerous git commands: the ones that throw away uncommitted work. - `git reset --hard COMMIT` 1. Throws away uncommitted changes 2. Points current branch at `COMMIT` Very useful, but be careful to commit first if you don't want to lose your changes - `git clean` Deletes files that aren't tracked by Git. - `git checkout BRANCH FILE` (or directory) Replaces FILE with the version from `BRANCH`. Will overwrite uncommitted changes.
git discussion bingo
A grid of boxes, like a bingo card, with the following text in them: - WTF is detached HEAD state - just use magit - subversion was so much worse - rewriting history is bad - I just do not care how git works - I hate git - git is a directed acyclic graph - I only know 5 commands - just spend 15 minutes learning git's internals - content addressed storage - git's design is so elegant - you have to understand the linux kernel dev workflow - a branch is just a pointer to a commit - something about "porcelain" - subversion was better - I've used git for 10 years and I have no idea how it works - mercurial is better - git is not github - the CLI is badly designed - merge sucks, only use rebase - something about Linus Torvalds - commits are immutable snapshots - you should just read Pro Git - rebase sucks, only use merge - I just delete my git repo if I mess it up
git branches: the rules
### branches have very few rules git lets you move branches forwards, backwards, or sideways if you want Illustration of three circles in a vertical line, with an additional branch extending out of the middle circle. The top circle is labelled "`main`". The middle circle is labelled "You could move `main` here. The circle in its own branch is labelled "or here." ### all changes to a branch are recorded in its reflog You can look at the reflog like this: `git reflog BRANCHNAME` reflog stands for "reference log" ### when you delete a branch, its reflog is deleted Illustration of a sad stick figure with short curly hair, talking to a box with a smiley face representing git. person: what if I wanted to look at the history of that branch to recover something? git: too bad! ### git will eventually delete any commit that isn't on a branch/tag/etc Illustration of four circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the second-from-bottom circle, and it is labelled "will be deleted by garbage collection after ~90 days unless you put it on a branch." ### git `branch -d` won't let you delete unmerged branches Illustration of three circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the bottom circle, labelled "my branch (not merged)" to delete an unmerged branch, you need to force it with `-D` ### rules git doesn't have about branches - when you push/pull a branch, the name doesn't have to match - the main branch doesn't have any special protections in git itself (though tools like GitHub can protect it)
fixing diverged remotes
### ways to reconcile two diverged branches Illustration of a sequence of boxes joined with lines. The first box is a star, the second box is a heart, and then it branches out into two boxes, one with a hash symbol and one with a squiggle. Hash symbol box is labelled “local main” and squiggle box is labelled “remote main” - combine the changes from both with (1) rebase or (2) merge! - throw out your local changes (3) after breaking your local branch! - throw out the remote changes (4) to get rid of something you accidentally pushed (be REAL careful with this one) ### reasons to throw away changes - I’ll throw away local changes if I accidentally committed to `main` instead of a new branch - I’ll throw away remote changes if I want to amend a commit after pushing it, and I’m the only one working on that branch ### 1. rebase ``` git pull --rebase git push ``` Illustration of four boxes (star, heart, squiggle, hash) in a straight line, labelled “local main” and “remote main” Many people like to configure `git config pull.rebase true` to make this the default when they run `git pull` ### 2. merge ``` git pull --no-rebase git push ``` Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle) then reconvene into a fifth box, with a diamond in it, labelled “local `main`” and “remote `main`” ### 3. throw away local changes ``` git switch -c newbranch git switch main git reset --hard origin/main ``` (the first line is labelled “optional: save your changes on `main` to `newbranch` so they’re not orphaned) Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle), which are labelled “new branch” and “local `main`" and "remote `main`” respectively. ### 4. throw away remote changes (DANGER!) `git push --force` Illustration of two boxes (star and heart) that then diverge into two branches one with a hash symbol, labelled “local `main`, remote `main`”, and one with a squiggle, whose box is a dotted line, and that’s labelled “orphan”. I ONLY do this if there's nobody else working on the branch.
every git jargon
### config ``` .git/config hook .gitconfig alias global local ``` ### history ``` log blame bisect diff ``` ### commit ``` commit checkout tree-ish show patch apply remotes restore ``` ### staging area ``` index staged cached grep add status staging area ``` ### branches ``` HEAD refs/heads/main detached HEAD state head HEAD^, HEAD~, HEAD^^ reference symbolic reference reset tag main master reflog .. ... ``` ### other features ``` stash worktree subtree submodule revert ``` ### merging ``` merge conflict rebase interactive rebase fast forward merge cherry-pick squash ours/theirs ``` ### remotes ``` upstream downstream push pull fetch clone fork remote refspec origin ```
every git command I use
getting started: git init, git clone move between branches: git branch, git checkout, git switch restore old files: git checkout, git restore preparing to commit: git status, git add, git mv, git rm, git diff, git reset combining branches: git merge, git rebase, git cherry-pick working with others: git pull, git push, git fetch, git remote making commits: git commit configuring git: git config, git remote code archaeology: git blame, git log FILENAME, git log -S SEARCh, git show, git diff trash changes: git stash, git checkout ., git reset --hard, git rebase -i git troubleshooting: git log BRANCH, git status, git diff, git reflog editing history: git rebase -i, git reset --hard
every commit has a parent
[git]
Every commit (except the first one!) has a parent commit! You can think of your git history as looking like this: current commit - c6045c - `HEAD` - "make cats blue" parent - 304db6 - `HEAD^` - "add cats" grandparent - a92eab - `HEAD^^` - "fix typo" b29aff - "initial commit" `HEAD` always refers to the current commit you have checked out, and `HEAD^` is its parent. So if you want to go look at the code from the previous commit, you can run `git checkout HEAD^` commits don't always have 1 parent. Merge commits actually have 2 parents! `git log` shows you all the ancestors of the current commit, all the way back to the initial commit
diverged branches
### when pushing/pulling, the hardest problems are caused by diverged branches sad error messages: ``` ! [rejected] main -&gt; main ``` (non `fast-forward`) `fatal: Not possible to fast-forward, aborting` `fatal: Need to specify how to reconcile divergent branches.` ### what are diverged branches it looks like this: Diagram with two blank boxes, followed by a box with a heart in it, that then branches out into two branches, one with a hash symbol in it, labelled "local main", and one with a squiggle in it, labelled "remote main". ### there are 4 possibilities with a remote branch 1. up to date (with a heart) Illustration of three boxes in a row, labelled both "local" and "remote" 2. need to pull Illustration of four boxes in a row. The second box in the sequence is labelled "local", the fourth branch is labelled "remote". 3. need to push Illustration of four boxes in a row. The second box in the sequence is labelled "remote", the fourth branch is labelled "local". 4. diverged (need to decide how to solve it) (sad face) Illustration of two boxes in a row, that then branches out into two branches. One of the branches has one box, labelled "remote", and the other branch has two boxes, labelled "local". ### how to tell your branches have diverged: `git status` 1. `$ git fetch` (get the latest remote state first) 2. `$ git status` Your branch and '`origin/main`' have diverged, and have 1 and 1 different commits each, respectively. (use "`git pull`" to merge the remote branch into yours) (diverged is highlighted) ### fix diverged branches before making more commits First illustration: two boxes in a row, then branches out into two branches, each with one box. It's labelled "not so bad to resolve..." Second illustration: two boxes in a row, then branches out into two branches, but each branch has a whole bunch of boxes. Illustration of a sad stick figure with curly hair. person: oh no ### there's no one solution Illustration of a smiling stick figure with curly hair. person: on the next page we'll talk about some options!
detached HEAD state
### how git knows what your current branch is: .git/HEAD `.git/HEAD` is a file where git stores either: 1. a branch name: the current branch 2. a commit ID: this means you don't have a current branch. git calls this "detached HEAD state" ### by itself, .git/HEAD being a commit ID is okay Illustration of a smiling stick figure with short curly hair. person: it's a great way to look at an old version of your code! I don't do it often, but it's super useful! git does it internally during a rebase! ### the only problem is that new commits you make can get "lost" (page 13) Illustration of five dots in a vertical stack, connected by lines. The top dot is labelled "main" and the bottom dot is labelled "HEAD". There is a dotted line branching off from "HEAD". The dot at the end of the dotted line is labelled "new commit will go here. danger! it won't be on any branch!" ### ways you can end up in detached HEAD state You will end up in detached HEAD state if you checkout: - a tag `$ git checkout v1.3` - a remote-tracking branch `$ git checkout origin/main` - a commit ID `$ git checkout a3ffab9` ### if you accidentally create commits in detached HEAD state, it's SUPER easy to avoid losing them just create a new branch! `git checkout -b oops` (you can also create a branch with `git switch -c` if you prefer) ### git has a little language for referring to commits - the current commit: `HEAD` - the previous commit: `HEAD^` - 3 commits ago: `HEAD^^^` - 3 commits ago: `HEAD~3` The full documentation is at: `man gitrevisions`
combining diverged branches
### there are 3 options for combining branches - merge - rebase - squash for example, let’s say we’re combining these 2 branches: Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle. ### panel 2: git rebase Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled “lost”. git merge Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond. git merge --squash Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 has a box with a spiral, followed by a branch with a squiggle. ### all 3 methods result in the EXACT SAME FILES some differences are: - the diff git shows you for the final commit - the commit ids - the specific flavour of suffering the method causes ### rebase pro: you can keep your git history simple: Diagram: a git history that is just a series of boxes in a straight line. pain: - harder to learn [sad face] - harder to undo [sad face] - easier to mess up [sad face] (I love rebase though!) ### merge pro: if you mess something up, the original commits are still in your branch’s history pain: when I look at histories like this I feel dread [sad face] Diagram: a complicated git history with a number of different branches. ### squash pro: have 20 messy commits? nobody needs to know! And it’s pretty simple to use. pain: “ugh, someone squashed their 3000-line branch into 1 commit” [sad face]
combining branches
### there are 3 options for combining branches * `merge` * `rebase` * `squash` for example, let's say we're combining these 2 branches: Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle. ### panel 2: 1. `git rebase` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan". 2. `git merge` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond. 3. `git merge --squash` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan". ### all 3 methods result in the EXACT SAME FILES some differences are: * the diff git shows you for the final commit * the specific flavour of suffering the method causes ### merge pro: if you mess something up, the original commits are still in your branch's history pain: when I look at histories like this I feel dread Diagram: a complicated git history with a number of different branches. ### rebase pro: you can keep your git history simple: Diagram: a git history that is just a series of boxes in a straight line. pain: - harder to learn [sad face] - harder to undo [sad face] - easier to mess up [sad face] (I love rebase though!) ### squash pro: have 20 messy commits? nobody needs to know! And it's pretty simple to use. pain: "ugh, someone squashed their 3000-line branch into 1 commit"
branches have no rules
### you might expect git to enforce some rules about branches some rules you might imagine: * you can't remove commits from a branch, only add them * the `main` branch has to stay more less in sync with `origin/main` But there are no rules. git character with demon hat: want to do something horrible to your branch? no problem! ### there are literally no rules commands that you can use to do weird stuff to a branch: * `git reset` * `git rebase` ### instead of rules, we have conventions for example: * run `git pull` often to keep your `main` up to date * if you're working with a big team, don't commit to `main` directly Illustration of the git demon talking to a nonplussed stick figure with curly hair. git demon: you've just gotta be really careful to not do the wrong thing and not mess up your branch person: um... thanks? ### our only saviour: the reflog `git reflog BRANCHNAME` will show you the history of every change to the branch, so you can always undo the reflog is a VERY unfriendly UI, but it's always there.
TCP: how to reliably get a cat
Step 3 in our plan is "open a TCP connection!" Let's learn what this "TCP" thing even is ### When you send a packet sometimes it gets lost jvns.ca server → Cat packets → lightning bolt laptop: nope never got it ### TCP lets you send a stream of data reliably, even if packets get lost or sent in the wrong order. four butterflies, labelled TCP C, TCP D, TCP D (duplicates), TCP A, and TCP B laptop: it says "abcd"! ### how does TCP work, you ask? WELL! ### how to know what order the packets should gо in: Every packet says what range of bytes it has. Like this: once upon a ti ← bytes 0-13 agical oysterbytes ← 30-42 me there was a m ← bytes 14-29 Then the client can assemble all the pieces into: "once upon a time there was a magical oyster" The position of the first byte (0,14,30 in our example) is called the "sequence number" ### how to deal with lost packets: When you get TCP data, you have to acknowledge it (ACK): jvns.ca server: here is part of a cat picture! that should be 28832 bytes so far! jvns.ca server (thinking): yay laptop: ACK! I have received all 28832 bytes If the server doesn't get an acknowledgement, it will retry sending the data.
miscellaneous networking tools
### stunnel make a SSL proxy for an insecure server ### hping3 make any TCP packet ### wget download files ### aria2c a fancier wget ### rsync sync files over SSH or locally ### lsof what ports are being used? ### httpie like curl but friendlier ### iftop/nethogs/ntop/iptraf/nload see what's using bandwidth ### whois is this domain registered? ### ipcalc easily see what 13.21.2.3/25 means ### python3 -m http.server serve files from a directory ### nftables new version of iptables ### zenmap GUI for nmap ### p0f identify OS of hosts connecting to you ### openVPN, wireguard VPNs ### tcpflow capture and assemble TCP streams ### sysctl configure Linux kernel's network stack ### ab/iperf benchmarking tools ### links a browser in your terminal ### telnet can help debug text network protocols
every Linux networking tool I know
### ping "are these computers even connected?" ### curl make any HTTP request you want ### httpie like curl but easier ("http get") ### wget download files ### tc on a linux router, slow down your brother's internet (and much more) ### dig/nslookup what's the IP for that domain? (DNS query) ### whois is this domain registered? ### ssh secure shell 💙 ### scp copy files over a SSH connection ### rsync copy only changed files (works over SSH) ### ngrep grep for your network ### tcpdump "show me all packets on Port 80!" ### wireshark look at those packets in a GUI ### tshark command line super powerful packet analysis ### tcpflow capture & assemble TCP streams ### ifconfig "what's my IP address?" ### route view & change the route table ### ip replaces ifconfig, route, and more! ### arp see your ARP table ### mitmproxy spy on SSL connections your programs are making ### nmap in ur network scanning ur ports ### zenmap GUI for nmap ### p0f identify OS of hosts connecting to you ### openvpn a VPN ### wireguard a newer VPN ### nc netcat! make TCP connections manually ### socat proxy a TCP socket to a unix domain socket + LOTS MORE ### telnet like SSH but insecure ### ftp/sftp copy files. sftp does it over SSH. ### netstat/ss/sof/fuser "what ports are servers using?" ### iptables set up firewalls and NAT! ### nftables new version of iptables ### hping3 construct any TCP packet you want ### traceroute/mtr what servers are on the way to that server? ### tcptraceroute Use top packets instead of icmp to traceroute ### ethtool manage physical Ethernet connections + network cards. ### iw/iwconfig manage wireless network settings (see speed/frequency!) ### sysctl configure Linux kernel's network stack ### openssl do literally anything with SSL certificates. ### stunnel make a SSL proxy server for an insecure server ### iptraf/nethogs/iftop/ntop see what's using bandwidth ### ab/nload/perf benchmarking tools ### python 3 -m http.server serve files from a directory ### ipcalc easily see what 13.21.2.3/25 means ### nsenter enter a container process's network namespace
BPF cheat sheet
[tcpdump]
how kubernetes can break - networking
how kubernetes can break - etcd
why the same origin policy matters
Browsers work hard to make sure that `evil.com` can't make requests to `other-website.com`. But `evil.com` can request `other-website.com` from its own server. So what's the big deal? Here are 2 reasons it's important to prevent Javascript code from making arbitrary requests from your browser: ### Reason 1: cookies Browsers often send your cookies with HTTP requests. You don't want `evil.com` to be able to make requests using your login cookies. They'd be logged in as you! evil.com Javascript: Send a GET request to mail.google.com with their current login cookies. browser: I'll do it, but you can't see the response unless the server says it's okay. (the browser will actually do it!) ### Reason 2 : network access You might be on a private network (for example your company's corporate network) that `evil.com` doesn't have access to, but your computer does. evil.com Javascript: POST request to secrets.corp.company.com/send_money please. browser: No! Same origin policy! I'm not even going to make that request without checking first.
HTTPS
HTTPS: HTTP + secure Here's what your browser does when it asks for `https://examplecat.com/cat.png:` 1. Negotiate an encryption key (AES symmetric key) to use for this connection to examplecat.com. The browser and server will use the same key to encrypt/decrypt content. Simplified version of how picking the encryption key works: browser, represented by the Firefox logo: hey I want examplecat.com server, represented by a box with a smiley face: here's proof that I'm examplecat.com browser, thinking: story checks out! browser: key exchange server: key exchange browser and server, thinking: we're going to use A$29FXY2.... as the encryption key This protocol for secure communication is called TLS (previously SSL) and you can use it on any TCP connection 2. Write an HTTP request ``` GET /cat.png HTTP/1.1 Host: examplecat.com User-Agent: Mozilla/... ``` 3. Encrypt the HTTP request With AES & send if to examplecat.com browser: $Af9bbca^~gggBF server, thinking: ah, I see, ``` GET /cat.png HTTP/1.1 Host: examplecat.com ... ``` 4. Receive encrypted HTTP response server: BXF^56□gxx... browser: nice, that means ``` 200 OK Content-Type: image/png ... ```
HTTP redirects
Sometimes you type a URL into your browser: `examplecat.com/dog.png` but end up at a slightly different URL: `examplecat.com/cat.png` ooh, where did the cat come from? I didn't type that! ### Here's what's going on behind the scenes: browser: ``` GET /dog.png HTTP/1.1 Host: examplecat.com ``` server: ``` 301 Moved Permanently Location: /cat.png ``` browser: okay, I'll try `/cat.png` instead browser: ``` GET /cat.png HTTP/1.1 Host: examplecat.com ``` server: ``` 200 OK <rest of website here> ``` The Location header tells the browser what new URL to use. The new URL doesn't have to be on the same domain: examplecat.com/panda can redirect to pandas.com. Setting up redirects is a great thing to do if you move your site to a new domain! ### ! Warning ! `301 Moved Permanently` redirects are PERMANENT: after a browser sees one once, it'll always use `examplecat.com/cat.png` when someone types `examplecat.com/dog.png` forever. You can't take it back and decide to not to redirect. If you're not sure you want to redirect your site for eternity, use `302 Found` to redirect instead.
HTTP exercises
Making HTTP requests with curl to real internet websites and trying different headers is my favourite way to play around with HTTP & learn. ### curl tips: - `-i` shows the response headers - `-I`shows the response headers (by sending a HEAD request) - `-H` adds a request header Try the Range header: `curl -i https://examplecat.com/cat.txt -H "Range: bytes=8-17"` Request (and print out!) a compressed response: ``` curl -i https://examplecat.com -H "Accept-Encoding: gzip" -- output - ``` Get a webpage in Spanish: `curl -i https://twitter.com -H "Accept-Language: es-ES` Get redirected to another URL: (hint: look at the `Location` header!) `curl -i http://examplecat.com` Guess what content delivery network Github iS using: (hint: it's in a header starting with x—) `curl -I https://github.githubassets.com` Find out when example.com was last updated (hint: `Last—Modified`) `curl -I example.com` Get a 404 not found: `curl -i examplecat.com/bananas`
opening a file
[linux2]
writing tip: say something surprising
writing tip: ask good questions
scenes from kubernetes
kubernetes components
command line arguments
[linux2]
 ### every process has command line arguments `$ ls 1 /usr/bin` (`ls`, `-l`, and `usr` are arguments!) ### they're passed to the program as an array example from Python: ``` import sys print(sys.argv) ``` `['test.py', 'file.txt' ]` ### arguments can be any sequence of bytes `$ python program.py ♥` (emoji are totally allowed!) ### the first argument is the executable's name ``` [ 'ls' '-1', '/usr/bin/' ] ``` (`ls` is the executable name) ### the total length of the arguments is limited you can find the limits on your system with `xargs -show-limits` It's usually ~2MB ### you can decide how you parse arguments - `-flag`: single dash! - `--flag`: 2 dashes! - `♥♥flag`: weird emoji scheme that will be very annoying to use!
file locking
terminal escape codes let you change colour
IMSI catchers (fake cellphone towers)
clock_gettime
[linux2]
### programs can be slow for a lot of reasons Illustration of two programs, each represented by a box with a smiley face. program 1: I'm waiting for a database query, you? program 2: I'm using SO MUCH CPU! ### it's not obvious when a program is using CPU Illustration of a stick figure with curly hair, looking unhappy. person: my webserver took 6 seconds to respond to that request! why? ### panel 3 person: how can I tell how much CPU time was used in this part of my code? ### clock-gettime clock-gettime is a system call. It can tell you how much CPU time your process/thread used since it started. ### how to track CPU time 1. run clock-gettime 2. do the thing (eg handle a HTTP request) 3. run clock-gettime 4. subtract! ### this trick works when You have 1 HTTP request per thread at a time Illustration of Ruby and node.js, each represented by a box with a smiley face. Ruby: I can use clock-gettime node.js: doesn't work for me, I have an event loop!
ways to count rows
Here are three ways to count rows: 1. `COUNT(*)`: count all rows This counts every row, regardless of the values in the row. Often used with a `GROUP BY` to get common values, like in this "most popular names" query: ``` SELECT first_name, COUNT(*) FROM people GROUP BY first_name ORDER BY COUNT(*) DESC LIMIT 50 ``` 2. `COUNT(DISTINCT column)`: get the number of distinct values Really useful when a column has duplicate values. For example, this query finds out how many species every plant genus has: ``` SELECT genus, COUNT (DISTINCT species) FROM plants GROUP BY 1 ORDER BY 2 ORDER BY DESC ``` "`GROUP BY 1`" means group by the first expression in the `SELECT`" 3. `SUM(CASE WHEN expression THEN 1 ELSE 0 END)` This trick using `SUM` and `CASE` lets you count how cats vs dogs vs other animals each owner has: ``` SELECT owner , SUM(CASE WHEN type = 'dog' then 1 else 0 end) AS num_dogs , SUM(CASE WHEN type = 'cat' then 1 else 0 end) AS num_cats , SUM(CASE WHEN type NOT IN ('dog', 'cat') then 1 else 0 end) end) AS num_other FROM pets GROUP BY owner ``` pets: | owner | type | |------------|-------------| | 1 | dog | | 1 | cat | | 2 | dog | | 2 | parakeet | | owner | num_dogs | num_cats | num_other | |----------|-------------|-------------|--------------| | 1 | 1 | 1 | 0 | | 2 | 1 | 0 | 1 |
SQL example: LEFT JOIN + GROUP BY
## example: LEFT JOIN + GROUP BY This query counts how many items every client bought (including clients who didn't buy anything): ``` SELECT name, COUNT (item) AS items_bought FROM owners LEFT JOIN sales ON owners.id = sales.client GROUP BY name ORDER BY items_bought DESC ``` `FROM owners LEFT JOIN sales...` #### owners | id | name | |----|---------| | 1 | maher | | 2 | rishi | | 3 | chandra | #### sales | item | client | |--------|--------| | catnip | 1 | | laser | 1 | | tuna | 1 | | tuna | 2 | `ON owners.id=sales.client` | id | name | item | |----|---------|--------| | 1 | maher | catnip | | 1 | maher | laser | | 1 | maher | tuna | | 2 | rishi | tuna | | 3 | chandra | NULL | `GROUP BY name` (same chart as previous, except the "maher" rows are circled, as are the "rishi" and "chandra" rows) `SELECT name, COUNT(item) AS items.bought` | name | items_bought | |---------|--------------| | rishi | 1 | | chandra | 0 | | maher | 3 | (`COUNT(item)` doesn't count `NULL`s) `ORDER BY items_bought DESC` | name | items_bought | |---------|--------------| | maher | 3 | | rishi | 1 | | chandra | 0 |
SQL example: get the time between baby feedings
This query finds the time since a baby's last feeding/diaper change. ``` SELECT event, hour, hour - LAG(hour) OVER(PARTITION BY event ORDER BY hour ASC) AS time_since_last FROM baby_log WHERE event in ('feeding', 'diaper') ORDER BY hour ASC ``` 1. `FROM baby_log` | event | hour | |------------|------| | feeding | 1 | | cough | 1 | | diaper | 3 | | feeding | 4 | | diaper | 5 | | diaper | 5 | | feeding | 7 | | cough | 7 | 2. `WHERE event IN ('diaper', 'feeding') | event | hour | |------------|------| | feeding | 1 | | diaper | 3 | | feeding | 4 | | diaper | 5 | | diaper | 5 | | feeding | 7 | 3. OVER (PARTITION BY event ORDER BY hour ASC) (this `ORDER BY` only affects the windows, not the query output) (There's a diagram of the table from step 2. There are arrows pointing to two smaller tables that break out only the lines where the event is "feeding", and only the lines where the event is "diaper", respectively) 4. `SELECT` type, hour, hour-LAG(hour) | event | hour | time_since_last | |------------|------|------------------------------------------------------------| | feeding | 1 | `NULL` (`LAG()` is `NULL` for the first row in the window) | | feeding | 4 | 3 | | feeding | 7 | 3 | | diaper | 3 | `NULL` | | diaper | 5 | 2 | | diaper | 5 | 0 | 5. `ORDER BY hour ASC` | event | hour | time_since_last | |------------|------|------------------| | feeding | 1 | `NULL` | | diaper | 3 | `NULL` | | feeding | 4 | 3 | | diaper | 5 | 2 | | diaper | 5 | 0 | | feeding | 7 | 3 |
SELECT
SELECT is where you pick the final columns that appear in the table the query outputs. Here's the syntax: ``` SELECT expression_l [AS alias], expression_2 [AS alias2], FROM ... ``` Some useful things YOU can do in a SELECT : - Combine many columns with SQL expressions A few examples: ``` CONCAT (first_name, ' ', last_name) DATE_TRUNC('month', created) ``` (This is PostgreSQL syntax for rounding date, other SQL dialects have different syntax) - Alias an expression with AS `first_name || ' ' || last_name AS full_name` is a mouthful! If you alias an expression with AS, you can use the alias elsewhere in the query to refer to that expression. (|| is a concatenation operation) ``` SELECT first_name || ' ' || last_name AS full_name FROM people ORDER BY full_name desc ``` (`full_name` refers to `first_name || ' ' || last_name`) - Select columns with SELECT * When I'm starting to figure out a query, I'll often write something like ``` SELECT * FROM some _ table LIMIT 10 ``` just to quickly see what the columns in the table look like.
OVER() assigns every row a window
A "window" is a set of rows: | name | class | grade | |---------|---------|---------| | juan | 1 | 93 | | lucia | 1 | 98 | (a window!) A window can be as big as the whole table (an empty `OVER()` is the whole table!) or as small as just one row. `OVER()` is confusing at first, so here's an example! Let's run this query that ranks students in each class by grade: ``` SELECT name, class, grade, ROW_NUMBER() OVER (PARTITION BY class ORDER BY grade DESC) AS rank_in_class FROM grades ``` Step 1: Assign every row a window. `OVER (PARTITION BY class)` means that there are 2 windows: one each for class 1 and 2. grades: | name | class | grade | |---------|---------|---------| | juan | 1 | 93 | | lucia | 1 | 98 | | raph | 2 | 88 | | chen | 2 | 90 | (Beside this table there is an illustration of two smaller popped-out tables showing the first two rows, and the second two rows respectively) Step 2: Run the function. We need to run `ROW_NUMBER()` to find each row's rank in its window: query output: | name | class | grade | rank_in_class | |---------|---------|---------|---------------| | juan | 1 | 93 | 2 | | lucia | 1 | 98 | 1 | | raph | 2 | 88 | 2 | | chen | 2 | 90 | 1 |
ORDER BY and LIMIT
### `ORDER BY` and `LIMIT` `ORDER BY` and `LIMIT` happen at the end and affect the final output of the query. `ORDER BY` lets you sort by anything you want! The syntax is: `ORDER BY` [expression] `ASC` or `DESC` (`ASC` stands for ascending) For example, this query sorts cats by the length of their name (shortest first): ``` SELECT * FROM cats ORDER BY LENGTH(name) ASC ``` cats: | owner | name | |-------|------------| | 1 | daisy | | 1 | dragonsnap | | 3 | buttercup | | 4 | rose | results of query: | owner | name | |-------|------------| | 4 | rose | | 1 | daisy | | 3 | buttercup | | 1 | dragonsnap | `LIMIT` lets you limit the number of rows output. The syntax is: ### `LIMIT` [integer] For example, this is the same as the previous query, but it limits to only the 2 cats with the shortest names: ``` SELECT * FROM cats ORDER BY LENGTH(name) ASC LIMIT 2 ``` cats: | owner | name | |-------|------------| | 1 | daisy | | 1 | dragonsnap | | 3 | buttercup | | 4 | rose | results of query: | owner | name | |-------|------------| | 4 | rose | | 1 | daisy |
INNER JOIN and LEFT JOIN
getting started with SELECT
A SQL database contains a bunch of tables sales: | client | item | |-------|--------| | x | x | | x | x | | x | x | clients: | id | name | |-------|--------| | x | x | | x | x | | x | x | cats: | owner | name | |-------|--------| | x | x | | x | x | | x | x | Every SELECT query takes data from those tables and outputs table of results. ### cats: | owner | name | |-------|------------| | 1 | daisy | | 1 | dragonsnap | | 3 | buttercup | | 4 | rose | ### query: ``` SELECT * FROM cats WHERE owner = 1 ``` ### query output | owner | name | |-------|------------| | 1 | daisy | | 1 | dragonsnap | ### A few basic facts to start out: - SELECT queries have to be written in the order: `SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT` - SQL isn't case sensitive: `select * from table` is fine too. This zine will use ALL CAPS for SQL keywords like `FROM`. smiling stick figure with curly hair: there are other kinds of queries like `INSERT/ UPDATE / DELETE` but this zine is just about `SELECT`
EXPLAIN your slow queries
Sometimes queries run slowly, and EXPLAIN can you why! 2 ways you can use EXPLAIN in PostgreSQL: (other databases have different syntax for this) 1. Before running the query (`EXPLAIN SELECT ... FROM ...`) This calculates a query plan but doesn't run the query. Smiling stick figure with long straight hair: I _always_ run `EXPLAIN` on a query before running my production database. I won't risk overloading the database with a slow query! 2. After running the query (`EXPLAIN ANALYZE SELECT ... FROM`) Smiling bald stick figure: why is my query so slow? Smiling stick figure with short curly hair: `EXPLAIN ANALYZE` runs the query and analyzes why it was slow! Here are the `EXPLAIN ANALYZE` results from PostgreSQL for the same query run on 2 tables of rows: one table that has an index and one that doesn't `EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1` unindexed table: ``` Seq Scan on users Filter: (id = 1) Rows Removed by Filter: 999999 Planning time: 0.185 ms Execution time: 179.412 ms ``` (`Seq Scan` means it's looking at each row (slow!)) indexed table: ``` Index Only Scan using users_id_idx on users Index Cond: (id = 1) Heap Fetches: 1 Planning time: 3.411 ms Execution time: 0.088 ms ``` (the query runs 50 times faster with an index)
CASE
Often I want to categorize by something that isn't a column: person 1: I want to count children/adults/teenagers but there's no column for that! person 2: no problem! jut categorize people based on age! `CASE` is how to write an `if` statement in SQL. Here's the syntax: ``` CASE WHEN <condition> THEN <result> WHEN <other-condition> THEN <result> ... ELSE <result> END ``` ## example: Here's how to categorize people into age ranges! ``` SELECT first_name, age, CASE WHEN age < 13 THEN 'child' WHEN age < 20 THEN 'teenager' ELSE 'adult' END AS age_range FROM people ``` (returns first `THEN` where the condition matches) people: | first_name | age | |---------------|--------| | ahmed | 5 | | marle | 17 | | akira | 60 | | pablo | 15 | result: | first_name | age | age_range | |---------------|--------|--------------| | ahmed | 5 | child | | marle | 17 | teenager | | akira | 60 | adult | | pablo | 15 | teenager |
user namespaces
### user namespaces are a security feature... smiling bald stick figure: I'd like root in the container to be totally unprivileged smiling stick figure with curly hair: you want a user namespace! ### but not all container runtimes use them same user! (two arrows point to two smiley faces, one labelled "root in container", the other labelled "root on host") ### "root" doesn't always have admin access Container process, represented by a box with a smiley face: I'm root so I can do ANYTHING right? Tux: actually you have limited capabilities so mostly you can just access files owned by root! ### in a user namespace, UIDs are mapped to host UIDs process: I'm running as UID O Linux: Oh, that's mapped to 12345 The mapping is in `/proc/self/uid_map` ### unmapped users show up as "nobody" `$ unshare --user bash` (create user namespace) `$ ls -l /usr/bin` `.. nobody nogroup apropos` ` nobody nogroup apt` (these are "actually" owned by root but we didn't map any user) ### how to find out if you have a separate user namespace compare the results of `$ ls /proc/PID/ns` between a container process and a host process.
pivot-root
### a container image is a tarball of a filesystem (or several tarballs: 1 per layer) pensive stick figure with short curly hair: if someone sends me a tarball of their filesystem, how do I use that though? ### chroot: change a process's root directory If you chroot to /fake/root when it opens the file /usr/bin/redis it'll get /fake/root/usr/bin/redis instead. You can "run" a container just by using chroot, like this: ``` $ mkdir redis; cd redis $ tar -xzf redis. tar $ chroot $PWD /usr/bin/redis # done ! redis is running! ``` ### programs can break out of a chroot #### chroot: Illustration of a box labelled "whole filesystem". Inside it is another box labelled "redis container directory". All these files are still there! A root process can access them if it wants. #### pivot_root Illustration of a box labelled "redis container directory". You can unmount the old filesystem so it's impossible to access it. Containers use pivot_root instead of chroot. ### to have a "container" you need more than pivot_root pivot_root alone won't let you: - set CPU/memory limits - hide other running processes - use the same port as another process - restrict dangerous system calls
network namespaces
### network namespaces are kinda confusing Illustration of an unhappy-looking stick figure with curly hair. person: what does it MEAN for a process to have its own network?? ### namespaces usually have 2 interfaces (+ sometimes more) - the loopback interface (127.0.0.1/8, for connections inside the namespace) - another interface (for connections from outside) ### every server listens on a port and network interface(s) `0.0.0.0:8080` means "port 8080 on every network interface in my namespace" ### 127.0.0.1 stays inside your namespace Illustration of a server, represented by a box with a smiley face, and a smiling stick figure with curly hair. server, thinking: I'm listening on 127.0.0.1 person: that's fine but nobody outside your network server namespace will be able to make requests to you! ### your physical network card is in the host network namespace Illustration of a rectangular box drawn with a dotted line. Inside it are: - the label "host network namespace" - 192.168.1.149, with an arrow pointing to it reading "requests from other computers" - network card ### other namespaces are connected to the host namespace with a bridge Illustration of a rectangular box drawn with a dotted line. Inside it are: - the label "host network namespace" - three boxes, each labelled "container"
layers (containers)
### different images have similar files Rails container image and Django container image, each represented by a box with a smiley face: we both use Ubuntu 18.0! ### reusing layers saves disk space Rails image - Rails app - ubuntu:18.04 Django image - Django app - ubuntu:18.04 Both have the exact same files on disk for ubuntu:18.04. ### a layer is a directory ``` $ ls 8891378eb* bin/ home/ mnt/ run/ tmp/ boot/ lib/ opt sbin/ usr/ dev/ lib64/ proc/ srv/ var/ etc/ media/ root/ sys/ ``` `etc` are files in an ubuntu:18.04 layer ### every layer has an ID usually the ID is a sha256 hash of the layer's contents example: `8e99fae2..` ### if a file is in 2 layers, you'll see the version from the top layer Two rectangular boxes on top of one another, each labelled `/code/file.py`. The one on top is the version you'll see in the merged image. ### by default, writes go to a temporary layer Illustration of a rectangle labelled "temp layer", with a bunch of other smaller rectangles stacked underneath it. The temp layer is labelled "thse files might be deleted after the container exits." To keep your changes, write to a directory that's mounted from outside the container
layers
### different images have similar files Rails container image and Django container image: we both use Ubuntu 18.04! ### reusing layers saves disk space Rails image: Rails app ubuntu:18.04 Django image: Django app ubuntu:18.04 exact same files on disk! ### a layer is a directory ``` $ ls 8891378eb* bin/home/mnt/run/tmp/ boot/lib/ opt sbin/ usr/ dev/lib64/proc/srv/var/ etc/media/ root/sys/ ``` files in an ubuntu:18.04 layer ### every layer has an ID usually the ID is a sha256 hash of the layer's contents example: `8e99fae2..` ### if a file is in 2 layers, you'll see the version from the top layer `/code/file.py` (this is the version you'll see in the merged image) `/code/file.py` ### by default, writes go to a temporary layer temp layer (these files might be deleted after the container exits) To keep your changes, write to a directory that's mounted from outside the container
how to make a namespace
### processes use their parent's namespaces by default parent, represented by a box with a smiley face: I'm in the host network namespace! child, represented by a smaller box with a smiley face (created with 'clone' syscall): me too! ### but you can switch namespaces at any time box with a smiley face: I'm starting a container so it needs its own namespaces ### command line tools - `$ unshare --net COMMAND`: run COMMAND in a new network namespace - `$ sudo lsns`: list all namespaces - `$ nsenter -t PID --all COMMAND`: run COMMAND in the same namespaces as PID ### namespace system calls - clone: make a new process - unshare: make + use a namespace - setns: use an existing namespace ### *clone* lets you create new namespaces for a child process parent: `clone (... CLONE_NEWNET)` child: I have my own network namespace! ### each namespace type has a man page ``` $ man network_namespaces ... A physical network device can live in exactly one network namespace. ```
containers: the big idea: include EVERY dependency
### containers package EVERY dependency together smiling stick figure with short curly hair: to make sure this program will run on your laptop, I'm going to send you every single file you need ### a container image iS a tarball of a filesystem Here's what's in a typical Rails app's container: - your app's code - libc + other system libraries - Ubuntu base OS - Ruby interpreter - Ruby gems ### how images are built 0. start with a base OS 1. install program + dependencies 2. configure it how you want 3. make a tarball of the WHOLE FILESYSTEM tiny stick figure with short curly hair: this is what `docker build` does! ### running an image 1. download the tarball 2. unpack it into a directory 3. run a program and pretend that directory is its whole filesystem ### images let you "install" programs realty easily person, thinking: I can set up a Postgres test database in like 5 seconds! wow!
container IP addresses
### containers often get their own IP address wordpress container 1: I'm running WordPress at `172.17.2.3:8080`! wordpress container 2: I'm using 172.17.0.49:8080! ### containers use private IP addresses These are reserved for private networks (RFC 1918). This is because they're not directly on the public internet. ### for a packet to get to the right place, it needs a route packet `172.16.2.3`: hi! I'm here! router, represented by a box with a nonplussed expression: I don't have any entry matching `172.16.2.3` in my route table, sorry! ### inside the same computer, you'll have the right routes same computer: ``` $ curl 172.16.2.3:8080 <html>.... ``` different computer: ``` $ curl 172.16.2.3:8080 .... no reply .... ``` ### distributing the right routes is complicated box with a smiley face: a new container started, 10.2.73.4 should go to X computer now route table, also represented by a box with a smiley face, thinking: wow these things change a lot ### cloud providers have systems to make container IPs work In AWS, this is called an "elastic network interface" route table
container configuration options
### panel 1 Illustration of a smiling stick figure with curly hair. person: here are the 6 most important things you can configure when starting a container! ### map a port to the host Illustration of two boxes drawn with dotted lines. One is labelled "host", the other is labelled "container". The "host" box says "port 1234", and the "container box" is labelled "port 8080". There is a double-ended arrow pointing back and forth between the two ports. ### mount directories from the host Illustration of two boxes drawn with dotted lines. One is labelled "host", the other is labelled "container". The "host" box says "`~/code/blah`", and the "container box" says "`/src`". There is a double-ended arrow pointing back and forth between the two boxes. ### set capabilities ### add seccomp-bpf filters ### set memory and CPU limits person: only 200 MB RAM for you ### use the host network namespace Usually the default is to use a new network namespace!
the CSS inspector
### all major browsers have a CSS inspector usually you can get to it by right clicking on an element and then "inspect element, but sometimes there are extra step ### see overridden properties `button {` `display: inline-block;` `color: var(--orange);` (this line in strikethrough) `}` ### edit CSS properties ``` element { { ``` (lets you change this element's properties) ``` button { display: inline-block; border: 1px solid black; } ``` (this lets you change the border of every `<button>`!) ### see computed styles person, represented by a smiling stick figure: here's a website with 12000 lines of CSS, what `font-size` does this link have? browser, represented by a box with a smiley face: 12px, because of `x.css` line 436 ### look at margin & padding Box Model Illustration of a small box labelled 1261 x 26. On the outside of that box is the word "padding". Surrounding the padding is the border. Surrounding the border is the margin. ### and LOTS more different browsers have different tools! For example, Firefox has special tools for debugging grid/flexbox.
media queries
### media queries let you use different CSS in different situations ``` @media print { #footer { display: none; } ``` (`print` is the media query, and the rest is the CSS to apply) ### max-width & min-width ``` @media (max-width: 500px) { // CSS for small screens } @media (min-width: 950px) { // CSS for large screens } ``` ### print and screen `screen` is for computer/ mobile screens `print` is used when printing a webpage there are more: `tv`, `tty`, `speech`, `braille`, etc ### accessibility queries you can sometimes find out a user's preferences with media queries examples: `prefers-reduced-motion: reduce` `prefers-color-scheme: dark` ### you can combine media queries it's very common to write something like this: ``` @media screen and (max-width: 1024px) ``` ### the viewport meta tag `<meta name="viewport" content="width=device-width, initial-scale=1">` Your site will look bad on mobile if you don't add a tag like this to the `<head>` in your HTML. Look it up to learn more!
hiding elements with CSS
### there are many ways to make an element disappear Illustration of a smiling stick figure with curly hair. person: which one to use depends: do you want the empty space it left to be filled? ### TRY ME: display: none; other elements will move to fill the empty space Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `display: none;`, the heart and star boxes will now be side-by-side. ### visibility: hidden; the empty space will stay empty Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `visibility: hidden;`, the heart and star boxes will have a gap between them the size of the "x" box. ### opacity: 0; like `visibility: hidden`, but you can still click on the element & it'll still be visible to screen readers. Usually `visibility: hidden` is better. ### how to slowly fade out ``` #fade:hover { transition: all 1s ease; visibility: hidden; opacity: 0; } ``` set the opacity just so that the transition works ### TRY ME: z-index z-index sets the order of overlapping positioned elements Illustration of two boxes, a smaller one with an "x" in it, that is overlapped over a larger empty box. There is an arrow pointing to a second illustration where the boxes are stacked in the opposite order, so that the small box is underneath of the large box.
CSS transitions
### an element's computed style can change 2 ways this can happen: 1. pseudo-classes (like `:hover`) 2. Javascript code `el.classList.add('x')` ### new styles change the element instantly... ``` a:hover { color: red; } ``` the element will turn red right away ### unless you set the transition property ``` a { color: blue; transition: all 2s; } a:hover { color: red; } ``` ("`all 2s`" = will fade from blue to red over 2s) ### transition has 3 parts `transition: color 1s ease;` `color`: which CSS properties to animate `1s`: duration `ease`: timing function ### not all property changes can be animated.... `list-style-type: square;` CSS renderer, represented by a box with a smiley face: I don't know how to animate that, sorry! ### ...but there are dozens of properties that can if it's a number or color, it can probably be animated! ``` font-size: 14px; rotate: 90deg; width: 20em; ```
css specifications
### CSS has specifications CSS 2.1, represented by an image of a document with many lines of text: hello, this is how max-width works in excruciating detail ### there used to be just one specification Illustration of a smiling stick figure with curly hair. person: it's called "CSS 2" and I still like to reference it to learn the basics ### today, every CSS feature has its own specification you can find them all at https://www.w3.org/TR/CSS/ there are dozens of specs, for example: colors, flexbox, and transforms ### major browsers usually obey the spec but sometimes they have bugs Illustration of a happy little caterpillar-type bug. browser, represented by a box with a smiley face: oops, I didn't quite implement that right... ### levels CSS versions are called "levels". new levels only add new features. They don't change the behaviour of existing CSS code ### new features take time to implement https://caniuse.com (The URL is surrounded by little hearts and stars) can tell you which browser versions support a CSS feature
CSS isn't arbitrary
CSS borders
### `border` has 3 components `border: 2px solid black;` is the same as ``` border-width: 2px; border-style: solid; border-color: black; ``` ### `border-style` options - `solid` - `dotted` - `dashed` - `double` (each word is surrounded by the border it describes) + lots more (`inset`, `groove`, etc) ### `border-{side}` you can set each side's border separately: ``` aborder-bottom: 2px solid black; ``` ### `border-radius` border-radius lets you have rounded corners `border-radius: 10px;` `border-radius: 50%;` will make a square into a circle! ### box-shadow lets you add a shadow to any element `box-shadow: 5px 5px 8px black;` the first "5px" is the x offset, the second "5px" is the x offset, "8px" is the blur radius, and "black" is the color. ### outline `outline` is like `border`, but it doesn't change an element's size when you add it outlines on `:hover/: active` help with accessibility: with keyboard navigation, you need an outline to see what's focused
CSS backwards compatibility
### browsers support old HTML + CSS forever Illustration of a smiling stick figure with long hair, talking to a browser from 2020, represented by a box with a smiley face. person: I wrote this CSS in 1998 2020 browser: still works great! ### this makes CSS hard to write... Illustration of two stick figures talking person 1: why are CSS units so weird? person 2, with grey hair: let me tell you a story from 20 years ago... ### but it means it's worth the investment Illustration of a smiling stick figure with long hair, talking to a browser, represented by a box with a smiley face. person: I spent DAYS getting this CSS to work browser: I'll make sure it keeps working forever! ### if you don't follow the standards, you're not guaranteed backwards compatibility my site broke! (oh yeah, Firefox dropped support for that experiment ### your CSS doesn't have to support browsers from 1998 Illustration of a smiling stick figure with short curly hair. person: just test that your CSS works on the browsers that your users are using! ### newer features are often easier to use what people expect from a website has changed a LOT since 1998. Newer CSS features make responsive design easy
browser default stylesheets
### every browser has a default stylesheet (aka "user agent stylesheet") a small sample from the Firefox default stylesheet: ``` h1 { font-size: 2em; font-weight: bold; } ``` ### different browsers have different defaults Illustration of a smiling stick figure with curly hair. person: buttons & forms have some of the biggest differences ### you can read the default stylesheet Firefox's default stylesheets are at: `resource://gre-resources/` ### every property also has a default "initial value" the initial value (defined in the spec) is what's used if no stylesheet has set anything. For example, `background-color`'s initial value is `transparent` ### a CSS property can be set in 5 ways (listed from lowest priority to highest priority) 1. the initial value 2. the browser's default stylesheet 3. the website's stylesheets and user stylesheets 4. inline styles set with HTML/JS
why I love bash
### it's SO easy to get started Here's how: 1. Make a file called `hello.sh` and put some commands in it, like `ls /tmp` 2. Run it with `bash hello.sh` ### pipes & redirects are super easy managing pipes in other languages is annoying. in bash, it's just: `cmd1 | cmd2` ### batch file operations are easy smiling stick figure with curly hair: let's convert every .png to a .jpg bash, with hearts in its eyes: I was born for this ### it's surprisingly good at concurrency smiling stick figure with curly hair: let's start 12 programs in parallel & wait for them all to finish bash: yep no problem! ### it doesn't change bash is weird and old, but the basics of how it works haven't changed in 30 years. If you learn it now, it'll be the same in 10 years. ### bash is GREAT for some tasks But it's also EXTREMELY BAD at a lot of things. I don't use bash if I need: * unit tests * math (bash barely has numbers!) * easy-to-read code ☺
subshells
### a subshell is a child shell process bash, represented by a box with a smiley face: hey, can you run this bash code for me? other bash process: sure thing! ### some ways to create a subshell 1. put code in parentheses `(...)` `(cd $DIR; ls)` (runs in subshell) 2. put code in `$(...)` `var=$(cat file.txt)` (runs in subshell) 3. pipe/redirect to a code block `cat x.txt while read line...` (piping to a loop makes the loop run in a subshell) 4. and lots more for example, process substitution `<()` creates a subshell ### `cd` in a subshell doesn't `cd` in the parent shell ``` ( cd subdir/ mv x.txt y.txt ) ``` I like to do this so I don't have to remember to `cd` back at the end! ### setting a variable in a subshell doesn't update it in the main shell ``` var=3 (var=2) echo $var ``` (this prints 3, not 2) ### it's easy to create a subshell and not notice `x=$(some_function)` sad stick person: I changed directories in some_function, why I didn't it work? happy stick person: it's running in a subshell!
POSIX compatibility
### there are lots of Unix shells - dash - bash - sh - zsh - fish - csh - tcsh - ksh you can find out your user's default shell by running: `$ echo $SHELL` ### POSIX is a standard that defines how Unix shells should work sh, dash, bash, zsh, and ksh, all represented by little boxes with smiley faces: if your script sticks to POSIX, we'll all run it the same way! (mostly [smiley face]) fish: I don't care about POSIX ### some shells have extra features bash, zsh, and ksh: we have extra features that aren't in POSIX sh and dash: we keep it simple & just do what POSIX says ### on most systems, /bin/sh only supports POSIX features smiling stick figure with short curly hair: if your script has `#!/bin/sh` at the top, don't use bash-only features in it! ### some people write all their scripts to follow POSIX smiling stick figure with straight chin length hair: I only use POSIX features smiling stick figure with short curly hair, labelled "me": I use lots of bash-only features! ### this zine is about bash scripting smiling stick figure with short curly hair: most things in this zine will work in any shell, but some won't! page 15 lists some non-POSIX features
non-POSIX features
### some bash features aren't in the POSIX spec Illustration of a smiling stick figure with curly hair. Person: here are some examples! These won't work in POSIX shells like `dash` and `sh`. ### arrays POSIX shells only have one array: `$@` for arguments ### [[ $DIR=/home/*]] POSIX alternative: match strings with `grep` ### [[ ... ]] POSIX alternative: `[ ... ]` ### diff <(./cmd1) <./cmd2) this is called "process substitution", you can use named pipes instead ### the local Keyword in POSIX shells, all variables are global ### for ((i=0; i <3; i++)) `sh` only has for `x` in ... loops, not C-style loops ### a. {png, svg} you'll have to type `a.png a.svg` ### {1..5} POSIX alternative: `$(seq 1 5)` ### $'\n' POSIX alternative: `$(printf "\n")` ### ${var//search/replace} POSIX alternative: pipe to `sed`
environment variables
[bash, shell]
### panel 1: every process has environment variables how to see any process's environment variables on Linux: ``` cat /proc/$PID/environ | tr '\0' '\n' ``` ### panel 2: shell scripts have 2 kinds of variables 1. environment variables 2. shell variables unlike in most languages, in shell you access both of these in the exact same way: `$VARIABLE` ### panel 3: export sets environment variables ``` export ANIMAL=panda ``` `export ANIMAL=panda` means that every child process will have `ANIMAL` set to `panda` ### panel 4: child processes inherit environment variables this is wy the variables set in your `.bash_profile` work in all programs you start from the terminal. They're all child processes of your bash shell! ### panel 5: shell variables aren't inherited ``` var=panda ``` in this example, `$var` only gets set in this process, not in child processes ### panel 6: you can set environment variables when starting a program Illustration of a smiling stick figure with curly hair, talking to env, represented by a box with a smiley face. Person: `env VAR=panda ./myprogram` env: OK! I'll set `VAR` to `panda` and then start `./myprogram`
bash variables
### how to set a variable - `var=value` right (no spaces!) - `var = value` wrong `var = value` will try to run the program var with the arguments "`=`" and "`value`" ### how to use a variable: "$var" ``` filename=blah.txt echo "$filename" ``` they're case sensitive. environment variables are traditionally all-caps, like `$HOME` ### there are no numbers, only strings ``` a=2 a="2" ``` both of these are the string "2" technically bash can do arithmetic, but I avoid it ### always use quotes around variables `$filename="swan 1.txt"` `$ cat $filename` (wrong) bash: ok, I'll run `cat swan 1.txt` 2 files! oh no! we didn't mean that! cat: Um `swan` and `1.txt` don't exist... $ cat "$filename" (right!) bash: ok, I'll run `cat "swan 1.txt"` cat '"swan 1.txt"`! that's a file! yay! ### ${varname} To add a suffix to a variable like "2", you have to use `${varname}`. Here's why: `$ zoo=panda` `$ echo "$zoo2"` prints `""`, `zoo2` isn't a variable `$ echo "${zoo}2"` this prints "`panda2`" like we wanted
bash quotes
### bash has 3 kinds of quotes #### 'single quotes' ``` $ echo '$HOME\n' $HOME \n ``` #### "double quotes" ``` $ echo "$HOME \n" /home/bork\n ``` (only double quotes expand variables) #### $' ``` $ echo $'$HOME\n' $HOME ``` (invisible newline) (only `$'` expands escape sequences like `\n` or `\'`) ### you can quote multiline strings ``` $ MESSAGE="Usage: here's an explanation of how to use this script!" ``` ### here documents heredocs are a way to write string containing quotes: expands variables: ``` $ cat <<PANDA he said: "that's $5" PANDA ``` doesn't expand: ``` $ cat <<'PANDA' he said: "that's $5" PANDA ``` ### a trick to escape any string: `!:q` make bash do it for you! ``` $ # He said "that's $5" $ !:q '/# He said "that'\'s $5"' ``` (3 strings squished together) ### escaping and here are a few ways to get a ' or ": ``` \' and \" " ' " and ' " ' $'\'' "\"" ``` person: `'\''` doesn't work!
bash pipes
### sometimes you want to send the output of one process to the input of another ``` $ ls | wc -1 53 ``` (53 files!) ### a pipe is a pair of 2 magical file descriptors ls -> stdout -> IN -> pipe -> OUT -> stdin -> wx (IN and OUT are file descriptors) ### panel 3 when ls does `write(IN, "hi")` wc can read it! `read(OUT)-> "hi"` Pipes are one way. You can't write to OUT ### the OS creates a buffer for each pipe IN -> data waiting to be read -> OUT when the buffer gets full: process, represented with by a box with a smiley face: `write(IN, "..."` OS, represented by a box with a nonplussed face: it's full! I'm going to pause you until there's room again, ### named pipes you can create a file that acts like a pipe with `mkfifo` ``` $ mkfifo mypipe $ ls > mypipe & $ wc < mypipe ``` (this does the same thing as `ls | wc`) ### you can use pipes in other languages! only shell has the syntax `process1 | process2` but you can create pipes in basically any language!
bash input
### read -r var reads stdin into a variable ``` $ read -r greeting hello there! ``` (type here and press enter) ``` + $ echo "$greeting" hello there! ``` ### you can also read into multiple variables ``` $ read -r name1 name2 ahmed fatima $ echo "$name2" fatima ``` ### by default, read strips whitespace `" a b c " -> "a b c"` it uses the `IFS` ("Input Field Separator") variable to decide what to strip ### set `IFS=''` to avoid stripping whitespace empty string `$ IFS=''`(empty string) `read -r greeting` ``` hi there! $ echo "$greeting" hi there! ``` the spaces are still there! ### more `IFS` uses: loop over every line of a file by default, for loops will loop over every word of a file (not every line). Set `IFS=''` to loop over every line instead! (don't forget to unset IFS when you're done!) ``` IFS='' for line in $(cat file.txt) do echo $line done ```
bash globs
### globs are a way to match strings beware: the `*` and the `?` in a glob are different than `*` and `?` in a regular expression!!! bear* matches -> bear ✓ matches -> bearable ✓ doesn't match -> bugbear x ### bash expands globs to match filenames smiling stick figure with short curly hair: `cat *.txt` bash, represented by a box with a smiley face, thinking: let's find all the `.txt` files in this directory... bash: `exec(["cat", "sun.txt" "planet.txt"])` cat, also represented by a box with a smiley face, thinking: `sun.txt` and `planet.txt`, got it (cat doesn't know that you wrote `cat *.txt`) ### there are just 3 special characters `*` matches 0+ characters `?` matches 1 character `[abc]` matches `a` or `b` or `c` person: I usually just use * in my globs ### use quotes to pass a literal '*' to a command `$ egrep 'b.*' file.txt` the regexp 'b.*' needs to be quoted so that bash won't translate it into a list of files with b. at the start ### filenames starting with a dot don't match unless the glob starts with a dot, like `.bash*` person: `ls *.txt` bash: there's `.bees.txt`, but I'm not going to include that
bash for loops
### for loop syntax ``` for i in panda swan do echo "$i" done ``` ### the semicolons are weird usually in bash you can always replace a newline with a semicolon. But not with for loops! `for i in a b; do ...; done` you need semicolons before do and done but it's a syntax error to put one after do ### looping over files is easy ``` for i in *.png do convert "$i" "${i/png/jpg}" done ``` this converts all png files to jpgs! ### for loops loop over words, not lines `for word in $(cat file.txt)` loops over every word in the file, NOT every line (see page 18 for how to change this!) ### while loop syntax ``` while COMMAND do ... done ``` like an if statement, runs COMMAND and checks if it returns 0 (success) ### how to loop over a range of numbers 3 ways: ``` for i in $(seq 1 5) for i in {1..5} for ((i=1; i<6; i++) ``` the second two only work in bash, not sh
bash debugging
### our hero: `set -x` `set -x` prints out every line of a script as it executes, with all the variables expanded! `#!/bin/bash set -x` (I usually put `set -x` at the top) ### or `bash -X` `$ bash -x script.sh` does the same thing as putting `set -x` at the top of `script.sh` ### you can stop before every line `trap read DEBUG`\ the `DEBUG` "signal" is triggered before every line of code ### a fancy step debugger trick put this at the start of your script to confirm every line before it runs: `trap '(read -p "\[$BASH_SOURCE: $LINENO] $BASH_COMMAND")' DEBUG` - `read -p` prints a message, press enter to continue - `$BASH_SOURCE` is the script filename - `$LINENO` is the line number - `$BASH_COMMAND` is the next command that will run ### how to print better error messages this die function: `die() { echo $1 >&2; exit 1; }` lets you exit the program and print a message if a command fails, like this: `some_command || die "oh no!"`
bash builtins
### most bash commands are programs You can run `which` to find out which binary is being used for a program: ``` $ which ls /bin/ls ```` ### but some commands are functions inside the bash program smiling stick figure with short curly hair: `$ echo hi` bash, represented by a box with a smiley face: ooh, echo? I'll call my builtin function that does that! ### type tells you if a command is a builtin ``` $ type grep grep is /bin/grep $ type echo echo is a builtin $ type cd cd is a builtin ``` ### examples of builtins - `declare` - `type` - `source` - `alias` - `read` - `cd` - `printf` - `echo` ### a useful builtin: `alias` `alias` lets you set up shorthand commands, like: `alias gc="git commit"` `~/.bashrc` runs when bash starts, put aliases there! ### a useful builtin: `source` `bash script.sh` runs `script.sh` in a subprocess, so you can't use its variables / functions. `source script.sh` is like pasting the contents of `script.sh`
background processes
### scripts can run many processes in parallel ``` python -m http.server & curl localhost:8080 ``` & starts python in the "background", so it keeps running while `curl` runs ### wait waits for all background processes to finish ``` command1 & command2 & wait ``` this waits for both `command1` and `command2` to finish ### concurrency is easy* in bash in other languages: smiling stick figure with short curly hair, thinking: threads? how do I do that again? in bash: ``` thing1 & thing2 & wait ``` `*` (if you keep it very simple) ### background processes sometimes exit when you close your terminal you can keep them running with `nohup` or by using `tmux/screen`. `$ nohup ./command &` ### panel 5: person: `jobs`, `fg`, `bg`, and `disown` let you juggle many processes in the same terminal, but I almost always just use multiple terminals instead ### panel 6: - `jobs`: list shell's background processes - `disown`: like nohup, but after process has started - `fg and bg`: move process to foreground/background
DNS queries
DNS queries aren't harmless
how to read dig output
glue records
how airports lie to you with DNS
DNS is distributed
SPF & DKIM records
things that can break your DNS
DNS cache levels
why DNS updates are slow: caching
### You might have heard that DNS updates need time to "propagate". What's actually happening is that there are old cached records which need to expire. ### DNS records are cached in many places - browser caches - DNS resolver caches - operating system caches google.com, represented by a box with a smiley face: my DNS records are cached on billions of devices! ### let's see what happens when you update an IP bananas.com A▾ 300 [changed to] 60 1.2.3.4 [changed to] 5.6.7.8 beware: even if you change the TTL to 60s, you still have to wait 300 seconds for the old record to expire ### 30 seconds later... (you go to bananas.com in your browser) Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a browser, represented by the Firefox logo of a fox wrapped around a globe browser: hey what's the IP for bananas.com? resolver, thinking: let's check my cache for bananas.com... found it!! resolver: it's 1.2.3.4! ### 400 seconds later... (you refresh the page again) browser: hey what's the IP for bananas.com? resolver, thinking: The TTL (300s) is up, better ask for a new IP... resolver: it's 5.6.7.8! ### 12 hours later... (you check 1.2.3.4's logs to make sure all the traffic has moved over) Illustration of a stick figure with curly hair looking confused, and a rogue DNS resolver, which looks like the other resolvers except that it is wearing a burglar mask. person: that's weird, the old server is still getting a few requests... rogue DNS resolver: I don't care about your TTL! I just cache everything for 24 hours! the culprit: a rogue DNS resolver
TXT records & more
### TXT records can contain literally anything ``` examplecat.com TXT "hello! I'm an example cat!" ``` (though they're usually ASCII) ### they're often used to verify that you own your domain google, represented by a box with a smiley face: put "banana stand panda" in a TXT record to prove you) own this domain! ### reasons to verify your domain - to issue SSL certificates with Let's Encrypt - to use Single Sign On (SSO) for a service - to get access to Google/ Facebook's data about your domain (eg search data) ### they're also used for email security (SPF/DKIM/DMARC) Illustration of two smiling stick figures talking. person 1: should we create a DNS record type for SPF? person 2: nah let's just put it all in TXT records! (not a historically accurate summary of the design process for SPF records) ### TXT records can contain many strings Each string is at most 256 characters, and clients will concatenate them together. You'll see this in DKIM records, because they're usually more than 256 characters. ### some other record types CAA: restrict who can issue certificates for your domain PTR: reverse DNS map IP addresses to domain names (look these up with `dig -x`) SRV: holds both an IP address and a port number
the root nameservers
### every DNS resolver starts with a root nameserver Illustration of a conversation between a resolver, represented by a box with a smiley face holding a magnifying glass, and a root nameserver, represented by a box with a smiley face wearing a stack of crowns. resolver: what's the IP for example.com? root nameserver: You should ask a `.com` nameserver! They're at `a.gtld-servers.net, b....` ### root nameserver IP addresses almost never change `a.root-servers.net`'s IP (`198.41.0.4`) hasn't changed since 1993. DECADES ago! ### there are thousands of physical root nameservers, but only 13 IP addresses Each IP refers to multiple physical servers, you'll get the one closest to you. (this is called "anycast") There's a map at https://root-servers.org ### if they didn't exist, resolvers wouldn't know where to start resolver, distressed: I need an IP address of an initial server to query, and I can't use DNS to get that IP! ### every resolver has the root IPs hardcoded in its source code example: https://wzrd.page/bind You can query one like this: `dig @198.41.0.4 example.com` All the IPs will give you the exact same results, there are just lots of them for redundancy. Here they are! ``` a.root-servers.net 198.41.0.4 b.root-servers.net 199.9.14.201 c.root-servers.net 192.33.4.12 d.root-servers.net 199.7.91.13 e.root-servers.net 192.203.230.10 f.root-servers.net 192.5.5.241 g.root-servers.net 192.112.36.4 h.root-servers.net 198.97.190.53 i.root-servers.net 192.36.148.17 j.root-servers.net 192.58.128.30 k.root-servers.net 193.0.14.129 1.root-servers.net 199.7.83.42 m.root-servers.net 202.12.27.33 ```
the DNS hierarchy
### there are 3 main levels of authoritative DNS servers root (wearing 3 crowns): I'm in charge of EVERYTHING .com nameserver (wearing 2 crowns): I'm in charge of all domains ending in `.com` example.com nameserver (wearing 1 crown): I'm in charge of all domains ending in `example.com` ### the root nameserver delegates what's the IP for example.com? root: I am not concerned with petty details like that. Here's the address of the .com nameserver. ### the .com nameserver also delegates what's the IP for example.com? .com nameserver: I am not concerned with petty details like that either. Here's the address of the example.com nameserver ### the example.com nameserver actually answers your questions what's the IP for example.com? example.com nameserver: 93.184.216.34! ### this design lets DNS be decentralized example: for my domain `jvns.ca` root (ICANN controls this!) delegates to .ca nameserver (Canada controls this!) delegates to jvns.ca nameserver (I control this!)
TCP DNS
### If you manage servers, sometimes DNS just breaks for no obvious reason Illustration of a smiling stick figure with curly hair. person: TCP DNS is an uncommon but VERY annoying cause of DNS problems! Let's learn about it! ### DNS queries can use either UDP or TCP A UDP DNS response has to be less than 4096 bytes. UDP is the default. TCP can send an unlimited amount of data. It's only used when UDP wouldn't work. ### large DNS responses automatically use TCP speech bubble 1: here's a UDP DNS query! speech bubble 2: sorry, my response is too big to fit in a UDP packet! get the rest with TCP! ### what's in a giant DNS response? person: I've seen responses with hundreds of internal server IP addresses (for example when using Consul) ### how not supporting TCP DNS can ruin your day 1. your server is happily making UDP DNS queries 2. one day, the responses get bigger and switch to TCP 3. oh no! the queries fail! ### 2 reasons TCP DNS might not work 1. some DNS libraries (like musl's getaddrinfo) don't support TCP. This is why DNS sometimes breaks in Alpine Linux. 2. it could be blocked by your firewall. You should open both UDP port 53 and TCP port 53.
search domains
### panel 1: In an internal network (like in a company or school), sometimes you can connect to a machine by just typing its name, like this: `$ ping labcomputer-23` Let's talk about how that works! ### many DNS lookup functions support "local" domain names browser, represented by a box with a smiley face: where's lab23? function, represented by a rectangle with squiggly lines: where's lab23.degrassi.ca? arrow pointing to resolver (server) represented by a box with a smiley face holding a magnifying glass (the function appends a base domain `degrassi.ca` to the end) ### the base domain is called a "search domain" On Linux, search domains are configured in `/etc/resolv.conf` Example: `search degrassi.ca` this tells `getaddrinfo` to turn `lab23` into `lab23.degrassi.ca` ### getaddrinfo doesn't always use search domains It uses an option called ndots to decide. ``` search degrassi.ca options ndots:5 ``` this means "only use search domains if the domain name contains less than 5 dots" ### search domains can make DNS queries slower browser: where's `jvns.ca`? getaddrinfo, represented by a rectangle with squiggly lines: okay, first I'll try `jvns.ca.degrassi.ca` this is silly but it can happen! ### avoid search domains by putting a "." at the end Use `http://jvns.ca.` instead of `http://jvns.ca` Illustration of a smiling stick figure with curly hair. person: "local" domain names like this mostly exist inside of big institutions like universities
resolvers vs authoritative nameservers
### panel 1 One reason DNS is confusing is that the DNS server you query (a resolver) is different from the DNS server where the records are stored (a network of authoritative nameservers. Beside "resolver" there is an illustration of a smiling little box holding a magnifying glass, and beside "authoritative nameserver" there is an illustration of a smiling little box with a crown. ### anytime your browser makes a DNS query, it's asking a resolver Illustration of a conversation between a browser, represented by the Firefox logo of a fox wrapped around a globe, and a resolver, represented by a smiling little box holding a magnifying glass browser: what's the IP for `example.com`? resolver: I'll find out for you! ### anytime you update a domain's DNS records, you're updating an authoritative nameserver Illustration between a smiling stick figure with curly hair, and an authoritative nameserver, represented by a pink box with a smiley face wearing a crown. person: set the IP for example.com to 1.2.3.4 authoritative nameserver: got it! Next time someone asks, that's what I'll tell them. ### how a resolver handles queries 1. check its cache, or (if that fails) 2. find the right authoritative nameserver and ask it ### how an authoritative nameserver handles queries 1. check its database for a match 2. that's it, there's no step 2. It's the authority! (illustration of a crown) ### the terminology is really confusing Other names for resolvers: - recursive resolver - DNS recursor - public DNS server - recursive nameserver - DNS resolution service - caching-only nameserver Types of authoritative nameservers: - root nameserver - TLD nameserver (like `.com` or `.ca`)
resolvers can lie
### When a resolver gets a DNS query, it has 2 options: Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass. resolver: I could tell you what the authoritative nameservers, said... or I could LIE! ### block ads / malware Illustration of conversation between a resolver and a a browser, represented by the Firefox logo of a fox wrapped around a globe browser: what's the IP for doubleclick.net? (ad domain, definitely exists) resolver: that domain doesn't exist PiHole blocks ads this way. ### reason to lie: to show you ads (rude!) browser: what's the IP for zzz.jvns.ca? (doesn't exist) resolver: here's an IP that will show you ads! This is called "DNS hijacking". ### reason to "lie": internal domain names browser: what's the IP for corp.examplecat.com? (doesn't exist on the public internet) corporate resolver: here's an internal IP address! ### reason to lie: airport DNS resolvers sometimes lie browser: what's the IP for google.com? airport resolver: you didn't log in yet so I will lie! here is our login page's IP! ### how does your computer know which resolver to use? When you connect to a network, the router tells your computer which search domain and resolver to use (using DHCP). Illustration of a router, represented by a box with antennae and a smiley face router: `192.168.1.1 search domain: lan`
NS records
### What's actually happening when the root nameserver redirects to the .com nameserver, on page 6? Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a root nameserver, represented by a pink box with a smiley face, wearing a stack of three crowns resolver: what's the IP for example.com? root nameserver: I am not concerned with petty details like that. Here's the address of the .com nameserver (this is an NS record) ### The root nameserver can return two kinds of DNS records: NS records: (in the Authority section) ``` com. 172800 NS a.gtld-servers.net com. 172800 NS b.gtld-servers.net ``` com. is the name 172800 is the TTL NS is the type b.gtld-servers.net is the value glue records: (in the Additional section) ``` a.gtld-servers.net 86400 A 192.5.6.30 b.gtld-servers.net 86400 A 192.33.14.30 ``` a.gtld-servers.net is the name 86400 is the TTL A is the type 192.33.14.30 is the value ### The NS record gives you the domain name of the server to talk to next, but not its IP address. resolver: But I need the IP for `a.gtld-servers.net` to communicate with it! is there a glue record? ### 2 ways the resolver gets the IP address 1. If it sees a glue record for a.gtld-servers.net, the resolver will use that IP 2. otherwise, it'll start a whole separate DNS lookup for a.gtld-servers.net ### glue records help resolvers avoid infinite loops without a glue record for `a.gtld-servers.net`: disaster! resolver: what's the IP for `a.gtld-servers.net`? root nameserver: You should ask `a.gtld-servers.net` ### terminology note NS records are DNS records with type "NS". Also, an "A record" means "record with type A", "MX record" means "record with type MX", etc. (confusingly, this is not true for glue records, glue records have type A or AAAA. It's weird, I know.)
negative caching
### Here's a problem I've had many times Illustration of a stick figure with curly hair and a distressed expression. Person's thought bubble: I set up my new domain, everything looks good, but it's not working?!?! ### I finally learned last year that my problem was "negative caching" Same person, now smiling: now I never have this problem anymore! ### resolvers cache negative results Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown. resolver: what's the IP for `bees.jvns.ca`? authoritative nameserver: I don't have any records for that! resolver (thought bubble) `caching: no A records for bees. jvns.ca` ### the TTL for caching negative results comes from the SOA record `example.com. 3600 IN SOA ns.icann.org. noc.dns.icann.org. 2021120741 7200 3600 1209600 3600` it's the smaller of the first number and the last number (in this case 3600 seconds) ### what you need to know about SOA records 1. they control the negative caching TTL 2. you can't change them (unless you run your own authoritative nameserver) 3. how to find yours: `dig SOA yourdomain.com` ### how to avoid this problem Just make sure not to visit your domain before creating its DNS record! That's it! (if you really want more details, see RFC 2308)
MX records
### there are two important problems in email From: Kermit @frog.com To: julia@example.com 1. Make sure the message gets to the right recipient. This is what MX records are for. 2. Make sure the sender didn't lie about their From: address. This is what SPF, DKIM, and DMARC records are for. SPF/DKIM/DMARC are very complicated but we'll give a tiny incomplete summary. ### MX records tell you the mail server for a domain ``` $ dig +short MX gmail.com 5 gmail-smtp-in.l.google.com. ``` 5 is the priority google.com is the server's domain name ### copy and paste your MX records Illustration of a smiling stick figure with curly hair. person: you're probably using an email service like Fastmail/Gmail, so just copy the records they tell you to use ### tiny guide to SPF/DKIM/DMARC records SPF: list of allowed sender IP addresses Example: `v=spf1 ip4:2.3.4.5 -all` DKIM: sender's public key Example: `v=DKIM1; k=rsa; p=MIGFMA0GCSqGSI.......` DMARC: what to do about SPF/DKIM failures Example: `v=DMARC1; p=reject; rua=mailto:dmarc@example.com`
life of a DNS query
### 1 An illustration of a smiling stick figure with curly hair, talking to a browser, represented by the Firefox logo of a fox wrapped around a globe. person: I want to go to https://example.com browser: hmm, I don't have an IP address for example.com cached. I'll ask a resolver! ### 2 An illustration of a browser talking to a resolver, represented by a box with a smiley face holding a magnifying glass. browser: what's the IP for example.com? resolver: hmm, I'll look in my cache... ### 3 ❤ DNS cache ❤ archive.org: 207.241.224.2 jvns.ca: 172.64.80.1 resolver: nope, I don't have it cached, I need to ask the authoritative nameservers! I have the root nameserver IPs hardcoded. note: we're pretending the resolver has no .com domains cached. Normally it would use its cache to skip step 4. ### 4 An illustration of a browser talking to a root nameserver, represented by a box with a smiley face wearing three crowns. resolver: What's the IP for example.com? root nameserver: ask a .com nameserver! It's at a.gtld-servers.net → com NS a.gtld-servers.net. ca NS a.ca-servers.net. horse NS a.nic.horse. (NS stands for "nameserver") ### 5 An illustration of a browser talking to a .com nameserver, represented by a box with a smiley face wearing two crowns. resolver: what's the IP for example.com? .com nameserver: ask an example.com. nameserver! It's at a.iana-servers.net list of DNS records: neopets.com, NS, ns-42.awsdns-05.com. → example.com, NS, a.iana-servers.net. ### 6 An illustration of a browser talking to an example.com nameserver, represented by a box with a smiley face wearing one crown. resolver: what's the IP for example.com? example.com nameserver: it's 93.184.216.34! resolver: great, I'll tell the browser! → example.com, A, 93.184.216.34
let's meet dig
### dig is my favourite tool for investigating DNS issues I find its default output unnecessarily confusing, but it's the only standard tool I know that will give you all the details. ### tiny guide to dig's full output ``` $ dig example.com ; <<>> DiG 9.16.24 <<>> +all example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27580 18 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ; example.com. IN A ;; ANSWER SECTION: example.com. 86400 IN A 93.184.216.34 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Jan 26 11:32:03 EST 2022 ;; MSG SIZE rcvd: 56 ``` `NOERROR` is the response code `example.com. 86400 IN A 93.184.216.34` is the answer to our DNS query. The "." at the end means that example.com isn't a subdomain of some other domain (like it's not example.com.degrassi.ca). This might seem obvious, but DNS tools like to be unambiguous. ### panel 3: Illustration of a smiling stick figure with curly hair. person: `$ dig +noall +answer` means "Just show me the answer section of the DNS response." It's a lot less to look at! ### panel 4: `$ dig +noall +answer example.com` `example.com. 86400 IN A 93.184.216.34` example.com is the name 86400 is the TTL IN is the class A is the record type 93.184.216.34 is the content just the answer! so much less overwhelming!
getaddrinfo
### panel 1: One weird thing about DNS is that different programs on a single computer can get different results for the same domain name. Let's talk about why! Illustration of a program, represented by a box with a smiley face, and a resolver (server), represented by a box with a smiley face holding a magnifying glass. Between them is a function, represented by a rectangle with squiggly lines on it. There are arrows going back and forth between the function and both the program and the resolver (server). The function is the problem. ### reason 1: many (but not all!!) programs use the function getaddrinfo for DNS lookups... ping, represented by a box with a smiley face: I use getaddrinfo! dig, also represented by a box with a smiley face: I don't! So if you see an error message like "`getaddrinfo: nodename or servname not provided...`", that's a DNS error. ### and not using getaddrinfo might give a different result - the program might not use `/etc/hosts` (dig doesn't) - the program might use a different DNS resolver (some browsers do this) ### reason 2: there are many different versions of `getaddrinfo`... - the one in `glibc` - the one in `musl libc` - the one in Mac OS And of course, they all behave slightly differently :) ### you can have multiple getaddrinfos on your computer at the same time For example on a Mac, there's your system `getaddrinfo`, but you might also be running a container that's using `musl`. ### glibc and musl getaddrinfo are configured with `/etc/resolv.conf` IP of resolver to use ``` # Generated by NetworkManager nameserver 192.168.1.1 nameserver fd13: d987:748a::1 ``` On a Mac, `/etc/resolv.conf` exists, but it's not used by the system `getaddrinfo`.
everything in a DNS packet
I literally mean everything, I copied this verbatim from a real DNS request using Wireshark. (DNS packets are binary but we're showing a human-readable representation here) ### Let's look at the actual data being sent during a DNS query: Illustration of a browser, represented by the Firefox logo of a fox wrapped around a globe, talking to a resolver, represented by a box with a smiley face holding a magnifying glass. browser: what's the IP for example.com? resolver: 93.184.216.34! ### request `Query ID: 0x05a8` (randomly generated) `Flags: 0x1000` (these flags just mean "this is a request") `Questions: 1` `Answer records: 0` `Authority records: 0` `Additional records: 0` `Question:` `Name: example.com` `Type: A (A is for IPv4 address. other types: MX, CNAME, AAAA, etc) `Class: IN` (IN stands for "INternet") ### response `Query ID: 0x05a8` (matches request ID) `Flags: 0x8580` the response code is encoded in the last 4 bits of these flags. The 3 main response codes are: - NOERROR (success!) - NXDOMAIN (doesn't exist!) - SERVFAIL (error!) ``` Questions: 1 Answer records: 1 Authority records: 0 Additional records: 0 ``` (copied from request) ``` Question: Name: example.com ``` (domain names aren't case sensitive) ``` Type: A Class: IN Answer records: Name: example.com Type: A Class: IN TTL: 86400 Content: 93.184.216.34 ``` (the IP we asked for) ``` Authority records: (empty) Additional records: (empty) ``` page 12 ("NS records") talks more about these 2 sections Illustration of a smiling stick figure with curly hair. Person: I'm always surprised by how little is actually in a DNS packet!
DNS: cast of characters
Let's meet the cast and see how they communicate with each other! browser: where's example.com? (function call) 93.184.216.34! ↓ function: where's example.com? (DNS query) 93.184.216.34! ↓ resolver: where's example.com? (DNS query) 93.184.216.34! ↓ authoritative nameservers ### browser Your browser uses DNS to look up IP addresses every time it visits a domain, like example.com. The browser has a DNS cache. ### function Your operating system provides a function to do DNS lookups. On Linux and Mac it's getaddrinfo. Your operating system also might have a DNS cache. ### resolver The function sends requests to a server called a resolver which knows how to find the authoritative nameservers. The resolver has a DNS cache. ### authoritative nameservers The authoritative nameservers are the servers where the DNS records are actually stored. They're wearing crowns because they're In Charge.
DNS records
### When you make DNS changes for your domain, you're editing a DNS record Туре: A Name (subdomain): paw Use @ for root IPv4 address: 1.2.3.4 TTL: 1 min Here's what the same record looks like with dig (we'll explain dig on page 18) ``` $ dig +noall +answer paw.examplecat.com paw.examplecat.com. 60 IN A 1.2.3.4 ``` ### DNS records have 5 parts - name (eg `tail.examplecat.com`) - type (eg `CNAME`) - value (eg `tail.jvns.ca`) - TTL (eg `60`) - class (eg `IN`) different record types have different kinds of values: `A` records have an IP address, and `CNAME` records have a domain name. ### name `paw.examplecat.com` When you create a record, you'll usually write just the subdomain (like `paw`). When you query for a record, you'll get the whole domain name (like `paw.examplecat.com`). ### TTL `60` "time to live". How long to cache the record for, in seconds. ### class `IN` "IN" stands for "INternet". You can ignore it, it's always the same. ### record type `A` "A" stands for "IPv4 Address". ### value `1.2.3.4` the IP address we asked for!
dig command line arguments
illustrtion of a laptop. its keyboard just says QWERTY. ### the basics: dig @SERVER TYPE DOMAIN (SERVER and TYPE are both optional) Examples: ``` dig example.com dig @8.8.8.8 NS example.com dig TXT example.com dig @8.8.8.8 example.com ``` default type: A default server: from `/etc/resolv.conf` (on Linux) ### tip: put +noall +answer in your ~/.digrc This makes your output more readable by default, and you can always go back to the full output with `dig +all`. ### dig +noall Hide all output. Useless by itself, but `dig +noall +authority` will just show you the "Authority" section of the response. ### dig +short DOMAIN Only show the record content. `$ dig +short example.com 93.184.216.34` ### dig +trace DOMAIN Traces how the domain gets resolved, starting at the root nameservers. This avoids all the caches, which is useful to make sure you set your record correctly.
a tiny DNS resolver
On page 5 (life of a DNS query), we saw how resolvers work. This code does the same thing, but it actually works. ``` def resolve(domain): # Start at a root nameserver nameserver = "198.41.0.4" # A "real" resolver would check its cache here while True: reply = query(domain, nameserver) ip = get_answer(reply) if ip: # Best case: we get an answer to our query and we're done return ip nameserver_ip = get_glue(reply) if nameserver_ip: # Second best: we get the IP address* of the nameserver to ask next nameserver = nameserver_ip else: # Otherwise: we get the domain name* of the nameserver to ask next nameserver_domain = get_nameserver(reply) nameserver = resolve(nameserver_domain) * Actual DNS resolvers are more complicated than this, but this is the core algorithm. ``` Smiling stick figure with curly hair: You can find the whole program at https://github.com/jvns/tiny-resolver
A & AAAA records
### there are two kinds of IP addresses: IPv4 and IPv6 Every website needs an IPv4 address. IPv6 addresses are optional. ### panel 2: A stands for IPv4 Address Example: `93.184.216.34` AAAA stands for IPv6 AAAAddress (joke, but kinda true) Example: `2606:2800:220:1:248:1893:25c8:1946` it's called AAAA (4 As) because IPv6 addresses have 4x as many bytes ### in theory, the Internet is moving from IPv4 to IPv6 This is because there are only 4 billion IPv4 addresses (the internet has grown a LOT since the 1980s when IPv4 was designed!) ### happy eyeballs* If your domain has both an A and an AAAA record, clients will use an algorithm called "happy eyeballs" to decide whether IPv4 or IPv6 will be faster. `*` yes that is the real name ### using IPv6 isn't always easy - not all web hosts give you an IPv6 address - lots of ISPs don't support IPv6 (mine doesn't!) ### IP addresses have owners You can find any IP's owner by looking up its ASN ("Autonomous System Number"). (except local IPs like `192.168.x.x`, `127.x.X.X`, `10.x.x.x`, `172.16.x.x`)
write a tiny program
Does your bug involve a library you don't understand? Illustration of an unhappy stick figure with curly hair. person (thinking): UGH, `requests` is NOT working how I expected it to! I like to convert my code using that library into a tiny standalone program which has the same bug: Illustration of two programs, one represented by a big messy scribble, the second represented by three tidy lines. giant buggy program => 20 lines of buggy code I find this makes it WAY EASIER to experiment and ask for help. And if it turns out that library actually has a bug, you can use your tiny program to report it.
write a message asking for help
When I'm REALLY stuck, I'll write an email to a friend: - "Here's what I'm trying to do..." - "I did X and I expected Y to happen, but instead..." - "Could this be because....?" - "This seems impossible because..." - "I've tried A, B, and C to fix it, but...." This helps me organize my thoughts, and often by the time I finish writing, I've magically fixed the problem on my own! It has to be a specific person, so that the imaginary version of them in my mind will say useful things :)
write a failing test
If your program already has tests, adding a failing test is a great way to work on your bug! Illustration of a smiling stick figure with curly hair. person (thinking): this function should return X, but it's returning Y - it forces you to pinpoint what exactly the bug is - it's easy to tell when you've fixed it (the test passes!) - you can keep the test to make sure the bug doesn't come back
why some bugs feel "impossible"
[debugging]
use a debugger
A debugger is a tool for stepping through your code line by line and looking at variables. But not all debuggers are equal! Some languages' debuggers have more features than others. Your debugger might let you: - jump into a REPL to poke around (see page 25) - watch a location in memory and stop the program any time it's modified - "record replay" debuggers let you record your entire program's execution and time travel Illustration of a smiling stick figure with curly hair. person (thinking): I love record/replay debuggers because they make hard-to-reproduce bugs easier: I just have to reproduce the bug once
types of debugging tools
Here are some tools I've found useful: - debuggers! (most languages have one!) - profilers: `perf, pprof, py-spy` - tracers: `strace, ltrace, ftrace, BPF tools` - network spy tools: `tcpdump, wireshark, ngrep, mitmproxy` - web automation tools: `selenium, playwright` - load testers: `ab, wrk` - test frameworks: `pytest, RSpec` - linters/static analysis tools: `black, eslint, pyright` - data formatting tools: `xd, hexdump, jq, graphviz` - dynamic analysis tools: `valgrind, asan, tsan, ubsan` - fuzzers/property testing: `hypothesis, quickcheck, Go's fuzzer` (I've never used those last two but lots of people say they're helpful.)
try out a new tool
There are TONS of great debugging tools (listed on the next page!), but often they have a steep learning curve. Some tips to get started: - get someone more experienced to show you an example of how they'd use the tool. (this is SO helpful!!!) - try it out when investigating a low stakes bug, so it's no big deal if it doesn't work out. - take notes with examples of the options you used, so you can refer to them next time.
track your progress
[debugging]
timebox your investigation
Sometimes I need to trick myself into getting started: Illustrations of a stick figure with short curly hair. person (thinking, looking unhappy): "UGH, I do NOT want to look at this CSS bug!!!!" Giving myself a time limit really helps: Illustration of an alarm clock person (thinking, now smiling): "Okay, I'll just see what I can figure out in 20 minutes..." You can't always solve it in 15 minutes, but this works surprisingly often! ... 15 minutes later ... person (thinking, happy): "all fixed! That wasn't so hard!"
tidy up your code
Messy code is harder to debug. Illustration of a smiling stick figure with curly hair. person (thinking): "this function is 100 lines??? who named these variables?!?!" (annotation: it was me) Doing a tiny bit of refactoring can make things easier, like: - rename variables or functions - format it with a code formatter (`go fmt`, `black`, etc.) - add comments - delete old/untrue comments Don't go overboard with the refactoring though: making too many changes can easily introduce new bugs.
tell a friend what you learned
I love to celebrate squashing a bug by telling a friend: Illustration of a smiling stick figure with curly hair. person: hey marie, did you know about this weird thing that can happen with CSS flexbox? Some possible outcomes of this: - they've seen that bug too, and teach me something else! - they learn something new! - they ask questions I hadn't thought of - they tell me about a website/tool I didn't know about - it helps solidify my knowledge!
take a break
Illustration of a steaming hot beverage. Investigating a tricky bug requires a LOT of focus. Illustration of a sad stick figure with long straight hair. person (thinking): "ugh, nothing is working..." (annotations on person): googling the same error message for the 7th time. very frustrated Instead, try one of these magical debugging techniques (even a 5 minute break can really help!): - ride your bike! - go to bed! - get a coffee! - have a shower! - eat lunch! Illustration of the same person, now happily riding their bike.
sprinkle assertions everywhere
Some languages have an `assert` keyword that you can use to crash the program if a condition fails. Assertions let you: - come up with something that should ALWAYS be true - immediately crash the program if it isn't this variable is undefined!!! Illustration of a program, represented by a box with an unhappy face. program (thinking): "this variable is undefined!!! STOP EVERYTHING!" This is a great way to force yourself to think about what's ALWAYS true in your program, and check if you're right. Illustration of a smiling stick figure with curly hair. person (thinking): "the radius can never be 0, right? or can it?"
shorten your feedback loop
when you're investigating a bug, you'll need to run the buggy code a million times. Illustration of a stick figure, holding their hands to their face in despair. person (thinking): ugh, i need to type all this information into the form to trigger the bug again??? this is literally the 30th time :( :( ways to speed it up: - use a browser automation tool to fill in forms / click buttons for you! - write a unit test! - autorun your code every time you save!
share your debugging stories
[debugging]
rule things out
Once I have a list of suspects, I can think about how to eliminate them. Illustration of a pensive stick figure with curly hair. person (thinking): "I'm really confused, but I can at least check if the server returned the right HTTP response here.." Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. Both boxes are labelled "suspicious". person (thinking): "that response looks good! the server isn't the problem!" Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. The client box is labelled "suspicious", with exclamation marks and question marks surrounding it, but the "server" box is labelled "ok", with a check mark and smiley faces. note: here we're assuming that was the only request being made. Otherwise this wouldn't be a safe conclusion :)
retrace the code's steps
Here's a classic (but still very effective!) way to get started: 1. find the line of code where the error happened 2. trace backwards to investigate what could have caused that error. keep asking "why?" example: - There's an error on line 58... - that's because this variable has the wrong value... - the value is set by calling this function... - that function is making an HTTP request to the API... - the API response doesn't have the format I expected! Why is that? In the corner of the page, there is an illustration of a goofy-looking bug with a long neck and curly antennae saying "Chase me!"
reread the error message
After I've read the error message, I sometimes run into one of these 3 problems: Each person is represented by a stick figure with curly hair. ### 1. misreading the message person (thinking) ok, it says the error is in file X spoiler: it actually said file Y ### 2. disregarding what the message is saying person (thinking): well, the message says X, but that's impossible... spoiler: it was possible ### 3. not actually reading it person (thinking): ok, I read it... spoiler: she did not read it
reproduce the bug
My favourite way to get information about buggy code is to run the buggy code and experiment on it. (Add print statements! Make a tiny change!) If the bug is happening on your computer every time you run your program: hooray! You've reproduced the bug! An illustration of a smiling stick figure with curly hair. person (thinking): "ok, time to debug! I've got my print statements ready to go!" But if you can't make the bug happen, you're left guessing. An illustration of a sad stick figure with curly hair. person (thinking): "what was variable X set to when the bug happened? guess there's NO WAY TO KNOW" cute illustration of a bug: the next page has tips!
reduce randomness
It's much easier to debug when your program does the exact same thing every time you run it. Illustration of a sad stick figure with curly hair. person (thinking): "the bug only happens 10% of the time, it's SO HARD to figure out if my change fixed it or not." There are a bunch of tools for controlling your program's inputs to reduce randomness, for example: - many random number generators let you set the seed so you get the same results every time. - `faketime` fakes the current time. - libraries like ruby's `vcr` can record http requests. - record/replay debuggers like `rr` record everything.
read the library's code
Lots of code isn't documented. But when there are no docs, there's always the source code! It sounds intimidating at first, but a quick search of the code sometimes gets me my answer really quickly. Tips for exploring an unfamiliar library's code: - search the tests! Tests are a GREAT source of examples. - git clone it locally to make it easier to navigate. - search for your error message and trace back. - if it's a Python/JS/Ruby library, sometimes I'll edit the library's code on my computer to add print statements (just remember to take them out after!)
read the error message
Error messages are a goldmine of information, but they can be very annoying to read: (image of an error message, represented by a stack of squiggly lines, with 2 notes pointing to it): - giant 50 line stack trace full of impenetrable jargon, often seems totally unrelated to your bug - can even be misleading, like "permission denied" sometimes means "doesn't exist" Tricks to extract information from giant error messages: - If there are many different error messages, start with the first one. Fixing it will often fix the rest. - If the end of a long error message isn't helpful, try looking at the beginning (scroll up!) - On the command line, pipe it to `less` so that you can scroll/search it ```(./my_program 2>&1 | less)``` Note: if you don't include `2>&1`, `less` won't show you the error messages (just the output)
read the docs
There are many ways to read the docs! - the surgical strike: Search for a specific function, find an example on the page, copy it and leave. (this is often me :)) - the question quest: You have a specific question and you'll keep skimming different pages until you find the answer. - the IDE integration: Set up your editor or IDE so that you can instantly jump to a function's documentation. - the rigorous read: Get a cup of coffee and read all of the docs cover to cover, like a book.
preserve the crime scene
One of the easiest ways to start is to save a copy of the buggy code and its inputs/outputs: An illustration of stick figure wearing a top hat. Beside them is a bug in a mason jar. person (thinking): "don't touch anything! we need to preserve evidence!" Depending on the situation, you might want to: - make a git commit of the buggy code! (on a branch, just for you) - save the input that triggered the bug - save logs/screenshots to analyze later
one thing at a time
It's tempting to try lots of fixes at once to save time: Illustration of a smiling stick figure with curly hair. dream: I'm going to add Z, and replace X with Y, and improve C-- that'll definitely fix it! Illustration of the same stick figure, now sad. reality: ... now there's a new problem AND it's still broken If I found I've done this by accident, I'll: - undo all my changes (`git stash!`) - make a list of things to investigate, one at a time
make sure your code is running
Illustration of an unhappy stick figure with curly hair. person (thinking): NOTHING I try is helping, this is IMPOSSIBLE person (thinking): wait... nothing I try is changing anything.... is my code even being run???? If my changes have no effect at all, often it means I've made a silly mistake (like forgetting to restart the app) and my changes aren't being run! I like to check that my code is being run by printing something out (like `print("asdf"`). Or, if that's not possible, I'll introduce an error so that it crashes.
make a minimal reproduction
[debugging]
look at recent changes
Often when something is broken, it's because of a recent change. Usually I look at recent changes manually, but git bisect is an amazing tool for finding exactly which git commit caused the problem. We don't have space for a full `git bisect` tutorial here, but here's how you start using it: ``` git bisect start git bisect bad HEAD git bisect good 1fe9dc ``` (1fe9dc is the ID of a commit that doesn't have the bug) Then you can either tag buggy commits manually or run a script that does it automatically.
list what you've learned
[debugging]
learn one small thing
Bugs are a GREAT way to discover things on the edge of your knowledge. Illustrations of a stick figure with curly hair. person (thinking, looking worried): "hmm, part of the problem here is that I don't understand how position: absolute works..." Finding one small thing I don't understand and learning it is really useful (and pretty fun!) person (thinking, now smiling): "now I understand position: absolute! cool!"
know your spy tools
[debugging]
keep a log book
I don't usually write things down. But 2 hours into debugging, I get really confused: Illustration of a frazzled-looking stick figure with curly hair. person (thinking): wait, what did that error message I saw 2 hours ago say again exactly?? person (thinking): did I already try this??? Keeping a document with notes makes it WAY easier to stay on track. It might contain: - specific inputs I tried - error messages I saw - stack overflow URLs The log makes it easier to ask for help later if needed!
jump into a REPL
In dynamic languages (like Python / Ruby / JS), you can use a debugger to jump into an interactive console (aka "REPL") at any point in your code. Here's how to do it in Python 3: 1. edit your code `my_var = call_some_function() breakpoint()` add "`breakpoint()`"! 2. rerun your code (refresh the page, whatever) 3. play around in the REPL! You can call any function you want / try out fixes! How to do it in other languages: - Ruby: `binding.pry` - Python (before 3.7): `import pdb; pdb.set_trace()` - Javascript: `debugger;`
investigate the bug together
I find investigating a bug with someone else SO MUCH more fun than doing it alone. Illustrations of two smiling stick figures, one with short curly hair, and one with longer straight hair. Debugging together lets you: - Teach each other new tools! person 1: I wish we could find out x, but that's impossible... person 2: Let's use my favourite tool, strace!!!!!! - Learn new concepts! person 2: What is this CORS thing?!?! person 1: Oh, I can explain that! - Keep each other on track person 2: Maybe the problem is Y? person 1: We already ruled that out! Right, I forgot!
inspect unreproducible bugs
When you can't reproduce a bug locally, it's tempting to just try random fixes and pray. Resist the temptation! Some ways to get information: - try to reproduce the environment where it happened - ask for screenshots / screen recordings - add more logging, deploy your code, and repeat until you understand what caused the bug - read the code VERY VERY carefully (incredibly boring but it actually does work sometimes) - do your experimentation somewhere where you can reproduce the bug (on a staging server? on someone else's computer?)
identify one small question
Debugging can feel huge and impossible. But all you have to do to make progress is: 1. come up with ONE QUESTION about the bug. 2. make sure the question is small enough that you can investigate it in ~20 minutes 3. figure out the answer to that question Illustration of a smiling stick figure with curly hair, surrounded by other question marks, which are crossed out. person (thinking): hmm, this database all these query is slow... well, can I find out if the query is using an index? ignore other questions for now! one at a time!
guesses are often wrong
[debugging]
find the type of bug
If the bug is totally new to you, find out if there's a name people use for that type of bug! Illustration of two stick figures. Person 1 has curly hair and looks worried, Person 2 has straight hair and is smiling. person 1: "this bug is happening intermittently, it's so weird." person 2: "that sounds like it might be a race condition..." person 1 (thinking): "oh, what's a race condition?" examples: - `terminated by signal SIGSEGV (address boundary error)` segmentation fault - `flexbox: div doesn't fit in other div (CSS)` item overflowing container - `nodename nor servname provided, or not known` DNS lookup failure - `RecursionError: maximum recursion depth exceeded` stack overflow
find related bugs
Illustration of two adorable bugs. They are holding hands and their antennae are intertwined. When you're done fixing a bug, glance around to see if there are any obvious places in your code that have the same bug. Illustration of a smiling stick figure with short curly hair. person (thinking): "I was calling function X wrong, I'll check if we're calling that function wrong anywhere else!" person (thinking): "wow, my assumption about how Y worked was TOTALLY wrong, I should go back and fix some things..."
find a version that works
If I have a bug with how I'm using a library, I like to: - find a code example in the documentation - make sure it works - slowly change it to be more like my broken code - test if it's still working after every single tiny change Illustration showing a bunch of points with arrows between them. Each point has a check mark beside it, until one that is labelled "Oh THAT'S what broke it!!!" This puts me back on solid ground: with every change I make that DOESN'T cause the bug to come back, I know that change wasn't the problem.
find a new source of info
We all know to look at the official documentation. Here are some less obvious places to look for answers: - the project's Discord, Slack, IRC channel, or mailing list - code search (search all of GitHub for how other people are using that library!) - GitHub issues (did someone else have the same problem?) - release notes (is the bug fixed in the new version?) - a book chapter (you might have a book on this topic!) - blog posts (sometimes there's an amazing explanation on the 2nd page of Google results)
explain the bug out loud
Explaining what's going wrong out loud is magic. Illustrations of two stick figures. One has curly hair, and one has short straight hair and is wearing a big t-shirt with a picture of a rubber duck. person (looking sad): "so, when I do X thing, I'm getting an error, and it doesn't make any sense because I already checked that A and B are working...." other person: huh... person (now smiling, with an exclamation mark above their head): "OH I SEE WHAT I DID WRONG" other person (also smiling): "happy to help!" People call this "rubber ducking" because the other person might as well be a rubber duck.
draw a diagram
Some ideas: ### network diagram An illustration of a network, with a cylinder labelled DB, and boxes labelled "factory", "handler", "obj", "model 1", and "model 2", with arrows amongst them showing their relationships. ### flowchart A flowchart with boxes "set flag", "run cmd", "if failed, retry", and "return result", with arrows amongst them illustrating a process. ### state diagram A diagram with boxes labelled "inventory page", "cart page", and "checkout page", with arrows amongst them labelled "cart icon", "continue shopping", "checkout", and "cancel". ### or anything else (like a data structure!) A box labelled "on | off | on | off". The first "off" is labelled "[1, 1, 1, 0, 0, 1, 1, 1, 0", and the second "off" is labelled "5 seconds".
document your quest
For very tricky bugs, writing up an explanation of what went wrong and how you figured it out is an amazing way to share knowledge and make sure you really understand it. Ways I've done this in the past: - complain about it in the internal chat! (so people can search for it!) - write a quick explanation in the commit message - write a fun blog post telling my tale of woe! - for really important work bugs, write a 5-page document with graphs explaining all the weird stuff I learned along the way
do the annoying thing
Illustrations of an unhappy-looking stick figure with short curly hair. Sometimes when I'm debugging, there are things I'll refuse to try because they take too long. person (thinking): ugh, that part of the code is so confusing, I don't want to look at it... But as I become more and more desperate, eventually I'll give in and do the annoying thing. Often it helps! person (thinking): FINE, I'll look at that code... oh, yeah, here's the bug.
do a victory lap
Once you've solved it, don't forget to celebrate! Take a break! Feel smart! Illustration of a smiling stick figure with curly hair. person (thinking): "i did it, i did it, i'm amazing" (now is not the time for humility) The best part of understanding a bug is that it makes it SO MUCH easier for you to solve similar future bugs. Illustration of a smiling stick figure with curly hair, and another figure with short spiky hair. person (thinking): I've seen something like this before, maybe the problem is X? colleague: (annotation, saying that they're awestruck at your brilliance)
delete the buggy code
Sometimes the buggy code is not worth salvaging and should be deleted entirely. Reasons you might do this: Illustration of an uneasy-looking stick figure with curly hair. - it uses a confusing library / tool person (thinking): this library isn't working, I'm going to switch to Y instead Illustration of the same person, now smiling. - you have a better idea for how to implement it person (thinking): I bet I could avoid all these problems if I took X approach instead...
debugging tip: track what you changed
[debugging]
debugging tip: slow down
[debugging]
debugging tip: ask lots of questions
[debugging]
comment out code
Commenting out code is an amazing way to quickly do experiments and figure out which part of your code is to blame. You can: - comment out a function call and replace it with a hardcoded value, to check if the function call is broken - if the error message doesn't give you a line number, comment out huge chunks of the program until the problem goes away - comment out some code and rewrite it to see if the new version is better
colours, graphs, and sounds
Instead of printing text, your program can tell you about its state by generating a picture! Or playing sounds at key moments! Some ways your programs can generate pictures or sounds: - add colours to your log lines (every letter of 'colours' is a different colour) - add red outlines around every HTML element! ("red" and "outlines" have a red outline around them) - Haskell has an option to beep at the start of every major garbage collection (there's a bell icon after "beep") - draw a chart of events over time (chart icon) - use graphviz to generate a diagram of your program's internal state (there's a picture of a little graph diagram with a -> b, a -> c)
brainstorm some suspects
brainstorming every possible cause I can think of helps me not get stuck on the 1 or 2 most obvious possibilities. In a box representing a sheet of paper: - could I be using the wrong version of this library? - am I passing the wrong argument to function X? - is something wrong with the server? - is the entire internet broken??? (there are two notes on the side pointing at the above text) - sometimes I find it easier to think clearly when writing by hand on paper. - no filter! even ridiculous ideas!
ask lots of questions
[debugging]
analyze the logs
If you can't reproduce a bug, sometimes you need to comb through the logs for clues. Some tips: - filter out irrelevant lines (for example with grep -v) - find 1 failed request and search for that request's ID to get all the logs for that request - build a timeline: copy and paste log lines (and your interpretations!) into a document - if you see a suspicious log line, search to make sure it doesn't also happen during normal operation - if there's a cascade of errors, find the first error that started the problems
add pretty printing
Sometimes you print out an object, and it just prints the class name and reference ID, like this: `MyObject<#18238120323>` Illustration of a frowning stick figure with curly hair. person (thinking): "ugh, thanks, very helpful... " Implementing a custom string representation for a class you're often printing out can save a LOT of time. The name of the method you need to implement is: - Python: `.__str__ ` - Ruby: `.to_s` - JavaScript: `.toString` - Java: `.toString` - Go: `String()` Also, pretty-printing libraries (like `pprint` in Python or `awesome_print` in Ruby) are great for printing out arrays/hashmaps.
add lots of print statements
I love to add print statements that print out 1, 2, 3, 4, 5... An illustration of a printer printing out lines of text. ``` console.log(1) console.log(2) console.log(3) ``` Using descriptive strings is smarter, but I usually use numbers or "wtf???" This helps me construct a timeline of which parts of my code ran and in what order: Illustration of timeline of code, with some arrows pointing at it numbered 1, 3, 2. Between 1 and 3, it says "everything is okay". Between 3 and 2 it says "the cause", with a picture of a bug, and after 2, it says "the error message" with a picture of a page of text. Often I'll discover something surprising, like "wait, 3, never got printed??? Why not???".
add a comment
Some bug fixes are a little counterintuitive. Otherwise you would have written the code that way in the first place! You might think: Illustration of a smiling stick figure with curly hair. person (thinking): "I'll remember why I added this code, I spent 5 hours this is a debugging it! this is a trap!!!!! Adding a comment can help future you (or your coworkers!) avoid accidentally reviving a bug later. person (thinking): ooh, I could simplify this code! Illustration of a dancing bug, singing "I'm back!"
a debugging manifesto
### 1. inspect, don't squash Try to fix the bug (crossed out, bad) Understand what happened (checkmarks, smiley faces) ### 2. Being stuck is temporary. person (thinking): I WILL NEVER FIGURE THIS OUT ... 20 minutes later... person (thinking): Wait, I haven't tried X... ### 3. Trust nobody and nothing person (thinking): This library can't be buggy... person (thinking): Or CAN IT??? (slowly growing horror) off to the side, a bug looks on, with a sneaky expression ### 4. It's probably your code person (thinking): I KNOW my code is right ... 2 hours later ... person (thinking): Ugh, my code WAS the problem?!!? ### 5. don't go it alone person 1: "WHAT IS HAPPENING?!?" person 2: "What if we try X?" ### 6. There's always a reason. A computer, illustrated by a box with a smiley face, surrounded by ones and zeros: Computers are always logical, even when it doesn't feel that way. ### 7. Build your toolkit person (thinking, holding a box labelled TOOLZ): "wow, the CSS inspector makes debugging SO much easier" ### 8. It can be an adventure. person: "You wouldn't BELIEVE the weird bug I found!" adorable weird bug, standing beside them: hi!
the gaps between floats
## title: the gaps between floats ## panel 1: floating point numbers have to fit into 32 or 64 bits This means there are only 2^64 64-bit floats, the same way there are only 2^64 64-bit integers ## panel 2: this means floating point numbers have to be spread out you can imagine them all spaced out on a number line, like this: (picture of a bunch of lines, with small gaps between them. The gaps are smaller on the left and bigger on the right) ## panel 3: the gaps start small. the next 64-bit float after 1.0 is 1 point (lots of 0s) 2 the gap between these two floats is 0 point (lots of 0s) 2, or 2^-52 gaps are always a power of 2 ## panel 4: the gaps get bigger as the numbers get bigger the next 64-bit float after 1000000000000000000 is that number plus 16384. so the gap is 16384, or 2^14! ## panel 5: the gaps make calculations inaccurate when you do math on floating point numbers, often you have to round the result to the nearest float usually this doesn’t make a big difference, but small mistakes can add up ## panel 6: this inaccuracy is inevitable if you want math to be fast, you have to store the numbers in a fixed number of bits, like 64 bits. So you’re always going to have accuracy issues.
signed vs unsigned integers
## signed vs unsigned integers ## there are 2 ways to interpret every integer unsigned: - always 0 or more - example: 8 bit unsigned ints are `0` to `255` signed: - half positive, half negative - example: 8 bit signed ints. are `-128` to `127` ## negative integers are represented in a counterintuitive way You might think that this is -5: `10000101` (1 is the sign bit, and 101 in binary is 5) But actually this is -5: `11111011` this looks weird, but we'll explain why! ## integer addition wraps around for example, for 8-bit integers `255 + 1 = 0` for 16-bit integers, `65535 + 1 = 0` by "addition", we mean "what the x86 `add` instruction does" ## panel: but if `255 + 1 = 0`, you could also say `255 = -1` ## examples of bytes and their signed/unsigned ints | byte | unsigned | signed | |----------|----------|--------| | `00000000` | 0 | 0 | | `01111111` | 127 | 127 | | `01111111` | 128 | -128 | | `10000001` | 129 | -129 | | `11111011` | 251 | -5 | | `11111111` | 255 | -1 | subtract 256 from unsigned numbers to get the signed numbers ## this way of handling signed integers is called "two's complement" It's popular because you can use the same circuits to add signed and unsigned integers. `5 + 255` has exactly the same result as `5 + (-1)`: they're both 4!
science <3 floating point
## science <3 floating point ## floating point was invented to do scientific computation - weather simulations! - earthquake modeling! - orbital mechanics! ## scientists don't need unlimited precision... we only know an electron's mass to 9 decimal places anyway... 9 decimal places is already VERY precise! ## but they do need TINY numbers and GIANT numbers mass of hydrogen atom: `1.6735575 * 10^-24` grams distance to Andromeda galaxy: `2.4 * 10^22` meters ## floating point is inspired by scientific notation `1.6735575 x 10^-24` The idea in floating point is to store a number by splitting it into: - the exponent (like `-24`) - the multiplier (like `1.6735575`) - and its sign (+ or -) ## floating point isn't just used for science though For example, Javascript's number type is floating point. Before it added `BigInt` in 2021, Javascript didn't have integers at all! Similarly, numbers in JSON are often interpreted as floating point numbers. ## panel: people usually explain floating point as "it's scientific notation, but in binary!" That's true, but I've never found it intuitive so we're going to explain it a different way.
NaN and infinity
## NaN and infinity ## NaN stands for "not a number" It means the result of the calculation is undefined. `0/0 = NaN` `sqrt(-1) = NaN` `log(-1) = NaN` ## infinity "Infinity" just means "this number is too big for floating point to handle." There are two infinities: one positive, one negative. `2.0**1024 = inf` (`2.0**1024` means `2^1024`) `-1/0 = -inf` `inf 10 = inf` `inf - inf = NaN` ## NaNs spread As soon as one NaN gets in, it gets everywhere `NaN * 5 = NaN` `NaN + 2 = NaN` ## NaN != NaN NaN isn't equal to anything (including itself) ## NaN and infinity: the bits A floating point value is `NaN` or `infinity` if the bits in the exponent are all 1. For example, this is a `NaN`: `01111111 11110001 00000000 00000000 00000000 00000000 00000000 00000000` It's `infinity` if the offset bits are all 0, otherwise it's `NaN`. There are 2^52 values like this: 2 of them are `±infinity` and the other 2^52-2 are `NaN`. We usually treat `NaN` like a single value though. ## a note on byte order All of the floating point examples in this zine use a big endian byte order, because it's easier to read. But most computers use a little endian byte order. You can see this in action at `https://memory-spy.wizardzines.com`
meet the byte
## meet the byte ## You might have heard that a computer's memory is a series of bits (Os and 1s)... `010100110101010110110111` but you only access them in groups of 8 bits - a byte! `01010011 1010101 10110111` ## 2 ways to think about a byte 1. 8 bits 2. an integer from 0 to 255 `00000000` = `0` `00000001` (8 bits!) = `1` (integer!) `00000010` = `2` `01011001` = `89` ## you can't just access 1 bit Every byte in your computer's memory has an address. If you want to fetch 1 bit, you need to fetch the whole byte at that address and then extract the bit. ## some things that are 1 byte - the boolean `true` (in C) `00000001` - the ASCII character F `01000110` - the red part of the colour `#FF00FF` `11111111` ## most things are more than one byte - integers and floats are Usually 4 bytes or 8 bytes - strings are LOTS of bytes (for example, in UTF-8 a heart emoji is 3 bytes) ## bytes weren't always 8 bits In the past, people experimented with lots of different byte sizes (2, 3, 4, 5, 6, 8, and 10 bits!) But now we've standardized on 8 bits pretty much everywhere.
little vs big endian
## little endian / big endian ## we write dates in two main orders 1. 2023-03-17 ("big endian") 2. 17-03-2023 ("little endian") 3. 03-17-2023 ("american") "big endian" means that the big unit (the year) is at the start ("big end first") ## similarly: computers order bytes in 2 ways Here are 2 ways your computer might represent the integer 271: 1. big endian: `00000001 00001111` 2. little endian: `00001111 00000001` How this corresponds to 271: `00000001 00001111` is 271 in binary ## When you send integers on a computer network, they have to be big endian. Here's how that works: Computer A has the 16-bit integer "271" in its memory: `00001111 00000001` Computer A flips the bytes and sends it as big endian: `00000001 00001111` Computer B receives the big endian integer Computer B flips the bytes and stores it in memory as little endian: `00001111 00000001` ## a little history Before 1980, computers ordered their bytes in different ways. In 1980, the Internet started being standardized, causing a huge fight over which byte order to use on the Internet. The terms "big/little endian" come from that fight: they were coined in an article called "On Holy Wars and a Plea For Peace" which compares the byte order fight to the Big/Little Endians in Gulliver's Travels. Big endian won that fight, so most Internet protocols (IPv4, TCP, UDP, etc.) are big endian. But almost all modern computers are little endian. Some machines, like the Xbox 360, are big endian though.
integers
## integers ## panel 1: To decode bytes as integers, we need to know 3 things: 1. the integer's size (8 bit, 16 bit, 32 bit, or 64 bit) 2. is it little or big endian? 3. is it signed or unsigned? ## panel 2: how signed integers work is the hardest part) to understand (I only learned how it works a couple months ago!). Just knowing that unsigned and signed integers are different will take you a long way. ## 2 bytes, 3 interpretations `254 | 0 ` We could interpret these 2 bytes as: 1. `254` (little endian) 2. `65024` (big endian, unsigned) 3. `-512` (big endian, signed) ## how you decode bytes depends on the context - in a program's memory, the type of the variable tells you the integer's size and if it's signed/unsigned - your CPU determines if integers are big or little endian (you don't have a choice) - for a binary network protocol (like DNS), the specification (for DNS, that's RFC 1035) will tell you how to decode the bytes ## examples of types - in Rust, an `i64` is a signed 64-bit integer - in Go, a `uint32` is an unsigned 32-bit integer - in C, a `short` is usually a signed 16-bit integer, depending on the platform
integer overflow
## integer overflow ### integers have a limited amount of space The 4 usual sizes for integers are 8 bits, 16 bits, 32 bits, and 64 bits ### the biggest 8-bit unsigned integer is 255 ... so what happens if you do 255 + 1? going above/below the limits is called overflow the result wraps around to the other side 255 + 1 = 0 255 + 3 = 2 200 * 2 = 144 0 - 2 = 254 ### maximum numbers for different sizes bits: unsigned signed 8: 127 255 16: 32767 65535 32: 2 billion ~4 billion 64: ~9 quintillion ~18 quintillion ### overflows often don't throw errors computer (thinking): "255 + 1? that number is 8 bits, so the answer is 0! that's what you wanted right?" This can cause VERY tricky bugs ### some languages where integer overflow happens Java/Kotlin C/C++ Rust Swift C# SQL R Go Dart Python (only in numpy) Some throw errors on overflow, some don't, for some it depends on various factors. Look up how it works in your language!
how floats are printed
## how floats are printed ## computers lie when they print out floats (by rounding) For example `0.12` isn't `0.12`, it's actually (roughly): `0.119999999999999995559` is my computer LYING to me??? about NUMBERS? ## the string -> float translation If your program says: `x = 0.12` your interpreter / compiler needs to translate "`0.12`" into the float `0.119999999999999995559`. Most languages will use the `strtod` ("string to double") function from libc to do that translation. ## the float -> string translation This is where the rounding comes in. Computers round to make the numbers shorter and easier to read. `1.19999999999999995559` ↪ 1.2 ## float -> string translation is actually super complicated Every floating point number needs a unique string representation. There are a bunch of academic papers about how to do this well, search "Printing floating point numbers accurately" to read more about it. ## some examples of printing floats `1.19900000000000006573` ↪`1.199` `1.19999999000000001637` ↪`1.19999999` `1.19999999999998996358` ↪ `1.9999999999999` `1.19999999999999995559` ↪`1.2` ## you can also print floats in base 16 or base 2 For example, 0.1 as a 32-bit float is: base 16: `0x1.99999ap-4` (`p-4` is the base 16. version of `e-4`) base 2: `1.10011001100110011001101p-100` The base 2/base 16 representations are not rounded, but they're rarely used.
how bitwise operations are used
### Binary formats often pack information into bytes very tightly to save space. For example, here are 2 bytes from a real TCP packet: `10000000 00010000` The first "`1000`" is the offset (4 bits) The following "`000`" is reserved (3 bits) The remaining "`00010000`" are the flags (9 bits) Here's how `&`, `|`, `<<`, `>>` can be used to pack/unpack data into bytes. ### bit masking Let's say we have the 2 bytes from the previous panel, and we want to extract just the flags part. Here's how to do it with `&` (bitwise and): The idea is that you put a mask "on top" of the bytes to erase bits: `X: 10000000 00010000` (number) `0x01FF: 000000001 1111111` (bit mask) `x & 0x01FF: 000000001 0010000` (how they combine) `000000001`: these 7 bits all get set to 0 `0010000`: these 4 bits stay the same ### check/set bit flags (see page 16 for more) set a bit flag with or: ``` x = x | 0b010000; ``` check a bit flag with and: ``` if ((x & 0b010000) != 0) { 00001000 X } ``` (this example is in C) ### unpack/pack bits Now let's talk about the offset from the first panel. We can't do calculations in it with the packed form, so we need to unpack it. You can unpack with >>: ``` 10000000 -> 00001000 X -> X >> 4 ``` and pack with <<: ``` 0001000 -> 10000000 X -> X << 4 ``` 1000 in binary is 8, which in this case is the TCP offset value.
hexadecimal
## panel 1: let's talk about how to write binary data one way: binary `01111111 11111111 11111111`\ it's easy to see the bits... `1010110110101001010`\ but it's hard to read a lot of them another way: base 10\ `83888607`\ but I have NO IDEA how many bits that is ## panel 2: now the best way to write binary data: base 16! It's short AND maps well to bits!\ `7fffff`\ Every hexadecimal digit represents 4 bits. So 1 byte (8 bits) is always 2 hexadecimal digits. ## panel 3: there are 16 hex digits: `0 → f` ``` | hex | decimal | binary | | 0 | 0 | 0000 | | 1 | 1 | 0001 | | 2 | 2 | 0010 | | 3 | 3 | 0011 | | 4 | 4 | 0100 | | 5 | 5 | 0101 | | 6 | 6 | 0110 | | 7 | 7 | 0111 | | 8 | 8 | 1000 | | 9 | 9 | 1001 | | a | 10 | 1010 | | b | 11 | 1011 | | c | 12 | 1100 | | d | 13 | 1101 | | e | 14 | 1110 | | f | 15 | 1111 | ``` ## panel 4: 0x means it's hex In many languages, the 0x prefix lets you write numbers in hexadecimal.\ For example, in C:\ 0x20 == 32 (base 16)\ 0b10100 20 (base 2)\ 061 == 49 (base 8)\ be careful: the 0 prefix meaning "base 8" can really trip you up! ## panel 5: things hexadecimal is used for color codes! (e.g. `#FF00FF`)\ memory addresses!\ hashes! (like git commit IDs)\ displaying binary data! (like with `hexdump`)
floating point: the bits
### panel 1: Floats need to fit into 64 bits. But how do we actually convert a number like 10.87 into 64 bits? First, we split the number into 3 parts: the sign, a power of 2 and an offset (The usual term is "significand", but I find that term calling it "offset") `10.87 = + (8 + 2.87) ` (8 is the biggest power of 2 that's less than 10.87) Next, we encode the sign, power of 2, and offset into bits! ### encoding the sign (1 bit) `+ is 0` `- is 1` ### floating point encoding is defined in the IEEE 754 standard since it's standardized, it works the same way on every computer! it was originally defined in 1985 ### encoding the exponent (11 bits, 2^-1023 to 2^1023) `8` ↓ `2^3 = 8` `3` ↓ add 1023 (this makes sure that the result is positive) `1026` ↓ write it in binary, in 11 bits `10000000010` ### encoding the offset (52 bits) `2.87` ↓ divide by the gap size, 2^-49 in this case (2^exponent-52) `1615666366319165.3 ` ↓ round `1615666366319165` ↓ write it in binary, 52 bits `01011011110101110000101000 ` `11110101110000101000111101` ### And here's `10.87`! `01000000 00100101 10111101 01110000 10100011 11010111 0001010 00111101`
floating point representation
### the (64-bit) floating point number line Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024]. Every window has 252 floats in it. The windows [-2, -1], [-1, -1/2], [-1/2, -1/4], [-1/4, 0], [0, 1/4], [1/4, 1/2], [1/2, 1], and [1, 2], each have 2^52 numbers. [2, 4] has 2^52 numbers. [4, 8] has 2^52 numbers. Illustration of a horizontal line, with the windows plotted out on it, showing that each window doubles in size as it moves away from zero. ### the windows go from REALLY small to REALLY big The window closest to 0 is [2^-1023, 2^-1022] This is TINY: a hydrogen atom weighs about 2^-76 grams. The biggest window is [2^1023, 2^1024]. This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away. ### the gaps between floats double with every window window: [1, 2] gap: 2^-52 window: [2, 4] gap: 2^-51 window: [4, 8] gap: 2^-50 window: [8, 16] gap: 2^-49 ### why does `10000000000000000.0 + 1 = 10000000000000000.0`? - In the window [2^n, 2^n+1], the gap between floats is 2^n-52 - `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2) - So the next float after `10000000000000000.0` is `10000000000000002.0`
floating point math
## floating point math let's deconstruct `0.1 + 0.2` 1. O The closest 64-bit float to 0.1 is (roughly) `0.1000000000000000055511151231` 2. For 0.2, it's (roughly) `0.2000000000000000111022302462` 3. `0.1000000000000000055511151231 + 0.2000000000000000111022302462 = 0.3000000000000000166533453693` 4. Inconveniently, `0.3000000000000000166533453693` is exactly in between 2 floating point numbers: `0.2999999999999999888977` and `0.30000000000000004440892` 5. How do we pick the answer? `0.30000000000000004440892` has an even offset, so we round to that one ## losing a little precision is okay `0.1 0.2 0.30000000000000004` is usually no big deal. Do you REALLY need your answer to be accurate to 16 decimal places? Probably not! ## the more numbers you add, the more precision you lose This Go code: `var meters float32 = 0.0 ` `for i = 0; i < 100000000; i++ { meters += 0.01` `} fmt.Println(meters)` prints out `262144`, not `1000000` because `262144.0+ 0.1 = 262144.0` ## adding a number to a MUCH smaller number is bad For example: 2 xx 53 + 1.0 = 2 xx 53 1.0 + 2 xx -57 = 1.0 (try it!) ## Use scientific computing libraries if you can There are special algorithms for adding up lots of small floating numbers without losing accuracy! For example `numpy` implements them.
floating point is weird
## floating point is weird ## floating point 10.0 is not the same as the integer 10 10 (64-bit integer): `0x000000000000000a` 10.0 (64-bit float): `0x4024000000000000` (what's this 4024 doing???) ## computer integers work almost exactly the way you'd expect `1 + 2 - 3 = 0` but floating point numbers don't: ` (0.1 + 0.2) - 0.3 = 0.0000000000000000555` ## checking for float equality is dangerous `if x == 0.3`: bad! `(0.1 + 0.2)` is not equal to `0.3`! Instead, check if x is very close to 0.3, something like this: `if abs(x 0.3) 0.0000001:` ## in floating point, very large integers get rounded For example: `10000000000000001.0 == 10000000000000000.0` (16 zeros) (try comparing those 2 numbers in your favourite language! they're the same!) ## (x + y) + z is not the same as x + (y + z) For example: `(9007199254740992.0+ 1.0) 1.0 = 9007199254740991.0` (the math term for this problem is "floating point addition isn't associative") ## some intuition for precision 32-bit floats have about 8 digits of precision 64-bit floats have about 16 digits of precision
floating point alternatives
## more floating point alternatives ## there are many alternative ways to represent numbers These are all implemented in software (not hardware) so they're a lot slower, and different languages have different libraries. ## alternative 1: decimal floating point This is like regular floating point, but in base 10 instead of base 2. It's also standardized in IEEE 754. Examples: Python's `decimal` module or Java's `BigDecimal` ## alternative 2: fractions This lets you do exact calculations with fractions (1/10 + 2/10 = 3/10) Examples: Python's fractions module in the standard library, Lisps have first-class support ## alternative 3: symbolic computation For example, `sqrt(2)` instead of `1.414`. You'll see this in computer algebra systems like Mathematica, Maple, or sympy. ## alternative 4: interval arithmetic The idea is to store every number as a range so that you can precisely track your error bars. Probably the least mainstream of these alternatives. ## alternative 5: binary-coded decimal This is how floating point numbers (and integers) were stored on IBM computers in the 60s, and you can still occasionally see it today in old formats like ISO 8583 for financial transactions.
fixed point
## fixed point ## just because you see 0.23, doesn't mean it's floating point For example, in this RGBA color: `rgba(211, 7, 23, 0.23)` `0.23` isn't a float at all, it's the 8-bit integer `59`. Let's see how that works! ## fixed point numbers are integers You interpret them as the integer divided by some fixed number (like 255 or 10000) For example, that opacity should be divided by 255 `59 / 255 = 0.23ish` ## things fixed point is often used for money: `$1.23 => 123` time: `0.1 seconds => 100000 microseconds` opacity: `0.23 => 59` ## fixed point is the most common alternative to floating point It's very simple and it's pretty easy to implement! ## implementing fixed point is easy (especially if you only need to add and subtract) You just need: - an integer - some code to display it (by dividing by 255 or something) ## fixed point can help avoid accuracy issues If you try to represent the current Unix epoch in nanoseconds as a 64-bit float, you'll lose accuracy. But if it's a 64-bit integer, it'll be fine.
bytes
bitwise operations
### bitwise operations operate one bit at a time The results can be surprising when you write them in base 10: `8 & 3 = 0` but in binary it makes more sense: ``` 00001000 (8) & 00000011 (3) = 00000000 ``` ### & Bitwise and: the result is 1 if BOTH bits are 1 ``` 1 & 1 = 1 1 & 0 = 0 0 & 0 = 0 11 & 10 = 10 ``` ### | Bitwise or: the result is 1 if EITHER bit is a 1 ``` 1 | 1 = 1 1 | 0 = 1 0 | 0 = 0 11 | 10 = 11 ``` ### ^ Bitwise xor: the result is 1 if EXACTLY ONE bit is a 1 ``` 1 ^ 1 = 0 1 ^ 0 = 1 0 ^ 0 = 0 11 ^ 10 = 01 ``` ### ~ Bitwise not: FLIP all the bits ``` ~0 = 1 ~1 = 0 ~10 = 01 ``` ### << Left shift: add 0s to the end `1110 <<< 3 = 1110000` `<< n` is the same as multiplying by 2^n ### >> Right shift: chop bits off the end 01100001 >> 2 = 00011000 `>> n` is the same as dividing by 2^n ### there are actually two right shifts unsigned right shift ``` 253 >> 1 = 126 11111101 -> 01111110 ``` always pad on the left with a 0 signed right shift ``` -3 >> 1 = 2 11111101 -> 11111110 ``` if the number is negative, pad on the left with 1 instead of a 0 In some languages, unsigned right shift is >>>. In other languages, both right shifts are >> and the integer's type determines which is used.
bit flags
## bit flags ## bit flags are a clever way to store lots of information in one integer If you have many options which are true or false, you can encode them all into an integer, with 1 bit for each option. 32 bits 32 options! For example, some of the bit flags the open function in C uses: - nofollow - append - truncate - create - write only - read write (this is on Linux) ## where you'll see bit flags In libc, the open, socket, and mmap functions use bit flags to pass options. The TCP and UDP protocol headers both have a flags field which has bit flags. ## bit flags are used a lot in C code Here's some C code that opens a new file: `fd = open("file.txt", O_RDWR | O_CREAT, 0666);` `O_RDWR` is: `00000010` `O_CREAT` is: `01000000` `O_RDWR | O_CREAT` is: `01000010` You can check if a bit flag is set in C like this: `if (flags & O_RDWR) { ... }` ## fun example: tic tac toe! Here's a way to encode the state of a tic tac toe game in 18 bits: x positions: `100` `010` `010` O positions: `010` `001` `100`
big integers
## big integers ## integers don't have to overflow Instead, integers can expand to use more space as they get bigger. Integers that expand are called "big integers". big integer: I'm going to use ONE THOUSAND bytes of space! ## big integer math is slower It's slower because it's implemented in software, not hardware. So a big integer addition is actually turned into lots of smaller additions. ## how big integers are represented (in Go, as of 2023) You can think of this array of 64-bit integers as being the number written in base 2^64 ## some languages only have big integers Python 3 and Ruby: we'd rather have slower math and no weird overflow problems! This works because people don't do a lot of math in Ruby/Python (except with numpy, which doesn't use big integers). ## some languages offer big integers as an option Go, Javascript, Java, and lots more. Each language has its own big integer implementation. ## when are big integers useful? - they're used in cryptography (e.g. for large key sizes) - for math on really big integers
bases
### We usually write numbers in base 10, but you can write numbers in any base. Let's write the number 103 in 3 different bases: base 10: `103` (powers of 10) ``` 1 x 100 = 100 0 x 10 = 0 3 x 1 = 3 = 103 ``` base 2: `1100111` (powers of 2) ``` 1 x 64 = 64 1 x 32 = 32 0 x 16 = 16 0 x 8 = 8 1 x 4 = 4 1 x 2 = 2 1 x 1 = 1 64 + 32 + 16 + 8 + 4 + 2 + 1 = 103 ``` base 3: `67` (powers of 16) ``` 6 x 16 = 96 7 x 1 = 7 96 + 7 = 103 ``` ### base 2, 10, and 16 are the main bases we use on computers - base 2 is called binary - base 10 is called decimal - base 16 is called hexadecimal ### how to convert from base 10 to base 2 Let's convert 19! We'll start on the right and move left. 1. Divide by 2: 19/2 = 9 remainder 1 2. Write the remainder (1) below, and 9 on the left 3. Repeat answer: 10011! person: but in real life I'd just ask a computer
ASCII
## panel 1: a string is an array of bytes ASCII is the simplest string encoding: 1 character = 1 byte. Let's see how it works! (We usually use UTF-8, which is WAY more complicated) ## panel 2: every printable ASCII character ``` !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\ []^_`abcdefghijklmnopqrstuvwxyz{|}~ ``` There are no accents because it's an English encoding: the "A" in ASCII is for "American". ## panel 3: there are 128 ASCII characters Only the bytes 0 to 127 are defined. It's very limited: you can really see why we need more powerful encodings like UTF-8! ## panel 4: how bytes map to characters Here's a partial list, look up "ASCII table" for the full list. Bytes (in base 10) are on the left, characters are on the right. 33 is !, 34 is " 48 is 0, 49 is 1 64 is A, 65 is B 97 is a, 98 is b ## panel 5: a trick to translate from lowercase to uppercase In ASCII, the lowercase letters are 32 more than the uppercase letters. So you can just subtract 32!
8 bytes, many meanings
## 8 bytes, many meanings The same bytes can mean many things. Here are 8 bytes and a bunch of things they could potentially mean a picture of 8 bytes: the ASCII characters for 'computer' some things they could mean: * 8 8-bit integers * 4 unsigned 16-bit integers * a 64-bit pointer * 2 IPv4 addresses * x86 machine code * 2 32-bit floating point numbers * 1 64-bit floating point number * 2 RGBA colours person: "don't worry if you don't understand all this right now! We'll explain. note on x86 machine code: this code is nonsense, but search "ascii shellcode" for x86 code which is valid ASCII.
32 bits is small
## panel 1: using 32-bit integers is dangerous Let's see some examples of how it can go wrong and why it's almost always better to use 64-bit integers instead! (32-bit floats are bad too, for similar reasons) ## panel 2: 32 bit integers are at most 4 billion unsigned 32-bit ints go from 0 to 4,294,967,295 (4 billion) signed 32-bit ints go from -2,147,483,648 to 2,147,483,647 ## panel 3: times "4 billion" wasn't enough **Database primary keys**: 4 billion records really isn't that much. **IPv4 addresses**: turns out we want more than 4 billion computers on the internet. Oops. **Registers**: in the 90s, registers were 32 bits. 4 billion bytes of RAM is 4GB. We need more than that. **Unix timestamp**s: 2 billion seconds after Jan 1, 1970 is Jan 19, 2038. That's going to be an exciting day. (look up "2038 problem"!) ## panel 4: 64 bits is usually big enough For example, 2^64 seconds after Jan 1, 1970 is over 100 billion years in the future: well after the death of the sun. So a 64-bit timestamp is definitely enough space. ## panel 5: be wary of using 32-bit integers by accident Systems that were designed in the 90s often have a 32-bit integer as the default. For example, in MySQL an INTEGER is 32 bits.
PATH
### PATH is how your shell knows where to find programs It's a list of directories that your shell searches in order. smiling stick figure: `$ python3` shell, represented by a nautilus shell: `PATH=/bin:/home/bork/bin:/usr/bin` (directories are separated by colons) shell: 1. `/bin/python3`? nope, doesn't exist 2. `/home/bork/bin/python3`? nope, doesn't exist 3. `usr/bin/python3`? there it is!!! run that! ### how to add a program to your `PATH` 1. find the directory the program is in 2. update `PATH` in your config with that directory 3. restart your shell for WAY TOO MUCH info about how to do this, see `https://wzrd.page/path` ### ...but which directory was the program installed in? remember how you installed it: little stick figure with curly hair, thinking: hmm, I used the Rust installer, where does that install things? ... or do a brute force search: `find / -name python3 | grep bin` (usually I put a `2>/dev/nu11` too) ### `PATH` ordering drama little stick figure with curly hair, thinking: ugh, no, don't run THAT `python3`, run the other one! You can prioritize a directory by adding it to the *beginning* of your `PATH` ### gotcha: not everything uses your `PATH` cron jobs usually have a very basic `PATH`, maybe just `/bin` and `/usr/bin` In a cron job I'll use the absolute path `/home/bork/bin/someprogram`
some people who make programming easier
### the loud newbie newbie: wait, HOW does X work?? other person, thinking: I'm so glad they asked, I was wondering that too... ### the grumpy old timer new person: X is so cool! grumpy old timer: it is! let me tell! you about some ways it can break though.... ### the bug chronicler that bug was so gnarly, I'm going to write an EXTREMELY CLEAR description of what happened so we I can all learn from it ### the documentarian person 1: here's how you do X... documentarian: I'll put those instructions in our wiki! ### the "today I learned..." I just learned this cool new tool... check out this weird bug! ### the "I've read the entire internet" person: how does X work? TAB GIRL: ah, I read about that recently... here's a link from my 200 browser tabs ### the tool builder everyone keeps getting confused by X! I'm going to fix it with CODE. ### the question answerer person 1: hey can you explain how X works? question answerer: I would LOVE to ### blank final panel ?
strace command line flags I love
### -e overwhelmed by all the system calls. you don't understand? Try `strace -e open` and it'll just show you opens. much simpler! ### -f is for follow Does your program start subprocesses! lots do! Use `-f` to see what those are doing too. Or just always use `-f`! That's what I do. ### -p is for PID "OH NO I STARTED THE PROGRAM 6 HOURS AGO AND NOW I WANT TO STRACE IT" Do not worry! Just find your process's PID (like 747) and `strace -p 747` (tip: if the process runs as root you'll need to be root, too because SECURITY) ### -s is for strings!! Sometimes I'm looking at the output of a recvfrom and it's like: recvfrom (6, "And then the monster...") and OH NO THE SUSPENSE. `strace -s 800` will show you the first 800 characters of each string. I use it all the time! ### -o is for output! Let's get real. No matter what, strace prints too much damn output. Use `strace -o too_much_stuff.txt` and sort through it later. ### -y Have no idea which file the file descriptor "3" refers to? `-y` is a flag in newer versions of strace, and it'll show you filenames instead of just numbers! ### Putting it all together: Want to spy on an ssh session? `strace -f -o ssh.txt ssh juliabox.com` Want to see what files a Dropbox sync process is opening? (with PID: 230) `strace -f -p230 -e open`
your domain's authoritative nameservers
### when you register a domain, your registrar runs your authoritative nameservers by default your registrar, represented by a box with a smiley face wearing a crown: I'm taking care of your DNS! You can change your nameservers in your registrar's control panel. ### LOTS of services can be your authoritative nameserver your registrar: I can manage your DNS records! AWS, also represented by a box with a smiley face wearing a crown: me too! shopify, also also represented by a box with a smiley face wearing a crown: me three! Nonplussed stick figure with curly hair: ok chill I only need one of you to do it ### how to find your domain's nameservers ``` $ dig +short NS neopets.com ns-42.awsdns-05.com. ns-1191.awsdns-20.org. ``` `neopets.com` is using AWS's nameservers right now ### how to change your nameservers 1. Copy your DNS records to the new nameservers (use dig to check that it worked) 2. On your registrar's website, update your nameservers 3. Wait 48 hours 4. Delete the old DNS records (to save your future self confusion) ### why changing your nameservers is slow registrar: here's the new nameserver for example.com! .com nameserver, represented by a box with a smiley face, wearing a stack of three crowns: ok great, I've saved this record: `example.com NS newns.com 172800` updates are VERY SLOW because this TTL is 2 days ### what can go wrong if you don't delete the old records Illustration of a nonplussed stick figure with curly hair. person: I'll go to $OLD_NAMESERVER to change my DNS records! person: WHY doesn't it WORK?!?!? person: oh right, I changed this domain's nameservers last year, oops!
git cheat sheet
Illustration of a smiling stick figure with short curly hair. Person: git has 17 million options but this is how I use it!  ### getting started #### start a new repo: `git init` #### clone an existing repo: `git clone $URL` ### know where you are `git status` ### prepare to commit #### add untracked file: (or unstaged changes) `git add $FILE` #### add ALL untracked files and unstaged changes: `git add` #### choose which parts of a file to stage: `git add -p` #### delete or move file: ``` git rm $FILE git mv $OLD $NEW ``` #### tell git to forget about a file without deleting it: `git rmcached $FILE` #### unstage everything: `git reset HEAD` ### make commits #### make a commit: (and open a text editor to write the message) `git commit` #### make a commit: `git commit -m 'message'` #### commit all unstaged changes: `git commit -am 'message'` ### move between branches #### switch branches: `git switch $NAM`E OR `git checkout $NAME` #### create a branch: `git switch -c $NAME` OR `git checkout -b $NAME` #### list branches: `git branch` #### delete a branch `git branch -d $NAME` #### force delete a branch: `git branch -D $NAME` #### list branches by most recently committed to: ``` git branch --sort--committerdate ``` ### look at a branch's history #### log the branch `git log main` #### show how two branches relate to each other: `git log-graph a b` #### one line log: `git log-oneline` ### code archaeology #### show who last changed each line of a file: `git blame $FILENAME` #### show every commit that modified a file: `git log $FILENAME` #### find every commit that added or removed some text: `git log S banana` ### diff commits #### show diff between a commit and its parent: `git show $COMMIT_ID` #### show diff between a merge commit and its merged parents: `git show --remerge-diff $COMMIT_ID` #### diff two commits: `git diff $COMMIT_ID $COMMIT_ID` #### just show diff for one file: `git diff $COMMIT_ID $FILENAME` #### show a summary of a diff: `git diff $COMMIT_ID --stat git show $COMMIT_ID --stat` ### diff staged/unstaged changes #### diff all staged and unstaged changes: `git diff HEAD` #### diff just staged changes: `git diff --staged` #### diff just unstaged changes: `git diff` ### configure git #### set a config option: `git config user.name 'Julia'` #### see all possible config options: `man git-config` #### set option globally: `git config --global ...` #### add an alias: `git config alias.st status` ### important git files #### local git config: `.git/config` #### global git config: `~/.gitconfig` #### list of files to ignore: `.gitignore` ### combine diverged branches #### how the branches look before: Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The second branch consists of a box with a spiral and a box with a squiggle. The second branch is labelled "banana". #### combine with rebase: ``` git switch banana git rebase main ``` Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The box with the spiral and the box with the squiggle have been added on after the box with the hashtag. The box with the squiggle is labelled "banana". The second branch, with the box with a spiral and the box with a squiggle, are drawn with dotted lines and labelled "lost". #### combine with merge: ``` git switch main git merge banana git commit ``` This diagram is like the "before" diagram, except now the two branches converge into a new box, with a diamond in it, labelled "main". #### combine with squash merge: ``` git switch main git merge git commit squash banana ``` This diagram is like the "before" diagram, except now, in the first of the two branches, after the hashtag symbol, there is a new box with both a spiral and a squiggle in it, labelled "main". ### bring a branch up to date with another branch (aka "fast-forward merge") main banana ---0-0 ``` git switch main git merge banana ``` banana ---0-2 main ### copy one commit onto another branch before: -K ← main +banana git cherry-pick $COMMIT_ID after: K main © -banana ### add a remote `git remote add $NAME $URL` ### push your changes #### push the main branch to the remote origin: `git push origin main` #### push a branch to the remote origin that you've never pushed before: `git push u origin $NAME` #### push the current branch to its remote "tracking branch": `git push` #### force push: `git push --force-with-lease` #### push tags: `git push --tags` ### pull changes #### fetch changes: (but don't change any of your local branches) `git fetch origin main` #### fetch changes and then merge them into your current branch: `git pull origin main` OR `git pull` #### fetch changes and then rebase your current branch: `git pull --rebase` #### fetch all branches: `git fetch --all` ### ways to refer to a commit every time we say $COMMIT_ID, you can use any of these: * a branch (`main`) * a tag (`v0.1`) * a commit ID (`3e887ab`) * a remote branch (`origin/main`) * current commit (`HEAD`) * 3 commits ago (`HEAD^^^`) * 3 commits ago (`HEAD~3`)
the floating point number line
### the (64-bit) floating point number line Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024]. Every window has 2^52 floats in it. - between -2 and -1 - between -1 and - 1/2 - between - 1/2 and - 1/4 - between - 1/4 and 0 - between 0 and 1/4 - between 1/4 and 1/2 - between 1/2 and 1 - between 1 and 2 ### the windows go from REALLY small to REALLY big The window closest to 0 is [2-1023 2-1022]. This is TINY: a hydrogen atom weighs about 2^-76 grams. The biggest window is [2^1023, 2^1024]. This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away. ### the gaps between floats double with every window - window: [1, 2] gap: 2^-52 - window: [2, 4] gap: 2^-51 - window: [4, 8] gap: 2^-50 - window: [8, 16] gap: 2^-49 ### why does `10000000000000000.0 + 1 = 10000000000000000.0?` - In the window [2^n, 2^n+1], the gap between floats is 2^n-52 - `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2) - So the next float after `10000000000000000.0` is `10000000000000002.0`
scenes from distributed systems
git references
### git often uses the term "reference" in error messages ``` $ git switch asdf fatal: invalid reference: asdf $ git push To github.com:jvns/int-exposed ! [rejected] main -> main error: failed to push some refs to 'github.com:jvns/int-exposed' ``` "ref" and "reference" mean the same thing Illustration of a tiny worried-looking stick person with a thought bubble reading "!" ### "reference" often just means "branch" in those two error messages, you can replace "reference" with "branch" in my experience, it's: 96% "branch" 3% "tag" 3% "HEAD" 0.01% something else ### it's an umbrella term Illustration of git, represented by a box with a smiley face git, thinking: "well, I COULD check if the thing we failed to push is a branch or tag or what, and customize the error message based on that...." git, thinking: "seems complicated, let's just print out "reference"" sad person: "why?" ### reference: the definition References are files: either `.git/HEAD` or files in `.git/refs`. There are 5 main types. Here's a list of every type of git reference that I have ever used: - HEAD: `.git/HEAD` - branches: `.git/refs/heads/BRANCH` - tags: `.git/refs/tags/TAG` - remote-tracking branches: `.git/refs/remotes/REMOTE/BRANCH` - stash: `.git/refs/stash` all of these files contain a commit ID, but the way that commit ID is used depends on what type of reference it is (examples of more obscure references are `.git/FETCH_HEAD` and `.git/refs/notes/...` but I've never needed to think about those and your repository probably doesn't even have notes) ### git's garbage collection starts with references the algorithm is: 1. find all references, and every commit in every reference's reflog 2. find every commit in the history of any of those commits 3. delete every commit that wasn't found
knowing where you are in git
### many git disasters are caused by accidentally running a command while on the wrong branch... Illustration of a stick figure with a neutral expression. person: `git commit` person, thinking: UGH I didn't mean to do that on `main` ### ... or by forgetting you're in the middle of a multistep operation smiling stick figure with curly hair: la la la just writing code same person, now distressed and surrounded by exclamation marks: OMG I FORGOT I WAS IN THE MIDDLE OF A MERGE CONFLICT ### I always keep track of 2 things 1. am I on a branch, or am I in detached `HEAD` state? 2. am I in the middle of some kind of multistep operation? (`rebase`, `merge`, `bisect`, etc) ### I keep my current branch in my shell prompt `~/work/homepage (main) $` to me it's as important as knowing what directory I'm in git comes with a script to do this in bash/zsh called `git-prompt.sh` ### decoder ring for the default git shell prompt `(main)` on a branch, everything is normal `((2e832b3...))` `((v1.0.13))` the double brackets (( )) mean `detached HEAD state`. this prompt can only happen if you explicitly `git checkout` a commit/tag/remote-tracking branch `(main|CHERRY-PICK)` `(main|REBASE 1/1)` `(main|MERGING)` `(main|BISECTING)` in the middle of a cherry-pick/rebase/merge/bisect
learning on my own
submodules
### panel 1 Illustration of a smiling stick figure with curly hair. person: I find submodules confusing and I avoid them if possible, but here's what I've learned from other people's writing on submodules (especially Dmitry Mazin's great "Demystifying Git Submodules" post) ### submodules let you store another git repository as a subdirectory ``` git submodule add https://github.com/jvns/myrepo ./myrepo ``` (`jvns` is the remote, `myrepo` is the local path) Git will store the commit ID and URL of the submodule ### gotcha: cloning a repository doesn't download its submodules To get the submodules, you can run this after cloning the repository: `git submodule update --init` ### gotcha: git pull and git checkout don't update submodules gotcha: git pull and git checkout don't update submodules To actually update them, you have to run: `git submodule update` every single time you switch branches or pull ### gotcha: git submodule update puts the submodule in detached HEAD state might not be a big deal if you're only using the submodule in a read-only way, but seems like it could get weird if you're editing it ### some submodule config options automatically update submodules after a pull/checkout: `submodule.recurse true` show which commits were added/removed in `git diff/git status`: ``` status.submoduleSummary true diff.submodule log ```
merge conflict tips
### use `diff3` or `zdiff3` to see the original version of the code `git config --global merge.conflictstyle diff3` This will add an extra section in the middle of your merge conflicts ### if you get confused, merge (or cherry-pick) 1 commit at a time This can make the conflicts smaller and easier to resolve! `git-imerge` is a tool to make this easier, though I haven't tried it ### use rerere to remember how you resolved a conflict during a rebase `git config --global rerere.enabled true` This means you won't have to resolve the exact same conflict over and over again ### `git checkout --ours/theirs` can take all changes from one side For example `git checkout --ours file.txt` will take the version of file.txt from the "ours" side of the merge (though upsettingly the meaning of "ours" and "theirs" depends on whether you merged or rebased) ### if you can't tell which code comes from which branch, looking on the web can help Illustration of an uncertain-looking stick figure with short curly hair. person (thinking): I'll just go to GitLab and see what `file.txt` looks like on the main branch ### `git merge-tree` can check for merge conflicts without actually merging the branches ``` $ git merge-tree --write-tree main mybranch ... Auto-merging file.py CONFLICT (content): Merge conflict in file.py ```
interactive rebase
### git rebase -i lets you garden your commits I use it like this: 1. make commits chaotically, `git commit -am 'wip'` 2. clean up with `git rebase -i` before sending them off for code review ### interactive rebase's UI is a text file when you run `git rebase -i main`, it'll open a text editor with something like this in it: ``` pick 399990 add some padding pick fb59d8 french translation pick 617b19 sort titles pick 31b81f hashchange ``` ### deleting commits You can delete a commit just by deleting that line in the text editor! (same as previous panel but the "french translation" line is scribbled out) ### combine commits with fixup Here's how to combine all 4 commits into 1 commit: (`f` stands for `fixup`) ``` pick 399990 add some padding f fb59d8 french translation f 617b19 sort titles f 31b81f hashchange ``` ### check that the tests pass with exec You can run make test on every intermediate commit to make sure your tests pass like this: ``` git rebase -i --exec "make test" main ``` (you can also use this to format every commit's code!) ### some other tips * `reword` lets you edit a commit message * If something goes very wrong, I try to run `git rebase --abort` ASAP, because undoing rebases is annoying
git worktree
### git worktree lets you have 2 branches checked out at the same time Illustration of a smiling stick figure with curly hair, and a git worktree, represented by a box with a smiley face person: ugh, I want to take a look at this other branch, but I have all these uncommitted changes... git worktree: i can help! ### creating a worktree You can check out a branch into a new directory like this: `git worktree add ~/my/repo mybranch` (`my` is the directory, `mybranch` is the branch) Then you can run any normal git commands in the new directory: ``` $ cd ~/my/repo $ git pull ``` ### two worktrees cant have the same branch checked out Here's what happens if you try: ``` $ git checkout main fatal: main is already checked out at /home/bork/work/homepage ``` ### it's way faster (and uses less space!) than cloning the repository again Because worktrees share a .git directory, it just needs to check out the files from the branch you want to use! ### other worktree commands List all worktrees: `$ git worktree list` Delete a worktree: `$ git worktree remove ~/my/repo` ### sometimes I use worktrees to keep my .git directory and its checkout separate this lets me put the checkout in Dropbox but not the .git directory: ``` $ git clone --bare git@github.com:jvns/myrepo $ cd myrepo.git $ git worktree add ~/Dropbox/myrepo main ``` (`Dropbox` is the directory, `main` is the branch)
git add -p
### `git add -p` lets you stage some changes and not others I use this if I want to commit my real changes, but not the random debugging code I added. (this is one of the tasks GUIs and IDEs are best at, but I always use `git add -p` anyway) ### what the interface looks like ``` --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ "name": "homepage", - "version": "1.0.0", + "version": "1.0.1", "devDependencies": { - "dart-sass": "^1.25.0" + "dart-sass": "^1.26.0", (1/1) Stage this hunk [y,n,q,a,d, s,e,?]? ``` package.json is the filename lines 4-9 are the diff `[y,n,q,a,d, s,e,?]` is your choice ### y(es)/n(o)/q(uit) y means "stage this change" n means "don't" q quits, keeping what you did so far. pretty straightforward. ### how to check your work `git diff --cached` will show your staged changes ### s: split into two parts s will split a diff into smaller diffs you can say y or n to individually, like this: ``` +++ b/package.json @@ -1,7 +1,7 @@ - "version": "1.0.0", + "version": "1.0.1", "devDependencies": { ``` BUT! This only works if there's a newline between the two parts. ### how to split a diff if there's no newline You can use the e ("edit") option to edit the diff manually: - to remove a - line, replace "-" with a space - to remove a + line, delete the whole line version 1: ``` "name": "homepage", - "version": "1.0.0", - "devDependencies": { "version": "1.0.1", + "devDependenciezzz' ``` version 2: ``` "name": "homepage", - "version": "1.0.0", + "version": "1.0.1", [space] "devDependencies": [space] ``` (or you can just say 'n' and edit your code! that's what I do!)
folder gotchas
## panel 1: `ls ..` and `cd ..` refer to different folders if you `cd` to a symlinked folder `~/Dropbox -> ~/Library/CloudStorage/Dropbox` ``` cd ~ cd Dropbox ls .. cd .. ``` * `ls ..` lists `~/Library/CloudStorage` * `cd ..` moves to `~` this is because `ls` is a program and `cd` is run by the shell. The shell handles `..` differently from other programs. ## panel 2: `ls ~/Dropbox` will list the contents of the folder this is annoying if you just want to look at its permissions, or where it links to to fix this: ``` ls -d ~/Dropbox ``` ## panel 3: deleting a folder and recreating it with the exact same name makes everything weird everything you do in the folder will fail with weird errors like: ``` $ touch newfile touch: newfile: no such file or directory ``` how to fix it: ``` cd . ``` ## panel 4: on Mac OS, these are not the same: `cp -R a/ b` and `cp -R a b` * `cp -R a/ b` merges the contents of `a` into `b` * `cp -R a b` copies the whole folder into `b/a` ## panel 4: tip: `cd -` switches to the folder you were previously in ## panel 5: notes on `mv file.txt dest` * if `dest` is a file: renames `file.txt` * if `dest` is a folder: moves `file.txt` to that folder
oh shit! I committed something to main that should have been on a brand new branch!
1. Make sure you have main checked out: `git checkout main` 2. Create the new branch: `git branch my-new-branch` 3. Remove the unwanted commit from main: ``` git status git reset --hard HEAD~ ``` (careful!) 4. Check out the new branch! `git checkout my-new-branch` Smiling stick figure with medium length straight hair: `git branch` and `git checkout -b` both create a new branch. The difference is `git checkout -b` also checks out the branch
what's HTTP?
HTTP is the protocol (Hypertext Transfer Protocol) that's used when you visit any website in your browser. Firefox, to server: HTTP request - cat picture please server, to Firefox: HTTP response - cat.gif The exciting thing about HTTP is that even though it's used for literally every website, HTTP requests and responses are easy to look at and understand: server: here's an HTTP response! person: that response has the wrong Content-type header, that's why the website isn't working! Example of what an HTTP request and response might look like: ### request request line: `GET / HTTP/I .1 .1` headers: ``` Host: examplecat.com User-Agent: curl Accept: */* ``` ### response status: `HTTP/I .1 .1 200 0K` headers: ``` Cache-Control: max-age=604800 Content-Type: text/ html Etag: "1541025663+ident" Server: ECS (nyb/1D0B) Vary: Accept-Encoding X-Cache: HIT Content-Length: 1270 ``` body: ``` <!doctype html> <title>Examp1e Cat</title> ... ``` that text is a lot to understand, so let's get started learning what of it means!
types of terminal programs
### knowing what type of program you're in really helps stick person with curly hair, thinking: why doesn't `Ctrl+C` quit?? Oh, I'm in a REPL, I should use `CTRL+D` instead. ### 1. REPLs (`sqlite`, `ipython`, `bash`) - you can probably use basic `readline` shortcuts to edit text - `Ctrl+D` usually quits - REPL stands for Read code, Evaluate it, Print the output, Loop (repeat) ### 2. full screen programs (`top`, `ncdu`) - `q` might quit - `?` might open the help - gotcha: if mouse reporting is on, you can't select text without pressing `Shift` ### 3. noninteractive programs (`grep`, `find`) - `Ctrl+C` usually quits - gotcha: you can get "stuck" waiting for input on stdin if you forget to specify an input (like if you run `cat` by itself) ### programs that play by their own rules `vim` doesn't act like any other program usually I avoid these unless (like with `vim`) I've made a special effort to learn them. ### `CTRL+C` doesn't always quit REPLs and full-screen programs often use `CTRL+C` to man "stop the current operation" instead of "quit the program"
the terminal: cast of characters
The "terminal" is actually a bunch of components that work together. Let's imagine that you're running `python3 blah.py`. Illustration of a flow chart. It begins with a smiling stick figure with curly hair, labelled "you". Arrows leading away from "you" are labelled "keyboard shortcuts", "type", and "click". The arrows lead to a little character with a winky cursor face, labelled "terminal emulator: xterm, iTerm, GNOME terminal". In the middle of the diagram are two boxes, PTMX, and TTY. Between the two of them is the OS terminal driver ("Linux, Mac OS"). There are arrows labelled "bytes" going between the terminal emulator and the OS terminal driver, and between the OS terminal driver and the programs on the far right. The programs ("cat, vim, top, bash") are represented by two box with smiley faces, labelled "shell" and "python". ### the terminal emulator your terminal emulator is a translator: - it translates all your typing/clicks into bytes - and it takes all the bytes the program sends and displays them on the screen ### the terminal driver the terminal driver is part of your operating system. It's in charge of sending signals to Python when you press `Ctrl+C`, and some other more obscure things. (more on page 24) ### the shell the shell is a special program which you use to start all other programs The shell doesn't do much after a program has started. Programs get a copy of the shell's current directory, environment variables, and input/outputs, etc and then they're on their own.
terminal escape codes
### a program's input and outputs are streams of bytes everything you type goes into standard input (almost) Illustration of a program, represented by a big box with a smiley face. There is an "in" arrow going into it, and "1 out" and "2 ERR" arrows coming out of it. all the output you see comes from either standard output or standard error your terminal emulator can only communicate with programs by reading/writing bytes ### some inputs/outputs are text and some are special instructions in: mouse position, ctrl+left arrow out: make text green, make cursor invisible ### these special instructions are called "escape codes" they're called "escape codes" because they all start with the ESC character five ways people print out `ESC`: - `\033` - `^[` - `ESC` - `\e` - `\x1b` ### example: how colours get set program: “`^[[31m`” Terminal emulator, represented by a box wth little arms and legs and a cute cursor winking face: “ok, I'll make text red from now on!!” (there are also codes for bold, underline, background colour) ### programs can easily "break" your terminal by printing escape codes program, represented by a frowning rectangle: “oops I made your cursor disappear” It's easy to fix though, run `reset` to print a special escape code that resets everything
stdin, stdout, stderr
### all terminal programs have 1 input and 2 outputs they're numbered: stdin is "0", stdout is "1", stderr is "2". Illustration of a program, represented by a box with a smiley face. There is an arrow labelled "0 IN" going into it, and arrows labelled "1 OUT" and "2 ERR" coming out of it. (the numbers are called "file descriptors") ### 3 things you can set the inputs/outputs to 1. the TTY (so output is displayed in your terminal emulator) 2. a file 3. a pipe (to write output to the input of another program) ### your shell is in charge of setting up stdin/stdout/stderr tiny smiling tick figure: “python3 script.py > out.txt” shell: “ok, I'll set stdout to out.txt for that program” ### when you redirect, the shell opens the file *before* the program starts `sudo echo blah > file.txt` shell, thinking: first i'll open `file.txt`... THEN I'll run `sudo echo blah` this is why `file.txt` isn't opened as root! ### on 2>&1 2>&1 redirects stderr to stdout The same illustration from the first panel, but with the addition of an arrow coming out from "2 ERR" and going into "1 OUT". you could also do `echo "oops" 1>&2`if you want to write a message to stderr in a script. ### gotcha: programs often buffer stdout but not stderr when a program writes text to stdout, it'll often 1. check if stdout is a TTY (using the `isatty` function) 2. if not, "buffer" the writes until there's `1KB` of data to write, for performance reasons (this is the default in libc)
shell history
### your shell has a history of the commands you ran some ways to access history: * press the up arrow * run `history` * search it with `Ctrl-R` (in `bash/zsh`) * use `!33` to rerun lin 35 from `history` (bash/zsh) ### how long does your shell store history for? (sad face) in bash, the default is 500 commands (not enough!) (happy face) in fish, the default is 256,000 commands if you're using bash, you might want to set `HISTSIZE` and `HISTFILESIZE` to stor more history in zsh, it's `HISTSIZE` and `SAVEHIST` ### when does your shell save history? by default, bash and zsh only save history to a file when you exit the shell fish saves the history continuously ### where is history stored? bash: `.bash_history` zsh: run `echo $HISTFILE` fish: mine is in `~/.local/share/fish/fish_history` smiling stick figure with curly hair (thinking): “sometimes I copy over my shell history when setting up a new computer!” ### `history` doesn't include everything usually it includes: - the contents of the history file when the shell *started* - the commands you ran in this shell session if I want to use the history from another terminal tab, I'll open a new tab ### a useful history tool: atuin atuin lets you: * save unlimited history * search history more easily * save commands as soon as you run them * sync your history (optionally)
quitting in the terminal
### quitting a terminal program isn't always easy Illustration of a stick figure with short curly hair. They look distressed and have an exclamation mark above their head. person (thinking): "I pressed `Ctrl-C` 17 times and NOTHING HAPPENED" ### ways to quit - `Ctrl-C` - the default - `Ctrl-D` - if you're at a prompt in a `REPL >>>` - `q` - if it's a full screen program - `Ctrl-\` - sometimes works if `Ctrl-C` doesn't - `kill -9` - the last resort ### how `Ctrl-D` works programs that read input will usually have some code like this: ``` text = read_line() if (text == EOF) { exit() } ``` `Ctrl-D` is how you send an EOF to the program ("I'm done!") important: `Ctrl-D` ONLY works if you press it on an empty line ### how `Ctrl-C` works * `*` unless your program is in "raw mode", we'll talk about that later person, smiling: "`ctrl-C`" terminal emulator, represented by a box with a dollar sign: "ok, C is the 3rd letter of the alphabet, I'll write 3 to the tty" OS terminal driver, represented by a box labelled "OS": ah, a 3, that means I should send the `SIGINT` signal to the current program program, represented by a box with a smiley face: ooh, a `SIGINT`, I will [shutdown gracefully, immediately exit, ignore it, stop a subtask, etc] `*` unless your program is in "raw mode", we'll talk about that later ### some programs have weird quitting incantations for example every text editor (vim, nano, emacs, etc) has its own completely unique way to quit
PATH tips
### add a directory to your PATH at the end: `export PATH=$PATH:/my/dir` at the beginning: `export PATH=/my/dir/:$PATH` in fish: `set -e PATH $PATH /my/dir` (illustration of a little fish with a heart-shaped tail) ### you shell's config file bash: `.bashrc or .bash_profile` (exactly which one is a bit of a rabbit hole sadly) zsh: `~/.zshrc` fish: `~/.config/fish/config.fish` (illustration of a little fish with a heart-shaped tail) ### show what your shell is actually going to do when you run the program `type python3` instead of running what's in `PATH`, sometimes it'll run a builtin or alias or cached entry ### show the first match on your PATH for a program `which python3` (but in zsh `which` acts like `type`) ### show ALL matches on your PATH for a program, in order `which -a python3` ### look at your PATH `echo $PATH` ### show each entry on its own line `echo $PATH | tr ':' '\n'` ### clear the PATH cache (bash/zsh) `hash -r` why you might need to do this: bash and zsh cache `PATH` lookups, so sometimes updating your `PATH` doesn't work properly
PATH and finding programs
### PATH is how your shell knows where to find programs Illustration of a smiling stick figure with curly hair, and shell, represented by a box with a smiley face. person: run `python3` PATH is ``` /bin /home/bork/bin /usr/bin ``` shell, thinking: `/bin/python3`? nope, doesn't exist `/home/bork/bin/python3`? nope, doesn't exist `/usr/bin/python3`? there it is!!! I'll run that! ### how to add a program to your PATH 1. find the folder the programs is in 2. update your shell config to add it to your `PATH` 3. restart your shell, for example by opening a new terminal tab ### ...but how do you find the folder * think about how you installed it person (thinking): hmm, I used the Rust installer, where does that install things? * a brute force search `find / -name python3 | grep bin` ### `PATH` ordering drama person (thinking): ugh, no, don't run THAT `python3`, run the other one! You can prioritize a folder by adding it to the beginning of your `PATH` ### gotcha: not everything uses your shell's `PATH` cron jobs usually have a very basic `PATH`, maybe just `/bin` and `/usr/bin` In a cron job I'll use the absolute path, like: `/home/bork/bin/someprogram`
more filename tips (in the shell)
### handle filenames starting with a dash with `--` or `./` `mv -- -file.txt dest` `mv ./-file.txt dest` (otherwise `mv` thinks `-file.txt` is an invalid option) ### match all filenames ending in `.png` `rm *.png` (`*.png` is called a "glob" and it's handled by the shell so you can use it with any program!) ### match `.png` files in any subdirectory `rm **/*.png` (only works in zsh/fish) ### match filenames starting with a dot ls .* (these aren't included in * by default) ### * gotcha: regular expressions if you want to pass a regexp with a * to `grep`: `grep '\s*test' file.txt` you need to quote it otherwise it will be treated as a glob ### you can drag files from your GUI file manager to escape the filename This only works if your terminal emulator supports it. ### GNU ls will quote filenames with spaces in the name ``` $ ls "julia's file.txt" ``` (properly quoted!) you can check if you have this feature by running: `ls --quoting-style=shell`
meet the program
Terminal programs have a lot of hidden rules and conventions (like some programs don't quit when you press `Ctrl+C`!), and knowing them makes your life WAY easier. Here are some of the questions I ask myself when starting a program! - is it a REPL? (`Ctrl+D` probably quits) (I can probably use basic `readline` shortcuts) -> is it using canonical mode? (arrow keys don't work when entering text, try `rlwrap`) - am I actually in another program it started? (A pager like `less` or a text editor like `vim`) - is it noninteractive? (quit with `Ctrl+C`) -> is it stuck because it's waiting for input on stdin and I forgot to pipe to it? - is it a shell? -> is this a minimal shell like `dash`? (if so, can I use `bash` instead?) - is it full screen? (`q` might quit) -> mouse reporting: is it on? -> (selecting text doesn't work, but you might be able to click to navigate) - am I in some kind of special environment? -> (a server? tmux/screen? a container? a virtualenv? what environment variables are set?)
line editing
### editing text you typed in seems so basic: `>>> print("helo")` oops, forgot an l! but there's actually no standard system ### programs need to implement even the most basic things Illustration of a little smiling stick figure with curly hair. person: "left arrow" program, represented by a box with a smiley face: "ok I will move the cursor to the left" often programs will use the readline library for this ### option 1: NOTHING person (angry): "even the ARROW KEYS don't work???" program (blissfully content): arrow keys? what's that? * Only `Ctrl-W` `Ctrl-U` and backspace work * Examples: `cat`, `nc`, `git` * You're probably in this situation if you press the left arrow key and it prints `^[[D` * You can often add readline shortcuts with `rlwrap`, like this: $ rlwrap nc ### option 2: READLINE person (neutral): "it's a little awkward but at least I can use those weird keyboard shortcuts from emacs!" * LOTS of keyboard shortcuts: `Ctrl-A` `Ctrl-E` , arrow keys, many more * You can use `Ctrl-R` to look at history * Examples: `bash`, `irb`, `psql` * If you press `Ctrl-R` and you see "reverse-i-search" , you're probably using readline * Configurable with the `~/.inputrc` config file ### option 3: CUSTOM person (smiling): "wow, I can type a multiline command without it being a total disaster?? amazing!" * The keyboard shortcuts are probably influenced by readline * Examples: `fish`, `zsh`, `ipython` * usually you only see custom implementations in bigger projects
filename tips (in the shell)
### your shell can help you type weird filenames person: "ugh how do I escape that filename again?" shell: "I can handle it! Just use `Tab`!" ### cycle through matching filenames `rm f<Tab><Tab><Tab><Tab>` (doesn't work in bash unless you configure it) ### configure bash to cycle through matching filenames Add this to your `~/.inputrc:` ``` set show-all-if-ambiguous on set menu-complete-display-prefix on TAB: menu-complete ``` ### tab complete from the middle of a filename `ls *thing*<Tab>` (or in fish just `ls thing<Tab>`) ### tab completion can go wrong programs can change how tab completion works with plugins called "completions" this is usually GREAT (`git add <Tab>` only completes modified files!) but sometimes it's buggy ### quote filenames with spaces `cat "Julia Evans.txt"` (if you don't do this you get weird "file not found" errors for `Julia` and `Evans.txt`) ### tab completion works inside quoted strings `cat "File N<Tab`
every core unix program I use
### basic file stuff - `touch` - create file - `mkdir` - create directory - `cp` - copy - `mv` - move - `rm` - delete - `ln` - create symlink - `ls` - list directory ### how big is it - `wc` - word count - `du` - file size in bytes - `df` - filesystem usage ### slice & dice files - `sed` - replace regex - `tr` - replace character - `grep` - search file - `cut` - get column - `awk` - get column (+ more) - `sort` - sort lines - `uniq` - unique lines - `head` - first 10 lines - `tail` - last 10 lines ### filesystems - `mount` - mount a filesystem - `umount` - unmount - `dd` - copy data to a disk ### manage processes - `ps` - list processes - `lsof` - list open files - `kill` - send a signal - `pkill` - fancy `kill` - `top` - who's using CPU? - `uptime` - time since reboot ### permissions - `chown` - change owner - `chmod` - change permissions ### time stuff - `time` - measure runtime - `date` - current time - `sleep` - wait X seconds - `cal` - cute calendar ### useful with pipes - `less` - scroll text - `cat` - print file contents - `tee` - stdin -> file + stdout - `xargs` - run cmd for each line - `find` - find files by name ### compression - `tar` - make/extract tar files - `gzip` - compress with gzip - `gunzip` - decompress - `zip` - compress with zip - `unzip` - decompress ### & more - `which` - find cmd on PATH - `man` - read man page - `crontab` - edit crontab - `md5sum` - calculate md5sum - `diff` - diff files
copy and paste in the terminal
### multiline It's SO scary when you paste a bunch of commands by accident and then it runs them all. fish, zsh, and newer bash versions protect you from this: you have to press `Enter` before running the thing you pasted. This is called "bracketed paste" ### problem: copying with the mouse can go wrong - copying 400 lines of text by dragging is nobody's idea of a good time - sometimes extra whitespace that you didn't want gets added at the end of lines ### panel 3 smiling stick figure with short curly hair: “copying a LOT of text is way easier if you don't use the mouse! Here are 2 tricks for copying without the mouse. " ### copy trick 1: pbcopy macOS comes with two programs that can copy from stdin / paste to stdout, like this: `cat main.go | pbcopy` They're SO useful and on Linux I like to write my own versions of `pbcopy`/`pbpaste` using `xsel` or `xclip` ### pbcopy over SSH you can even implement pbcopy over SSH (yes really!) with this bash one-liner. It uses an escape code called "OSC 52". ``` printf "\033]52;c;%s\007" "$(base64 | tr -d '\n')" ``` ### copy trick 2: syncing the vim clipboard I use vim as a terminal text editor, and I find it's WAY easier if I sync my system clipboard with the vim clipboard like this: `set clipboard=unnamed` `tmux` can also copy to your system clipboard.
canonical mode
### panel 1 We said earlier that every program has to implement text editing (on page 21) This is not 100% true! The TTY driver technically has a very limited text editing system called "canonical mode" that hasn't changed since the 80s ### what using canonical mode feels like stressed-out stick figure with curly hair, surrounded by question marks: “I pressed an arrow key and it just printed out `^[[D`???” terminal driver, represented by a square smiley face: “what's an arrow key?" ### how canonical mode works 1. you type in text `(helloo<Backspace><Enter>)` 2. the TTY driver lets you edit the text until you press `<Enter>` 3. the TTY driver sends the line of text to the program ### canonical mode is incredibly limited The only ways it lets you edit text are: - backspace - `CTRL+W` (delete word) - `CTRL+U` (delete line) The good thing is those 3 things almost always work. ### Interactive programs almost never use canonical mode... bash, represented by a box with a smiley face, thinking: I want my users to be able to use their arrow keys! this isn't the 80s! You can try out canonical mode by running `cat` and typing. ### instead, programs receive bytes as soon as you type them bash, thinking: okay, `[[D`, that means "left arrow", I'll tell the terminal emulator to move the cursor... (usually by using a library like `readline`)
stty
### your TTY driver has configuration you can see how it's configured by running: `stty -a` for example it print out the current window size! ### `Ctrl+S` by default, pressing `Ctrl+S` wi(( freeze your terminal (and `Ctrl+Q` wi(( unfreeze) I have never wanted this in my life, you can turn it off with `stty -ixon` (fish turns it off by default) ###fun fact: changing `Ctrl+C` technically you can use `stty` to set a different keyboard shortcut for `Ctrl+C`, (ike "`u`" `stty intr u` this is extremely chaotic and I can't imagine a reason that I would ever do this though ### programs have to configure the TTY driver to get friendly features developer: I want arrow keys to work in my program! other person: better tell the TTY driver to turn off canonical mode! (more on the next page) ### the TTY driver's settings are called "termios settings" for the gnarly details: `man termios` but if you're writing a terminal program libraries like `readline` or `ncurses` will handle setting up the TTY driver ### panel 6 smiling stick figure with short curly hair: I've only needed to use `stty` once in the last 20 years and I mostly don't understand its output but I think it's a fun view into terminal internals!
meet the TTY driver
### the TTY driver is the most obscure part of the system You almost never need to think about it, but when I've wanted to do something weird (like put a terminal in a web browser) understanding the driver is SO USEFUL ### when you start your terminal emulator, it asks the OS to create a "pseudoterminal pair" which is a pair of two files terminal emulator <-> TTY <-> TTY driver <-> TTY <-> program ### a "TTY" is the program's side of the pair programs use it to: - communicate with the terminal emulator by reading/writing bytes - configure the TTY driver (more on the next page!) Run `tty` fo see the current TTY! ### the TTY driver is why `Ctrl+C` does the same thing relatively consistently program: you press `Ctrl+C`, I send a signal! well, unless the program tells me it wants the raw bytes! ### some things the TTY driver is in charge of (you might think "these are unrelated" and you'd be right) - storing the terminal window's size - sending a `SIGHUP` signal when you close your terminal - a basic mode for entering text called "canonical mode" - pausing the output and confusing you when you press `Ctrl+S` - tracking which process is in the "foreground" and sending what you type there
keyboard shortcuts
### editing text ([almost] always works) - `backspace` - `Ctrl + W`: delete previous word - `Ctrl + U`: delete line (except in text editors) ### quitting - `Ctrl + C`: quit (`SIGINT`) - `Ctrl + Z`: stop process (`SIGTSTP`) (resume with `fg` or `bg` or kill (with `kill`) - `Ctrl + D`: quit (in a REPL) (more on page 20) - `q`: quit (in some full screen programs) - `Enter`: exit frozen SSH session or the nuclear option: ``` $ ps aux | grep THING bork 7213 ... THING $ kill -9 7213 ``` ### editing text (these often work in a readline-like situation) - `arrow keys` - `Ctrl + A` or `Home`: beginning of line - `Ctrl + E` or `End`: end of line - `Ctrl + arrow keys`: left/right a word or sometimes `Alt + arrow keys` or `Option + arrow keys` or `Alt+b / Alt+f` - `Ctrl + K`: delete line forward - `Ctrl + Y`: paste (from `Ctrl+K` or `Ctrl+U`) - `Ctrl + H`: might work if `Backspace` doesn't also many shells have a "vi mode" if that's your jam ### other useful stuff - Ctrl + L: clear screen - Ctrl + R: search history - Ctrl + Q: unfreeze screen (that you froze with `Ctrl+S`, more on page 25) ### copy and paste in your terminal emulator, it's usually: `Ctrl + Shift + C/V` or `Cmd + C/V` ### mouse stuff that might work - `Option + click` - or `Alt + click`: place cursor - `scroll wheel`: scroll
editing text in a REPL
### editing text in a REPL doesn't always work little stick figure, horrified: when I press my ARROW KEYS it just prints out `[[D`??? what? ### this is because every program has fo implement text editing itself terminal program author, represented by a nonplussed looking stick figure with medium length straight hair, thinking: but I just want arrow keys to work?? shouldn't that be automatic? unix, represented by a box with a smiley face: NOPE you gotta do it ### you do gef a few things automatically - backspace (occasionally backspace won't work and you have to use `Ctrl+H` instead) - `Ctrl+W` (delete word) - `Ctrl+U` (delete line) (see page 25 for what "automatically" means) ### REPLs mostly have the same keyboard shortcuts there's a very popular library called "readline", and everyone either uses it or imitates how it works for example `Ctrl+A` ("go to beginning of line") comes from readline ### `rlwrap` adds readline keyboard shortcuts for example on my machine the dash shell doesn't use readline but you can make it better by running: `rlwrap dash` ### built in programs on Mac don't use `readline` (for example `sq1ite3`) this is probably because `readline` is GPL licensed They use `libedit` which is worse. I like to install a sqlite version with `readline` support and use that instead.
TERM
### different terminal emulators use different escape codes terminal emulator 1: if you print out `ESC[2J` I'll clear the screen! terminal emulator 2: for me it's `ESC[HESC[J`! ### your system has a database called "terminfo" with escape codes in it how if plays out when you press `Ctrl+L` to clear the screen: program, with a little heart over it, thinking: ah, she wants fo clear the screen! I'll look up how to do that in the terminfo database... (on my machine, the database is in `/usr/share/terminfo`) program: `ESC[HESC[J` terminal emulator, thinking: ok, clearing the screen! ### how programs know what terminal you're using: `TERM` your terminal emulator sets the `TERM` environment variable when it starts fun fact: terminal emulators often say they're "`xterm-256color`" even if they're not ### this can break when SSHing into an old system with a new terminal emulator (in a VERY annoying way) happy little stick fiture: I am using ghostty program, with a little heart over it: NOPE never heard of it ### some ways to fix `TERM` - install the terminfo file for your terminal emulator on the system - use a different terminal emulator - just set `TERM=xterm-256color`, it'll often sort of work
the mouse
### when you click in the terminal, it can either be handled by your terminal emulator (represented by a box with a winky cursor face and little arms and legs) (good if you want to copy text) or the program (represented by a box with a smiley face) (lots of programs have mouse support!) ### programs can tell the terminal emulator to let them handle the mouse program: if there's a mouse click, send me escape codes to tell me where it was! terminal emulator: okay! I'll disable all my usual mouse functions like "selecting text"! this is called "mouse reporting" ### some programs that have mouse support - tmux: resize a pane! right-click for a menu! - htop: click to sort columns! - micro: text editor with good mouse support - vim: click on the tab bar! - and lots more! (`lazygit`, `mc`, `zellij`, `btop`...) ### how to force the terminal emulator to handle the mouse: unhappy stick figure with short curly hair, thinking: ugh no I don't want to focus that pane, I want to COPY SOME TEXT!!! * could be something else too, it depends on your terminal emulator ### the scroll wheel In some programs (like `less`) the scroll wheel does the same thing as pressing up/down arrow keys really fast terminal emulator: UP UP UP UP UP UP UP UP UP UP UP UP in other programs (like `lazygit`) it uses "mouse reporting" to report where your mouse was when you scrolled ### other mouse features your terminal emulator might have - `Shift+click` (or something) to open a link in a browser - `Alt+click` (or maybe `Option`) to move the cursor when editing a command in your shell
colours in the terminal
### your terminal emulator has 16 configurable colours | | normal | bright | |-----------|-----------|-----------| | black | 0 | 0 | | red | 1 | 1 | | green | 2 | 2 | | yellow | 3 | 3 | | blue | 4 | 4 | | purple | 5 | 5 | | cyan | 6 | 6 | | white | 7 | 7 | ### these are called "ANSI colours" you can configure them in your terminal emulator's settings OR run a script that prints escape codes to magically set up your colours `https://wzrd.page/scripts` (my favourite way!) ### programs can use ANSI colours by printing an escape code `echo -e "\033[34m blue text"` `3` means "normal fg colour" `4` means "blue" ### the default ANSI colours often have bad contrast `ls --color` often displays directories in ANSI "blue" which can look like this: [bar of illegibly dark text against a dark background, which says "can you read this?"] ANSI "yellow" on white also often has bad contrast ### "minimum contrast" Picking ANSI colours which always have good contrast is impossible. the only real solution is to use a terminal emulator which has a "minimum contrast" feature (like iTerm or kitty) which will fix contrast issues ### usually if a program is writing to a pipe, it'll disable colours `$ grep blah file.txt | less` grep, represented by a box with a smiley face: better turn off colours so that I don't accidentally show someone `^[[34ntext here]
meet the terminal emulator
### your terminal emulator has two main jobs 1. turn your actions (typing & clicking) into bytes and send them 2. receive bytes and display them visually Illustration of a terminal emulator, with a winking cursor face, and a program, represented by a box with a smiley face. The program has a heart above it, and there are arrows going back and forth between them labelled "bytes" ### a little bit of history it's called an "emulator" because in the 80s a "terminal" was a separate machine from the computer Illustration of a bulky old monitor, with a keyboard attached with a spiral cord, and a wire running to a panl of buttons and displays, labelld "mainframe". There are arrows going back and forth between them labelled "bytes" We still use the same 80s protocol! ### what are these "bytes"? the bytes are either: - text (like `cat blah.txt`) - escape codes (for example to tell the terminal what colour to display the text in) - control characters example `Ctrl+C` is the byte `3`) ### it's in charge of copy and paste your terminal emulator lets you select text and copy/paste it (usually with `Ctrl+Shift+C` (Linux) or `Cmd+C` (Mac)) (copy & paste tips on page 18!) ### it manages colours and fonts! some terminal emulators come with a big theme library of different colourschemes! if yours doesn't, this site has colourschemes for many terminal emulators: `iterm2colorschemes.com` ### fun fact: how `Ctrl-X` gets translated to bytes ``` Ctrl-A => 1 Ctrl-B => 2 ... Ctrl-Z => 26 ``` `Ctrl` is the only modifier key I trust in the terminal, all of the others can work differently depending on the situation
redirects
### redirect to a file: `cmd > file.txt` terminal emulator into program, program out to file.txt, error out to terminal emulator ### append to a file: `cmd >> file.txt` terminal emulator into program, program out to file.txt (append mode), error out to terminal emulator ### send a file to stdin: `cmd < file.txt` file.txt into program, program out and err to terminal emulator ### redirect stderr to a file: `cmd 2 > file.txt` terminal emulator into program, program out to nowhere, err out to file.txt ### redirect stdout AND stderr: `cmd > file.txt 2>&1` terminal emulator into program, out and err to file.txt ### pipe stdout: `cmd1 | cmd2` terminal emulator into program 1, 1 out to program 2 via pipe, 2 out to command line, program 2 out 1 and 2 to terminal emulator ### pipe stdout AND stderr: `cmd1 2>&1 | cmd2` terminal emulator into program 1, 1 and 2 out to program 2 via pipe, program 2 out 1 and 2 to terminal emulator ### three gotchas 1. `cmd file.txt > file.txt` will delete the contents of `file.txt` some people use `set -o noclobber` (in bash/zsh) to avoid this but I just have "never read from redirect to the same file" seared into my memory. 2. `sudo echo blah > /root/file.txt` doesn't write to `/root/file.txt` as root. Instead, do: `echo blah | sudo tee /root/file.txt` or `sudo sh -c 'echo blah > /root/file.txt'` 3. `cmd 2>&1 > file.txt` doesn't write both stdout and stderr to `file.txt`. Instead, do: `cmd > file.txt 2>&1` ### `cat` vs `<` I almost always prefer to do: `cat file.txt | cmd` instead of `cmd < file.txt` it works fine & it feels better to me using `cat` can be slower if if's a GIANT file though ### `&>` and `&|` some shells support `&>` and `&|` to redirect/pipe both stdout and stderr (also some shells use `|&` instead of `&|`)
job control
### your shell lets you run many programs ("jobs") in the same terminal tab programs can either be: - foreground - background - stopped (which is more like "paused") ### `&` runs a program in the background for example I like to convert 100 files in parallel like this: ``` for i in `seq 1 100` do convert $i.png $i.jpg & done ``` ### `jobs` lists backgrounded & stopped jobs ``` $ jobs [1] Running python blah.py & [2] Stopped vim ``` use the numbers to bring them to the foreground or background (like `fg %2`), kill them (kill `%2`), or disown them ### when you close a terminal tab all jobs are killed with a `SIGHUP` signal you can stop this with `disown` or by starting the program with `nohup`: `disown %1` (job number goes here) `nohup my_program &` ### a trick to kill programs if `Ctrl+C` doesn't work 1. press `Ctrl+Z` to stop the program 2. run `kill %1` to kill it (or `kill -9 %1` if you're feeling extra murderous) ### a little flowchart Three boxes, labelled "running in foreground", "stopped", and "running in background" `Ctrl+Z` goes from "running in foreground" to "stopped" `fg` goes from "stopped" to "running in foreground" `fg` goes from "running in background" to "running in foreground" `bg` goes from "stopped" to "running in background"
meet the shell
### the shell starts programs when you run a program in the terminal, you're actually asking your shell to start it for you it turns out that starting programs is a surprisingly complicated job! ### the 3 most popular shells there are LOTS of shells but 95% of people use - bash (default on Linux) - zsh (default on Mac (in 2025)) - fish (aims to be more user friendly) ### fish: the friendly interactive shell ASCII illustration of a fish I love how fish has friendly defaults that I can use without configuring it this is (mostly) not a fish propaganda zine though Little illustration of a smiling stick figure holding up a sign that says "fish 4eva", labelled "me" ### bash and zsh are both "POSIX shells" this means they follow a standard for how Unix shells should behave, but there are differences I'll mention when something varies between shells! ### where to find your shell's config file bash: `~/.bashrc` or `~/.bash_profile` (which one is a rabbit hole, huge flow chart at `wzrd.page/bashrc`) zsh: `~/.zshrc` fish: `~/.config/fish/config.fish` ### `.bashrc` vs `.bash_profile` here's an trick to figure out whether bash is using `.bashrc` or `.bash_ profile` (or both!) Add: ``` echo "this is .bashrc" echo "this is . bash _ profile" ``` to each file, open a new terminal tab, and see what it prints out!
terminal redirects
### redirect to a file: `cmd > file.txt` terminal emulator into program, program out to file.txt, error out to terminal emulator ### append to a file: `cmd >> file.txt` terminal emulator into program, program out to file.txt (append mode), error out to terminal emulator ### send a file to stdin: `cmd < file.txt` file.txt into program, program out and err to terminal emulator ### redirect stderr to a file: `cmd 2 > file.txt` terminal emulator into program, program out to nowhere, err out to file.txt ### redirect stdout AND stderr: `cmd > file.txt 2>&1` terminal emulator into program, out and err to file.txt ### pipe stdout: `cmd1 | cmd2` terminal emulator into program 1, 1 out to program 2 via pipe, 2 out to command line, program 2 out 1 and 2 to terminal emulator ### pipe stdout AND stderr: `cmd1 2>&1 | cmd2` terminal emulator into program 1, 1 and 2 out to program 2 via pipe, program 2 out 1 and 2 to terminal emulator ### three gotchas 1. `cmd file.txt > file.txt` will delete the contents of `file.txt` some people use `set -o noclobber` (in bash/zsh) to avoid this but I just have "never read from redirect to the same file" seared into my memory. 2. `sudo echo blah > /root/file.txt` doesn't write to `/root/file.txt` as root. Instead, do: `echo blah | sudo tee /root/file.txt` or `sudo sh -c 'echo blah > /root/file.txt'` 3. `cmd 2>&1 > file.txt` doesn't write both stdout and stderr to `file.txt`. Instead, do: `cmd > file.txt 2>&1` ### `cat` vs `<` I almost always prefer to do: `cat file.txt | cmd` instead of `cmd < file.txt` it works fine & it feels better to me using `cat` can be slower if if's a GIANT file though ### `&>` and `&|` some shells support `&>` and `&|` to redirect/pipe both stdout and stderr (also some shells use `|&` instead of `&|`)
less
### many programs use `less` without telling you `less` lets you through text, so programs will use `less` by default any time they want to display a lot of text `git`, represented by a box with a smiley face: I want to display a huge diff... I'll show it in `less`! `man`, also represented by a box with a smiley face: I need to display a man page... I'll use `less`! it's `less` because it's an improved version of `more` ### how to know you're in `less` if it's suddenly full screen and there's this little colon in the bottom left, it might be `less` ### a few `less` tips - quit: `q` - help: h - scroll: arrow keys/spacebar/mouse wheel - search: `/banana ENTER` - next/prev match: `n/N` - go to start/end: `g/G` also piping to `less -R` will interpret escape codes like colours ### how to tell a program not to use `less` you can set the `PAGER` environment variable to something else to programs to use that instead I've never had any reason to set `PAGER` though ### programs will also drop you into `vim` sometimes the default text editor is `vim`. If you don't like `vim` you can set the EDITOR environment variable `export EDITOR=micro` (your favourite editor here)
inside the commit
### you can see for yourself how git is storing your files! You just need one command: `git cat-file -p` First, get a commit ID. You can get one from `git log` ### 1. read the commit ``` git cat-file -p 3530a4 tree 22b920 parent 56cfdc author Julia 1697682215 -0500 committer Julia 1697682215 -0500 commit message goes here ``` `22b920` is the directory ID I just use `git cat-file` for fun and learning, never to get things done ### 2. read the directory ``` $ git cat-file -p 22b920 100644 blob 4fffb2 .gitignore 100644 blob e351d9 404.html 100644 blob cab416 Cargo.toml 100644 blob fe442d hello.html 040000 tree 9de29f src ``` (`fe442d` is a file ID) (IDs are actually 40 characters) ### 3. read a file ``` $ git cat-file -p fe442d <!DOCTYPE html> <html lang="en" <body> <h1>Hello!</h1> </body> </html></p> ``` ### 4. and we're done! `fe442d` is the sha1 hash of the contents of the file. It's called a "blob id". Commit and tree IDs re hashes too. Using a hash to identify each file is how git avoids duplication: if the file's contents don't change, the hash won't change, so git doesn't need to store a new version!
lost commits
### commits in git are usually saved forever But even if git still has your commits, they're not always easy to find. Some ways commits get "lost": - `git commit --amend` - `git rebase` - deleting an unmerged branch - `git stash drop` ### the three levels of losing commits - annoying: the commit isn't in the history of any branch/tag, but it's relatively easy to find - nightmare: you need to search every single commit to find it - disaster: it's been deleted ### how commits can get lost: git commit --amend before: Diagram of two boxes side by side, labelled "main branch". The one on the left is labelled "`parent`". The one on the right is labelled "`fix color buug`" (typo!). after: The same diagram as above, but the initial two boxes are now labelled "Now it's "lost"!". Also branching off of "`parent`" is a third box, labelled "`fix color bug`". That branch is now labelled "main branch". ### how commits can get lost: git rebase before: Two boxes side-by-side, connected by a line. These are labelled "`main branch`". Also branching off of the leftmost box are two further boxes, one labelled with a heart, and one with a star. These are labelled "`feature branch`". after: An initial box with two lines of boxes coming off of it. The topmost line of boxes is a blank box, followed by a heart, then a star. The blank box is labelled "`main branch`". The heart and star boxes are labelled "`feature branch`". The lower line of boxes have a heart and a star and are highlighted in red and labelled "now these two are "lost!". ### how commits can get lost: git stash drop before: Three boxes in a horizontal row. The left two boxes are blank. The middle box is labelled "`main branch`". The rightmost box has a star, and is labelled "`stashed commit`". after: The same diagram as above, but now the rightmost box is labelled "now it's "lost"!". stash is the only way I've seen the "nightmare" situation happen. ### you can find lost commits I find it very comforting to know that git keeps my lost commits around. How to find them: - annoying: use the reflog (page 26) - nightmare: use `git fsck` - disaster: impossible (but this has never happened to me)
meet the remote
### any repository you're pushing to / pulling from is called a "remote" remotes can be: - hosted by GitHub/GitLab/etc. - on your own server - just a folder on your computer ### git push syntax (same for git pull) `git push origin main` "`origin`" is the remote name, "`main`" is the remote branch. the default name for a remote is origin but you can name it anything ### tip! I like to configure `push.autoSetupRemote true` to automatically set up tracking the first time I push a new branch ### remotes are where the drama happens Smiling stick figure with short curly hair: I spent 3 hours working on `cats.py` person: git pull git, represented by a box with a smiley face: fun fact! your coworker totally rewrote that file! ### example: I use 2 remotes when contributing to open source projects Diagram of a box labelled "local repo". Local repo has an arrow labelled "push to here", pointing to a box labelled "My personal GitHub fork". That box has an arrow labelled "pull request", pointing to a box labelled "main project repo name: "origin"". That box has an arrow labelled "pull from here", pointing back to the "local repo" box. ### remotes are configured in `.git/config` every remote has a name and URL ``` [remote "origin"] url = git@github.com:jvns/myrepo branch ["main"] remote = origin merge = refs/heads/main ``` "`origin`" is the name, "`git@github.com:jvns/myrepo`" is the URL. this sets up "tracking" between local main remote main on origin so that git knows what to push to when you run git push or git pull ### protocols Git has 3 main protocols for remotes. The protocol is embedded in the URL. - HTTP (I use this if I only want to pull) `https://github.com/jvns/myrepo` - SSH (I use this if I need to push) `git@github.com:jvns/myrepo` - local `file:///home/bork/myrepo`
the reflog
### a reflog is a log of commit IDs I use the reflog to find "lost" commits: it contains every commit ID that the branch/tag/HEAD has ever pointed to. ### some differences between `git log main` and `git reflog main` - reflog entries older than 90 days might get deleted by `git gc` - the reflog can show you where your branch was before a rebase. `git log` can't - the reflog isn't shared between repositories. `git log` is. - if I'm looking at the reflog, I'm having a bad day ### which reflog to use? The main two I use are: - `git reflog` - every single commit you've ever had checked out - has everything but very noisy - it's the reflog for `HEAD ` - `git reflog BRANCH` - just the history for that branch, might be less noisy ### how to use the reflog 1. run git reflog 2. sadly stare at output until you find a log message that looks right 3. look at the commit ``` git show $COMMIT_ID git log $COMMIT_ID ``` 4. repeat until you find the thing 5. use something like `git reset --hard $COMMIT_ID` or `git branch $NAME $COMMIT_ID` to put the commit on a branch ### the reflog kind of sucks - (sad face) if you delete a branch, git deletes its reflog - (sad face) if you drop a stash entry, you can't use the reflog to get it back - (sad face) reflog entries don't correspond exactly to git commands you ran But it's the best we have. ### `git fsck`: the last resort If a commit isn't in the reflog (for example if you "lost" it with `git stash drop`), there's still hope! You can use `git fsck` to list every commit ID that's unreferenced. I've never done this though: I try to avoid getting into this situation.
reset
### git has no undo there's no - unadd - uncommit - unmerge - unrebase instead, git has a single dangerous command for undoing: `git reset` ### most git commands move the current branch forwards - `git commit` Illustration of three boxes in a row, connected by lines. There is an arrow pointing from the second box to the third box. - `git merge` Illustration of two boxes in a row, connected by lines. From the second box, two lines diverge to two other boxes, and from those two, lines converge back into a final box. There is an arrow pointing from one of the diverged boxes into the final merged box. - `git pull` Illustration of five boxes in a row, connected by lines. There is an arrow pointing from the second box to the fifth box. (though rebase is a sideways move) ### git reset can move the current branch anywhere - backwards! - forwards! - "sideways"! Illustration of five boxes, connected with lines into two branches, with arrows pointing in all directions amongst them. this makes it possible to undo, but you can also really mess up your branch ### how git reset works `git reset HEAD^` 1. finds the commit ID corresponding to HEAD^ (for example a2b3c4) 2. forces your current branch to point to a2b3c4 3. unstages all changes ### `--hard`: the danger option `git reset $COMMIT_ID` Keeps all the files in your working directory exactly the same. `git reset --hard $COMMIT_ID` Throws away all your uncommitted changes. Useful but dangerous. ### problems `reset` can cause - (sad face) it's easy to "lose" commits, especially if you move a branch backwards - (sad face) if you use `--hard`, you can permanently lose your uncommitted changes
the staging area
### git has a 2-stage commit process 1. tell git what you want to stage (`git add`, `git rm`, `git mv`, etc.) 2. make the commit with git commit Diagram showing two boxes, labelled "untracked files" and "unstaged changes". They converge into a box labelled "stage" via `git add`. They then flow into a box labelled "committed", which has a heart and smiley face beside it, via `git commit`. ### git uses 3 terms interchangeably for the staging area 1. staged (like `--staged`) 2. cache (like `--cached`) 3. index (like `--keep-index`) it's total chaos but they're all the same thing tiny illustration of a sad stick figure with curly hair: why ### tip: you can use `git add -p` to commit only certain parts of a file person: I only want to commit my actual changes, not all the random debugging code I put in ### gotcha: `git diff` only shows unstaged changes You can use: - `git diff HEAD` to see ALL changes you haven't committed yet - `git diff --cached` to see staged changes ### gotcha: `git commit -a` doesn't automatically add new files person: I CONSTANTLY forget to add new files and then get confused about why they didn't get committed
the diff algorithm
### git is CONSTANTLY showing you diffs smiling stick figure with short curly hair: `git show COMMIT_ID` git, represented by a box with a smiley face: here's the diff! and it makes it seem like git thinks in terms of diffs ### have you ever noticed your git diffs don't make sense? git: `deleted...` `added...` person: but I didn't DELETE that file, I MOVED it ### in git, moving a file is the same as deleting the old one and adding the new one ``` git mv old.py new.py ``` is the same as ``` cp old.py new.py git rm old.py git add new.py ``` ### git is just guessing about your intentions person: ``` git mv old.py new.py git commit ``` git: well the OLD version has `old.py` and the NEW version has `new.py` and they have the same contents... so I guess you moved it ### diff is an algorithm the algorithm: - takes 2 versions of the code - compares them - tries to summarize it in a human readable way (but it doesn't always do a great job) ### git has many diff algorithms person: I've been trying out `histogram` because I don't like how the default algorithm displays the diff when I rearrange code how to try it out: `git diff --histogram`
diverged remote branches
### when pushing/pulling, the hardest problems are caused by diverged branches ``` ! [rejected] main -> main (non fast-forward ``` `fatal: Not possible to fastforward, aborting` `fatal: Need to specify how to reconcile divergent branches.` (each of these three messages is in a spiky bubble, and they are all surrounded by numerous sad faces.) ### what are diverged branches? both sides have commits that the other doesn't, like this: An illustration of two boxes in a row, connected by a line. The first one has a star, the second has a heart. Branching out from the heart are a box with a hash symbol, labelled "`local main`", and a box with a squiggle, labelled "`remote main`". I like to fix my diverged branches before making more commits. ### there are 4 possibilities with a remote branch 1. up to date Illustration of three boxes in a row, connected by lines. The final box is labelled both "local" and "remote". 2. need to pull Illustration of four boxes in a row, connected by lines. The second box is labelled "local" and the fourth one is labelled "remote". 3. need to push Illustration of four boxes in a row, connected by lines. The second box is labelled "remote" and the fourth one is labelled "local". 4. DIVERGED (need to decide how to solve it) Illustration of two boxes in a row, connected by lines. Diverging from the second box are two branches. One has one box in it and is labelled "remote". The other one has two boxes and is labelled "local". Illustration of a smiling stick figure with short curly hair. person: when I have a diverged branch, I usually just run `git pull --rebase` and move on. On the next page we'll talk about some other options though! ### how to tell if your branches have diverged: git status `$ git fetch` (get the latest remote state first) `$ git status` Your branch and '`origin/main`' have diverged, and have 1 and 1 different commits each, respectively. (use "`git pull`" to merge the remote branch into yours) ### git fetch and git pull `git fetch` just fetches the latest commits from the remote branch. `git pull origin main` has 2 parts: - `run git fetch origin main` - `run git merge origin/main` (or sometimes rebase) (More about how to tell `git pull` to merge/rebase on page 16!)
unix permissions
### There are 3 things you can do to a file **r**ead **w**rite e**x**ecute ### `ls -l file.txt` shows you permissions. Here's how to intepret the output: `rw- rw- r-- bork staff` the first `rw-` means bork (user) can read & write the second `rw-` means staff (group) can read & write `r--` means ANYONE can read ### File permissions are 12 bits The first digit is `setuid`, the second digit is `setgid`, the third digit is `sticky` `110` (user) `110` (group) `100` (all) For files: r = can read w = can write x = can execute For directories, it's approximately: r = can list files w = can create files x = can cd into & access files ### 110 in binary is 6 So `rw-` = 110 = 6 `r--` = 100 = 4 `r-- ` = 100 = 4 `chmod 644 file.txt` means change the permissions to `rw- r-- r--`: simple! ### setuid affects executables `$ls -l /bin/ping` `rws r-x r-x root root` (the s means ping always runs as oot) `setgid` does 3 different unrelated things for executables, directories, and regular files. person: unix why! unix, cheerfully: it's a long story
disk usage
### `du` tells you how much disk space files/directories take up `-s`: summary: total size of all files in a directory `-h`: human readable sizes ### `df` tells you how much free space each partition has. `-h` for human-readable sizes | `Filesystem` | `Size` | `Used` | `Avail` | `Use%` | `Mounted on` | |---------------|---------|---------|----------|---------|---------------| | `/dev/sda3` | `18G` | `G` | `2.5G` | `86%` | `/` | | `udev` | `483M` | `4.0K` | `483M` | `1%` | `/dev` | | `tmpf`s | `99M` | `1.4M` | `97M` | `2%` | `/run` | | `/dev/sda4` | `167G` | `157G` | `9.9G` | `95%` | `/home` | ### `df -i` instead of % disk fee, report how many inodes are used/fee on each partition happy little stick person: running out of inodes is VERY ANNOYING. You can't create new files! ### `ncdu` see what's using disk space in an interactive way | 17.5 GiB | [#####] | /music | |---------- |--------- |---------- | | 3.2 GiB | [## ] | /photos | | 5.7 MiB | [ ] | /code | | 2.0 MiB | [ ] | file.pdf | ### `iostat` get statistics about disk reads/writes `# iostat 5` (interval to report at)` | Device: | kB_read/s | kB_wrtn/s | |--------- |----------- |----------- | | sda | 2190.21 | 652.87 | | sdb | 6.00 | 0.00 |
ps
### `ps` `ps` shows which processes are running I usually run `ps` like this: `$ ps aux` u means include username column a+x together show all processes (`ps -ef` works too) ### `w` is for wide. `ps auxwww` will show all the command like args for each process ### `e` is for environment. `ps auxe` will show the environment vars! ### wchan you can choose which columns to show with `ps` (`ps -eo...`) One cool column is '`wchan`', which tells you the name of the kernel function if the process is sleeping. try it: `$ ps -eo user,pid,wchan,cmd` ### process state Here's what the letters in `ps`'s STATE column mean: - `R`: running - `S/D`: asleep - `Z`: zombie - `l`: multithreaded - `+`: in the foreground ### `f` is for "forest" :) `ps auxf` will show you an ASCII art process tree! `pstree` can display a process tree, too. ### `ps` has 3 different sets of command line arguments (broken heart) 1. UNIX (1 dash) 2. BSD (no dash) 3. GNU (2 dashes) you can write monstrosities like: `$ ps f -f` `f` is "forest" (BSD) `-f` is "full format" (UNIX)
tc
### `tc` is for "traffic control" humanoid traffic light, hand raised as if directing traffic: packets! stop/slow down/go the other way! ### make your internet slow ``` $ sudo tc qdisc add dev wlp3s0 root netem delay 500ms ``` (delay packets by 500 ms) and fast again ``` $ sudo tc qdisc del dev wlp3s0 root netem ``` ### `netem` rules `netem` ("network emulator") is a part of `tc` that lets you: - drop - duplicate - delay - corrupt packets. See the man page: `$ man netem` ### make your brother's internet slow Have a Linux router? You can configure `tc` on it to make your brother's internet slower than yours. google: "tc QoS" for a start. ### show current `tc` settings ``` $ tc qdisc show $ tc class show dev DEV $ tc filter show dev DEV ``` ### panel 6 smiling stick figure with short curly hair: `tc` can do 10 million more things! this is just the beginning!
ip
### `ip` (Linux only) lets you view + change network configuration `$ ip OBJECT COMMAND` (`OBJECT`: addr, link, neigh, etc. `COMMAND`: add, show, delete, etc.) ### `ip addr list` shows ip addresses of your devices. Look for something like this: ``` 2: eth0: link/ether 3c:97... inet 192.168.170/24 ``` ### `ip route list` displays the route table. ``` default via 192.168.1.1 169.240.0.0/16 dev docker0 ``` (`192.168.1.1` is my router) to see all route tables: `$ ip route list table all` ### change your MAC address good for cafes with time limits (little devil face) ``` $ ip link set wlan0 down $ ip link set eth0 address 3c:a9:f4:d1:00:32 $ ip link set wlan0 up $ service network-manager restart ``` (or whatever you use) ### `ip link` network devices! (like `eth0`) ### `ip neigh` view/edit the ARP table ### `ip xfrm` is for IPsec ### `ip route get IP` what route will packets with `$IP` take? ### `--color` pretty colourful output! ### `--brief` show a summary
linux system calls
### The Linux kernel has code to do a lot of things - read from a hard drive - make network connections - create new processes - kill process - change file permissions - keyboard drivers ### Your program doesn't know how to do those things program, blithely: TCP? dude I have no idea how that works. program: NO, I do not know how the ext4 filesystem is implemented. I just want to read some files! ### Programs ask Linux to do work using system calls program: please write to this file (switch to running kernel code) Linux: done! I wrote 1097 bytes! (program resumes) ### Every program uses system calls Python program: I use the 'open' syscall to open files Java program: me too! C program: me three! ### And every system call has a number (e.g. chmod is 390 on x86.64) so what's actually going on when you change a file's permissions is: program: run syscall #90 with these arguments Linux: ok! ### You can see which system calls a program is using with strace `$ strace ls /tmp` will show you evey system call 'ls' uses! it's really fun! warning: strace has high overhead so don't run it on your production database
how diffie hellman key exchange works
### diffie hellman key exchange is a system for establishing a secret key in the open. Illustration of two stick figures sending messages to each other person 1: ... person 2: ... everyone can read these messages, but nobody knows their secret key! ### diffie hellman key exchange requires a ~~~ magic function ~~~ f(s, a) = s⊙a. You put 2 numbers in to f and get a result (we'll call it s⊙a). There are two rules this function has to follow: 1. It's commutative: (s⊙a)⊙b is always the same as (s⊙b)⊙a 2. It's hard to undo: if you know s⊙a and s, you can't easily "divide" to figure out what "a" was ### Finding a magic function that works this way requires a lot of math... two examples: 1. elliptic curve multiplication (where s⊙x means "add the point s to itself x times") 2. modular arithmetic (where s⊙x = s^x mod q). But you don't need to understand the math to get the basic idea. ### how diffie hellman works 1. Choose s (some cryptographers choose this and tell everyone "hey this is what we're using guys") 2. Each person picks a random number. Left person picks a, right person picks b. 3. "Multiply" s by the number and send it. left person sends: s⊙a right person sends: s⊙b Nobody can figure out a and b because of Rule 2! 4. "Multiply" the number the other person sent. left person calculates: (sb)⊙a right person calculates: (s⊙a)⊙b These two numbers are the same because of Rule 1! 5. We're done! (s⊙b)⊙a is the secret key!