bash tricks

### panel 1: ctrl + r search your history! I use this **constantly** to rerun commands ### panel 2: magical braces ```$ convert file.{jpg,png}``` expands to ```$ convert file.jpg file.png``` `{1..5}` expands to 1 2 3 4 5 (for i in {1..100}...) ### panel 3: !! expands to the last command run `$ sudo !!` ### panel 4: commands that start with a space don't go in your history. good if there's a password ### panel 5: loops ``` for i in *.png do convert $i $i.jpg done ``` person: for loops: easy & useful! ### panel 6: $ ( ) gives the output of a command ``` $touch file- $ (date -1)``` create a file named file-2018-05-25 ### panel 7: more keyboard shortcuts ctrl + a beginning of line ctrl + e end of line ctrl + l clear the screen & lots more emacs shortcuts too!

/proc

### panel 1: Every process on Linux has a PID (process ID) like 42. In /proc/42, there's a lot of VERY USEFUL information about process 42. ### panel 2: /proc/PID/cmdline command line arguments the process was started with. ### panel 3: /proc/PID/environ all of the process's environment variables ### panel 4: /proc/PID lexe Symlink to the process's binary magic: works even if the binary has been deleted! ### panel 5: /proc/PID/status Is the program running or asleep? How much memory is it using? And much more! ### panel 6: /proc/PID/fd Directory with every file the process has open! Run ```$1s-1 /proc/42/fd``` to see the list of files for process 42. These symlinks are also magic & you can use them to recover deleted files ### panel 7: /proc/PID/stack The kernel's current stack for the process. Useful if it's stuck in a system call. ### panel 8: /proc/PID/maps List of process's memory maps. Shared libraries, heap, anonymous maps, etc. ### panel 9: and more Look at ```man proc``` for more information!

how I got better at debugging

### Remember: the bug is happening for a logical reason. It's never magic. Really. Even when it makes no sense. ### Be confident I can fix it before: maybe this is too hard now: well I've fixed a lot of hard bugs before ### Talk to my coworkers person 1: ? person 2: ! ### know my debugging toolkit before: I want to know $THING but I don't know how to find out now: I KNOW! I'll use tcpdump! ### most importantly: I learned to like it before: oh no! a bug! now: I think I'm about to learn something (facial expression: determination)

grep

### panel 1: grep lets you files for text search ```$ grep bananas foo.txt``` Here are some of my favourite grep command line arguments! ### panel 2: -E Use if you want regexps like ".+" to work. otherwise you need to use ".\+" ### panel 3: -v invert match find : all lines that don't match ### panel 4: -r recursive! Search all the files in a directory. ### panel 5: -o only print the matching part of the line (not the whole line) ### panel 6: -i case insensitive ### panel 7: -A -B -C Show **c**ontext for your search ```$grep -A 3 foo``` will show 3 lines of context **a**fter a match ### panel 8: -l only show the **filenames** of the files that matched ### panel 9: -F aka fgrep don't treat the match string as a regex eg ```$ grep -F...``` ### panel 10: -a search binaries: treat binary data like it's text instead of ignoring it! ### panel 11: grep alternatives ack ag ripgrep (better for searching code!)

permissions

### panel 1: There are 3 things you can do to a file. **r**ead **w**rite e**x**ecute ### panel 2: Is -1 file.txt shows you permissions. Here's how to interpret the output: rw- **bork** (user) can read & write rw- **staff** (group) can read & write r-- **ANYONE** can read ### panel 3: File permissions are 12 bits First digit: setuid Second digit: setgid Third digit: sticky User 110 rwx Group 110 rwx all 100 rwx For files: r = can read W = can write X = can execute For directories, it's approximately: r = can list files W = can create files x = can cd into & access files ### panel 4: 110 in binary is 6 so rw- = 110 = 6 r-- = 100 = 4 r-- = 100 = 4 ```chmod 644 file.txt|``` means change the permissions to: rw- r-- r-- Simple! ### panel 5: setuid affects executables ```$1s-1/bin/ping``` rw**s** r-x r-x root root the s means ping always runs as root ```setgid``` does 3 different unrelated things for executables, directories, and regular files. person: unix! why?? unix: it's a long story

how to be a wizard programmer

more bash tricks

ssh

### ssh keys An ssh key is a secret key that lets you SSH to a machine person: hello! ssh: That's on my list of authorized keys! come in! ### ssh-copy-id This script installs your SSH key on a host (over SSH) `$ ssh-copy-id user@host` (puts it in .ssh/authorized-keys etc) installing a SSH key is surprisingly finicky so this script is helpful! ### port forwarding ``` ssh user@host.com - Nfl 3333:localhost:8888 ``` 3333 = local port 8888 = remote port Lets you view a remote server that's not on the internet in your browser. ### just run 1 command `$ ssh user@host uname -a` runs the command `uname -a` & exits. ### ssh-agent remembers your SSH key passphrase so you don't have to keep typing it ### ~. <Enter> ~. closes the SSH , connection. Useful if it's hanging! ### mosh ssh alternative: keeps the connection open if you disconnect + reconnect later ### .ssh/config Lets you set, per host: - Username to use. - SSH key to use - an alias! so you can type `$ ssh ALIAS` instead of `ssh user@very longdomain.com`

what to talk about in 1:1s

Each of these items is enclosed in a little thought bubble, with an image of a stick figure with short curly hair. The person is smiling in every illustration, except "what's not going well". ### what's been going well I LOVE this project! ### what's not going well I got paged 10 times last week ### team priorities how does my work fit in with company goals? ### career planning I'd like to be promoted this year ### ask for opportunities I want to work on a customer-facing project ### ask for feedback do you have any concerns about how PROJECT is going? ### brainstorm let's think about this problem! ### give feedback the team seems really unfocused recently ### ask for resources I think this training would really help me

bash brackets cheat sheet

### shell scripts have a lot of brackets here's a cheat sheet to help you identify them all! we'll cover the details later. ### (cd ~/music; pwd) `(...)` runs commands in a subshell. ### VAR=$(cat file.txt) `$(COMMAND)` is equal to `COMMAND`'S stdout ### { cd ~/music; pwd; } `(...)` groups commands. runs in the same process. ### x=(1 2 3) `x=(...)` creates an array ### x=$((2+2)) `$(())` does arithmetic ### if [...] `/usr/bin/[` is a program that evaluates statements ### <(COMMAND) "process substitution": an alternative to pipes ### a{.png, .svg} this expands to `a.png a.svg` it's called "brace expansion" ### if [[ ... ]] `[[` is bash syntax. it's more powerful than `[` ### ${var//search/replace} see page 21 for more about `${...}`!

the box model

### every HTML element is in a box ``` <div class="1"> <div class="2" /> <div class="3" /> </div> ``` Illustration of a larger box, labelled 1. Nested inside it are two boxes. The one on top is labelled 2, and the one below 2 is labelled 3. ### boxes have padding, borders, and a margin Illustration of a series of nested boxes. The middle box is empty. The area around the middle box is labelled "padding". The area around the padding is labelled "border". The area around the border is labelled "margin". ### width & height don't include any of those The same illustration from the previous panel, but with two lines measuring the width and height of only the middle box, not the padding, border, or margin. ### margins are allowed to overlap sometimes Illustration of two sets of nested boxes, similar to the diagrams above. One is on top of the other, and the area between the sets of boxes is shaded in green, showing that the bottom margin of the first set of boxes, and the top margin of the second set of boxes, overlap. the browser combines these top/bottom margins. look up "margin collapse" to learn more ### `box-sizing: border-box;` includes border + padding in the width/height Illustration of a series of nested boxes with a middle box surrounded by padding, border, and margin. In this version, the lines measuring width and height extend all the way to the edge of the border (but don't include the margin surrounding the border.) ### inline elements ignore other inline elements' vertical padding Illustration of two dotted line boxes stacked directly on top of one another. Each has the word "`span`" inside it. you can set vertical padding but the other span won't move

memory allocation

### your program has memory 10MB: program binary 3MB: stack 587 MB: heap the heap is what your allocator manages ### Your memory allocator (malloc) is responsible for 2 things. THING 1: keep track of what memory is used/free. ### THING 2: Ask the OS for more memory! malloc: oh no! I'm being asked for 40 MB and I don't have it. malloc: can I have 60 MB more? OS: here you go! ### your memory allocator's interface - malloc(size_t size): allocate size bytes of memory & return a pointer to it. - free (void* pointer): mark the memory as unused (and maybe give back to the OS) - realloc(void pointer, size_t size): ask for more/less memory for pointer. - Calloc (size-t members, size_t size): allocate array + initialize to 0. ### malloc tries to fill in for space memory when you ask your code: can I have 512 bytes of memory? malloc: YES! ### malloc isn't magic! it's just a function! you can always: - use a different malloc library like jemalloc or tcmalloc (easy!) - implement your own malloc (harder)

misc commands

SELECT queries start with FROM

Conceptually, every step (like "`WHERE`") of a query transforms its input, like this: cats owner: 1 name: daisy owner: 1 name: dragonsnap owner: 3 name: buttercup owner: 4 name: rose `WHERE owner = 1` owner: 1 name: daisy owner: 1 name: dragonsnap The query's steps don't happen in the order they're written: how the query' is written SELECT... FROM + JOIN WHERE ... GROUP BY ... HAVING ... ORDER BY... LIMIT... how you should think about it: FROM + JOIN ↓ WHERE ↓ GROUP BY ↓ HAVING ↓ SELECT ↓ ORDER BY ↓ LIMIT (In reality query execution is much more complicated than this. There are a lot of optimizations.)

the most important HTTP request headers

curl

bash errors

### by default, bash will continue after errors bash, represented by a box with a smiley face: oh, was that an error? who cares, let's keep running!!! programmer, represented by a nonplussed stick figure with short curly hair: uh that is NOT what I wanted ### `set -e` stops the script on errors ``` set -e unzip fle.zip ``` (typo! script stops here!) programmer, smiling: this makes your scripts WAY more predictable ### by default, unset variables don't error `rm -r "$HOME/$SOMEPTH"` bash, happily: `$SOMEPTH` doesn't exist? no problem, i'll just use an empty string! programmer: OH NOOOO that means `rm -rf $HOME` ### `set -u` stops the script on unset variables ``` set-u rm -r "$HOME/$SOMEPTH" ``` bash, concerned: I've never heard of `$SOMEPTH`! STOP EVERYTHING!!! ### by default, a command failing doesn't fail the whole pipeline `curl yxqzq.ca | grep 'panda'` bash, pleased with itself: `curl` failed but `grep` succeeded so it's fine! success! ### `set -o pipefail` makes the pipe fail if any command fails you can combine `set -e`, `set -u`, and `set -o pipefail` into one command I put at the top of all my scripts: `set -euo pipefail`

xargs

### xargs takes white space separated strings from stdin and converts them into command-line arguments ``` $ echo "/home /tmp" | xargs ls ``` will run `ls /home/tmp` ### this is useful when you want to run the same command on a list of files! - delete (`xargs rm`) - combine (`xargs cat`) - search (`xargs grep`) - replace (`xargs sed`) ### how to replace "foo" with "bar" in all .txt files: ``` find. -name '*.txt' | xargs sed -i s/foo/bar/g ``` ### how to lint every Python file in your Git repo: ``` git ls-files | grep pyl xargs pep8 ``` ### if there are spaces in your filenames "my day.txt" xargs will think it's 2 files ""my" and "day.txt" fix it like this: ``` find -print0 | xargs -0 COMMAND ``` ### more useful xargs options `-n 1` (max-args): makes xargs run a separate process max-args for each input `-P` (capital P, max-procs): is the max number of parallel processes xargs will start

awk

### panel 1: awk is a tiny programming language for manipulating columns of data person: I only know how to do 2 things with awk but it's still useful! ### panel 2: basic awk program structure ``` BEGIN {...} CONDITION (action} CONDITION (action} ``` (do action on lines matching CONDITION) ``` END {...} ``` ### panel 3: extract a column.of text with awk ```awk -F, '{print $5}'``` the comma is the column separator the ' is a single quote! ```{print $5}``` means print the 5th column person: this is 99% of what I do with awk ### panel 4: SO MANY Unix commands print columns of text (ps! Is!) so being able to get the column you want with awk is GREAT ### panel 5: awk program example sum the numbers in the 3rd column ``` {s += $3};``` (action) ``` END {print s}'``` (at the end, print the sum!) ### panel 6: awk program example print every line over 80 characters ```length($0) > 80``` "length" is the condition (there's an implicit ```{print}``` as the action)

CORS

cross-origin resource sharing Cross-origin requests are not allowed by default: (because of the same origin policy!) Javascript from clothes.com: POST request to api.clothes.com? Firefox (thought bubble): same origin flow chart Firefox: NOPE. api.clothes.com is a different origin from clothes.com If you run api.clothes.com, you can allow clothes.com to make requests to it using the ```Access-Control-Allow-Origin``` header. Here's what happens: javascript on clothes.com: ```POST /buy_thing``` ```Host: api.clothes.com``` Firefox (thought bubble): That's cross-origin. I'm going to need to ask api.clothes.com if this request is allowed. Firefox: ```OPTIONS /buy_thing``` ```Host: api.clothes.com``` ("hey, what requests are allowed?" preflight request) api.clothes.com: ```204 No Content`` ```Access-Control-Allow-Origin: clothes.com``` Firefox (thought bubble): cool, the request is allowed! Firefox: ```POST /buy_thing``` ```Host: api.clothes.com``` ```Referer: clothes.com/checkout``` api.clothes.com: ```200 OK``` ```{"thing_bought": true}``` This OPTIONS request is called a "preflight" request, and it only happens for some requests, like we described in the diagram on the same-origin policy page. Most GET requests will just be sent by the browser without a preflight request first, but POST requests that send JSON need a preflight.

CSS variables

### duplication is annoying Illustration of a frowning stick figure with curly hair. person, thinking: ugh, I have `color: #f79` set in 27 places and now I need to change it in 27 places ### define variables in any selector ``` body { --text-color: #f79; body { } ``` (applies to everything) ``` #header { --text-color: #c50; } ``` (applies to children of `#header`) ### use variables with `var()` ``` body { color: var(--text-color); } ``` (variables always start with `--`) ### do math on them with `calc()` ``` #sidebar { width: calc( var (--my-var) + 1em ); } ``` ### you can change a variable's value in Javascript ``` let root = document.documentElement; root.style.setProperty( '--text-color', 'black'); ``` ### changes to variables apply immediately JS, represented by a box with a smiley face: set `--text-color` to red css renderer, also represented by a box with a smiley face: ok everything using it is red now!

containers aren't magic

These 15 lines of bash will start a container running the fish shell. Try it! (download this script at bit.ly/containers-arent-magic) It only runs on Linux because these features are all Linux-only. `wget bit.ly/fish-container -O fish.tar` (# 1. download the image) `mkdir container-root; cd container-root` `tar -xf ../fish.tar` (# 2. unpack image into a directory) `cgroup_id="cgroup_$(shuf -i 1000-2000 -n 1)"` (# 3. generate random cgroup name) `cgcreate -g "cpu, cpuacct, memory: $cgroup_id"` (# 4. make a cgroup & set CPU/memory limits) `cgset -r cpu. shares=512 "$cgroup_id"` `cgset -r memory.limit_in_bytes=1000000000 \` `"$cgroup_id"` `cgexec -g "cpu, cpuacct, memory: $cgroup_id" \ ` (# 5. use the cgroup) `unshare -fmuipn --mount-proc\` (# 6. make and use some namespaces) ` chroot "$PWD" \` (# 7. change root directory) `/bin/sh -c "` `/bin/mount -t proc proc /proc &&` (# 8. use the right /proc) `hostname container-fun-times &&` (# 9. change the hostname) `/usr/bin/fish"` (# 10. finally, start fish!)

virtual memory

### your computer has physical memory memory 868 204-PIN SODIMM DDR3 CE ### physical memory has addresses, like O-8GB but when your program references an address like Ox 5c69a2a2, that's not a physical with memory address! It's a virtual address. ### every program has its own virtual address space program 1: Ox 129520 → "puppies" program 2: Ox 129520 → "bananas" ### Linux keeps a mapping, from virtual memory pages to physical memory pages called the page table a "page" is a 4kb or chunk of memory (or sometimes bigger) PID -- virtual addr -- physical addr 1971 -- Ox 20000 -- Ox 192000 2310 -- Ox 20000 -- Ox 228000 2310 -- Ox21000 -- Ox 9788000 ### when your program accesses a virtual address CPU: I'm accessing Ox21000 MMU "memory management unit" (hardware): I'll look that up in the page table and then access the right physical address ### every time you switch which process is running, Linux needs to switch the page table Linux: here's the address of process 2950's page table MMU: thanks, I'll use that now!

system calls

content delivery networks

In 2004, if your website suddenly got popular, often the webserver wouldn't be able to handle all the requests. slashdot: person 1: I want cat picture! person 2: me too! person 3: me 300,000! server, on fire: <no response> web host: now you owe me $1000 for bandwidth you: how will I pay for this? A CDN (content delivery network) can make your site faster and save you money by caching your site and handling most requests itself. 20 million requests for 1 cute cat picture -> CDN (many powerful computers) -> just 1 request: hey send me that cat picture? server: here you go! Today, there are many free or cheap CDN services available, which means if your site gets popular suddenly you can easily keep it running! This is great but caching can cause problems too! I updated my site yesterday but people are still seeing the old site! (Cache-Control header) French users are seeing the English site?!? Why? (Vary header) Next, we'll explain the HTTP headers your CDN or browser uses to decide how to do caching.

inline vs block

### HTML elements default to inline or block example inline elements: `<a> <span> <strong> <i> <small> <abbr> <img> <q> <code>` example block elements: `<p> <div> <ol> <ul><li> <h1> <h6> <blockquote> <pre>` ### inline elements are laid out horizontally text text text `<a>` text text text text `<span>` text text ### block elements are laid out vertically by default `<div>` `<p>` to get a different layout, use `display: flex` or `display: grid` ### inline elements ignore width & height* Setting the width is impossible, but in some situations, you can use `line-height` to change the height `*` img is an exception to this: look up "replaced elements" for more ### display can force an element to be inline or block `display` determines 2 things: 1. whether the element itself is `inline`, `block`, `inline-block`, etc 2. how child elements are laid out (`grid`, `flex`, `table`, `default`, etc) ### display: inline-block; TRY ME! `inline-block` makes a block element be laid out horizontally like an inline element inline text more inline text inline-block inline text

ask for specific feedback

I used to ask for feedback like this: Illustration of two stick figures, both smiling. Person 1, the employee, has short curly hair, and person 2, the manager, doesn't have hair. person 1 (speech bubble): dо you have any feedback for me? person 2 (speech bubble): not right now! person 1 (thought bubble): is there something they're not telling me? person 2 (thought bubble): what specifically does she want feed back on? I've learned that I get WAY BETTER answers if I ask more specific questions! - what do you think of this design? - did I prioritize these things well? - should I be doing more or less of X? - do you have any concerns about PROJECT? - was that email clear? Bonus: asking specific questions forces me to actually think about which areas I might want to focus on.

### ip (Linux only) lets you view + change network configuration. `ip OBJECT COMMAND` (`OBJECT` = addr, link neigh, etc) (`COMMAND` = add, show, delete, etc) Here are some ways to use it! ### ip addr list shows ip addresses your devices. Look for something like this: ``` 2: eth0: link/ether 3c:97... inet 192.168.1.170/24 ``` ### ip route list displays the route table. `default via 192.168.1.1` (my router) `169.240.0.0/16 dev docker` `...` to see all route tables: `ip route list table all` ### change your MAC address good for cafés with time limits (devil face emoji) ``` $ ip link set wlan0 down $ ip link set eth0 address 3ca9f4d1:00:32 $ ip link set wlan0 up $ service network-manager restart ``` (or whatever you use) ### `ip link` network devices! (like eth0) ### `ip neigh` view/edit the ARP table ### `ip xfrm` is for IPsec ### `ip route get IP` what route will packets with $IP take? ### `--color` (the letters of "color" are in various rainbow colours) pretty colourful output! ### `-- brief` show a summary

what's HTTP?

how URLs work

`https://examplecat.com:443/cats?color=light%20gray#banana` - scheme (`https://`): Protocol to use for the request. Encrypted (`https`), insecure (`http`), or something else entirely (`ftp`). - domain (`examplecat.com`): Where to send the request. For HTTP(s) requests, the Host header gets set to this (`Host: example.com`) - port (`:443`): Defaults to 80 for HTTP and 443 for HTTPS. - path (`/cats`): Path to ask the server for. The path and the query parameters are combined in the request, like: `GET /cats?color=light%20gray HTTP/1/1` - query parameters (`color=light gray`): Query parameters are usually used to ask for a different version of a page ("I want a light gray cat!"). Example: `hair-short&color=black&name=mr%20darcy`. Hair is the name, short is the value, separated by & - URL (`encoding %20`): URLS aren't allowed to have certain special characters like spaces, @, etc. So to put them in a URL you need to percent encode them as % + hex representation of ASCII value. space is %20, % is %25, etc. - fragment id (`#banana`): This isn't sent to the server at all. It's used either to jump to an HTML tag (`<a id="banana"..>`) or by Javascript on the page.

namespaces

### inside a container, things look different2` Illustration of a smiling stick figure with curly hair. Person: I only see 4 processes in `ps aux`, that's weird... ### why things look different: namespaces Illustration of a container, represented by a box with a smiley face Container: I'm in a different PID namespace so `ps aux` shows different processes! ### every process has 7 namespaces ``` $ lsns -p 273 NS TYPE 4026531835 cgroup 4026531836 pid 4026531837 user 4026531838 uts 4026531839 ipc 4026531840 mnt 4026532009 net ``` -p is the PID 4026532009 is the namespace ID you can also see a process's namespace with: `$ ls -1 /proc/273/ns` ### there's a default ("host" namespace) Person: "outside a container" just means "using the default namespace" ### processes can have any combination Container: I'm using the host network namespace but my own mount container namespace!

CSS units

### CSS has 2 kinds of units: absolute & relative absolute: - px - pt - pc - in - cm - mm relative - em - rem - vw - vh - % ### `rem` the root element's font size `1rem` is the same everywhere in the document. `rem` is a good unit for setting font sizes! ### `em` the parent element's font size ``` .child { font-size: 1.5em; } ``` Illustration of a box labelled "parent". Inside it is a box labelled, in larger text, "child". An arrow is pointing to the "child" text, labelled "font size is 1.5 x parent". ### O is the same in all units ``` .btn { margin: 0; } ``` also, `0` is different from `none`. `border: 0` sets the border width and `border: none` sets the style ### 1 inch = 96 px on a screen, 1 CSS "inch" isn't really an inch, and 1 CSS "pixel" isn't really a screen pixel. look up "device pixel ratio" for more. ### rem & em help with accessibility ``` .modal { width: 20rem; } ``` this scales nicely if the user increases their browser's default font size

centering in CSS

### center text with `text-align` ``` h2 { text-align: center; } ``` ### center block elements with `margin: auto` example HTML: ``` <div class="parent"> <div class="child"> </div> </div> ``` ### `margin: auto` only centers horizontally ``` .child { width: 400px; margin: auto; } ``` Illustration of a smaller box, labelled "child", inside a larger box. The child box is at the top of the larger (parent) box. An arrow pointing to the child box is labelled "not centered vertically!" ### vertical centering is easy with flexbox or grid A spiky box labelled "TRY ME" here's how with grid: ``` .parent { display: grid; place-items: center; } ``` and with flexbox: ``` .parent { display: flex; } .child { margin: auto; } ``` ### it's ok to use a flexbox or grid just to center one thing Illustration of a smaller box nested inside a larger box. The larger box is labelled ".parent `(display: grid)`" and the smaller box is labelled ".child (centered!)"

less

### less is a pager that means it lets you view (not edit) text files. man uses your pager (usually `less`) to display man pages ### many vim shortcuts work in less - `/` search - `n/N` next/prev match - `j/k` down / up a line - `m/'` mark/return to line - `g`(`gg` in vim)/`G` beginning /end of file ### less -r displays bash escape codes as colours try `ls --color | less -r` with `-r`: - `a.txt` - `a.txt.gz` (red, bold) without `-r` - `a.txt` ESCLOM ESC C01;31ma.txt.gz ESCCOM (ugh) or piped in text ### q quit (smiley face) ### v (lowercase) edit file in your $EDITOR ### arrow keys, Home / End, PgUp, Pg Dn work in less ### F press F to keep reading from the file as it's updated (like `tail -f`) press Ctrl+C to stop reading updates ### + `+` runs a command when less starts - `less +F` : follow updates - `less +G`: start at end of file - `less +20%`: start 20% into file - `less +/foo`: search for 'foo' right away

netcat

HAVING

person: every user has a different email right? 1 query later... person, now sad: oh no This query uses `HAVING` to find all emails that are shared by more than one user: ``` SELECT email, COUNT(*) FROM users GROUP BY email HAVING COUNT(*) > 1 ``` users: id 1, email asdf@fake.com id 2, email bob@builder.com id 3, email asdf@fake.com query output: email asdf@fake.com, `COUNT`(*) 2 `HAVING` is like `WHERE`, but with 1 difference: `HAVING` filters rows AFTER grouping and `WHERE` filters rows BEFORE grouping. Because of this, you can use aggregates (like `COUNT` (*)) in a `HAVING` clause but not with `WHERE`. Here's another `HAVING` example that finds months with more than $6.00 in income: ``` SELECT month FROM sales GROUP BY month HAVING SUM(price) > 6 ``` sales: month: Jan. item: catnip price: 5 month: Feb item: laser price: 8 month: March item: food price: 4 month: March item: food price: 3 query output: month: Feb month: March

sed

### sed is most often used for replacing text in a file `$ sed s/cat/dog/g file.txt` "cat"can be a regular expression ### change a file in place. with -i person: in GNU sed it's -i, in BSD sed, -i SUFFIX confuses me every time. ### Some more sed incantations... ### sed -n 12 p print 12th line -n suppresses output so only what you print with 'p' gets printed ### sed 5 d delete 5th line ### sed /cat/d delete lines matching /cat/ ### sed -n 5,30 p print lines 5-30 ### sed s+cat/+dog/+ ('+' can be any character) Use + as a regex delimeter person: way easier than escaping /s like s/cat\//dog\//! ### sed -n s/cat/dog/p only print changed lines. ### sed G double space a file (good for long error lines) ### sed /cat la dog' append 'dog' after lines containing 'cat' ### sed 'i 17 panda" insert "panda" on line 17

write for one person

ngrep

### like grep for your network (network is surrounded with glowy lines) `$ sudo ngrep GET` will find every plaintext HTTP GET request ### ngrep syntax ``` $ ngrep [options] [regular expression] [BPF filter] ``` ("regular expression" is what to search packets for) "BPF filter" use the same format as tcpdump uses! ### panel 3 Illustration of a smiling stick figure with curly hair. person: I started using `ngrep` when I was intimidated by tcpdump and I found it easier (heart) ### -d is for device which network interface to use. same as tcpdump's `-i` (try `-d any`!) ngrep ### -W byline prints line breaks as line breaks, not "\n". Nice when looking at HTTP requests ### -I file.pcap -O file.pcap read/write packets from/to a pcap file

du & df

floating point

cgroups

### processes can use a lot of memory process 1: I want 10 GB of memory process 2: me too! Linux: guys, I only have 16 GB total ### a cgroup is a group of processes every process in a container is in the same cgroup ### cgroups have memory/CPU limits Linux: you three get 500 MB of RAM to share, okay? ### use too much memory:| get OOM ("out of memory") killed process: I want 1 GB of memory Linux: NOPE your limit was 500 MB you die now! process, dead: oh no ### use too much CPU: get slowed down process: I want to use ALL THE CPU! Linux: you hit your quota for this 100ms period, you'll have to wait ### cgroups track memory & CPU usage Linux: that cgroup is using 112.3 MB of memory right now you can see it in `/sys/fs/cgroup`

shared libraries

### panel 1: Most programs on Linux use a bunch of C libraries. Some popular libraries: openssl (for SSL!) sqlite (embedded db?) zlib (gzip!) lib pcre (regular expressions!) libstdc++ (C++ standard library!) ### panel 2: There are 2 ways. to use any library: 1. Link it into your binary your code (big binary with lots of things!) | zlib | sqlite and 2. Use separate shared libraries your code zlib sqlite (all different files) ### panel 3: Programs like this your code | zlib | sqlite are called "statically linked" programs like this your code zlib sqlite are called "dynamically linked" ### panel 4: person 1: how can I tell what shared libraries a Program is using? person 2: Idd!! ```$ Idd /usr/bin/curl libz.so.1 => /lib/x86-64... lib resolv.so.2 =>.... libc.so.6 =>... ``` +34 more ### panel 5: person 1: I got a "library not found" error when running my binary?! person 2: If you know where the library is, try setting the ```LD_LIBRARY_PATH``` environment variable dynamic linker: ```LO-LIBRARY_PATH``` tells me where to look! ### panel 6: Where the dynamic linker looks 1. ```ODT. RPATH``` in your executable 2. ```LD- LIBRARY_PATH``` 3. ```DT- RUNPATH``` in executable 4. ```/etc/ld.so.cache.``` (run ```Idconfig -p``` to See contents) 5. ```/lib, /usr/lib```

take on hard projects

To wrap up, let's talk about one last wizard skill: confidence. When there's a hard project, sometimes I think: maybe someone better than me should work on this? and I imagine this magical human: - codes really fast - knows everything about every technology - understands the business well - great communicator - has time for the project - 20 years of experience But in programming: - we're changing the tech we use all the time. - every project is different, and it's rarely obvious how to do it. - there aren't many experts, and they certainly don't have time to do everything. So instead, we have me: - learns fast - works hard - 6 years of experience - good at debugging I figure "someone's gotta do this' write down a plan, and get started! A lot of the time, it turns out well. I learn something and feel a little more like a WIZARD.

anatomy of a http request

HTTP requests always have: - a domain (like `examplecat.com`) - a resource (like `/cat.png`) - a method (`GET`, `POST`, or something else) - headers (extra information for the server) There's an optional request body. `GET` requests usually don't have a body, and `POST` requests usually do. This is an HTTP 1.1 request for `examplecat.com/cat.png`. It's a `GET` request, which is what happens when you type a URL in your browser. It doesn't have a body. ``` GET /cat.png HTTP/1.1 Host: examplecat.com User-Agent: Mozilla... Cookie: ..... ``` `GET` = method (usually GET or POST) `/cat.png` = resource being requested `HTTP/1.1` = HTTP version `examplecat.com` = domain being requested, header `User-Agent: Mozilla`... = header `Cookie: .....` = header Here's an example POST request with a JSON body: ``` POST /add_cat HTTP/1.1 Host: examplecat.com content type of body Content-Type: application/json Content-Length: 20 ``` {"name": "mr darcy"} `POST` = method `Host: examplecat.com` = header `Content-Type: application/json` = content type of body, header `Content-Length: 20` = header `{"name": "mr darcy"}` = request body: the JSON we're the server sending to

anatomy of a HTTP response

### HTTP responses have: - a status code (200 OK! 404 not found!) - headers - a body (HTML, an image, JSON, etc) ### Here's the HTTP response from `examplecat.com/cat.txt`: ``` HTTP/1.1 200 OK status Accept-Ranges: bytes Cache-Control: public, max-age=0 Content-Length: 33 Content-Type: text/plain; charset=UTF-8 Date: Mon, 09 Sep 2019 01:57:35 GMT Etag: "ac5affa59f554a1440043537ae973790-ssl" Strict-Transport-Security: max-age=31536000 Age: 0 Server: Netlify [ASCII image of a cat, labelled "cat!" with a smiley face] ``` The first line, `HTTP/1.1 200 OK` is the status code. "200" is the status. The lines from `Accept-Ranges` to `Server` are the headers. The cat picture is the body. ### There are a few kinds of response headers: - when the resource was sent/modified: ``` Date: Mon, 09 Sep 2019 01:57:35 GMT Last-Modified: 3 Feb 2017 13:00:00 GMT ``` - about the response body: ``` Content-Language: en-US Content-Length: 33 Content-Type: text/plain; charset=UTF-8 Content-Encoding: gzip ``` - caching: ``` ETag: "ac5affa..." Vary: Accept-Encoding Age: 255 Cache-Control: public, max-age=0 ``` - security: (see page 25) ``` X-Frame-Options: DENY X-XSS-Protection: 1 Strict-Transport-Security: max-age=31536000 Content-Security-Policy: default-src https: ``` - and more: ``` Connection: keep-alive Accept-Ranges: bytes Via: nginx Set-Cookie: cat-darcy; HttpOnly; expires=27-Feb-2020 13:18:57 GMT; ```

why containers?

dig

my rules for simple JOINs

http status codes

Every HTTP response has a status code. browser, optimistically: `GET /cat.png` (request) server, sadly: 404 not found (404 is the status code!) There are 50ish status codes but these are the most common ones in real life: 2xxs mean ★Success★ - 200 OK 3xx s aren't errors, just redirects to somewhere else - 301 Moved Permanently - 302 Found: temporary redirect - 304 Not Modified: the client already has the latest version, "redirect" to that 4xx errors are generally the client's fault: it made some kind of invalid request - 400 Bad Request - 403 Forbidden: API key/OAuth/something needed - 404 Not Found: we all know this one :) - 429 Too Many Requests: you're being rate limited 5xx errors generally mean something's wrong with the server. - 500 Internal Server Error: the server code has an error - 503 Service Unavailable: could mean nginx (or whatever proxy) couldn't connect to the server - 504 Gateway Timeout: the server was too slow to respond

sort & uniq

how indexes make your queries fast

By default, if you run `SELECT * FROM cats WHERE name = 'mr darcy'` the database needs to look at every single row to find matches. database, sad: reading 30 GB of data from disk takes like 60 seconds by itself, you know! (at 500 MB/s SSD speed) Indexes are a tree structure that makes it faster to find rows. Here's what an index on the 'name' column might look like. a-z aaron to ahmed aaron to abdullah agnes to ahmed molly to nasir 60 children waseem to zahra database indexes are b-trees and the nodes have lots of children (like 60) instead of just 2. log <sub>60</sub> (1,000,000,000) = 5.06 This means that if you have 1 billion names to look through, you'll only need to look at maybe nodes in the index to find the name you're looking for (5 is a lot less than 1 billion!!!). person 1: are you saying indexes can make my queries 1,000,000x faster? person 2: yes! actually some queries. on large tables are basically impossible (or would take weeks) without using an index!

### panel 1 two stick figures talking. the first one is bald and looks unhappy. the second one has short curly hair and is smiling. person 1: I can't start my server because it says something is using port 8080! person 2: 1. Use ss ("socket statistics") to find the process ID using the port 2. Kill the other process! ### * tuna, please! * `$ ss -tunapl` (the 'a' here doesn't do anything) This is my favourite way to use ss! It shows all the running servers. ### -n use numeric ports (80 not http) ### -P show PIDs using the socket ### TONS of information -i -m -o (-i is in a spiky bubble, -m is in a cloud bubble, and -o is in a heart) ### which sockets ss shows listening or connections (non-listening/established)? default: connections -1: listening -a: both which protocols? default: all -t: TCP -u: UDP -X: unix domain Sockets ### netstat netstat -tunapl and ss -tunap! do the same thing netstat is older and more complicated. If you're learning now I'd recommend ss!

nmap

### nmap lets you explore a network which ports are open? what hosts are up? security people use it a lot! ### find which hosts are up `$ nmap-sn 192.168.1.0/24` `168` is my home network `-sn` means "ping scan". (not `-s-n` it's `-sn`) just finds hosts by pinging every one, doesn't port scan ### aggressive scan `nmap -v -A scanme.nmap.org` `-A` = aggressive port, server version, even OS ### -Pn skip doing a ping scan and assume every host is up. good if hosts block ping (lots do) ### fast port scan `$ nmap -SS-F 192.168.1.0/24` just sends a SYN packet to check if each port is open. I found out which ports my printer has open! ``` 80 http 443 https 515 printer 631 ipp 9100 jetdirect ``` ### -F scan less ports: just the most common ones ### -T4 or -T5 scan faster by timing out more quickly ### ♡ check TLS version and ciphers ♡ check if your server still supports old TLS versions ``` $ nmap --script ssl-enum-ciphers -p 443 wizardzines.com ``` list all scripts with: `$ nmap --script-help '*'`

lsof

man page sections

man pages are split up into 8 sections 1 2 3 4 5 6 7 8 `$ man 2 read` means "get me the man page for `read` from section 2". There's both - a program called "read" - and a system call called "read" So `$ man 1 read` gives you a different man page from `$ man 2 read` If you don't specify a section, man will look through all the sections & show the first one it finds. ### man page sections 1. programs `$ man grep ` `$ man ls` 2. system calls `$ man sendfile `$ man ptrace 3. C functions `$man printf `$ man fopen 4. devices `$ man null` for /dev/null docs 5. file formats `$ man sudoers` for `/etc/sudoers` `$ man proc` files in `/proc`! 6. games not super useful. `$man sl` is my favourite from that section 7. miscellaneous explains concepts! `$man 7 pipe` `$ man 7 symlink` 8. sysadmin programs `$ man apt` `$ man chroot`

CSS selectors

### panel 1 Illustration of a smiling stick figure with curly hair. person: now that we have the right attitude, let's move on to how CSS actually works! ### div matches `div` elements `<div>` ### #welcome `#` matches elements by `id` `<div id="welcome">` ### .button matches elements by `class` `<a class="button">` ### div .button match every `.button` element that's a descendent of a `div` ### div.button match divs with class "`button`" `<div class="button">` ### div > .button match every `.button` element that's a direct child of a `div` ### .button, #welcome matches both `button` and `#welcome` elements ### a[href^="http"] match `a` elements with a `href` attribute starting with `http` ### a:hover matches `a` elements that the cursor is hovering over ### :checked matches if a checkbox or radio button is checked ### tr:nth-child(odd) match every other child of a parent element

Cookies are a way for a server to store a little bit of information in your browser. They're set with the `Set-Cookie` response header, like this: ### first request: server sets a cookie browser, represented by a box with a smiley face: `GET /my-cats` server, also represented by a box with a smiley face: ``` 200 OK Set-Cookie: user = b0rk; HttpOnly <response body> ``` (`user` is the name, `b0rk` is the value. `HttpOnly` is the cookie options (expiry goes here)) ### Every request after: browser sends the cookie back browser: ``` GET /my-cats Cookie: user= b0rk ``` server, thinking: oh, this is b0rk! I don't need to ask them who they are then! Cookies are used by many websites to keep you logged in. Instead of `user=b0rk` they'll set a cookie like `sessionid=long-incomprehensible-id`. This is important because if they just set a simple cookie like `user=b0rk`, anyone could pretend to be b0rk by setting that cookie! Designing a secure login system with cookies is quite difficult— to learn more about it, google "OWASP Session Management Cheat Sheet".

CSS testing checklist

Finally, it's important to test your site with different browsers, screen sizes, and accessibility evaluation tools. ### browsers - Chrome - Safari - Firefox - maybe others! ### sizes - small phone (300px wide) - tablet (~700px) - desktop (~1200px) ### accessibility - colour contrast - text size - keyboard navigation - works with a screen reader ### performance - fake a slow/high latency network connection! Illustration of a smiling stick figure with curly hair. person: the most important thing is to know your users! Check your analytics: if 10% of your users are using IE, test your site on IE!

copy on write

### On Linux, you start new processes using the fork() or clone() system call. calling fork creates a child process that's a copy of the caller ### the cloned process has EXACTLY the same memory. - same heap - same stack - same memory maps if the parent has 36B of memory, the child will too. ### copying all that memory every time we fork would be slow and a waste of RAM often processes call `exec` right after `fork`, which means they don't use the parent process's memory basically at all! ### so Linux lets them share physical RAM and only copies the memory when one of them tries to write process: I'd like to change that memory Linux: okay! I'll make you your own copy! ### Linux does this by giving both the processes identical page tables. (same RAM) but it marks every page as read only. ### when a process tries to write to a shared memory address: 1. there's a page fault= 2. Linux makes a copy of the page & updates the page table 3. the process continues, blissfully ignorant process, happily: It's just like I have my own copy

the OOM killer

CPU scheduling

HTTP/2

HTTP/2 is a new version of HTTP. Here's what you need to know: ### A lot isn't changing All the methods, status codes, request/response bodies, and headers mean exactly the same thing in HTTP/2. before (HTTP/1.1): ``` method: GET path: /cat.gif headers: - Host: examplecat.com - User-Agent: curl ``` after (HTTP/2): ``` method: GET path: /cat.gif authority: examplecat.com headers: - User-Agent: curl ``` one change: Host header => authority #### HTTP/2 is faster Even though the data sent is the same, the way HTTP/2 sends it is different. The main differences are: - It's a binary format (it's harder to ```tcpdump``` traffic and debug) - Headers are compressed - Multiple requests can be sent on the same connection at a time before (HTTP/1.1): → request 1 response 1 ← → request 2 response 2 ← after (HTTP/2): → request 1 → request 2 response 2 ← response 1 ← (out of order is ok) (one TCP connection) All these changes together mean that HTTP/2 requests often take less time than the same HTTP/1.1 requests. ### Sometimes you can switch to it easily A lot of software (CDNs, nginx) let clients connect with HTTP/2 even if your server still only supports HTTP/1.1. 1. Firefox to CDN: HTTP/2 request 2. CDN to your server: HTTP/1.1 request 3. your server to CDN: HTTP/1.1 response 4. CDN to Firefox: HTTP/2 response

tar

### panel 1 The tar file format combines many files into one file. a.txt b.txt dir/c.txt tar files aren't compressed by themselves. Usually you gzip them: .tar.gz or .tgz! ### panel 2: Usually when you use the 'tar' command, you'll run some incantation. To unpack a tar.gz, use: ```tar -xzf file.tar.gz`` person 1: what's xzf? person 2: let's learn! ### panel 3: -X is for extract into the current directory by default (change with -C) ### panel 4: -C is for create makes a new tar file! ### panel 5: -t is for list lists the contents of a tar archive ### panel 6: -f is for file which tar file to create or unpack ### panel 7: tar can compress / decompress -z gzip format (.gz) -j bzip2 format (.bz2) -J x2 format (.xz) & more! see the man page ### panel 8: putting it together list contents of a .tar.bz2: ```$tar tvf file.tar.bz2 ``` j = verbose create a .tar.gz: ```$ tar -c2f file.tar.gz dir/``` dir/ = files to go in the archive

Sometimes queries run slowly, and `EXPLAIN` can tell you why! 2 ways you can use `EXPLAIN` in PostgreSQL: (other databases have different syntax for this) 1. Before running the query (`EXPLAIN SELECT... FROM ...`) This calculates a query plan but doesn't run the query. I always run EXPLAIN on a query. before running it on my production database. I won't risk overloading the database with a slow query! 2. After running the query `(EXPLAIN ANALYZE SELECT ... FROM...)` person 1: why is my query so slow? person 2: `EXPLAIN ANALYZE` runs the query and analyzes why it was slow Here are the EXPLAIN ANALYZE results from PostgreSQL for the same query run on two tables of 1,000,000 rows: one table that has an index and one that doesn't `EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1` unindexed table: ``` Seq Scan on users Filter: (id = 1) Rows Removed by Filter: 999999 Planning time: 0.185 ms Execution time: 179.412 ms ``` "Seq Scan" means it's looking at each row (slow!) indexed table: ``` Index Only Scan using users_id_idx on users Index Cond: (id = 1) Heap Fetches: 1 Planning time: (3.411 ms Execution time: 0.088 ms ``` the query runs 50 times faster with an index

cat

### cat concatenates files `$ cat myfile.txt` prints contents of myfile.txt| `$ cat *.txt` prints all .txt files put together! ### you can use cat as an EXTREMELY BASIC text editor: 1. Run $ cat > file.txt 2. type the contents (don't make mistakes (smiley face)) 3. press ctrl+d to finish ### cat -n prints out the file with line numbers! 1. Once upon a midnight.. 2. Over many a quaint. 3. While I nodded, nearly ### zcat cats a gzipped file! Actually just a 1-line shell script that runs `gzip -cd`, but easier to remember. ### tee `tee file.txt` will write. its stdin to both stdout and file.txt `stdin` > `tee a.txt` > `stdout` and `a.txt` ### how to redirect to a file owned by root `$ sudo echo "hi">> x.txt` this will open x.txt as your user, not as root, so it fails! `$ echo "hi" I sudo tee -a x.txt` will open x.txt as root (smiley face)

find

### find searches a directory for files `find /tmp -type d -print` `tmp`: directory to search `-type d`: which files `-print`: action to do with the files There are my favourite find arguments! ### -name/-iname case insensitive the filename! eg `-name '*.txt'` ### -path /-ipath search the full path! `-path /home/*/*.go` ### -type [TYPE] f: regular file d: directory 1: symlink and more! ### -maxdepth NUM only descend NUM levels when searching a directory. ### -Size O find empty files! Useful to find files you created by accident ### -exec COMMAND action: run COMMAND on every file found ### -print0 print null-separated filenames Use with xargs -O! ### -delete action: delete all files found ### locate The locate command searches a database of every file on your system. good: faster than find bad: can get out of date ### $ sudo updatedb updates the database

debugging is hard. take breaks.

segmentation faults

CSS isn't easy

### CSS seems simple at first ``` h2 { font-size: 22px; } ``` Illustration of a smiling stick figure with curly hair. person: ok this is easy! ### and it is easy for simple tasks image of a page with header and text underneath a layout like this is simple to implement! ### but website layout is not an easy problem image of a page with a logo, header, text, sidebar, and multiple images this needs to adjust to so many screen sizes! ### the spec can be surprising TRY ME! CSS 2.1: setting `overflow: hidden;` on an inline-block element changes its vertical alignment Illustration of a stick figure with curly hair, looking worried. person: weird! ### and all browsers have bugs Safari: I don't support flexbox for `<summary>` elements person: ok fine ### accept that writing CSS is gonna take time person: if I'm patient I can fix all the edge cases in my CSS and make my site look great everywhere!

page faults

network protocols

bash parameter expansion

## panel 1: `${...}` is really powerful person: "it can do a lot of string operations, my favourite is search/replace ## panel 2: `${var}` same as `$var` ## panel 3: `${#var}` length of the string or array `var` example: ``` $ x=panda $ echo ${#x} 5 ``` ### panel 4: `${var/bear/panda}` search & replace. Example: ``` $ x="I'm a bearbear! $ echo ${x/bear/panda} # replace 1 instance of 'bear' I'm a pandabear! $ echo ${x//bear/panda} # replace every instance of 'bear' I'm a pandapanda! ``` ### panel 5: `${var:-othervar}` use a default value if `var` is unset/null Example: ``` echo ${asdf:-some default value} ``` ### panel 6: `${var:?some error}` prints "some error" and exits if `var` is null or unset ### panel 7: `${var#pattern}` and `${var%pattern}` remove the prefix/suffix `pattern` from `var. Example: ``` $ x=motorcycle.svg $ echo "${x%.svg}" motorcycle ``` ### panel 8: `${var:offset:length}` get a substring of `var`. Example: ``` $ x='panda bear time' $ echo ${x:6:4} time ``` ### panel 9 person: "there are LOTS more, look up 'bash parameter expansion'!"

COALESCE

single quote your strings

mmap

### what's mmap for? person 1: I want to work with a VERY LARGE FILE but it won't fit in memory person 2: You could try mmap! (mmap = "memory map") ### load files lazily with mmap When you mmap a file, it gets mapped into your program's memory. 2 TB file: 2 TB of virtual memory but nothing is ACTUALLY read into RAM until you try to access the memory. (how it works: page faults!) ### how to mmap in Python ``` import mmap f= open("HUGE.txt") mm= mmap.mmap (f. filenol), 0) ``` (this won't read the file from disk! Finishes ~instantly.) `print (mm C-1000:7)` this will read only the last 1000 bytes! ### sharing big files with mmap three processes: we all want to read the same file! mmap: no problem! Even if 10 processes mmap a file, it will only. be read into memory once ### dynamic linking uses mmap program: I need to use libc.so.6 (standard library) ld dynamic linker: you too eh? no problem. I always mmap, so that file is probably loaded into memory already. ### anonymous memory maps - not from a file (memory set to by default) - with `MAP.SHARED`, you can use them to share memory with a subprocess!

processes

libc

when debugging, your attitude matters

overlay filesystems

what's a shell?

debugging tips: check your assumptions

inodes

pipes

learn one thing at a time

ping

### ping checks if you can reach a host and how long the host took to reply `$ping health.gov.au` output: `... time=253ms...` Australia is 17,000 km from me. at the speed of light it's still far! ### ping works by sending an ICMP packet and waiting for a reply ping: to: health.gov.au hello! health.gov.au: I'm here! ### myth: if a host doesn't reply to ping, that means it's down Some hosts never respond to ICMP packets. This is why traceroute shows "..." sometimes. ping: hello! host (thinking): not listening!! ### traceroute tells you the path a packet takes to get to a destination me → my ISP → NYC → Sacramento → Australia ### example traceroute `$ traceroute health.gov.au` `1: 192.168.1.1 3ms` ← router `2:...yul.ebox.ca 12 ms` ← ISP `...` `8: NYC4. ALTER.NET 24 ms` ← here the packet crossed the USA! from NYC- Sacramento! `9: SAC1.ALTER.NET 97 ms` `...` `16: health.gov.au 253ms` ← crossing the US takes time ### mtr like traceroute, but nicer output! try it! ### last panel look up how traceroute works (using TTLs!) it's simple + cool!

kill

### kill doesn't just kill programs you can send ANY signal to a program with kill! `$ kill -SIGNAL PID` (name or number) ### which signal kill sends name num ``` kill => SIGTERM 15 kill -9 => SIGKILL 9 kill -KILL => SIGKILL 9 kill -HUP => SIGHUP kill -STOP => SIGSTOP ``` ### kill -l lists all signals. 1. HUP 2. INT 3. QUIT 4. ILL 5. TRAP 6. ABRT 7. BUS 8. FPE 9. KILL 10. USR1 11. SEGV 12. USR2 13. PIPE 14. ALRM 15. TERM 16. STKFLT 17. CHLD 18. CONT 19. STOP 20. TSTP 21. TTIN 22. TTOU 23. URG 24. XCPU 25. XFS2 26. VTALRM 27. PROF 28. WINCH 24. POLL 30. PWR 31. SYS ### killall -SIGNAL NAME signals all processes called NAME for example: `$ killall firefox` useful flags: -w: wait for all signaled processes to die -i: ask before signalling ### pgrep prints PIDs of matching running programs pgrep fire matches firefox firebird NOT bash firefox.sh To search the whole command line (eg bash firefox.sh), use `pgrep -f` ### pkill same as pgrep, but signals PIDS found. Example: `$ pkill -f firefox` I use pkill more than killall these days.

terminals

a branch is a pointer to a commit

A branch in git is a pointer to a commit SHA master → 2e9fab awesome-feature → 3bafea fix-typo → 9a9a9a Here's some proof! In your favourite git repo, run this command: ```$ cat .git/refs/heads/master``` "master" is just a text file with the commit SHA master points at! Understanding what a branch is will make it WAY EASIER to fix your branches when they're broken: you just need to figure out how to get your branch to point at the right commit again! 3 main ways to change the commit a branch points to: - ```git commit``` will point the branch at the new commit - ```git pull``` will point the branch at the same commit as the remote branch - ```git reset COMM T_SHA``` will point the branch at ```COMM T_SHA```

### panel 1: unix programs have 1 input and 2 outputs When you run a command from a terminal, the input & outputs go to/from the terminal by default. Picture of a program (represented by a box with a smiley face) with 1 arrow coming in and 2 arrows out. The arrows are numbered 0, 1, and 2, and there's a comment: "each input/output has a number, its "file descriptor") **arrow 0 (coming into program): `<` redirects stdin** `wc < file.txt` and `cat file.txt | wc` both read `file.txt` to wc's stdin ``` wc < file.txt cat file.txt ``` **arrow 1 (coming out of program): `>` redirects stdout** ``` cmd > file.txt ``` **arrow 2 (coming out of program): `2>` redirects stderr** ``` cmd 2> file.txt ``` ### panel 2: `2>&1` redirects stderr to stdout ``` cmd > file.txt 2>&1 ``` Illustration of cmd, represented by a box with a smiley face. There is one arrow, labelled "sdout(1)", leading to a box labelled "file.txt". There is a second arrow coming out of cmd, labelled "stderr(2)". Then, there's a squiggly third arrow, labelled "2>&1", that leads from "stderr(2)" to "file.txt". ### panel 3: `/dev/null` your operating system ignores all writes to `/dev/null` ``` cmd > /dev/null ``` picture of stdout going to a trash can (`/dev/null`) and stderr still going to the terminal ### panel 2: sudo doesn't addect redirects your bash shell opens a file to redirect to it, and it's running as you. So ``` $ sudo echo x > /etc/xyz ``` won't work. do this instead: ``` $ sudo echo x | tee /etc/xyz ```

why updating DNS is slow

shellcheck

### shellcheck finds problems with your shell scripts `$ shellcheck my-script.sh` shellcheck: oops, you can't use in an `if [ ... ]`! ### it checks for hundreds of common shell scripting errors shellcheck: hey, that's a bash- only feature but your script starts with `#!/bin/sh` ### every shellcheck error has a number (like "SC2013") and the shellcheck wiki has a page for every error with examples! I've learned a lot from the wiki. ### it even tells you about misused commands shellcheck: hey, it looks like you're not using `grep` correctly here person: wow I'm not! thanks! ### your text editor probably has a shellcheck plugin shellcheck: I can check your shell scripts every time you save! ### basically, you should probably use it bash has too many weird edge cases for me to remember, I love that shellcheck can help me out!

if you understand a bug, you can fix it

sockets

### networking protocols are complicated book: TCP/IP Illustrated, Volume 1, by Stevens (600 pages) person: what if I just want to download a cat picture? ### Unix systems have an API called the "socket API" that makes it easier to make network connections Unix: you don't need to know how TCP works. I'll take care of it! ### here's what getting a cat picture with the Socket API looks like: 1. Create a socket: `fd= socket(AF_INET, SOCK-STREAM...)` 2. Connect to an IP/port: `connect (fd, 12.13.14.15:80)` 3. Make a request: `write (fd, "GET /cat.png HTTP/I.I...)` 4. Read the response: `cat-picture= read (fd...)` ### Every HTTP library uses sockets under the hood `$curl awesome.com` Python: `requests.get("yay.us")"` (sockets) person: oh, cool, I could write an HTTP library too if I wanted`*`. Neat! `*` SO MANY edge cases though! :) ### AF_INET? What's that? AF-INET means basically "internet socket": it lets you connect to other computers on the internet using their IP address. The main alternative is AF-UNIX ("unix domain socket") for connecting to programs on the same computer. ### 3 kinds of internet (AF INET) sockets: 1. `SOCK_STREAM` = TCP (curl uses this) 2. `SOCK_DGRAM` = UDP (dig (DNS) uses this) 3. `SOCK.RAW` = just let me send IP packets. I will implement my own protocol. (ping uses this)

file descriptors

### Unix systems use integers to track open files Process, represented by a box with a smiley face: Open `foo.txt` kernel, also represented by a box with a smiley face: okay! that's file #7 for you. these integers are called file descriptors ### `lsof` (list open files) will show you a process's open files `$lsof -P 4242` (4242 is the PID we're interested in) FD NAME ``` 0 /dev/pts/tty1 1 /dev/pts/tty1 2 pipe: 29174 3 /home/bork/awesome.txt 5 /tmp/ ``` (FD is for file descriptor) ### file descriptors can refer to: - files on disk - pipes - sockets (network connections) - terminals (like `xterm`) - devices (your speaker! `/dev/null`!) - LOTS MORE (`event fd`, `inotify`, `signalfo`, `epoll`, etc.) little tiny smiling stick figure: not EVERYTHING on Unix is a file, but lots of things are ### When you read or write to a file/pipe/network connection you do that using a file descriptor person: connect to google.com OS: ok! fd is 5! person: write GET / HTTP/1.1) to fd #5 OS: done! ### Let's see how some simple Python code works under the hood: Python: ``` f = open ("file.txt") f. read lines() ``` Behind the scenes: Python program: open file.txt OS: ok! fd is 4 Python program: read from file #4 OS: here are the contents! ### (almost) every process has 3 standard FDs: - `stdin`: 0 - `stdout`: 1 - `stderr`: 2 "read from stdin" means "read from the file descriptor O" (could be a pipe or file or terminal)

HTTP response headers

### Age how many seconds response has been cached ```Age: 355``` ### Date when response was sent ```Date: Mon, 09 Sep 2019...``` ### Last-Modified when content was last modified (not always accurate) ### ETag Version of response body ```Etag: "ac5affa.."``` ### Cache-Control various caching ```settings Cache-Control: max-age=300``` ### Vary request headers that response will vary based on ### Via added by proxy servers ```Via: nginx``` ### Expires The response is stale and should be re-requested after this time. ### Connection "close" or "keep-alive" Whether to keep the TCP connection open ### Set-Cookie Sets a cookie. ```Set-Cookie: name=value; HttpOnly``` ### Access-Control-* Called CORS headers. These allow cross-origin requests. ### Content-Type MIME type of body ```Content-Type: text/plain``` ### Content-Length length of body in bytes ```Content-Length: 33``` ### Content-Language Language of body ```Content-Language: en-US``` ### Content-Encoding Whether body is compressed ```Content-Encoding: gzip``` ### Location URL to redirect to ```Location: /cat.png``` ### Accept-Ranges Whether Range request header is supported for this resource

tcpdump

head & tail

### head shows the first you 10 lines of a file. if you pipe a program's output to head, the program will stop after printing 10 lines (it gets sent SIG PIPE) ### tail tail shows the last 10 lines! `tail -f FILE` will follow: print any new lines added to the end of FILE. Super useful for log files! ### -n NUM -n NUM (either head or tail) will change the # lines shown NUM can also be negative. Example: `$ head -n 5 file.txt` will print all lines except the last 5 ### -C NUM show the first /last NUM bytes of the file `$ head -c 1k` will show the first 1024 bytes ### tail --retry keep trying to open file if it's inaccesible ### tail --pid PID stop when process PID stops running (with `-f`) ### tail --follow-name Usually `tail -f` will follow a file descriptor. `tail --follow-name FILENAME` will keep following the same file name, even if the file descriptor changes

mitmproxy

oh shit! I accidentally committed to the wrong branch!

1. Check out the correct branch `git checkout correct-branch` `cherry-pick` makes a new commit with the same changes as *, but a different parent 2. Add the commit you wanted to it `git cherry-pick COMMIT_ID` ↑ use '`git log wrong-branch`' to find this 3. Delete the commit from the wrong branch. ``` git checkout wrong-branch git reset --hard HEAD^ ``` be careful when running '`git reset --hard!`' always run '`git status`' first to make sure there aren't uncommitted changes and '`git stash`' to save them if there are

the same-origin policy

### we think of root as being all-powerful... The following items are in spiky bubbles: - edit any file - change network config - spy on any program's memory ### ... but actually to do "root" things, a process needs the right ★capabilities★ Process, represented by a box with a smiley face: I want to modify the route table! Linux, represented by a penguin: you need CAP_NET_ADMIN! ### there are dozens of capabilities Illustration of a smiling stick figure with curly hair. Person: `$ man capabilities` explains all of them but let's go over 2 important ones! ### CAP_SYS_ADMIN lets you do a LOT of things. avoid giving this if you can! ### CAP_NET_ADMIN allow changing network settings ### by default containers have limited capabilities Process: can I call process_vm_ready? Linux: nope! you'd need CAP_SYS_PTRACE for that! ### $ getpcaps PID print capabilities that PID has ### getcap / setcap system calls: get and set capabilities!

HTTP security headers

questions to ask about your data

threads

### Threads let a process do many different things at the same time process: thread 1: I'm calculating ten million digits of π! so fun! thread 2: I'm finding a REALLY BIG prime number! ### threads in the same process share memory thread 1: I'll write some digits of to π O x 129420 in memory thread 2: uh oh! that's where I was putting my prime numbers. ### and they share code calculate-pi find-big-prime-number but each thread has its own stack and they can be run by different CPUs at the same time CPU 1: π thread CPU 2: primes thread ### sharing memory can cause problems (race conditions!) at the same time: memory: 23 thread 1: I'm going to add 1 to that number! thread 2: I'm going to add 1 to that number! RESULT: 24 WRONG. Should be 25! ### why use threads instead of starting a new process? a thread takes less time to create. sharing data between threads is very easy. But it's also easier to make mistakes with threads. thread 1: you weren't supposed to CHANGE that data!

highlight the main ideas

ethtool

container kernel features

what's your manager's job?

Understanding a little about your manager's job helps you work well with them! Some things your manager is responsible for: Each of these items is enclosed in a thought bubble with an illustration. ### make sure the team is doing important projects Illustration of a smiling stick figure (the manager). manager: X is a priority this quarter! ### keep projects on track Illustration of two smiling stick figures, one with medium length straight hair (the CEO) and another one with no hair (the manager). CEO: what's the status of x project? manager: [needs to answer] ### communicate with other teams Illustration of two smiling stick figures, one with curly hair (person on other team) and another one with no hair (the manager). person on other team: we're doing x manager: our teams should collaborate on that! ### help team members grow Illustration of a smiling stick figures with curly hair. person: I learned so much this year!

debugging tip: code one thing at a time

CSS specificity

### different rules can set the same property which one gets chosen? ``` a:visited { color: purple; font-size: 1.2em; ``` ``` } #start-link { color: orange; } ``` ### CSS uses the "most specific" selector that matches an element In our example, the browser will use `color: orange` because IDs (like `#start-link`) are more specific than pseudoclasses (like `:visited`) ### TRY ME! CSS can mix properties from different rules it'll use this font size: ``` a:visited { color: purple; font-size: 1.2em; ``` but use this color because `#start-link` is more specific: ``` } #start-link { color: orange; } ``` ### how CSS picks the "most specific" rule a selector with element names: ``` body div span a { color:red; } ``` loses to a selector with `.classes` or `:pseudoclasses`: ``` .sidebar .link { color: orange; } ``` loses to a selector with an `#id`: ``` #header a { color: purple; } ``` loses to an inline style: ``` style="color: green; ``` loses to an `!important` rule: ``` "color: blue !important; ``` (`!important` is very hard to override, which makes life hard for your future self!)

containers = processes

### a container is a group of Linux processes Illustration of a smiling stick figure with curly hair. person: on a Mac, all your containers are actually running in a Linux virtual machine ### panel 2 person: I started 'top' in a container. Here's what that looks like in ps: - outside the container ``` $ ps aux grep top USER PID START COMMAND root 23540 20:55 top bork 23546 20:57 top ``` - inside the container ``` $ ps aux | grep top USER PID START COMMAND root 25 20:55 top ``` (`root 23540 20:55 top` and `root 25 20:55 top` are the same process!) ### container processes can do anything a normal process can... Illustration of a smiling stick figure with curly hair, and Linux, represented by its penguin mascot person: I want my container to do X Y Z W! Linux: sure! your computer, your rules! ### but usually they have restrictions (there are drawings of locks on either side of the word "restrictions") Illustration of a container, represented by a box with a smiley face. Around it are arrows with the following labels: - different PID namespace - different root directory - cgroup memory limit - limited capabilities - not allowed to run some system calls ### the restrictions are enforced by the Linux kernel Linux: NO, you can't have more memory! person: on the next page we'll list all the kernel features that make this work!

writing code with bugs is normal

NULL surprises

NULL isn't equal (or not equal!) to anything in SQL (x = NULL and x != NULL are never true for any x). This results in 2 behaviours that are surprising at first: ### Surprise! x= NULL doesn't work fish name: NULL owner: bob name: nemo owner: ahmed ``` SELECT * FROM fish WHERE name = NULL ``` no results! You need to use `x IS NULL` instead. works name IS NULL name IS NOT NULL doesn't work name = NULL name != NULL surprise! name != 'betty' doesn't match NULLs fish name: NULL owner: bob name: nemo owner: ahmed ``` SELECT FROM fish WHERE name != 'betty' ``` name: NULL owner: bob To match NULLS as well, I'll often write something like `WHERE name = 'betty' OR name IS NULL` instead. ### more surprising truths More operations with NULL which might be surprising: 2 + NULL => NULL NULL * 10 => NULL CONCAT('hi', NULL) => NULL NULL = NULL => NULL (NULL isn't even equal to itself!) 2 = NULL => NULL 2 != NULL => NULL

on surviving performance reviews

Performance reviews can be really stressful. Illustration of two stick figures. One has no hair and is smiling, the other one has short curly hair and looks unhappy. person 1: here's the self assessment form to fill out! person 2 (thought bubble): AWESOME PLAN: procrastinate for 2 weeks and then do it at the last minute in a panic! Here's what I've been doing for the last year or so, which has helped! About a month before performance review season comes around, I'll compile a HUGE DOCUMENT with: - every project I did in the last year - the project's goals & results - cool graphs/metrics that show it was a success - what my contributions to the project were - people I've mentored (eg an intern!) - project plans & documentation I've written and send it to my manager. My manager's reaction: Illustration of a smiling stick figure with no hair. THANK YOU! Having all this information makes it really easy for me to explain why your work is so great!

binary search

understand the bug before trying to fix it

CSS grid areas

### panel 1 Illustration of a smiling stick figure with curly hair. person: CSS grid is a big topic, so I just want to show you one of my favourite grid features: areas! ### let's say you want to build a layout Illustration of a long rectangle, labelled "header". Underneath it are two rectangles, side by side, labelled "sidebar" and "content" ### `grid-template-areas` lets you define your layout in an almost visual way ``` grid-template-areas: "header header" "sidebar content" ``` I think of it like this: Illustration of a two rectangles side-by-side, both labelled "header". Underneath them are two rectangles, side by side, labelled "sidebar" and "content" ### write your HTML ``` <div class="grid"> <div class="top"></div> <div class="side"></div> <div class="main"></div> </div> ``` ### 2. define the areas ``` .grid { display: grid; grid-template-columns: 200px 800px; grid-template-areas:"header header" "sidebar content"; } ``` ### 3. set grid-area ``` .top {grid-area: header} .side {grid-area: sidebar} .main {grid-area: content} ``` result: Illustration of a long rectangle, labelled "`.top`". Underneath it are two rectangles, side by side, labelled "`.side`" and "`.main`"

HTTP caching headers

set clear expectations

I used to often get stressed out about whether the way I was prioritizing my work was reasonable. Illustration of a stick figure with short curly hair, looking uneasy. person: I'm spending a lot of time on X and no time on Y. I hope that's okay!!!! Everything got easier once I could just: 1. come up for a plan for what to prioritize 2. tell my manager the plan and ask if it sounds good 3. trust them when they say yes Illustration of two stick figures talking. The employee has short curly hair, and the manager has no hair. employee: this quarter I'm planning to get BIG PROJECT done and spend time with my intern. I'm not planning to work on OTHER PROJECT at all. manager: sounds good! Just do X too? Setting expectations is awesome because: - I feel confident that my plans are reasonable - my manager is aware of what I'm planning and can coordinate Everybody wins!!!

position: absolute

### `position: absolute;` doesn't mean absolutely positioned on the page... ``` #star { position: absolute; top: 1em; left: 1em; } ``` doesn't always place element at the top left of the page! ### ... it's relative to the "containing block" the "containing block" is the closest ancestor with a `position` that isn't `static`, or the body if there isn't one. (`position: static` is the default) Illustration of a larger box, labelled "body", with a smaller box, labelled "`#star` nested inside it. The smaller box is off-centre within the larger box. The smaller box is labelled "this element has `position: relative` set" ### `top, bottom, left, right` will place an absolutely positioned element ``` top: 50%; bottom: 2em; right: 30px; left: -2in; ``` "`left: -2in;`" is labelled "negative works too" Illustration of two overlapping boxes. The top of the smaller one is halfway down the height of the larger one. The gap between the tops of the two boxes is labelled "50%". The smaller one extends to the left of the larger one, representing "`left: -2in;`", and its right and bottom sides are nested inside the larger one, representing "`right: 30px;`" and "`bottom: 2em;`". ### left: 0; right: 0; != width: 100%; `left: 0; right: 0;` Illustration of two boxes. The smaller box is nested within the larger box. It is the same width as the larger box, and is aligned to the top of it. This illustration is labelled "left and right borders are both 0px away from containing block". `width: 100%;` Illustration of two boxes. The smaller box is nested within the larger box, but its right edge extends past the right edge of the larger box. This illustration is labelled "width is the same as width of containing block". ### absolutely positioned elements are taken out of the normal flow Illustration of two stick figures having a conversation. Person 1: will a parent element expand to fit an absolutely positioned child? Person 2: nope!

seccomp-bpf

shell script arguments

### panel 1: a script's arguments are in `$1`, `$2`, `$3`, etc ``` ./script.sh panda banana ``` `$1` is `"panda"` and `$2` is `"banana"` ### panel 2: arguments are great for making simple scripts Here's a 1-line `svg2png` script that I use to convert SVGs to PNGs: ``` #!/bin/bash inkscape "$1" -b white --export-png="$2" ``` I run it like this: ``` $ svg2png old.svg new.png ``` (arrow pointing to `"$2"`: "always quote your variables!") ### panel 3: get all the arguments with `"${@}"` ``` ls --color "${@}" ``` ### panel 4: you can loop over arguments ``` for i in "${@}" do echo "$i" done ``` ### panel 5: 1 line shell scripts are great person: "I can write a tiny script so I don't have to remember a long command!"

bash if statements

### the basic syntax ``` if COMMAND then # do thing else # do other thing fi ``` (you need a new line or ; before then) ### `[` vs `[[` there are 2 commands often used in if statements: `[` and `[[` `if [ -e file.txt ] ` `/usr/bin/[` (aka `test`) is a program that returns 0 if the test you pass it succeeds `if [[ -e file.txt ]]` `[[` is built into bash. It lets you do tests like `[[e x.txt && -e y.txt ]]` that wouldn't work with a command line tool ### `if COMMAND` did `COMMAND` return 0? ### if ! COMMAND did `COMMAND` NOT return 0? ### `if true` `true` always returns 0 :) ### `if [ -n "$var" ]` is `$var` nonempty? ### `if [ e file.txt ] ` does `file.txt` exist? ### combine with `&&` and `||` `if [ -e file] && [ -e file2 ]` ### `if [ -d somedir ]` does `somedir` exist? ### `if [ -x script.sh ] ` is `script.sh` executable? ### `man [` for more you can do a lot!

window functions

why the same-origin policy matters

debugging tip: change one thing at a time

NULL: unknown or missing

CSS isn't design

### panel 1: web design is really hard Illustration of a stick figure with short curly hair, looking pensive. person (thinking): "wow, forms are way more complicated than I thought" ### panel 2: writing CSS is also hard person (thinking): "ok, how exactly does flexbox work again?" ### panel 3: remember that they're 2 different skills person (thinking): "hmm, I have NO IDEA what I want this site to look like, maybe that's the problem and not CSS" ### panel 4: CSS is easier when you have a good design Illustration of a box with smaller boxes arrayed inside it. person (thinking, and now smiling): "I can make it look like that!" ### panel 5: usually you have to adjust the design person (thinking): "oh right, I didn't think about how that menu should look on desktop" ### panel 6: sketching a design in advance can help! Illustration of a box with text reading "title", and a grid of smaller boxes underneath. even a simple sketch can help you think!

oh shit! I committed something to main that should have been on a brand new branch!

1. Make sure you have main checked out: `git checkout main` 2. Create the new branch: `git branch my-new-branch` 3. Remove the unwanted commit from main: ``` git status git reset --hard HEAD~ ``` (careful!) 4. Check out the new branch! `git checkout my-new-branch` Smiling stick figure with medium length straight hair: `git branch` and `git checkout -b` both create a new branch. The difference is `git checkout -b` also checks out the branch

GROUP BY

what's a header?

amazing debugging features

understand your error messages

HTTP request methods 1

Every HTTP request has a method. It's the first thing in the first line: `GET /cat.png HTTP/1.1` `GET` means it's a `GET` request There are 9 methods in the HTTP standard. 80% of the time you'll only use 2 (`GET` and `POST`). ### `GET` When you type an URL into your browser, that's a `GET` request. examplecat.com/cat.png client, represented by a box with a smiley face: ``` GET /cat.png Host: examplecat.com ``` server, also represented by a box with a smiley face: ``` 200 OK Content-Type: image/png <the cat picture> ``` ### `POST` When you hit submit on a form, that's (usually) a `POST` request. client: ``` POST /add_cat Content-Type: application/json {"name": "mr darcy"} ``` (`POST` requests usually have a request body) server: ``` 200 OK Content-Type: text/html <after sign up page> ``` The big difference between `GET` and `POST` is that `GET`s are never supposed to change anything on the server. ### `HEAD` Returns the same result as GET, but without the response body. client: ``` HEAD /cat.png ``` server: ``` 200 OK Content-Type: image/png ``` (no image, just headers)

signals

### If you've ever used kill you've used signals person, angrily: DIE!!! process, sad: okay ### the Linux kernel sends processes signals in lots of situations - your child terminated - the timer you set expired - that pipe is closed - illegal instruction - segmentation fault ### you can send signals yourself with the kill system call or command ``` SIGINT Ctrl-C SIGTERM kill SIGKILL kill -9 SIGHUP kill -HUP ``` (various levels of "die") `SIGHUP` is often interpreted as "reload config", e.g. by nginx. ### Every signal has a default action, which is one of: - ignore - kill process - kill process AND make core dump file - stop process - resume process ### Your program can set Custom handlers for almost any signal person: `SIGTERM` (terminate) process: okay! I'll (clean up and then exit! exceptions: `SIGSTOP` & `SIGKILL` can't be ignored dead program: got `SIGKILL`ed ### Signals can be hard to handle correctly since they can happen at ANY time process: handling a signal person: SURPRISE! another signal!

PID namespaces

### the same process has different PIDs in different namespaces PID in host / PID in container 23512 / 1 (PID 1 is special) 23513 / 4 23518 / 12 ### PID namespaces are in a tree Diagram showing "host PID namespace (the root)" with three arrows coming down from it, each pointing to a label that says "child". Often the tree is just 1 level deep (every child is a container) ### you can see processes in child PID namespaces Illustration of a host, represented by a box with heart eyes and a big smile. host: aw! look at all those containers running! ### if PID 1 exits, everyone gets killed Illustration of PID 1, represented by a box with a smiley face, and Linux, represented by its penguin mascot. PID 1: ok I'm done! Linux: I'm kill -q'ing everyone else in this PID namespace IMMEDIATELY ### Killing PID 1 accidentally would be bad Illustration of a container process, represented by a box with a smiley face, and Linux, represented by its penguin mascot. container process: `kill 1` Linux: do you WANT everyone to die? I'm not gonna let you do that ### rules for signaling PID 1 - from same container: only works if the process has set a signal handler - from the host: only SIGKILL and SIGSTOP are ok, or if there's a signal handler

let your bugs teach you

### panel 1: defining functions is easy ``` say_hello() { echo "hello!" } ``` and so is calling them: ``` say_hello ``` (no parentheses when calling a function! ### panel 2: functions have exit codes ``` failing_function () { return 1 } ``` `0` is a success, everything else is a failure. A program's exit codes work the same way -- 0 is success, everything else is failure. ### panel 3: you can't return a string you can only return an exit code from 0 to 255 ### panel 4: arguments are `$1`, `$2`, `$3`, etc ``` say_hello() { echo "Hello, $1!" } say_hello "Ahmed" ``` the above code prints `Hello, Ahmed!`. Again, `say_hello "Ahmed"`, not `say_hello("Ahmed")` ### panel 5: The `local` keyword declares local variables ``` say_hello() { local x x=$(date) # this is a local variable y=$(date) # this is a global variable } ``` ### panel 6: `local x=VALUE` suppresses errors this line never fails, even if `asdf` doesn't exist: ``` local x=$(asdf) ``` but this will fail (as you would expect) -- if you have `set -e` set, it'll stop the program ``` local x x=$(asdf) # this line will fail ``` person: "I really have NO IDEA why it's like this, bash is weird sometimes"

trap

container networking

figure out what your manager is great at

Different managers are good at different things! I've worked with managers who are amazing at: Each of these items is enclosed in a thought bubble. - product design - helping people resolve conflicts - understanding the business - building remote teams - prioritizing ruthlessly - running meetings - solving tricky technical problems - organizational politics Not every manager is good at every single thing, and that's okay! I like to figure out what my manager is awesome at and lean on them for those things. (heart) Also, strengths change over time! If they're not good at something today, maybe check back in a year & see if that's changed.

container registries

the 4 types of DNS servers

subqueries

stacking contexts

### a z-index can push an element up/down... ``` .first { z-index: 3; } . second { z-index: 0; } ``` Illustration of two boxes. The one labelled "`.first`" is layered over top of the other one. ### TRY ME: but a higher z-index doesn't always put an element on top Illustration of a box labelled "`z-index: 0`". On top of that is a box labelled "`z-index: 10`". Another box is on top of that one. Layered over top of all of these is a box labelled "`z-index: 2`". `z-index: 2` is on top! why? ### every element is in a stacking context The same illustration as the previous panel, but a label pointing to both the "`z-index: 10`" and "`z-index: 2`" boxes says, "these 2 elements are in different stacking contexts" ### a stacking context is like a Photoshop layer Illustration of two boxes, each with three smiley faces and an "ok" button in it, one layered on top of the other. These are labelled "two 'layers'". by default, an element's children share its stacking context ### setting z-index creates a stacking context ``` #modal { z-index: 5; position: absolute; } ``` this is a common way to create a stacking context ### stacking contexts are confusing You can do a lot without understanding them at all. But if `z-index` ever isn't working the way you expect, that's the day to learn about stacking contexts (smiley face)

padding syntax

### there are 4 ways to set padding `padding: 1em;` (all sides) `padding: 1em 2em;` (first value is vertical, second is horizontal) `padding: 1em 2em 3em;` (first value is top, second is horizontal, third is bottom) `padding: 1em 2em 3em 4em;` (first value is top, second is right, third is bottom, fourth is left) ### tricks to remember the order 1. trouble top right left bottom 2. it's clockwise ### you can also set padding on just 1 side ``` padding-top: 1em; padding-right: 10px; padding-bottom: 3em; padding-left: 4em; ``` ### TRY ME: differences between padding & margin - padding is "inside" an element: the background color covers the padding, you can click padding to click an element, etc. Margin is "outside". - you can center with margin: auto, but not with padding - margins can be negative, padding can't ### margin syntax is the same as padding `border-width` also uses the same order: top, right, bottom, left

WHERE

CNAME records

### there are 2 ways to set up DNS for a website 1. set an A record with an IP `www.cats.com A 1.2.3.4` 2. set a CNAME record with a domain name `www.cats.com CNAME cats.github.io` ### CNAME records redirect every DNS record, not just the IP I like to use them whenever possible so that if my web host's IP changes, I don't need to change anything! ### what actually happens during a CNAME redirect Illustration of a conversation between a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown. resolver: what's the A record for `www.cats.com`? authoritative nameserver: `www.cats.com CNAME cats.github.io` resolver (thinking): okay, I'll look up the A record for `cats.github.io`! ### rules for when you can use CNAME records 1. you can only set CNAME records on subdomains (like `www.example.com`), not root domains (like `example.com`) 2. if you have a CNAME record for a subdomain, that subdomain can't have any other records (technically you can ignore these rules, but it can cause problems, the RFCs say you shouldn't, and many DNS providers enforce these rules) ### some DNS providers have workarounds to support CNAME for root domains Look up "CNAME flattening" or "ANAME" to learn more.

DNS record types

talk about problems early

### Every so often I'll start with a small problem Illustration of a stick figure with short curly hair, looking nonplussed. employee: hmm this isn't great ### and forget to talk about it until I'm REALLY MAD Illustration of a stick figure with short curly hair, looking very upset, and another stick figure, the manager, who has medium length straight hair, and looks confused, with question marks over their head. employee: THIS IS TERRIBLE manager, thinking: whoa where did that come from? ### It's way better to bring up a problem early and figure it out before it turns into a big deal! Illustration of a stick figure with short curly hair, looking nonplussed, and their manager, a stick figure with medium length straight hair, who is smiling. employee: I got paged 15 times this week, can we talk about how to improve this? manager: yes let's work on that!

how to read an error message

why we need DNS

flexbox basics

### display: flex; set on a parent element to lay out its children with a flexbox layout. by default, it sets `flex-direction: row;` ### flex-direction: row; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. They are side-by-side in a single row. by default, children are laid out in a single row. the other option is `flex-direction: column` ### flex-wrap: wrap; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side, then an arrow winds around to the starburst box, which is underneath the other two, aligned to the left. will wrap instead of shrinking everything to fit on one line ### justify-content: center; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side. The starburst box is centred underneath them. horizontally center (or vertically if you've set `flex-direction: column`) ### align-items: center; Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The boxes are different heights, and are placed side-by-side in a single row, centred horizontally. vertically center (or horizontally if you've set `flex-direction: column`) ### you can nest flexboxes A box labelled `display: flex`. Inside it are two smaller boxes, side-by-side. Each is also labelled `display: flex`. One of the smaller boxes has three boxes side-by-side in it. The other smaller box has three boxes stacked on top of one another, inside it.

debugging tip: you've probably seen this bug before

debug by writing a test

subdomains

### to make a subdomain, you just have to set a DNS record! To set up cats.yourdomain.com, create a DNS record like this in your authoritative nameservers: cats.yourdomain.com A 1.2.3.4 yourdomain is the name A is the record type 1.2.3.4 is the value ### there are 2 ways a nameserver can handle subdomains 1. Store their DNS records itself nameserver, represented by a box with a smiley face wearing a crown: here's the IP for cats.yourdomain.com! 2. Redirect to another authoritative nameserver (this happens if you set an NS record for the subdomain, it's called "delegation") nameserver: ask this other DNS server instead! ### you can create multiple levels of subdomains For example, you can make: a.b.c.d.e.f.g.example.com up to 127 levels is allowed! ### www is a common subdomain Usually www.yourdomain.com and yourdomain.com point to the exact same IP address. If you wanted to confuse people, you could make them totally different websites! ### panel 5 Illustration of a smiling stick figure with curly hair. person: I love using subdomains for my projects (like dns-lookup.jvns.ca) because they're free, I can give a subdomain a different IP, and it keeps projects separate.

top-level domains

how to handle intermittent bugs

picking a domain registrar

authoritative nameservers

make your code easy to debug

work with your manager to get promoted

Where I work, my manager wants people on the team to get promoted. If people are being promoted, it (hopefully) means that they're growing & getting more awesome at their jobs, which makes the team's manager look good! Illustration of a smiling stick figure with short curly hair. person, thinking: huh, maybe promotions are just a normal thing we can have a conversation about? Some ways to start conversations: - can we walk through the expectations for the next level to make sure I understand them? - what areas do you think I should focus on? - if I accomplished X Y Z, do you think that would be enough to get promoted? If this is something you care about, keep checking in periodically! The person who cares the most about your career is you ♡♡

keep conversations mostly constructive

I've had periods with some managers where, every time we talk, we're talking about SOME problem: Two illustrations of the same stick figure with curly hair, looking unhappy. me: why did y happen? me: X has been a problem for a year and it's STILL not fixed These days, I try to bring up problems that I'm interested in fixing and bring ideas for solutions when I can. Often we just talk about our work: Each item is illustrated with a smiling stick figure with curly hair saying them. - here's an idea I had... - my intern is doing awesome work! - did you see that great thing this other team did? - here's an interesting bug from this past week... - I thought of an onboarding project for the new person! Sometimes venting can be useful too, though! If there's a problem, it's often helpful to bring it up even if I don't have a solution.

debugging tip: build your mental model

a SHA always refers to the same code

Let's start with some fundamentals! If you understand the basics about how git works, it's WAY easier to fix mistakes. So let's explain what a git commit is! Every git commit has an id like 3f29abcd233fa, also called a SHA ("Secure Hash Algorithm"). A SHA refers to both: - the changes that were made in that commit (see them with ```git show```) - a snapshot of the code after that commit was made No matter how many weird things you do with git, checking out a SHA will always give you the exact same code. It's like saving your game so that you can go back if you die You can check out a commit like this: ```git checkout 3f29abk``` SHAS are long but you can just use the first 6 chars This makes it way easier to recover from mistakes! person at 10 am: ok, let's commit, that's a2992b person at 11 am: I really screwed up this file, let's go back to the version from a2992b

HTTP request methods 2

### OPTIONS `OPTIONS` is mostly used for `CORS` requests. The `CORS` page has more about that. It also tells you which methods are available. ### DELETE Used in many APIs (like the Stripe API) to delete resources. box with a smiley face 1: `DELETE /v1/customers/cus_12345` ་("delete this customer please!") box with a smiley face 2: `200 OK` ("deleted!") ### PUT Used in some APIs (like the S3 API) to create or update resources. `PUT /cat/1234` lets you `GET /cat/1234` later. ### PATCH Used in some APIs for partial updates to a resource ("just change this 1 field"). ### TRACE I've never seen a server that supports this, you probably don't need to know about it. ### CONNECT Different from all the others: instead of making a request to a server directly, it asks for a proxy to open a connection. If you set the `HTTPS_PROXY` environment variable to a proxy server, many HTTP libraries will use this protocol to proxy your requests. client, represented by a box with a smiley face: `CONNECT test.com` `$AFO XXRTZ` (encrypted request) proxy, also represented by a box with a smiley face, thinking: ok, I'll open a connection to test.com. proxy: `$AFO XXRTZ` test.com, represented by a box with a smiley face: [is here]

build the support system you need

The flip side of "figure out what things they're great at" is that there are always going to be things your manager I can't help you with. When that happens, there are a few choices: 1. Get mad that they can't help 2. Resign yourself to not getting help with those things 3. Find help elsewhere!!! Lara Hogan (her blog is GREAT) has an amazing blog post called "When your manager isn't supporting you, build a Voltron" about building a crew of people with lots of different skills who you can ask for help! Some of her tips: - figure out what you need help with before asking. Use their time well!` - focus on problem solving, not venting Illustration of a big cool robot with wings, holding a big sword. Various parts of its body are labelled with the points below. A Voltron is a robot built out of several other robots - works in a different field - awesome at communication - more experience than me bit.ly/managervoltronbingo has a useful bingo card!

receiving email at your domain

domain privacy

debugging tip: more assumptions to check

debugging tip: get specific about what the bug is

how to give good feedback

directories and symlinks

ipv6

what's a mac address?

inter-process communication

2fa

user space vs kernel space

the senior engineer

ways i want my team to be

tcp

page table

having productive conversations when i disagree

what does an operating system do?

the stack

no feigning surprise

how to talk to your operating system

anatomy of a packet

networking concepts

network address translation

mutexes

man pages are awesome

directories and symlinks

linux tracing systems

getting started with ftrace

vim sessions

what's slow on a computer

learning to design software

building confidence in kubernetes

how kubernetes can break - etcd

writing tip: say something surprising

writing tip: ask good questions

why I love bash

ways to build expertise

user namespaces

understand your manager's goals

Illustration of two stick figures having a conversation. The manager is smiling and has straight shoulder length hair. The employee looks confused and has short curly hair. manager: can you get metrics on X's speed? me: why? That won't help us get the code done! They might be asking for metrics because: - they're hearing complaints about X being slow (that you might not be hearing!) - without metrics, it's hard for them to have an informed conversation about those complaints (& defend you if X is actually fast!) Having regular conversations about their priorities for the team is SO USEFUL and means that I'm surprised less often. (illustration of two smiley faces) Illustration of the same two stick figures as above, but now they're both smiling. manager: performance / speed is getting more important recently! me: good to know, should I work on speeding up X?

tips for reading code

the CSS inspector

### all major browsers have a CSS inspector usually you can get to it by right clicking on an element and then "inspect element, but sometimes there are extra step ### see overridden properties `button {` `display: inline-block;` `color: var(--orange);` (this line in strikethrough) `}` ### edit CSS properties ``` element { { ``` (lets you change this element's properties) ``` button { display: inline-block; border: 1px solid black; } ``` (this lets you change the border of every `<button>`!) ### see computed styles person, represented by a smiling stick figure: here's a website with 12000 lines of CSS, what `font-size` does this link have? browser, represented by a box with a smiley face: 12px, because of `x.css` line 436 ### look at margin & padding Box Model Illustration of a small box labelled 1261 x 26. On the outside of that box is the word "padding". Surrounding the padding is the border. Surrounding the border is the margin. ### and LOTS more different browsers have different tools! For example, Firefox has special tools for debugging grid/flexbox.

subshells

share your debugging stories

scenes from kubernetes

scenes from design docs

remember your manager's only human

Sometimes I fall into a trap where I think my manager should be able to solve EVERY problem on the team and if they're not then they're not doing their job. (the word "every" is surrounded by glowing lines for emphasis) It's helpful for me to remember that at any given time they're probably dealing with a lot! Illustration of a smiling stick figure, representing the manager, surrounded by spiky bubbles containing the following items. - hire 2 people - coordinate with other teams - make sure the intern gets an offer on time (illustration of a clock) - write 10 performance reviews - finalize plans for next quarter - make sure we have an onboarding plan for the new person - interview new manager candidate - a team member is unhappy, figure out what's going on - ... personal life (smiley face) I try to be somewhat aware of what my manager is dealing with & help out when I can. Illustration of two smiling stick figures, one with curly hair representing the employee, and one with medium length straight hair, representing the manager. employee: Here's a project I think could be a good fit for the new person! manager: good idea, thanks!

"Emotional labour" is the idea that dealing with feelings-related problems is work. Illustration of two stick figures having a conversation. The employee has short curly hair and looks angry. The manager is smiling and has no hair. I'm angry that my contributions on that project weren't recognized... manager: [understanding face, doing work] Emotional labour is part of what managers are paid to do. But!! Managers aren't therapists. Illustration of a smiling stick figure, crossed out in red. manager: tell me about your father... not good 1:1 material (smiley face) When I'm upset about something, I try to be clear about why and ideally explain what I think a reasonable resolution would be. employee: can we just make sure it features in my next performance review? manager: yes definitely!

oh shit! I want to undo something from 5 commits ago!

oh shit! I want to split my commit into 2 commits!

oh shit! I tried to run a diff but nothing happened!

oh shit! I tried to commit a file that should be ignored!

oh shit! I started rebasing and now I have 1000000 conflicts to fix!

oh shit! I need to change the message on my last commit!

oh shit! I have a merge conflict!

Suppose you had `main` checked out and ran `git merge feature-branch`. If that causes a merge conflict, you'll see something like this in the files with conflicts: ``` <<<<<<< HEAD if x == 0: return false ======= ``` (this is the code from `main`) ``` if y == 6: return true elif x ==0: return false feature-branch >>> d34367 ``` (this is the code from `feature-branch`) ### To resolve the conflict: 1. Edit the files to fix the conflict 2. `git add` the fixed files 3. `git diff` --check: check for more conflicts. 4. `git commit` when you're done. (or `git rebase --continue` if you're rebasing!) Smiling stick figure with medium length straight hair: You can use a GUI to visually resolve conflicts with `git mergetool`. Meld (meldmerge.org) is a great choice!

oh shit! I did something terribly wrong, does git have a magic time machine?

Yes! It's called git reflog and it logs every single thing you do with git so that you can always go back. Suppose you ran these git commands: ``` git checkout my-cool-branch (1) git commit -am "add cool feature" (2) git rebase master (3) ``` Here's what git reflog's output would look like. It shows the most recent actions first: ```245fc8d HEAD @{2} rebase -i (start):``` (3) checkout master ```b623930 HEAD @{3} commit:``` (2) add cool feature ```01d7933 HEAD @{4} checkout:``` (1) moving from master to my-cool-branch If you really regret that rebase and want to go back, here's how: ```git reset --hard b623930``` ```git reset --hard HEAD@{3} ``` 2 ways to refer to that commit before the rebase

oh shit! I committed but I want to make one small change!

non-POSIX features

### some bash features aren't in the POSIX spec Illustration of a smiling stick figure with curly hair. Person: here are some examples! These won't work in POSIX shells like `dash` and `sh`. ### arrays POSIX shells only have one array: `$@` for arguments ### [[ $DIR=/home/*]] POSIX alternative: match strings with `grep` ### [[ ... ]] POSIX alternative: `[ ... ]` ### diff <(./cmd1) <./cmd2) this is called "process substitution", you can use named pipes instead ### the local Keyword in POSIX shells, all variables are global ### for ((i=0; i <3; i++)) `sh` only has for `x` in ... loops, not C-style loops ### a. {png, svg} you'll have to type `a.png a.svg` ### {1..5} POSIX alternative: `$(seq 1 5)` ### $'\n' POSIX alternative: `$(printf "\n")` ### ${var//search/replace} POSIX alternative: pipe to `sed`

network namespaces

### network namespaces are kinda confusing Illustration of an unhappy-looking stick figure with curly hair. person: what does it MEAN for a process to have its own network?? ### namespaces usually have 2 interfaces (+ sometimes more) - the loopback interface (127.0.0.1/8, for connections inside the namespace) - another interface (for connections from outside) ### every server listens on a port and network interface(s) `0.0.0.0:8080` means "port 8080 on every network interface in my namespace" ### 127.0.0.1 stays inside your namespace Illustration of a server, represented by a box with a smiley face, and a smiling stick figure with curly hair. server, thinking: I'm listening on 127.0.0.1 person: that's fine but nobody outside your network server namespace will be able to make requests to you! ### your physical network card is in the host network namespace Illustration of a rectangular box drawn with a dotted line. Inside it are: - the label "host network namespace" - 192.168.1.149, with an arrow pointing to it reading "requests from other computers" - network card ### other namespaces are connected to the host namespace with a bridge Illustration of a rectangular box drawn with a dotted line. Inside it are: - the label "host network namespace" - three boxes, each labelled "container"

miscellaneous networking tools

media queries

### media queries let you use different CSS in different situations ``` @media print { #footer { display: none; } ``` (`print` is the media query, and the rest is the CSS to apply) ### max-width & min-width ``` @media (max-width: 500px) { // CSS for small screens } @media (min-width: 950px) { // CSS for large screens } ``` ### print and screen `screen` is for computer/ mobile screens `print` is used when printing a webpage there are more: `tv`, `tty`, `speech`, `braille`, etc ### accessibility queries you can sometimes find out a user's preferences with media queries examples: `prefers-reduced-motion: reduce` `prefers-color-scheme: dark` ### you can combine media queries it's very common to write something like this: ``` @media screen and (max-width: 1024px) ``` ### the viewport meta tag `<meta name="viewport" content="width=device-width, initial-scale=1">` Your site will look bad on mobile if you don't add a tag like this to the `<head>` in your HTML. Look it up to learn more!

love your bugs

(thanks to Allison Kaptur for teaching me this attitude! she has a great talk called "Love Your Bugs.) Debugging is a great way to learn. First, the harsh reality of bugs in your code is a good way to reveal problems with your mental model. program: error: too many open files person: I can't just open as? many files as I want?. Interesting! Fixing bugs is a good way to learn to write also more reliable code! person, thinking: hmm, I should put in error handling here in case that data base query times out. Also, you get to solve a mystery and get immediate feedback about whether you were right or not. person 1: that's weird... person 1: oh goodness, that's a lot of errors person 1: I have an idea! person 1: [coding a fix] person 1: it works now! person 2: great work! Nobody writes great code without writing + fixing lots of bugs. So let's talk about debugging skills a bit!

let's build expertise!

learning on my own

learning at work

layers

### different images have similar files Rails container image and Django container image: we both use Ubuntu 18.04! ### reusing layers saves disk space Rails image: Rails app ubuntu:18.04 Django image: Django app ubuntu:18.04 exact same files on disk! ### a layer is a directory ``` $ ls 8891378eb* bin/home/mnt/run/tmp/ boot/lib/ opt sbin/ usr/ dev/lib64/proc/srv/var/ etc/media/ root/sys/ ``` files in an ubuntu:18.04 layer ### every layer has an ID usually the ID is a sha256 hash of the layer's contents example: `8e99fae2..` ### if a file is in 2 layers, you'll see the version from the top layer `/code/file.py` (this is the version you'll see in the merged image) `/code/file.py` ### by default, writes go to a temporary layer temp layer (these files might be deleted after the container exits) To keep your changes, write to a directory that's mounted from outside the container

kubernetes components

know your spy tools

it's not too late to start learning

invest in understanding

how to work well with your manager

Most of the rest of this zine is about COMMUNICATION (The word "communication" is surrounded by hearts, smiley faces, stars, and exclamation marks) Basically your manager's job is to make sure that your team is getting work done that will help the business. This is awesome because it means that if you just communicate with them well, then you can mostly focus on programming!!! (the word "awesome" is surrounded by glowing lines and hearts) Communicating well can help you: - get awesome opportunities - solve problems - build trust - understand priorities - get promoted - get feedback (each of the above items is in a spikey bubble) To start, let's talk about 1:1s (which hopefully your manager schedules regularly).

how to make a namespace

hiding elements with CSS

### there are many ways to make an element disappear Illustration of a smiling stick figure with curly hair. person: which one to use depends: do you want the empty space it left to be filled? ### TRY ME: display: none; other elements will move to fill the empty space Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `display: none;`, the heart and star boxes will now be side-by-side. ### visibility: hidden; the empty space will stay empty Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `visibility: hidden;`, the heart and star boxes will have a gap between them the size of the "x" box. ### opacity: 0; like `visibility: hidden`, but you can still click on the element & it'll still be visible to screen readers. Usually `visibility: hidden` is better. ### how to slowly fade out ``` #fade:hover { transition: all 1s ease; visibility: hidden; opacity: 0; } ``` set the opacity just so that the transition works ### TRY ME: z-index z-index sets the order of overlapping positioned elements Illustration of two boxes, a smaller one with an "x" in it, that is overlapped over a larger empty box. There is an arrow pointing to a second illustration where the boxes are stacked in the opposite order, so that the small box is underneath of the large box.

git mistakes you can't fix

Most mistakes you make with git can be fixed. If you've ever committed your code, you can get it back. That's what the rest of this zine is about! Here are the dangerous git commands: the ones that throw away uncommitted work. - `git reset --hard COMMIT` 1. Throws away uncommitted changes 2. Points current branch at `COMMIT` Very useful, but be careful to commit first if you don't want to lose your changes - `git clean` Deletes files that aren't tracked by Git. - `git checkout BRANCH FILE` (or directory) Replaces FILE with the version from `BRANCH`. Will overwrite uncommitted changes.

getting started with SELECT

getting a new manager

Being assigned a new manager is a little scary. Not all of my managers have been great! Illustration of a stick figure with short curly hair, looking uncertain. person: OH NO what if my new manager is hard to work with ?!?! But! More than once I've started out thinking, Illustration of a stick figure with short curly hair, looking scared. person: who is this person they seem suspicious and ended up, a year later, at Illustration of a stick figure with short curly hair, smiling. person: wow they have helped me and the team so much, this is AMAZING so I try to assume that's where we'll end up. Some things I've found helpful: - write a document explaining my past work to them - ask them about any concerns directly - often they have great answers! - pay close attention to what they do well - tell them when they do something great

every commit has a parent

Every commit (except the first one!) has a parent commit! You can think of your git history as looking like this: current commit - c6045c - `HEAD` - "make cats blue" parent - 304db6 - `HEAD^` - "add cats" grandparent - a92eab - `HEAD^^` - "fix typo" b29aff - "initial commit" `HEAD` always refers to the current commit you have checked out, and `HEAD^` is its parent. So if you want to go look at the code from the previous commit, you can run `git checkout HEAD^` commits don't always have 1 parent. Merge commits actually have 2 parents! `git log` shows you all the ancestors of the current commit, all the way back to the initial commit

debugging tip: track what you changed

debugging tip: slow down

debugging tip: ask lots of questions

containers: the big idea: include EVERY dependency

container configuration options

### panel 1 Illustration of a smiling stick figure with curly hair. person: here are the 6 most important things you can configure when starting a container! ### map a port to the host Illustration of two boxes drawn with dotted lines. One is labelled "host", the other is labelled "container". The "host" box says "port 1234", and the "container box" is labelled "port 8080". There is a double-ended arrow pointing back and forth between the two ports. ### mount directories from the host Illustration of two boxes drawn with dotted lines. One is labelled "host", the other is labelled "container". The "host" box says "`~/code/blah`", and the "container box" says "`/src`". There is a double-ended arrow pointing back and forth between the two boxes. ### set capabilities ### add seccomp-bpf filters ### set memory and CPU limits person: only 200 MB RAM for you ### use the host network namespace Usually the default is to use a new network namespace!

command line arguments

### every process has command line arguments `$ ls 1 /usr/bin` (`ls`, `-l`, and `usr` are arguments!) ### they're passed to the program as an array example from Python: ``` import sys print(sys.argv) ``` `['test.py', 'file.txt' ]` ### arguments can be any sequence of bytes `$ python program.py ♥` (emoji are totally allowed!) ### the first argument is the executable's name ``` [ 'ls' '-1', '/usr/bin/' ] ``` (`ls` is the executable name) ### the total length of the arguments is limited you can find the limits on your system with `xargs -show-limits` It's usually ~2MB ### you can decide how you parse arguments - `-flag`: single dash! - `--flag`: 2 dashes! - `♥♥flag`: weird emoji scheme that will be very annoying to use!

clock_gettime

### programs can be slow for a lot of reasons Illustration of two programs, each represented by a box with a smiley face. program 1: I'm waiting for a database query, you? program 2: I'm using SO MUCH CPU! ### it's not obvious when a program is using CPU Illustration of a stick figure with curly hair, looking unhappy. person: my webserver took 6 seconds to respond to that request! why? ### panel 3 person: how can I tell how much CPU time was used in this part of my code? ### clock-gettime clock-gettime is a system call. It can tell you how much CPU time your process/thread used since it started. ### how to track CPU time 1. run clock-gettime 2. do the thing (eg handle a HTTP request) 3. run clock-gettime 4. subtract! ### this trick works when You have 1 HTTP request per thread at a time Illustration of Ruby and node.js, each represented by a box with a smiley face. Ruby: I can use clock-gettime node.js: doesn't work for me, I have an event loop!

browser default stylesheets

### every browser has a default stylesheet (aka "user agent stylesheet") a small sample from the Firefox default stylesheet: ``` h1 { font-size: 2em; font-weight: bold; } ``` ### different browsers have different defaults Illustration of a smiling stick figure with curly hair. person: buttons & forms have some of the biggest differences ### you can read the default stylesheet Firefox's default stylesheets are at: `resource://gre-resources/` ### every property also has a default "initial value" the initial value (defined in the spec) is what's used if no stylesheet has set anything. For example, `background-color`'s initial value is `transparent` ### a CSS property can be set in 5 ways (listed from lowest priority to highest priority) 1. the initial value 2. the browser's default stylesheet 3. the website's stylesheets and user stylesheets 4. inline styles set with HTML/JS

bash variables

### how to set a variable - `var=value` right (no spaces!) - `var = value` wrong `var = value` will try to run the program var with the arguments "`=`" and "`value`" ### how to use a variable: "$var" ``` filename=blah.txt echo "$filename" ``` they're case sensitive. environment variables are traditionally all-caps, like `$HOME` ### there are no numbers, only strings ``` a=2 a="2" ``` both of these are the string "2" technically bash can do arithmetic, but I avoid it ### always use quotes around variables `$filename="swan 1.txt"` `$ cat $filename` (wrong) bash: ok, I'll run `cat swan 1.txt` 2 files! oh no! we didn't mean that! cat: Um `swan` and `1.txt` don't exist... $ cat "$filename" (right!) bash: ok, I'll run `cat "swan 1.txt"` cat '"swan 1.txt"`! that's a file! yay! ### ${varname} To add a suffix to a variable like "2", you have to use `${varname}`. Here's why: `$ zoo=panda` `$ echo "$zoo2"` prints `""`, `zoo2` isn't a variable `$ echo "${zoo}2"` this prints "`panda2`" like we wanted

asking good questions (part 2)

asking good questions

One of my favorite tools for learning is asking questions of all the awesome people I know! what's a good question? ### good questions: - are easy for the person to answer - get you the information you're looking for ### Here are some strategies for asking them: - state what you know person 1: so, I know when the database gets a lot of writes, the hard drive can't keep up. person 2: that's right! I don't think that was) our problem, though. Look at this... This helps because: - I'm forced to think about what I know - I'm less likely to get answers that are too basic or too advanced Guessing the answer: - makes me think! - helps my coworker see what kind of answer I'm looking for guess what the answer might be person 1: Do we have 5 load balancers because we get a lot of HTTP requests? person 2: actually, we just want to be sure it's ok if one goes down.

SQL example: get the time between baby feedings

SQL example: LEFT JOIN + GROUP BY

SELECT

POSIX compatibility

OVER() assigns every row a window

ORDER BY and LIMIT

INNER JOIN and LEFT JOIN

HTTPS

HTTP redirects

Sometimes you type a URL into your browser: `examplecat.com/dog.png` but end up at a slightly different URL: `examplecat.com/cat.png` ooh, where did the cat come from? I didn't type that! ### Here's what's going on behind the scenes: browser: ``` GET /dog.png HTTP/1.1 Host: examplecat.com ``` server: ``` 301 Moved Permanently Location: /cat.png ``` browser: okay, I'll try `/cat.png` instead browser: ``` GET /cat.png HTTP/1.1 Host: examplecat.com ``` server: ``` 200 OK <rest of website here> ``` The Location header tells the browser what new URL to use. The new URL doesn't have to be on the same domain: examplecat.com/panda can redirect to pandas.com. Setting up redirects is a great thing to do if you move your site to a new domain! ### ! Warning ! `301 Moved Permanently` redirects are PERMANENT: after a browser sees one once, it'll always use `examplecat.com/cat.png` when someone types `examplecat.com/dog.png` forever. You can't take it back and decide to not to redirect. If you're not sure you want to redirect your site for eternity, use `302 Found` to redirect instead.

HTTP exercises

HEAD is the commit you have checked out

CSS transitions

### an element's computed style can change 2 ways this can happen: 1. pseudo-classes (like `:hover`) 2. Javascript code `el.classList.add('x')` ### new styles change the element instantly... ``` a:hover { color: red; } ``` the element will turn red right away ### unless you set the transition property ``` a { color: blue; transition: all 2s; } a:hover { color: red; } ``` ("`all 2s`" = will fade from blue to red over 2s) ### transition has 3 parts `transition: color 1s ease;` `color`: which CSS properties to animate `1s`: duration `ease`: timing function ### not all property changes can be animated.... `list-style-type: square;` CSS renderer, represented by a box with a smiley face: I don't know how to animate that, sorry! ### ...but there are dozens of properties that can if it's a number or color, it can probably be animated! ``` font-size: 14px; rotate: 90deg; width: 20em; ```

CSS isn't arbitrary

CSS borders

### `border` has 3 components `border: 2px solid black;` is the same as ``` border-width: 2px; border-style: solid; border-color: black; ``` ### `border-style` options - `solid` - `dotted` - `dashed` - `double` (each word is surrounded by the border it describes) + lots more (`inset`, `groove`, etc) ### `border-{side}` you can set each side's border separately: ``` aborder-bottom: 2px solid black; ``` ### `border-radius` border-radius lets you have rounded corners `border-radius: 10px;` `border-radius: 50%;` will make a square into a circle! ### box-shadow lets you add a shadow to any element `box-shadow: 5px 5px 8px black;` the first "5px" is the x offset, the second "5px" is the x offset, "8px" is the blur radius, and "black" is the color. ### outline `outline` is like `border`, but it doesn't change an element's size when you add it outlines on `:hover/: active` help with accessibility: with keyboard navigation, you need an outline to see what's focused

CSS backwards compatibility

### browsers support old HTML + CSS forever Illustration of a smiling stick figure with long hair, talking to a browser from 2020, represented by a box with a smiley face. person: I wrote this CSS in 1998 2020 browser: still works great! ### this makes CSS hard to write... Illustration of two stick figures talking person 1: why are CSS units so weird? person 2, with grey hair: let me tell you a story from 20 years ago... ### but it means it's worth the investment Illustration of a smiling stick figure with long hair, talking to a browser, represented by a box with a smiley face. person: I spent DAYS getting this CSS to work browser: I'll make sure it keeps working forever! ### if you don't follow the standards, you're not guaranteed backwards compatibility my site broke! (oh yeah, Firefox dropped support for that experiment ### your CSS doesn't have to support browsers from 1998 Illustration of a smiling stick figure with short curly hair. person: just test that your CSS works on the browsers that your users are using! ### newer features are often easier to use what people expect from a website has changed a LOT since 1998. Newer CSS features make responsive design easy

CASE

BPF cheat sheet

8 bytes, many meanings

## 8 bytes, many meanings The same bytes can mean many things. Here are 8 bytes and a bunch of things they could potentially mean a picture of 8 bytes: the ASCII characters for 'computer' some things they could mean: * 8 8-bit integers * 4 unsigned 16-bit integers * a 64-bit pointer * 2 IPv4 addresses * x86 machine code * 2 32-bit floating point numbers * 1 64-bit floating point number * 2 RGBA colours person: "don't worry if you don't understand all this right now! We'll explain. note on x86 machine code: this code is nonsense, but search "ascii shellcode" for x86 code which is valid ASCII.

css specifications

### CSS has specifications CSS 2.1, represented by an image of a document with many lines of text: hello, this is how max-width works in excruciating detail ### there used to be just one specification Illustration of a smiling stick figure with curly hair. person: it's called "CSS 2" and I still like to reference it to learn the basics ### today, every CSS feature has its own specification you can find them all at https://www.w3.org/TR/CSS/ there are dozens of specs, for example: colors, flexbox, and transforms ### major browsers usually obey the spec but sometimes they have bugs Illustration of a happy little caterpillar-type bug. browser, represented by a box with a smiley face: oops, I didn't quite implement that right... ### levels CSS versions are called "levels". new levels only add new features. They don't change the behaviour of existing CSS code ### new features take time to implement https://caniuse.com (The URL is surrounded by little hearts and stars) can tell you which browser versions support a CSS feature

file locking

terminal escape codes

IMSI catchers (fake cellphone towers)

ASCII

## panel 1: a string is an array of bytes ASCII is the simplest string encoding: 1 character = 1 byte. Let's see how it works! (We usually use UTF-8, which is WAY more complicated) ## panel 2: every printable ASCII character ``` !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\ []^_`abcdefghijklmnopqrstuvwxyz{|}~ ``` There are no accents because it's an English encoding: the "A" in ASCII is for "American". ## panel 3: there are 128 ASCII characters Only the bytes 0 to 127 are defined. It's very limited: you can really see why we need more powerful encodings like UTF-8! ## panel 4: how bytes map to characters Here's a partial list, look up "ASCII table" for the full list. Bytes (in base 10) are on the left, characters are on the right. 33 is !, 34 is " 48 is 0, 49 is 1 64 is A, 65 is B 97 is a, 98 is b ## panel 5: a trick to translate from lowercase to uppercase In ASCII, the lowercase letters are 32 more than the uppercase letters. So you can just subtract 32!

environment variables

### panel 1: every process has environment variables how to see any process's environment variables on Linux: ``` cat /proc/$PID/environ | tr '\0' '\n' ``` ### panel 2: shell scripts have 2 kinds of variables 1. environment variables 2. shell variables unlike in most languages, in shell you access both of these in the exact same way: `$VARIABLE` ### panel 3: export sets environment variables ``` export ANIMAL=panda ``` `export ANIMAL=panda` means that every child process will have `ANIMAL` set to `panda` ### panel 4: child processes inherit environment variables this is wy the variables set in your `.bash_profile` work in all programs you start from the terminal. They're all child processes of your bash shell! ### panel 5: shell variables aren't inherited ``` var=panda ``` in this example, `$var` only gets set in this process, not in child processes ### panel 6: you can set environment variables when starting a program Illustration of a smiling stick figure with curly hair, talking to env, represented by a box with a smiley face. Person: `env VAR=panda ./myprogram` env: OK! I'll set `VAR` to `panda` and then start `./myprogram`

bash quotes

bash builtins

background processes

list what you've learned

why some bugs feel "impossible"

track your progress

make a minimal reproduction

ask lots of questions

guesses are often wrong

bytes

the gaps between floats

## title: the gaps between floats ## panel 1: floating point numbers have to fit into 32 or 64 bits This means there are only 2^64 64-bit floats, the same way there are only 2^64 64-bit integers ## panel 2: this means floating point numbers have to be spread out you can imagine them all spaced out on a number line, like this: (picture of a bunch of lines, with small gaps between them. The gaps are smaller on the left and bigger on the right) ## panel 3: the gaps start small. the next 64-bit float after 1.0 is 1 point (lots of 0s) 2 the gap between these two floats is 0 point (lots of 0s) 2, or 2^-52 gaps are always a power of 2 ## panel 4: the gaps get bigger as the numbers get bigger the next 64-bit float after 1000000000000000000 is that number plus 16384. so the gap is 16384, or 2^14! ## panel 5: the gaps make calculations inaccurate when you do math on floating point numbers, often you have to round the result to the nearest float usually this doesn’t make a big difference, but small mistakes can add up ## panel 6: this inaccuracy is inevitable if you want math to be fast, you have to store the numbers in a fixed number of bits, like 64 bits. So you’re always going to have accuracy issues.

signed vs unsigned integers

## signed vs unsigned integers ## there are 2 ways to interpret every integer unsigned: - always 0 or more - example: 8 bit unsigned ints are `0` to `255` signed: - half positive, half negative - example: 8 bit signed ints. are `-128` to `127` ## negative integers are represented in a counterintuitive way You might think that this is -5: `10000101` (1 is the sign bit, and 101 in binary is 5) But actually this is -5: `11111011` this looks weird, but we'll explain why! ## integer addition wraps around for example, for 8-bit integers `255 + 1 = 0` for 16-bit integers, `65535 + 1 = 0` by "addition", we mean "what the x86 `add` instruction does" ## panel: but if `255 + 1 = 0`, you could also say `255 = -1` ## examples of bytes and their signed/unsigned ints | byte | unsigned | signed | |----------|----------|--------| | `00000000` | 0 | 0 | | `01111111` | 127 | 127 | | `01111111` | 128 | -128 | | `10000001` | 129 | -129 | | `11111011` | 251 | -5 | | `11111111` | 255 | -1 | subtract 256 from unsigned numbers to get the signed numbers ## this way of handling signed integers is called "two's complement" It's popular because you can use the same circuits to add signed and unsigned integers. `5 + 255` has exactly the same result as `5 + (-1)`: they're both 4!

science <3 floating point

## science <3 floating point ## floating point was invented to do scientific computation - weather simulations! - earthquake modeling! - orbital mechanics! ## scientists don't need unlimited precision... we only know an electron's mass to 9 decimal places anyway... 9 decimal places is already VERY precise! ## but they do need TINY numbers and GIANT numbers mass of hydrogen atom: `1.6735575 * 10^-24` grams distance to Andromeda galaxy: `2.4 * 10^22` meters ## floating point is inspired by scientific notation `1.6735575 x 10^-24` The idea in floating point is to store a number by splitting it into: - the exponent (like `-24`) - the multiplier (like `1.6735575`) - and its sign (+ or -) ## floating point isn't just used for science though For example, Javascript's number type is floating point. Before it added `BigInt` in 2021, Javascript didn't have integers at all! Similarly, numbers in JSON are often interpreted as floating point numbers. ## panel: people usually explain floating point as "it's scientific notation, but in binary!" That's true, but I've never found it intuitive so we're going to explain it a different way.

meet the byte

## meet the byte ## You might have heard that a computer's memory is a series of bits (Os and 1s)... `010100110101010110110111` but you only access them in groups of 8 bits - a byte! `01010011 1010101 10110111` ## 2 ways to think about a byte 1. 8 bits 2. an integer from 0 to 255 `00000000` = `0` `00000001` (8 bits!) = `1` (integer!) `00000010` = `2` `01011001` = `89` ## you can't just access 1 bit Every byte in your computer's memory has an address. If you want to fetch 1 bit, you need to fetch the whole byte at that address and then extract the bit. ## some things that are 1 byte - the boolean `true` (in C) `00000001` - the ASCII character F `01000110` - the red part of the colour `#FF00FF` `11111111` ## most things are more than one byte - integers and floats are Usually 4 bytes or 8 bytes - strings are LOTS of bytes (for example, in UTF-8 a heart emoji is 3 bytes) ## bytes weren't always 8 bits In the past, people experimented with lots of different byte sizes (2, 3, 4, 5, 6, 8, and 10 bits!) But now we've standardized on 8 bits pretty much everywhere.

little vs big endian

## little endian / big endian ## we write dates in two main orders 1. 2023-03-17 ("big endian") 2. 17-03-2023 ("little endian") 3. 03-17-2023 ("american") "big endian" means that the big unit (the year) is at the start ("big end first") ## similarly: computers order bytes in 2 ways Here are 2 ways your computer might represent the integer 271: 1. big endian: `00000001 00001111` 2. little endian: `00001111 00000001` How this corresponds to 271: `00000001 00001111` is 271 in binary ## When you send integers on a computer network, they have to be big endian. Here's how that works: Computer A has the 16-bit integer "271" in its memory: `00001111 00000001` Computer A flips the bytes and sends it as big endian: `00000001 00001111` Computer B receives the big endian integer Computer B flips the bytes and stores it in memory as little endian: `00001111 00000001` ## a little history Before 1980, computers ordered their bytes in different ways. In 1980, the Internet started being standardized, causing a huge fight over which byte order to use on the Internet. The terms "big/little endian" come from that fight: they were coined in an article called "On Holy Wars and a Plea For Peace" which compares the byte order fight to the Big/Little Endians in Gulliver's Travels. Big endian won that fight, so most Internet protocols (IPv4, TCP, UDP, etc.) are big endian. But almost all modern computers are little endian. Some machines, like the Xbox 360, are big endian though.

integers

## integers ## panel 1: To decode bytes as integers, we need to know 3 things: 1. the integer's size (8 bit, 16 bit, 32 bit, or 64 bit) 2. is it little or big endian? 3. is it signed or unsigned? ## panel 2: how signed integers work is the hardest part) to understand (I only learned how it works a couple months ago!). Just knowing that unsigned and signed integers are different will take you a long way. ## 2 bytes, 3 interpretations `254 | 0 ` We could interpret these 2 bytes as: 1. `254` (little endian) 2. `65024` (big endian, unsigned) 3. `-512` (big endian, signed) ## how you decode bytes depends on the context - in a program's memory, the type of the variable tells you the integer's size and if it's signed/unsigned - your CPU determines if integers are big or little endian (you don't have a choice) - for a binary network protocol (like DNS), the specification (for DNS, that's RFC 1035) will tell you how to decode the bytes ## examples of types - in Rust, an `i64` is a signed 64-bit integer - in Go, a `uint32` is an unsigned 32-bit integer - in C, a `short` is usually a signed 16-bit integer, depending on the platform

integer overflow

## integer overflow ### integers have a limited amount of space The 4 usual sizes for integers are 8 bits, 16 bits, 32 bits, and 64 bits ### the biggest 8-bit unsigned integer is 255 ... so what happens if you do 255 + 1? going above/below the limits is called overflow the result wraps around to the other side 255 + 1 = 0 255 + 3 = 2 200 * 2 = 144 0 - 2 = 254 ### maximum numbers for different sizes bits: unsigned signed 8: 127 255 16: 32767 65535 32: 2 billion ~4 billion 64: ~9 quintillion ~18 quintillion ### overflows often don't throw errors computer (thinking): "255 + 1? that number is 8 bits, so the answer is 0! that's what you wanted right?" This can cause VERY tricky bugs ### some languages where integer overflow happens Java/Kotlin C/C++ Rust Swift C# SQL R Go Dart Python (only in numpy) Some throw errors on overflow, some don't, for some it depends on various factors. Look up how it works in your language!

how floats are printed

## how floats are printed ## computers lie when they print out floats (by rounding) For example `0.12` isn't `0.12`, it's actually (roughly): `0.119999999999999995559` is my computer LYING to me??? about NUMBERS? ## the string -> float translation If your program says: `x = 0.12` your interpreter / compiler needs to translate "`0.12`" into the float `0.119999999999999995559`. Most languages will use the `strtod` ("string to double") function from libc to do that translation. ## the float -> string translation This is where the rounding comes in. Computers round to make the numbers shorter and easier to read. `1.19999999999999995559` ↪ 1.2 ## float -> string translation is actually super complicated Every floating point number needs a unique string representation. There are a bunch of academic papers about how to do this well, search "Printing floating point numbers accurately" to read more about it. ## some examples of printing floats `1.19900000000000006573` ↪`1.199` `1.19999999000000001637` ↪`1.19999999` `1.19999999999998996358` ↪ `1.9999999999999` `1.19999999999999995559` ↪`1.2` ## you can also print floats in base 16 or base 2 For example, 0.1 as a 32-bit float is: base 16: `0x1.99999ap-4` (`p-4` is the base 16. version of `e-4`) base 2: `1.10011001100110011001101p-100` The base 2/base 16 representations are not rounded, but they're rarely used.

how bitwise operations are used

### Binary formats often pack information into bytes very tightly to save space. For example, here are 2 bytes from a real TCP packet: `10000000 00010000` The first "`1000`" is the offset (4 bits) The following "`000`" is reserved (3 bits) The remaining "`00010000`" are the flags (9 bits) Here's how `&`, `|`, `<<`, `>>` can be used to pack/unpack data into bytes. ### bit masking Let's say we have the 2 bytes from the previous panel, and we want to extract just the flags part. Here's how to do it with `&` (bitwise and): The idea is that you put a mask "on top" of the bytes to erase bits: `X: 10000000 00010000` (number) `0x01FF: 000000001 1111111` (bit mask) `x & 0x01FF: 000000001 0010000` (how they combine) `000000001`: these 7 bits all get set to 0 `0010000`: these 4 bits stay the same ### check/set bit flags (see page 16 for more) set a bit flag with or: ``` x = x | 0b010000; ``` check a bit flag with and: ``` if ((x & 0b010000) != 0) { 00001000 X } ``` (this example is in C) ### unpack/pack bits Now let's talk about the offset from the first panel. We can't do calculations in it with the packed form, so we need to unpack it. You can unpack with >>: ``` 10000000 -> 00001000 X -> X >> 4 ``` and pack with <<: ``` 0001000 -> 10000000 X -> X << 4 ``` 1000 in binary is 8, which in this case is the TCP offset value.

hexadecimal

## panel 1: let's talk about how to write binary data one way: binary `01111111 11111111 11111111`\ it's easy to see the bits... `1010110110101001010`\ but it's hard to read a lot of them another way: base 10\ `83888607`\ but I have NO IDEA how many bits that is ## panel 2: now the best way to write binary data: base 16! It's short AND maps well to bits!\ `7fffff`\ Every hexadecimal digit represents 4 bits. So 1 byte (8 bits) is always 2 hexadecimal digits. ## panel 3: there are 16 hex digits: `0 → f` ``` | hex | decimal | binary | | 0 | 0 | 0000 | | 1 | 1 | 0001 | | 2 | 2 | 0010 | | 3 | 3 | 0011 | | 4 | 4 | 0100 | | 5 | 5 | 0101 | | 6 | 6 | 0110 | | 7 | 7 | 0111 | | 8 | 8 | 1000 | | 9 | 9 | 1001 | | a | 10 | 1010 | | b | 11 | 1011 | | c | 12 | 1100 | | d | 13 | 1101 | | e | 14 | 1110 | | f | 15 | 1111 | ``` ## panel 4: 0x means it's hex In many languages, the 0x prefix lets you write numbers in hexadecimal.\ For example, in C:\ 0x20 == 32 (base 16)\ 0b10100 20 (base 2)\ 061 == 49 (base 8)\ be careful: the 0 prefix meaning "base 8" can really trip you up! ## panel 5: things hexadecimal is used for color codes! (e.g. `#FF00FF`)\ memory addresses!\ hashes! (like git commit IDs)\ displaying binary data! (like with `hexdump`)

floating point: the bits

### panel 1: Floats need to fit into 64 bits. But how do we actually convert a number like 10.87 into 64 bits? First, we split the number into 3 parts: the sign, a power of 2 and an offset (The usual term is "significand", but I find that term calling it "offset") `10.87 = + (8 + 2.87) ` (8 is the biggest power of 2 that's less than 10.87) Next, we encode the sign, power of 2, and offset into bits! ### encoding the sign (1 bit) `+ is 0` `- is 1` ### floating point encoding is defined in the IEEE 754 standard since it's standardized, it works the same way on every computer! it was originally defined in 1985 ### encoding the exponent (11 bits, 2^-1023 to 2^1023) `8` ↓ `2^3 = 8` `3` ↓ add 1023 (this makes sure that the result is positive) `1026` ↓ write it in binary, in 11 bits `10000000010` ### encoding the offset (52 bits) `2.87` ↓ divide by the gap size, 2^-49 in this case (2^exponent-52) `1615666366319165.3 ` ↓ round `1615666366319165` ↓ write it in binary, 52 bits `01011011110101110000101000 ` `11110101110000101000111101` ### And here's `10.87`! `01000000 00100101 10111101 01110000 10100011 11010111 0001010 00111101`

floating point representation

### the (64-bit) floating point number line Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024]. Every window has 252 floats in it. The windows [-2, -1], [-1, -1/2], [-1/2, -1/4], [-1/4, 0], [0, 1/4], [1/4, 1/2], [1/2, 1], and [1, 2], each have 2^52 numbers. [2, 4] has 2^52 numbers. [4, 8] has 2^52 numbers. Illustration of a horizontal line, with the windows plotted out on it, showing that each window doubles in size as it moves away from zero. ### the windows go from REALLY small to REALLY big The window closest to 0 is [2^-1023, 2^-1022] This is TINY: a hydrogen atom weighs about 2^-76 grams. The biggest window is [2^1023, 2^1024]. This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away. ### the gaps between floats double with every window window: [1, 2] gap: 2^-52 window: [2, 4] gap: 2^-51 window: [4, 8] gap: 2^-50 window: [8, 16] gap: 2^-49 ### why does `10000000000000000.0 + 1 = 10000000000000000.0`? - In the window [2^n, 2^n+1], the gap between floats is 2^n-52 - `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2) - So the next float after `10000000000000000.0` is `10000000000000002.0`

floating point math

## floating point math let's deconstruct `0.1 + 0.2` 1. O The closest 64-bit float to 0.1 is (roughly) `0.1000000000000000055511151231` 2. For 0.2, it's (roughly) `0.2000000000000000111022302462` 3. `0.1000000000000000055511151231 + 0.2000000000000000111022302462 = 0.3000000000000000166533453693` 4. Inconveniently, `0.3000000000000000166533453693` is exactly in between 2 floating point numbers: `0.2999999999999999888977` and `0.30000000000000004440892` 5. How do we pick the answer? `0.30000000000000004440892` has an even offset, so we round to that one ## losing a little precision is okay `0.1 0.2 0.30000000000000004` is usually no big deal. Do you REALLY need your answer to be accurate to 16 decimal places? Probably not! ## the more numbers you add, the more precision you lose This Go code: `var meters float32 = 0.0 ` `for i = 0; i < 100000000; i++ { meters += 0.01` `} fmt.Println(meters)` prints out `262144`, not `1000000` because `262144.0+ 0.1 = 262144.0` ## adding a number to a MUCH smaller number is bad For example: 2 xx 53 + 1.0 = 2 xx 53 1.0 + 2 xx -57 = 1.0 (try it!) ## Use scientific computing libraries if you can There are special algorithms for adding up lots of small floating numbers without losing accuracy! For example `numpy` implements them.

floating point is weird

## floating point is weird ## floating point 10.0 is not the same as the integer 10 10 (64-bit integer): `0x000000000000000a` 10.0 (64-bit float): `0x4024000000000000` (what's this 4024 doing???) ## computer integers work almost exactly the way you'd expect `1 + 2 - 3 = 0` but floating point numbers don't: ` (0.1 + 0.2) - 0.3 = 0.0000000000000000555` ## checking for float equality is dangerous `if x == 0.3`: bad! `(0.1 + 0.2)` is not equal to `0.3`! Instead, check if x is very close to 0.3, something like this: `if abs(x 0.3) 0.0000001:` ## in floating point, very large integers get rounded For example: `10000000000000001.0 == 10000000000000000.0` (16 zeros) (try comparing those 2 numbers in your favourite language! they're the same!) ## (x + y) + z is not the same as x + (y + z) For example: `(9007199254740992.0+ 1.0) 1.0 = 9007199254740991.0` (the math term for this problem is "floating point addition isn't associative") ## some intuition for precision 32-bit floats have about 8 digits of precision 64-bit floats have about 16 digits of precision

floating point alternatives

## more floating point alternatives ## there are many alternative ways to represent numbers These are all implemented in software (not hardware) so they're a lot slower, and different languages have different libraries. ## alternative 1: decimal floating point This is like regular floating point, but in base 10 instead of base 2. It's also standardized in IEEE 754. Examples: Python's `decimal` module or Java's `BigDecimal` ## alternative 2: fractions This lets you do exact calculations with fractions (1/10 + 2/10 = 3/10) Examples: Python's fractions module in the standard library, Lisps have first-class support ## alternative 3: symbolic computation For example, `sqrt(2)` instead of `1.414`. You'll see this in computer algebra systems like Mathematica, Maple, or sympy. ## alternative 4: interval arithmetic The idea is to store every number as a range so that you can precisely track your error bars. Probably the least mainstream of these alternatives. ## alternative 5: binary-coded decimal This is how floating point numbers (and integers) were stored on IBM computers in the 60s, and you can still occasionally see it today in old formats like ISO 8583 for financial transactions.

fixed point

## fixed point ## just because you see 0.23, doesn't mean it's floating point For example, in this RGBA color: `rgba(211, 7, 23, 0.23)` `0.23` isn't a float at all, it's the 8-bit integer `59`. Let's see how that works! ## fixed point numbers are integers You interpret them as the integer divided by some fixed number (like 255 or 10000) For example, that opacity should be divided by 255 `59 / 255 = 0.23ish` ## things fixed point is often used for money: `$1.23 => 123` time: `0.1 seconds => 100000 microseconds` opacity: `0.23 => 59` ## fixed point is the most common alternative to floating point It's very simple and it's pretty easy to implement! ## implementing fixed point is easy (especially if you only need to add and subtract) You just need: - an integer - some code to display it (by dividing by 255 or something) ## fixed point can help avoid accuracy issues If you try to represent the current Unix epoch in nanoseconds as a 64-bit float, you'll lose accuracy. But if it's a 64-bit integer, it'll be fine.

bit flags

## bit flags ## bit flags are a clever way to store lots of information in one integer If you have many options which are true or false, you can encode them all into an integer, with 1 bit for each option. 32 bits 32 options! For example, some of the bit flags the open function in C uses: - nofollow - append - truncate - create - write only - read write (this is on Linux) ## where you'll see bit flags In libc, the open, socket, and mmap functions use bit flags to pass options. The TCP and UDP protocol headers both have a flags field which has bit flags. ## bit flags are used a lot in C code Here's some C code that opens a new file: `fd = open("file.txt", O_RDWR | O_CREAT, 0666);` `O_RDWR` is: `00000010` `O_CREAT` is: `01000000` `O_RDWR | O_CREAT` is: `01000010` You can check if a bit flag is set in C like this: `if (flags & O_RDWR) { ... }` ## fun example: tic tac toe! Here's a way to encode the state of a tic tac toe game in 18 bits: x positions: `100` `010` `010` O positions: `010` `001` `100`

big integers

## big integers ## integers don't have to overflow Instead, integers can expand to use more space as they get bigger. Integers that expand are called "big integers". big integer: I'm going to use ONE THOUSAND bytes of space! ## big integer math is slower It's slower because it's implemented in software, not hardware. So a big integer addition is actually turned into lots of smaller additions. ## how big integers are represented (in Go, as of 2023) You can think of this array of 64-bit integers as being the number written in base 2^64 ## some languages only have big integers Python 3 and Ruby: we'd rather have slower math and no weird overflow problems! This works because people don't do a lot of math in Ruby/Python (except with numpy, which doesn't use big integers). ## some languages offer big integers as an option Go, Javascript, Java, and lots more. Each language has its own big integer implementation. ## when are big integers useful? - they're used in cryptography (e.g. for large key sizes) - for math on really big integers

bases

### We usually write numbers in base 10, but you can write numbers in any base. Let's write the number 103 in 3 different bases: base 10: `103` (powers of 10) ``` 1 x 100 = 100 0 x 10 = 0 3 x 1 = 3 = 103 ``` base 2: `1100111` (powers of 2) ``` 1 x 64 = 64 1 x 32 = 32 0 x 16 = 16 0 x 8 = 8 1 x 4 = 4 1 x 2 = 2 1 x 1 = 1 64 + 32 + 16 + 8 + 4 + 2 + 1 = 103 ``` base 3: `67` (powers of 16) ``` 6 x 16 = 96 7 x 1 = 7 96 + 7 = 103 ``` ### base 2, 10, and 16 are the main bases we use on computers - base 2 is called binary - base 10 is called decimal - base 16 is called hexadecimal ### how to convert from base 10 to base 2 Let's convert 19! We'll start on the right and move left. 1. Divide by 2: 19/2 = 9 remainder 1 2. Write the remainder (1) below, and 9 on the left 3. Repeat answer: 10011! person: but in real life I'd just ask a computer

NaN and infinity

## NaN and infinity ## NaN stands for "not a number" It means the result of the calculation is undefined. `0/0 = NaN` `sqrt(-1) = NaN` `log(-1) = NaN` ## infinity "Infinity" just means "this number is too big for floating point to handle." There are two infinities: one positive, one negative. `2.0**1024 = inf` (`2.0**1024` means `2^1024`) `-1/0 = -inf` `inf 10 = inf` `inf - inf = NaN` ## NaNs spread As soon as one NaN gets in, it gets everywhere `NaN * 5 = NaN` `NaN + 2 = NaN` ## NaN != NaN NaN isn't equal to anything (including itself) ## NaN and infinity: the bits A floating point value is `NaN` or `infinity` if the bits in the exponent are all 1. For example, this is a `NaN`: `01111111 11110001 00000000 00000000 00000000 00000000 00000000 00000000` It's `infinity` if the offset bits are all 0, otherwise it's `NaN`. There are 2^52 values like this: 2 of them are `±infinity` and the other 2^52-2 are `NaN`. We usually treat `NaN` like a single value though. ## a note on byte order All of the floating point examples in this zine use a big endian byte order, because it's easier to read. But most computers use a little endian byte order. You can see this in action at `https://memory-spy.wizardzines.com`

DNS: cast of characters

Let's meet the cast and see how they communicate with each other! browser: where's example.com? (function call) 93.184.216.34! ↓ function: where's example.com? (DNS query) 93.184.216.34! ↓ resolver: where's example.com? (DNS query) 93.184.216.34! ↓ authoritative nameservers ### browser Your browser uses DNS to look up IP addresses every time it visits a domain, like example.com. The browser has a DNS cache. ### function Your operating system provides a function to do DNS lookups. On Linux and Mac it's getaddrinfo. Your operating system also might have a DNS cache. ### resolver The function sends requests to a server called a resolver which knows how to find the authoritative nameservers. The resolver has a DNS cache. ### authoritative nameservers The authoritative nameservers are the servers where the DNS records are actually stored. They're wearing crowns because they're In Charge.

life of a DNS query

### 1 An illustration of a smiling stick figure with curly hair, talking to a browser, represented by the Firefox logo of a fox wrapped around a globe. person: I want to go to https://example.com browser: hmm, I don't have an IP address for example.com cached. I'll ask a resolver! ### 2 An illustration of a browser talking to a resolver, represented by a box with a smiley face holding a magnifying glass. browser: what's the IP for example.com? resolver: hmm, I'll look in my cache... ### 3 ❤ DNS cache ❤ archive.org: 207.241.224.2 jvns.ca: 172.64.80.1 resolver: nope, I don't have it cached, I need to ask the authoritative nameservers! I have the root nameserver IPs hardcoded. note: we're pretending the resolver has no .com domains cached. Normally it would use its cache to skip step 4. ### 4 An illustration of a browser talking to a root nameserver, represented by a box with a smiley face wearing three crowns. resolver: What's the IP for example.com? root nameserver: ask a .com nameserver! It's at a.gtld-servers.net → com NS a.gtld-servers.net. ca NS a.ca-servers.net. horse NS a.nic.horse. (NS stands for "nameserver") ### 5 An illustration of a browser talking to a .com nameserver, represented by a box with a smiley face wearing two crowns. resolver: what's the IP for example.com? .com nameserver: ask an example.com. nameserver! It's at a.iana-servers.net list of DNS records: neopets.com, NS, ns-42.awsdns-05.com. → example.com, NS, a.iana-servers.net. ### 6 An illustration of a browser talking to an example.com nameserver, represented by a box with a smiley face wearing one crown. resolver: what's the IP for example.com? example.com nameserver: it's 93.184.216.34! resolver: great, I'll tell the browser! → example.com, A, 93.184.216.34

DNS queries

DNS queries aren't harmless

how to read dig output

everything in a DNS packet

I literally mean everything, I copied this verbatim from a real DNS request using Wireshark. (DNS packets are binary but we're showing a human-readable representation here) ### Let's look at the actual data being sent during a DNS query: Illustration of a browser, represented by the Firefox logo of a fox wrapped around a globe, talking to a resolver, represented by a box with a smiley face holding a magnifying glass. browser: what's the IP for example.com? resolver: 93.184.216.34! ### request `Query ID: 0x05a8` (randomly generated) `Flags: 0x1000` (these flags just mean "this is a request") `Questions: 1` `Answer records: 0` `Authority records: 0` `Additional records: 0` `Question:` `Name: example.com` `Type: A (A is for IPv4 address. other types: MX, CNAME, AAAA, etc) `Class: IN` (IN stands for "INternet") ### response `Query ID: 0x05a8` (matches request ID) `Flags: 0x8580` the response code is encoded in the last 4 bits of these flags. The 3 main response codes are: - NOERROR (success!) - NXDOMAIN (doesn't exist!) - SERVFAIL (error!) ``` Questions: 1 Answer records: 1 Authority records: 0 Additional records: 0 ``` (copied from request) ``` Question: Name: example.com ``` (domain names aren't case sensitive) ``` Type: A Class: IN Answer records: Name: example.com Type: A Class: IN TTL: 86400 Content: 93.184.216.34 ``` (the IP we asked for) ``` Authority records: (empty) Additional records: (empty) ``` page 12 ("NS records") talks more about these 2 sections Illustration of a smiling stick figure with curly hair. Person: I'm always surprised by how little is actually in a DNS packet!

glue records

how airports lie to you with DNS

DNS is distributed

SPF & DKIM records

things that can break your DNS

the root nameservers

DNS cache levels

resolvers vs authoritative nameservers

negative caching

### Here's a problem I've had many times Illustration of a stick figure with curly hair and a distressed expression. Person's thought bubble: I set up my new domain, everything looks good, but it's not working?!?! ### I finally learned last year that my problem was "negative caching" Same person, now smiling: now I never have this problem anymore! ### resolvers cache negative results Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown. resolver: what's the IP for `bees.jvns.ca`? authoritative nameserver: I don't have any records for that! resolver (thought bubble) `caching: no A records for bees. jvns.ca` ### the TTL for caching negative results comes from the SOA record `example.com. 3600 IN SOA ns.icann.org. noc.dns.icann.org. 2021120741 7200 3600 1209600 3600` it's the smaller of the first number and the last number (in this case 3600 seconds) ### what you need to know about SOA records 1. they control the negative caching TTL 2. you can't change them (unless you run your own authoritative nameserver) 3. how to find yours: `dig SOA yourdomain.com` ### how to avoid this problem Just make sure not to visit your domain before creating its DNS record! That's it! (if you really want more details, see RFC 2308)

the DNS hierarchy

### there are 3 main levels of authoritative DNS servers root (wearing 3 crowns): I'm in charge of EVERYTHING .com nameserver (wearing 2 crowns): I'm in charge of all domains ending in `.com` example.com nameserver (wearing 1 crown): I'm in charge of all domains ending in `example.com` ### the root nameserver delegates what's the IP for example.com? root: I am not concerned with petty details like that. Here's the address of the .com nameserver. ### the .com nameserver also delegates what's the IP for example.com? .com nameserver: I am not concerned with petty details like that either. Here's the address of the example.com nameserver ### the example.com nameserver actually answers your questions what's the IP for example.com? example.com nameserver: 93.184.216.34! ### this design lets DNS be decentralized example: for my domain `jvns.ca` root (ICANN controls this!) delegates to .ca nameserver (Canada controls this!) delegates to jvns.ca nameserver (I control this!)

bitwise operations

### bitwise operations operate one bit at a time The results can be surprising when you write them in base 10: `8 & 3 = 0` but in binary it makes more sense: ``` 00001000 (8) & 00000011 (3) = 00000000 ``` ### & Bitwise and: the result is 1 if BOTH bits are 1 ``` 1 & 1 = 1 1 & 0 = 0 0 & 0 = 0 11 & 10 = 10 ``` ### | Bitwise or: the result is 1 if EITHER bit is a 1 ``` 1 | 1 = 1 1 | 0 = 1 0 | 0 = 0 11 | 10 = 11 ``` ### ^ Bitwise xor: the result is 1 if EXACTLY ONE bit is a 1 ``` 1 ^ 1 = 0 1 ^ 0 = 1 0 ^ 0 = 0 11 ^ 10 = 01 ``` ### ~ Bitwise not: FLIP all the bits ``` ~0 = 1 ~1 = 0 ~10 = 01 ``` ### << Left shift: add 0s to the end `1110 <<< 3 = 1110000` `<< n` is the same as multiplying by 2^n ### >> Right shift: chop bits off the end 01100001 >> 2 = 00011000 `>> n` is the same as dividing by 2^n ### there are actually two right shifts unsigned right shift ``` 253 >> 1 = 126 11111101 -> 01111110 ``` always pad on the left with a 0 signed right shift ``` -3 >> 1 = 2 11111101 -> 11111110 ``` if the number is negative, pad on the left with 1 instead of a 0 In some languages, unsigned right shift is >>>. In other languages, both right shifts are >> and the integer's type determines which is used.

32 bits is small

## panel 1: using 32-bit integers is dangerous Let's see some examples of how it can go wrong and why it's almost always better to use 64-bit integers instead! (32-bit floats are bad too, for similar reasons) ## panel 2: 32 bit integers are at most 4 billion unsigned 32-bit ints go from 0 to 4,294,967,295 (4 billion) signed 32-bit ints go from -2,147,483,648 to 2,147,483,647 ## panel 3: times "4 billion" wasn't enough **Database primary keys**: 4 billion records really isn't that much. **IPv4 addresses**: turns out we want more than 4 billion computers on the internet. Oops. **Registers**: in the 90s, registers were 32 bits. 4 billion bytes of RAM is 4GB. We need more than that. **Unix timestamp**s: 2 billion seconds after Jan 1, 1970 is Jan 19, 2038. That's going to be an exciting day. (look up "2038 problem"!) ## panel 4: 64 bits is usually big enough For example, 2^64 seconds after Jan 1, 1970 is over 100 billion years in the future: well after the death of the sun. So a 64-bit timestamp is definitely enough space. ## panel 5: be wary of using 32-bit integers by accident Systems that were designed in the 90s often have a 32-bit integer as the default. For example, in MySQL an INTEGER is 32 bits.

PATH

### PATH is what causes "command not found" errors ``` $ ffmpeg zsh: command not found: ffmpeg ``` sad person (thinking): "but why?? I just installed ffmpeg!" ### your shell has a list of directories it looks for commands in bash (thinking): "I'll check `/usr/local/bin` then `/usr/bin` then `/opt/local/bin` then ..." That list is in an environment variable called `PATH` ### how to check which shell you're using To change your `PATH`, you need to know which shell you're using. To check, run a nonexistent command: ``` $ zxasdfasfaewfhrasda zsh: command not found: ... ``` tada! your shell is zsh! ### how to see your current PATH ``` $ echo $PATH ``` Or to format it more nicely in bash or zsh: ``` $ echo $PATH | tr ':' '\n' ``` ### how to fix your PATH 1. figure out what directory you need to add (try `find / -name ffmpeg` if you can't figure it out) 2. edit your shell config 3. open a new terminal (very important!) ### how to edit your shell config You'll need to add one line. The file you edit depends on your shell: for bash: add `export PATH=$PATH:/some/dir` to `~/.bashrc` for zsh: add `export PATH=$PATH:/some/dir` to `~/.zshrc` for fish: add `set PATH $PATH /some/dir` to `~/.config/fish/config.fish`

some people who make programming easier

### the loud newbie newbie: wait, HOW does X work?? other person, thinking: I'm so glad they asked, I was wondering that too... ### the grumpy old timer new person: X is so cool! grumpy old timer: it is! let me tell! you about some ways it can break though.... ### the bug chronicler that bug was so gnarly, I'm going to write an EXTREMELY CLEAR description of what happened so we I can all learn from it ### the documentarian person 1: here's how you do X... documentarian: I'll put those instructions in our wiki! ### the "today I learned..." I just learned this cool new tool... check out this weird bug! ### the "I've read the entire internet" person: how does X work? TAB GIRL: ah, I read about that recently... here's a link from my 200 browser tabs ### the tool builder everyone keeps getting confused by X! I'm going to fix it with CODE. ### the question answerer person 1: hey can you explain how X works? question answerer: I would LOVE to ### blank final panel ?

TCP: how to reliably get a cat

Step 3 in our plan is "open a TCP connection!" Let's learn what this "TCP" thing even is ### When you send a packet sometimes it gets lost jvns.ca server → Cat packets → lightning bolt laptop: nope never got it ### TCP lets you send a stream of data reliably, even if packets get lost or sent in the wrong order. four butterflies, labelled TCP C, TCP D, TCP D (duplicates), TCP A, and TCP B laptop: it says "abcd"! ### how does TCP work, you ask? WELL! ### how to know what order the packets should gо in: Every packet says what range of bytes it has. Like this: once upon a ti ← bytes 0-13 agical oysterbytes ← 30-42 me there was a m ← bytes 14-29 Then the client can assemble all the pieces into: "once upon a time there was a magical oyster" The position of the first byte (0,14,30 in our example) is called the "sequence number" ### how to deal with lost packets: When you get TCP data, you have to acknowledge it (ACK): jvns.ca server: here is part of a cat picture! that should be 28832 bytes so far! jvns.ca server (thinking): yay laptop: ACK! I have received all 28832 bytes If the server doesn't get an acknowledgement, it will retry sending the data.

strace command line flags I love

### -e overwhelmed by all the system calls. you don't understand? Try `strace -e open` and it'll just show you opens. much simpler! ### -f is for follow Does your program start subprocesses! lots do! Use `-f` to see what those are doing too. Or just always use `-f`! That's what I do. ### -p is for PID "OH NO I STARTED THE PROGRAM 6 HOURS AGO AND NOW I WANT TO STRACE IT" Do not worry! Just find your process's PID (like 747) and `strace -p 747` (tip: if the process runs as root you'll need to be root, too because SECURITY) ### -s is for strings!! Sometimes I'm looking at the output of a recvfrom and it's like: recvfrom (6, "And then the monster...") and OH NO THE SUSPENSE. `strace -s 800` will show you the first 800 characters of each string. I use it all the time! ### -o is for output! Let's get real. No matter what, strace prints too much damn output. Use `strace -o too_much_stuff.txt` and sort through it later. ### -y Have no idea which file the file descriptor "3" refers to? `-y` is a flag in newer versions of strace, and it'll show you filenames instead of just numbers! ### Putting it all together: Want to spy on an ssh session? `strace -f -o ssh.txt ssh juliabox.com` Want to see what files a Dropbox sync process is opening? (with PID: 230) `strace -f -p230 -e open`

every Linux networking tool I know

### ping "are these computers even connected?" ### curl make any HTTP request you want ### httpie like curl but easier ("http get") ### wget download files ### tc on a linux router, slow down your brother's internet (and much more) ### dig/nslookup what's the IP for that domain? (DNS query) ### whois is this domain registered? ### ssh secure shell 💙 ### scp copy files over a SSH connection ### rsync copy only changed files (works over SSH) ### ngrep grep for your network ### tcpdump "show me all packets on Port 80!" ### wireshark look at those packets in a GUI ### tshark command line super powerful packet analysis ### tcpflow capture & assemble TCP streams ### ifconfig "what's my IP address?" ### route view & change the route table ### ip replaces ifconfig, route, and more! ### arp see your ARP table ### mitmproxy spy on SSL connections your programs are making ### nmap in ur network scanning ur ports ### zenmap GUI for nmap ### p0f identify OS of hosts connecting to you ### openvpn a VPN ### wireguard a newer VPN ### nc netcat! make TCP connections manually ### socat proxy a TCP socket to a unix domain socket + LOTS MORE ### telnet like SSH but insecure ### ftp/sftp copy files. sftp does it over SSH. ### netstat/ss/sof/fuser "what ports are servers using?" ### iptables set up firewalls and NAT! ### nftables new version of iptables ### hping3 construct any TCP packet you want ### traceroute/mtr what servers are on the way to that server? ### tcptraceroute Use top packets instead of icmp to traceroute ### ethtool manage physical Ethernet connections + network cards. ### iw/iwconfig manage wireless network settings (see speed/frequency!) ### sysctl configure Linux kernel's network stack ### openssl do literally anything with SSL certificates. ### stunnel make a SSL proxy server for an insecure server ### iptraf/nethogs/iftop/ntop see what's using bandwidth ### ab/nload/perf benchmarking tools ### python 3 -m http.server serve files from a directory ### ipcalc easily see what 13.21.2.3/25 means ### nsenter enter a container process's network namespace

a debugging manifesto

### 1. inspect, don't squash Try to fix the bug (crossed out, bad) Understand what happened (checkmarks, smiley faces) ### 2. Being stuck is temporary. person (thinking): I WILL NEVER FIGURE THIS OUT ... 20 minutes later... person (thinking): Wait, I haven't tried X... ### 3. Trust nobody and nothing person (thinking): This library can't be buggy... person (thinking): Or CAN IT??? (slowly growing horror) off to the side, a bug looks on, with a sneaky expression ### 4. It's probably your code person (thinking): I KNOW my code is right ... 2 hours later ... person (thinking): Ugh, my code WAS the problem?!!? ### 5. don't go it alone person 1: "WHAT IS HAPPENING?!?" person 2: "What if we try X?" ### 6. There's always a reason. A computer, illustrated by a box with a smiley face, surrounded by ones and zeros: Computers are always logical, even when it doesn't feel that way. ### 7. Build your toolkit person (thinking, holding a box labelled TOOLZ): "wow, the CSS inspector makes debugging SO much easier" ### 8. It can be an adventure. person: "You wouldn't BELIEVE the weird bug I found!" adorable weird bug, standing beside them: hi!

preserve the crime scene

One of the easiest ways to start is to save a copy of the buggy code and its inputs/outputs: An illustration of stick figure wearing a top hat. Beside them is a bug in a mason jar. person (thinking): "don't touch anything! we need to preserve evidence!" Depending on the situation, you might want to: - make a git commit of the buggy code! (on a branch, just for you) - save the input that triggered the bug - save logs/screenshots to analyze later

read the error message

Error messages are a goldmine of information, but they can be very annoying to read: (image of an error message, represented by a stack of squiggly lines, with 2 notes pointing to it): - giant 50 line stack trace full of impenetrable jargon, often seems totally unrelated to your bug - can even be misleading, like "permission denied" sometimes means "doesn't exist" Tricks to extract information from giant error messages: - If there are many different error messages, start with the first one. Fixing it will often fix the rest. - If the end of a long error message isn't helpful, try looking at the beginning (scroll up!) - On the command line, pipe it to `less` so that you can scroll/search it ```(./my_program 2>&1 | less)``` Note: if you don't include `2>&1`, `less` won't show you the error messages (just the output)

reread the error message

After I've read the error message, I sometimes run into one of these 3 problems: Each person is represented by a stick figure with curly hair. ### 1. misreading the message person (thinking) ok, it says the error is in file X spoiler: it actually said file Y ### 2. disregarding what the message is saying person (thinking): well, the message says X, but that's impossible... spoiler: it was possible ### 3. not actually reading it person (thinking): ok, I read it... spoiler: she did not read it

reproduce the bug

My favourite way to get information about buggy code is to run the buggy code and experiment on it. (Add print statements! Make a tiny change!) If the bug is happening on your computer every time you run your program: hooray! You've reproduced the bug! An illustration of a smiling stick figure with curly hair. person (thinking): "ok, time to debug! I've got my print statements ready to go!" But if you can't make the bug happen, you're left guessing. An illustration of a sad stick figure with curly hair. person (thinking): "what was variable X set to when the bug happened? guess there's NO WAY TO KNOW" cute illustration of a bug: the next page has tips!

inspect unreproducible bugs

When you can't reproduce a bug locally, it's tempting to just try random fixes and pray. Resist the temptation! Some ways to get information: - try to reproduce the environment where it happened - ask for screenshots / screen recordings - add more logging, deploy your code, and repeat until you understand what caused the bug - read the code VERY VERY carefully (incredibly boring but it actually does work sometimes) - do your experimentation somewhere where you can reproduce the bug (on a staging server? on someone else's computer?)

identify one small question

Debugging can feel huge and impossible. But all you have to do to make progress is: 1. come up with ONE QUESTION about the bug. 2. make sure the question is small enough that you can investigate it in ~20 minutes 3. figure out the answer to that question Illustration of a smiling stick figure with curly hair, surrounded by other question marks, which are crossed out. person (thinking): hmm, this database all these query is slow... well, can I find out if the query is using an index? ignore other questions for now! one at a time!

retrace the code's steps

Here's a classic (but still very effective!) way to get started: 1. find the line of code where the error happened 2. trace backwards to investigate what could have caused that error. keep asking "why?" example: - There's an error on line 58... - that's because this variable has the wrong value... - the value is set by calling this function... - that function is making an HTTP request to the API... - the API response doesn't have the format I expected! Why is that? In the corner of the page, there is an illustration of a goofy-looking bug with a long neck and curly antennae saying "Chase me!"

write a failing test

If your program already has tests, adding a failing test is a great way to work on your bug! Illustration of a smiling stick figure with curly hair. person (thinking): this function should return X, but it's returning Y - it forces you to pinpoint what exactly the bug is - it's easy to tell when you've fixed it (the test passes!) - you can keep the test to make sure the bug doesn't come back

brainstorm some suspects

brainstorming every possible cause I can think of helps me not get stuck on the 1 or 2 most obvious possibilities. In a box representing a sheet of paper: - could I be using the wrong version of this library? - am I passing the wrong argument to function X? - is something wrong with the server? - is the entire internet broken??? (there are two notes on the side pointing at the above text) - sometimes I find it easier to think clearly when writing by hand on paper. - no filter! even ridiculous ideas!

rule things out

Once I have a list of suspects, I can think about how to eliminate them. Illustration of a pensive stick figure with curly hair. person (thinking): "I'm really confused, but I can at least check if the server returned the right HTTP response here.." Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. Both boxes are labelled "suspicious". person (thinking): "that response looks good! the server isn't the problem!" Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. The client box is labelled "suspicious", with exclamation marks and question marks surrounding it, but the "server" box is labelled "ok", with a check mark and smiley faces. note: here we're assuming that was the only request being made. Otherwise this wouldn't be a safe conclusion :)

keep a log book

I don't usually write things down. But 2 hours into debugging, I get really confused: Illustration of a frazzled-looking stick figure with curly hair. person (thinking): wait, what did that error message I saw 2 hours ago say again exactly?? person (thinking): did I already try this??? Keeping a document with notes makes it WAY easier to stay on track. It might contain: - specific inputs I tried - error messages I saw - stack overflow URLs The log makes it easier to ask for help later if needed!

draw a diagram

Some ideas: ### network diagram An illustration of a network, with a cylinder labelled DB, and boxes labelled "factory", "handler", "obj", "model 1", and "model 2", with arrows amongst them showing their relationships. ### flowchart A flowchart with boxes "set flag", "run cmd", "if failed, retry", and "return result", with arrows amongst them illustrating a process. ### state diagram A diagram with boxes labelled "inventory page", "cart page", and "checkout page", with arrows amongst them labelled "cart icon", "continue shopping", "checkout", and "cancel". ### or anything else (like a data structure!) A box labelled "on | off | on | off". The first "off" is labelled "[1, 1, 1, 0, 0, 1, 1, 1, 0", and the second "off" is labelled "5 seconds".

add lots of print statements

I love to add print statements that print out 1, 2, 3, 4, 5... An illustration of a printer printing out lines of text. ``` console.log(1) console.log(2) console.log(3) ``` Using descriptive strings is smarter, but I usually use numbers or "wtf???" This helps me construct a timeline of which parts of my code ran and in what order: Illustration of timeline of code, with some arrows pointing at it numbered 1, 3, 2. Between 1 and 3, it says "everything is okay". Between 3 and 2 it says "the cause", with a picture of a bug, and after 2, it says "the error message" with a picture of a page of text. Often I'll discover something surprising, like "wait, 3, never got printed??? Why not???".

use a debugger

A debugger is a tool for stepping through your code line by line and looking at variables. But not all debuggers are equal! Some languages' debuggers have more features than others. Your debugger might let you: - jump into a REPL to poke around (see page 25) - watch a location in memory and stop the program any time it's modified - "record replay" debuggers let you record your entire program's execution and time travel Illustration of a smiling stick figure with curly hair. person (thinking): I love record/replay debuggers because they make hard-to-reproduce bugs easier: I just have to reproduce the bug once

jump into a REPL

In dynamic languages (like Python / Ruby / JS), you can use a debugger to jump into an interactive console (aka "REPL") at any point in your code. Here's how to do it in Python 3: 1. edit your code `my_var = call_some_function() breakpoint()` add "`breakpoint()`"! 2. rerun your code (refresh the page, whatever) 3. play around in the REPL! You can call any function you want / try out fixes! How to do it in other languages: - Ruby: `binding.pry` - Python (before 3.7): `import pdb; pdb.set_trace()` - Javascript: `debugger;`

find a version that works

If I have a bug with how I'm using a library, I like to: - find a code example in the documentation - make sure it works - slowly change it to be more like my broken code - test if it's still working after every single tiny change Illustration showing a bunch of points with arrows between them. Each point has a check mark beside it, until one that is labelled "Oh THAT'S what broke it!!!" This puts me back on solid ground: with every change I make that DOESN'T cause the bug to come back, I know that change wasn't the problem.

look at recent changes

Often when something is broken, it's because of a recent change. Usually I look at recent changes manually, but git bisect is an amazing tool for finding exactly which git commit caused the problem. We don't have space for a full `git bisect` tutorial here, but here's how you start using it: ``` git bisect start git bisect bad HEAD git bisect good 1fe9dc ``` (1fe9dc is the ID of a commit that doesn't have the bug) Then you can either tag buggy commits manually or run a script that does it automatically.

sprinkle assertions everywhere

Some languages have an `assert` keyword that you can use to crash the program if a condition fails. Assertions let you: - come up with something that should ALWAYS be true - immediately crash the program if it isn't this variable is undefined!!! Illustration of a program, represented by a box with an unhappy face. program (thinking): "this variable is undefined!!! STOP EVERYTHING!" This is a great way to force yourself to think about what's ALWAYS true in your program, and check if you're right. Illustration of a smiling stick figure with curly hair. person (thinking): "the radius can never be 0, right? or can it?"

analyze the logs

If you can't reproduce a bug, sometimes you need to comb through the logs for clues. Some tips: - filter out irrelevant lines (for example with grep -v) - find 1 failed request and search for that request's ID to get all the logs for that request - build a timeline: copy and paste log lines (and your interpretations!) into a document - if you see a suspicious log line, search to make sure it doesn't also happen during normal operation - if there's a cascade of errors, find the first error that started the problems

read the docs

There are many ways to read the docs! - the surgical strike: Search for a specific function, find an example on the page, copy it and leave. (this is often me :)) - the question quest: You have a specific question and you'll keep skimming different pages until you find the answer. - the IDE integration: Set up your editor or IDE so that you can instantly jump to a function's documentation. - the rigorous read: Get a cup of coffee and read all of the docs cover to cover, like a book.

comment out code

Commenting out code is an amazing way to quickly do experiments and figure out which part of your code is to blame. You can: - comment out a function call and replace it with a hardcoded value, to check if the function call is broken - if the error message doesn't give you a line number, comment out huge chunks of the program until the problem goes away - comment out some code and rewrite it to see if the new version is better

learn one small thing

Bugs are a GREAT way to discover things on the edge of your knowledge. Illustrations of a stick figure with curly hair. person (thinking, looking worried): "hmm, part of the problem here is that I don't understand how position: absolute works..." Finding one small thing I don't understand and learning it is really useful (and pretty fun!) person (thinking, now smiling): "now I understand position: absolute! cool!"

find the type of bug

If the bug is totally new to you, find out if there's a name people use for that type of bug! Illustration of two stick figures. Person 1 has curly hair and looks worried, Person 2 has straight hair and is smiling. person 1: "this bug is happening intermittently, it's so weird." person 2: "that sounds like it might be a race condition..." person 1 (thinking): "oh, what's a race condition?" examples: - `terminated by signal SIGSEGV (address boundary error)` segmentation fault - `flexbox: div doesn't fit in other div (CSS)` item overflowing container - `nodename nor servname provided, or not known` DNS lookup failure - `RecursionError: maximum recursion depth exceeded` stack overflow

tidy up your code

Messy code is harder to debug. Illustration of a smiling stick figure with curly hair. person (thinking): "this function is 100 lines??? who named these variables?!?!" (annotation: it was me) Doing a tiny bit of refactoring can make things easier, like: - rename variables or functions - format it with a code formatter (`go fmt`, `black`, etc.) - add comments - delete old/untrue comments Don't go overboard with the refactoring though: making too many changes can easily introduce new bugs.

read the library's code

Lots of code isn't documented. But when there are no docs, there's always the source code! It sounds intimidating at first, but a quick search of the code sometimes gets me my answer really quickly. Tips for exploring an unfamiliar library's code: - search the tests! Tests are a GREAT source of examples. - git clone it locally to make it easier to navigate. - search for your error message and trace back. - if it's a Python/JS/Ruby library, sometimes I'll edit the library's code on my computer to add print statements (just remember to take them out after!)

find a new source of info

We all know to look at the official documentation. Here are some less obvious places to look for answers: - the project's Discord, Slack, IRC channel, or mailing list - code search (search all of GitHub for how other people are using that library!) - GitHub issues (did someone else have the same problem?) - release notes (is the bug fixed in the new version?) - a book chapter (you might have a book on this topic!) - blog posts (sometimes there's an amazing explanation on the 2nd page of Google results)

write a tiny program

Does your bug involve a library you don't understand? Illustration of an unhappy stick figure with curly hair. person (thinking): UGH, `requests` is NOT working how I expected it to! I like to convert my code using that library into a tiny standalone program which has the same bug: Illustration of two programs, one represented by a big messy scribble, the second represented by three tidy lines. giant buggy program => 20 lines of buggy code I find this makes it WAY EASIER to experiment and ask for help. And if it turns out that library actually has a bug, you can use your tiny program to report it.

take a break

Illustration of a steaming hot beverage. Investigating a tricky bug requires a LOT of focus. Illustration of a sad stick figure with long straight hair. person (thinking): "ugh, nothing is working..." (annotations on person): googling the same error message for the 7th time. very frustrated Instead, try one of these magical debugging techniques (even a 5 minute break can really help!): - ride your bike! - go to bed! - get a coffee! - have a shower! - eat lunch! Illustration of the same person, now happily riding their bike.

reduce randomness

It's much easier to debug when your program does the exact same thing every time you run it. Illustration of a sad stick figure with curly hair. person (thinking): "the bug only happens 10% of the time, it's SO HARD to figure out if my change fixed it or not." There are a bunch of tools for controlling your program's inputs to reduce randomness, for example: - many random number generators let you set the seed so you get the same results every time. - `faketime` fakes the current time. - libraries like ruby's `vcr` can record http requests. - record/replay debuggers like `rr` record everything.

one thing at a time

It's tempting to try lots of fixes at once to save time: Illustration of a smiling stick figure with curly hair. dream: I'm going to add Z, and replace X with Y, and improve C-- that'll definitely fix it! Illustration of the same stick figure, now sad. reality: ... now there's a new problem AND it's still broken If I found I've done this by accident, I'll: - undo all my changes (`git stash!`) - make a list of things to investigate, one at a time

delete the buggy code

Sometimes the buggy code is not worth salvaging and should be deleted entirely. Reasons you might do this: Illustration of an uneasy-looking stick figure with curly hair. - it uses a confusing library / tool person (thinking): this library isn't working, I'm going to switch to Y instead Illustration of the same person, now smiling. - you have a better idea for how to implement it person (thinking): I bet I could avoid all these problems if I took X approach instead...

investigate the bug together

I find investigating a bug with someone else SO MUCH more fun than doing it alone. Illustrations of two smiling stick figures, one with short curly hair, and one with longer straight hair. Debugging together lets you: - Teach each other new tools! person 1: I wish we could find out x, but that's impossible... person 2: Let's use my favourite tool, strace!!!!!! - Learn new concepts! person 2: What is this CORS thing?!?! person 1: Oh, I can explain that! - Keep each other on track person 2: Maybe the problem is Y? person 1: We already ruled that out! Right, I forgot!

timebox your investigation

Sometimes I need to trick myself into getting started: Illustrations of a stick figure with short curly hair. person (thinking, looking unhappy): "UGH, I do NOT want to look at this CSS bug!!!!" Giving myself a time limit really helps: Illustration of an alarm clock person (thinking, now smiling): "Okay, I'll just see what I can figure out in 20 minutes..." You can't always solve it in 15 minutes, but this works surprisingly often! ... 15 minutes later ... person (thinking, happy): "all fixed! That wasn't so hard!"

write a message asking for help

When I'm REALLY stuck, I'll write an email to a friend: - "Here's what I'm trying to do..." - "I did X and I expected Y to happen, but instead..." - "Could this be because....?" - "This seems impossible because..." - "I've tried A, B, and C to fix it, but...." This helps me organize my thoughts, and often by the time I finish writing, I've magically fixed the problem on my own! It has to be a specific person, so that the imaginary version of them in my mind will say useful things :)

explain the bug out loud

Explaining what's going wrong out loud is magic. Illustrations of two stick figures. One has curly hair, and one has short straight hair and is wearing a big t-shirt with a picture of a rubber duck. person (looking sad): "so, when I do X thing, I'm getting an error, and it doesn't make any sense because I already checked that A and B are working...." other person: huh... person (now smiling, with an exclamation mark above their head): "OH I SEE WHAT I DID WRONG" other person (also smiling): "happy to help!" People call this "rubber ducking" because the other person might as well be a rubber duck.

try out a new tool

There are TONS of great debugging tools (listed on the next page!), but often they have a steep learning curve. Some tips to get started: - get someone more experienced to show you an example of how they'd use the tool. (this is SO helpful!!!) - try it out when investigating a low stakes bug, so it's no big deal if it doesn't work out. - take notes with examples of the options you used, so you can refer to them next time.

make sure your code is running

Illustration of an unhappy stick figure with curly hair. person (thinking): NOTHING I try is helping, this is IMPOSSIBLE person (thinking): wait... nothing I try is changing anything.... is my code even being run???? If my changes have no effect at all, often it means I've made a silly mistake (like forgetting to restart the app) and my changes aren't being run! I like to check that my code is being run by printing something out (like `print("asdf"`). Or, if that's not possible, I'll introduce an error so that it crashes.

do the annoying thing

Illustrations of an unhappy-looking stick figure with short curly hair. Sometimes when I'm debugging, there are things I'll refuse to try because they take too long. person (thinking): ugh, that part of the code is so confusing, I don't want to look at it... But as I become more and more desperate, eventually I'll give in and do the annoying thing. Often it helps! person (thinking): FINE, I'll look at that code... oh, yeah, here's the bug.

types of debugging tools

Here are some tools I've found useful: - debuggers! (most languages have one!) - profilers: `perf, pprof, py-spy` - tracers: `strace, ltrace, ftrace, BPF tools` - network spy tools: `tcpdump, wireshark, ngrep, mitmproxy` - web automation tools: `selenium, playwright` - load testers: `ab, wrk` - test frameworks: `pytest, RSpec` - linters/static analysis tools: `black, eslint, pyright` - data formatting tools: `xd, hexdump, jq, graphviz` - dynamic analysis tools: `valgrind, asan, tsan, ubsan` - fuzzers/property testing: `hypothesis, quickcheck, Go's fuzzer` (I've never used those last two but lots of people say they're helpful.)

shorten your feedback loop

when you're investigating a bug, you'll need to run the buggy code a million times. Illustration of a stick figure, holding their hands to their face in despair. person (thinking): ugh, i need to type all this information into the form to trigger the bug again??? this is literally the 30th time :( :( ways to speed it up: - use a browser automation tool to fill in forms / click buttons for you! - write a unit test! - autorun your code every time you save!

colours, graphs, and sounds

Instead of printing text, your program can tell you about its state by generating a picture! Or playing sounds at key moments! Some ways your programs can generate pictures or sounds: - add colours to your log lines (every letter of 'colours' is a different colour) - add red outlines around every HTML element! ("red" and "outlines" have a red outline around them) - Haskell has an option to beep at the start of every major garbage collection (there's a bell icon after "beep") - draw a chart of events over time (chart icon) - use graphviz to generate a diagram of your program's internal state (there's a picture of a little graph diagram with a -> b, a -> c)

add pretty printing

Sometimes you print out an object, and it just prints the class name and reference ID, like this: `MyObject<#18238120323>` Illustration of a frowning stick figure with curly hair. person (thinking): "ugh, thanks, very helpful... " Implementing a custom string representation for a class you're often printing out can save a LOT of time. The name of the method you need to implement is: - Python: `.__str__ ` - Ruby: `.to_s` - JavaScript: `.toString` - Java: `.toString` - Go: `String()` Also, pretty-printing libraries (like `pprint` in Python or `awesome_print` in Ruby) are great for printing out arrays/hashmaps.

document your quest

For very tricky bugs, writing up an explanation of what went wrong and how you figured it out is an amazing way to share knowledge and make sure you really understand it. Ways I've done this in the past: - complain about it in the internal chat! (so people can search for it!) - write a quick explanation in the commit message - write a fun blog post telling my tale of woe! - for really important work bugs, write a 5-page document with graphs explaining all the weird stuff I learned along the way

tell a friend what you learned

I love to celebrate squashing a bug by telling a friend: Illustration of a smiling stick figure with curly hair. person: hey marie, did you know about this weird thing that can happen with CSS flexbox? Some possible outcomes of this: - they've seen that bug too, and teach me something else! - they learn something new! - they ask questions I hadn't thought of - they tell me about a website/tool I didn't know about - it helps solidify my knowledge!

find related bugs

Illustration of two adorable bugs. They are holding hands and their antennae are intertwined. When you're done fixing a bug, glance around to see if there are any obvious places in your code that have the same bug. Illustration of a smiling stick figure with short curly hair. person (thinking): "I was calling function X wrong, I'll check if we're calling that function wrong anywhere else!" person (thinking): "wow, my assumption about how Y worked was TOTALLY wrong, I should go back and fix some things..."

do a victory lap

Once you've solved it, don't forget to celebrate! Take a break! Feel smart! Illustration of a smiling stick figure with curly hair. person (thinking): "i did it, i did it, i'm amazing" (now is not the time for humility) The best part of understanding a bug is that it makes it SO MUCH easier for you to solve similar future bugs. Illustration of a smiling stick figure with curly hair, and another figure with short spiky hair. person (thinking): I've seen something like this before, maybe the problem is X? colleague: (annotation, saying that they're awestruck at your brilliance)

add a comment

Some bug fixes are a little counterintuitive. Otherwise you would have written the code that way in the first place! You might think: Illustration of a smiling stick figure with curly hair. person (thinking): "I'll remember why I added this code, I spent 5 hours this is a debugging it! this is a trap!!!!! Adding a comment can help future you (or your coworkers!) avoid accidentally reviving a bug later. person (thinking): ooh, I could simplify this code! Illustration of a dancing bug, singing "I'm back!"

a tiny DNS resolver

On page 5 (life of a DNS query), we saw how resolvers work. This code does the same thing, but it actually works. ``` def resolve(domain): # Start at a root nameserver nameserver = "198.41.0.4" # A "real" resolver would check its cache here while True: reply = query(domain, nameserver) ip = get_answer(reply) if ip: # Best case: we get an answer to our query and we're done return ip nameserver_ip = get_glue(reply) if nameserver_ip: # Second best: we get the IP address* of the nameserver to ask next nameserver = nameserver_ip else: # Otherwise: we get the domain name* of the nameserver to ask next nameserver_domain = get_nameserver(reply) nameserver = resolve(nameserver_domain) * Actual DNS resolvers are more complicated than this, but this is the core algorithm. ``` Smiling stick figure with curly hair: You can find the whole program at https://github.com/jvns/tiny-resolver

every git jargon

### config ``` .git/config hook .gitconfig alias global local ``` ### history ``` log blame bisect diff ``` ### commit ``` commit checkout tree-ish show patch apply remotes restore ``` ### staging area ``` index staged cached grep add status staging area ``` ### branches ``` HEAD refs/heads/main detached HEAD state head HEAD^, HEAD~, HEAD^^ reference symbolic reference reset tag main master reflog .. ... ``` ### other features ``` stash worktree subtree submodule revert ``` ### merging ``` merge conflict rebase interactive rebase fast forward merge cherry-pick squash ours/theirs ``` ### remotes ``` upstream downstream push pull fetch clone fork remote refspec origin ```

meet the merge

### panel 1: combining different versions of files is core to git Illustration showing two boxes, each with three symbols, being added together. (a, b, y) + (x, b, c) = ??? it's very hard ### panel 2: to merge files, you need to know what the original was picture: original is (a, b, c) one side changed a -> x so it's (x, b, c) the other side changed c -> y so it's (a, b, y) ### panel 3: git merges by combining all changes merge machine with everything from panel 1 in a thought bubble: (a, b, c) (x, b, c) (a, b, y) result is (x, b, y) ### panel 4: if both changed the same line it's a merge conflict merge machine with a thought bubble showing: (a, b, c) (x, b, c) (z, b, y) result is (a/z, b, y) The result has red question marks around it because the first position has two values in conflict. There are also red sad faces and x's around the illustration. ### panel 5: git figures out the original version by looking at commit history Illustration showing a path with a starting point labelled "original", with a note that this is called the "merge base". Two paths, labelled v1 and v2, diverge from it. ### panel 6: `cherry-pick`, `revert`, `rebase`, and `merge` all need to combine files Illustration of a smiling stick figure with curly hair. person: "they all use the same merge algorithm, using some clever tricks! we'll talk about that next."

let's explore a commit

### panel 1: you can see for yourself how git is storing your files! You just need one command: `git cat-file -p` First, get a commit ID. You can get one from `git log` ### panel 2: read the commit ``` $ git cat-file -p 3530a4 tree 22b920 parent 56cfdc author Julia <julia@fake.com> 1697682215 -0500 committer Julia <julia@fake.com> 1697682215 -0500 ``` ### panel 3: read the directory ``` $ git cat-file -p 22b920 100644 blob 4fffb2 .gitignore 100644 blob e351d9 404.html 100644 blob cab416 Cargo.toml 100644 blob fe442d hello.html 040000 tree 9de29f src ``` ### panel 4: read a file ``` $ git cat-file -p fe442d <!DOCTYPE html> <html lang="en"> <body> <h1>Hello!</h1> </body> </html> ``` ### panel 5: and we're done! `fe442d` is the sha1 hash of the contents of the file. It's called a "blob id". this is how git keeps things efficient: it only needs to make a new copy when the file changes

git discussion bingo

A grid of boxes, like a bingo card, with the following text in them: - WTF is detached HEAD state - just use magit - subversion was so much worse - rewriting history is bad - I just do not care how git works - I hate git - git is a directed acyclic graph - I only know 5 commands - just spend 15 minutes learning git's internals - content addressed storage - git's design is so elegant - you have to understand the linux kernel dev workflow - a branch is just a pointer to a commit - something about "porcelain" - subversion was better - I've used git for 10 years and I have no idea how it works - mercurial is better - git is not github - the CLI is badly designed - merge sucks, only use rebase - something about Linus Torvalds - commits are immutable snapshots - you should just read Pro Git - rebase sucks, only use merge - I just delete my git repo if I mess it up

why DNS updates are slow: caching

### You might have heard that DNS updates need time to "propagate". What's actually happening is that there are old cached records which need to expire. ### DNS records are cached in many places - browser caches - DNS resolver caches - operating system caches google.com, represented by a box with a smiley face: my DNS records are cached on billions of devices! ### let's see what happens when you update an IP bananas.com A▾ 300 [changed to] 60 1.2.3.4 [changed to] 5.6.7.8 beware: even if you change the TTL to 60s, you still have to wait 300 seconds for the old record to expire ### 30 seconds later... (you go to bananas.com in your browser) Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a browser, represented by the Firefox logo of a fox wrapped around a globe browser: hey what's the IP for bananas.com? resolver, thinking: let's check my cache for bananas.com... found it!! resolver: it's 1.2.3.4! ### 400 seconds later... (you refresh the page again) browser: hey what's the IP for bananas.com? resolver, thinking: The TTL (300s) is up, better ask for a new IP... resolver: it's 5.6.7.8! ### 12 hours later... (you check 1.2.3.4's logs to make sure all the traffic has moved over) Illustration of a stick figure with curly hair looking confused, and a rogue DNS resolver, which looks like the other resolvers except that it is wearing a burglar mask. person: that's weird, the old server is still getting a few requests... rogue DNS resolver: I don't care about your TTL! I just cache everything for 24 hours! the culprit: a rogue DNS resolver

getaddrinfo

### panel 1: One weird thing about DNS is that different programs on a single computer can get different results for the same domain name. Let's talk about why! Illustration of a program, represented by a box with a smiley face, and a resolver (server), represented by a box with a smiley face holding a magnifying glass. Between them is a function, represented by a rectangle with squiggly lines on it. There are arrows going back and forth between the function and both the program and the resolver (server). The function is the problem. ### reason 1: many (but not all!!) programs use the function getaddrinfo for DNS lookups... ping, represented by a box with a smiley face: I use getaddrinfo! dig, also represented by a box with a smiley face: I don't! So if you see an error message like "`getaddrinfo: nodename or servname not provided...`", that's a DNS error. ### and not using getaddrinfo might give a different result - the program might not use `/etc/hosts` (dig doesn't) - the program might use a different DNS resolver (some browsers do this) ### reason 2: there are many different versions of `getaddrinfo`... - the one in `glibc` - the one in `musl libc` - the one in Mac OS And of course, they all behave slightly differently :) ### you can have multiple getaddrinfos on your computer at the same time For example on a Mac, there's your system `getaddrinfo`, but you might also be running a container that's using `musl`. ### glibc and musl getaddrinfo are configured with `/etc/resolv.conf` IP of resolver to use ``` # Generated by NetworkManager nameserver 192.168.1.1 nameserver fd13: d987:748a::1 ``` On a Mac, `/etc/resolv.conf` exists, but it's not used by the system `getaddrinfo`.

resolvers can lie

### When a resolver gets a DNS query, it has 2 options: Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass. resolver: I could tell you what the authoritative nameservers, said... or I could LIE! ### block ads / malware Illustration of conversation between a resolver and a a browser, represented by the Firefox logo of a fox wrapped around a globe browser: what's the IP for doubleclick.net? (ad domain, definitely exists) resolver: that domain doesn't exist PiHole blocks ads this way. ### reason to lie: to show you ads (rude!) browser: what's the IP for zzz.jvns.ca? (doesn't exist) resolver: here's an IP that will show you ads! This is called "DNS hijacking". ### reason to "lie": internal domain names browser: what's the IP for corp.examplecat.com? (doesn't exist on the public internet) corporate resolver: here's an internal IP address! ### reason to lie: airport DNS resolvers sometimes lie browser: what's the IP for google.com? airport resolver: you didn't log in yet so I will lie! here is our login page's IP! ### how does your computer know which resolver to use? When you connect to a network, the router tells your computer which search domain and resolver to use (using DHCP). Illustration of a router, represented by a box with antennae and a smiley face router: `192.168.1.1 search domain: lan`

dig command line arguments

illustrtion of a laptop. its keyboard just says QWERTY. ### the basics: dig @SERVER TYPE DOMAIN (SERVER and TYPE are both optional) Examples: ``` dig example.com dig @8.8.8.8 NS example.com dig TXT example.com dig @8.8.8.8 example.com ``` default type: A default server: from `/etc/resolv.conf` (on Linux) ### tip: put +noall +answer in your ~/.digrc This makes your output more readable by default, and you can always go back to the full output with `dig +all`. ### dig +noall Hide all output. Useless by itself, but `dig +noall +authority` will just show you the "Authority" section of the response. ### dig +short DOMAIN Only show the record content. `$ dig +short example.com 93.184.216.34` ### dig +trace DOMAIN Traces how the domain gets resolved, starting at the root nameservers. This avoids all the caches, which is useful to make sure you set your record correctly.

NS records

### What's actually happening when the root nameserver redirects to the .com nameserver, on page 6? Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a root nameserver, represented by a pink box with a smiley face, wearing a stack of three crowns resolver: what's the IP for example.com? root nameserver: I am not concerned with petty details like that. Here's the address of the .com nameserver (this is an NS record) ### The root nameserver can return two kinds of DNS records: NS records: (in the Authority section) ``` com. 172800 NS a.gtld-servers.net com. 172800 NS b.gtld-servers.net ``` com. is the name 172800 is the TTL NS is the type b.gtld-servers.net is the value glue records: (in the Additional section) ``` a.gtld-servers.net 86400 A 192.5.6.30 b.gtld-servers.net 86400 A 192.33.14.30 ``` a.gtld-servers.net is the name 86400 is the TTL A is the type 192.33.14.30 is the value ### The NS record gives you the domain name of the server to talk to next, but not its IP address. resolver: But I need the IP for `a.gtld-servers.net` to communicate with it! is there a glue record? ### 2 ways the resolver gets the IP address 1. If it sees a glue record for a.gtld-servers.net, the resolver will use that IP 2. otherwise, it'll start a whole separate DNS lookup for a.gtld-servers.net ### glue records help resolvers avoid infinite loops without a glue record for `a.gtld-servers.net`: disaster! resolver: what's the IP for `a.gtld-servers.net`? root nameserver: You should ask `a.gtld-servers.net` ### terminology note NS records are DNS records with type "NS". Also, an "A record" means "record with type A", "MX record" means "record with type MX", etc. (confusingly, this is not true for glue records, glue records have type A or AAAA. It's weird, I know.)

let's meet dig

### dig is my favourite tool for investigating DNS issues I find its default output unnecessarily confusing, but it's the only standard tool I know that will give you all the details. ### tiny guide to dig's full output ``` $ dig example.com ; <<>> DiG 9.16.24 <<>> +all example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27580 18 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ; example.com. IN A ;; ANSWER SECTION: example.com. 86400 IN A 93.184.216.34 ;; Query time: 0 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Wed Jan 26 11:32:03 EST 2022 ;; MSG SIZE rcvd: 56 ``` `NOERROR` is the response code `example.com. 86400 IN A 93.184.216.34` is the answer to our DNS query. The "." at the end means that example.com isn't a subdomain of some other domain (like it's not example.com.degrassi.ca). This might seem obvious, but DNS tools like to be unambiguous. ### panel 3: Illustration of a smiling stick figure with curly hair. person: `$ dig +noall +answer` means "Just show me the answer section of the DNS response." It's a lot less to look at! ### panel 4: `$ dig +noall +answer example.com` `example.com. 86400 IN A 93.184.216.34` example.com is the name 86400 is the TTL IN is the class A is the record type 93.184.216.34 is the content just the answer! so much less overwhelming!

A & AAAA records

### there are two kinds of IP addresses: IPv4 and IPv6 Every website needs an IPv4 address. IPv6 addresses are optional. ### panel 2: A stands for IPv4 Address Example: `93.184.216.34` AAAA stands for IPv6 AAAAddress (joke, but kinda true) Example: `2606:2800:220:1:248:1893:25c8:1946` it's called AAAA (4 As) because IPv6 addresses have 4x as many bytes ### in theory, the Internet is moving from IPv4 to IPv6 This is because there are only 4 billion IPv4 addresses (the internet has grown a LOT since the 1980s when IPv4 was designed!) ### happy eyeballs* If your domain has both an A and an AAAA record, clients will use an algorithm called "happy eyeballs" to decide whether IPv4 or IPv6 will be faster. `*` yes that is the real name ### using IPv6 isn't always easy - not all web hosts give you an IPv6 address - lots of ISPs don't support IPv6 (mine doesn't!) ### IP addresses have owners You can find any IP's owner by looking up its ASN ("Autonomous System Number"). (except local IPs like `192.168.x.x`, `127.x.X.X`, `10.x.x.x`, `172.16.x.x`)

TCP DNS

### If you manage servers, sometimes DNS just breaks for no obvious reason Illustration of a smiling stick figure with curly hair. person: TCP DNS is an uncommon but VERY annoying cause of DNS problems! Let's learn about it! ### DNS queries can use either UDP or TCP A UDP DNS response has to be less than 4096 bytes. UDP is the default. TCP can send an unlimited amount of data. It's only used when UDP wouldn't work. ### large DNS responses automatically use TCP speech bubble 1: here's a UDP DNS query! speech bubble 2: sorry, my response is too big to fit in a UDP packet! get the rest with TCP! ### what's in a giant DNS response? person: I've seen responses with hundreds of internal server IP addresses (for example when using Consul) ### how not supporting TCP DNS can ruin your day 1. your server is happily making UDP DNS queries 2. one day, the responses get bigger and switch to TCP 3. oh no! the queries fail! ### 2 reasons TCP DNS might not work 1. some DNS libraries (like musl's getaddrinfo) don't support TCP. This is why DNS sometimes breaks in Alpine Linux. 2. it could be blocked by your firewall. You should open both UDP port 53 and TCP port 53.

MX records

### there are two important problems in email From: Kermit @frog.com To: julia@example.com 1. Make sure the message gets to the right recipient. This is what MX records are for. 2. Make sure the sender didn't lie about their From: address. This is what SPF, DKIM, and DMARC records are for. SPF/DKIM/DMARC are very complicated but we'll give a tiny incomplete summary. ### MX records tell you the mail server for a domain ``` $ dig +short MX gmail.com 5 gmail-smtp-in.l.google.com. ``` 5 is the priority google.com is the server's domain name ### copy and paste your MX records Illustration of a smiling stick figure with curly hair. person: you're probably using an email service like Fastmail/Gmail, so just copy the records they tell you to use ### tiny guide to SPF/DKIM/DMARC records SPF: list of allowed sender IP addresses Example: `v=spf1 ip4:2.3.4.5 -all` DKIM: sender's public key Example: `v=DKIM1; k=rsa; p=MIGFMA0GCSqGSI.......` DMARC: what to do about SPF/DKIM failures Example: `v=DMARC1; p=reject; rua=mailto:dmarc@example.com`

TXT records & more

### TXT records can contain literally anything ``` examplecat.com TXT "hello! I'm an example cat!" ``` (though they're usually ASCII) ### they're often used to verify that you own your domain google, represented by a box with a smiley face: put "banana stand panda" in a TXT record to prove you) own this domain! ### reasons to verify your domain - to issue SSL certificates with Let's Encrypt - to use Single Sign On (SSO) for a service - to get access to Google/ Facebook's data about your domain (eg search data) ### they're also used for email security (SPF/DKIM/DMARC) Illustration of two smiling stick figures talking. person 1: should we create a DNS record type for SPF? person 2: nah let's just put it all in TXT records! (not a historically accurate summary of the design process for SPF records) ### TXT records can contain many strings Each string is at most 256 characters, and clients will concatenate them together. You'll see this in DKIM records, because they're usually more than 256 characters. ### some other record types CAA: restrict who can issue certificates for your domain PTR: reverse DNS map IP addresses to domain names (look these up with `dig -x`) SRV: holds both an IP address and a port number

search domains

### panel 1: In an internal network (like in a company or school), sometimes you can connect to a machine by just typing its name, like this: `$ ping labcomputer-23` Let's talk about how that works! ### many DNS lookup functions support "local" domain names browser, represented by a box with a smiley face: where's lab23? function, represented by a rectangle with squiggly lines: where's lab23.degrassi.ca? arrow pointing to resolver (server) represented by a box with a smiley face holding a magnifying glass (the function appends a base domain `degrassi.ca` to the end) ### the base domain is called a "search domain" On Linux, search domains are configured in `/etc/resolv.conf` Example: `search degrassi.ca` this tells `getaddrinfo` to turn `lab23` into `lab23.degrassi.ca` ### getaddrinfo doesn't always use search domains It uses an option called ndots to decide. ``` search degrassi.ca options ndots:5 ``` this means "only use search domains if the domain name contains less than 5 dots" ### search domains can make DNS queries slower browser: where's `jvns.ca`? getaddrinfo, represented by a rectangle with squiggly lines: okay, first I'll try `jvns.ca.degrassi.ca` this is silly but it can happen! ### avoid search domains by putting a "." at the end Use `http://jvns.ca.` instead of `http://jvns.ca` Illustration of a smiling stick figure with curly hair. person: "local" domain names like this mostly exist inside of big institutions like universities

DNS records

### When you make DNS changes for your domain, you're editing a DNS record Туре: A Name (subdomain): paw Use @ for root IPv4 address: 1.2.3.4 TTL: 1 min Here's what the same record looks like with dig (we'll explain dig on page 18) ``` $ dig +noall +answer paw.examplecat.com paw.examplecat.com. 60 IN A 1.2.3.4 ``` ### DNS records have 5 parts - name (eg `tail.examplecat.com`) - type (eg `CNAME`) - value (eg `tail.jvns.ca`) - TTL (eg `60`) - class (eg `IN`) different record types have different kinds of values: `A` records have an IP address, and `CNAME` records have a domain name. ### name `paw.examplecat.com` When you create a record, you'll usually write just the subdomain (like `paw`). When you query for a record, you'll get the whole domain name (like `paw.examplecat.com`). ### TTL `60` "time to live". How long to cache the record for, in seconds. ### class `IN` "IN" stands for "INternet". You can ignore it, it's always the same. ### record type `A` "A" stands for "IPv4 Address". ### value `1.2.3.4` the IP address we asked for!

your domain's authoritative nameservers

### when you register a domain, your registrar runs your authoritative nameservers by default your registrar, represented by a box with a smiley face wearing a crown: I'm taking care of your DNS! You can change your nameservers in your registrar's control panel. ### LOTS of services can be your authoritative nameserver your registrar: I can manage your DNS records! AWS, also represented by a box with a smiley face wearing a crown: me too! shopify, also also represented by a box with a smiley face wearing a crown: me three! Nonplussed stick figure with curly hair: ok chill I only need one of you to do it ### how to find your domain's nameservers ``` $ dig +short NS neopets.com ns-42.awsdns-05.com. ns-1191.awsdns-20.org. ``` `neopets.com` is using AWS's nameservers right now ### how to change your nameservers 1. Copy your DNS records to the new nameservers (use dig to check that it worked) 2. On your registrar's website, update your nameservers 3. Wait 48 hours 4. Delete the old DNS records (to save your future self confusion) ### why changing your nameservers is slow registrar: here's the new nameserver for example.com! .com nameserver, represented by a box with a smiley face, wearing a stack of three crowns: ok great, I've saved this record: `example.com NS newns.com 172800` updates are VERY SLOW because this TTL is 2 days ### what can go wrong if you don't delete the old records Illustration of a nonplussed stick figure with curly hair. person: I'll go to $OLD_NAMESERVER to change my DNS records! person: WHY doesn't it WORK?!?!? person: oh right, I changed this domain's nameservers last year, oops!

rules for rebasing

### don't rebase a million tiny commits you can end up having to fix the same merge conflict 25 times and it's a nightmare. instead, do it in 2 steps: 1. squash into 1 commit with `git rebase -i` 2. `git rebase main` ### don't force push to a shared branch it's totally ok if it's your own branch that nobody else will ever have to git pull from, but if other people are using it, it makes things weird ### don't do more than one thing in a `git rebase -i` you can * combine commits * reorder commits * edit commits but don't do all of them at once! It's too confusing! ### don't rebase other people's commits I only modify my own commits ### stop a rebase if it's going badly it's MUCH easier to run `git rebase --abort` and bail out than to have to undo it later. It'll take you back to where you were before the rebase. ### you never have to rebase the only reason to rebase is to tidy up your git history, if you're not comfortable rebasing then just don't do it! You can merge or `git commit --amend` instead

every git command I use

getting started: git init, git clone move between branches: git branch, git checkout, git switch restore old files: git checkout, git restore preparing to commit: git status, git add, git mv, git rm, git diff, git reset combining branches: git merge, git rebase, git cherry-pick working with others: git pull, git push, git fetch, git remote making commits: git commit configuring git: git config, git remote code archaeology: git blame, git log FILENAME, git log -S SEARCh, git show, git diff trash changes: git stash, git checkout ., git reset --hard, git rebase -i git troubleshooting: git log BRANCH, git status, git diff, git reflog editing history: git rebase -i, git reset --hard

meet the branch

You can think about a Git branch in 3 different ways. Each of the three ways is illustrated with a diagram of a vertical line divided up into four nodes, labelled "main". A diagonal line with three nodes is coming off the second node from the bottom, labelled "branch". ### way 1: just the commits that "branch" off In this diagram, the two nodes that are on the branch, but not on the main, are labelled "these two". This is what you're probably thinking about when you `merge` or `rebase` a branch into another one. Git doesn't keep track of which branch another branch is "based" on though: that's why you have to run `git merge main` (you have to tell it which base branch to merge with!) You can see these commits with: ``` git log main..BRANCHNAME ``` ### way 2: every previous commit In this diagram, the same two nodes are indicated as in way 1, plus the node on main that the branch comes out of, and the node on main before the branch. This is what `git log BRANCHNAME` shows you. When we say a commit is "on" a branch, we mean that it's somewhere in the history for that branch. ### way 3: just the commit at the end In this diagram, the one node on the branch farthest from where main and branch diverge is labelled "this one". This is how git represents a branch internally. You can run: ``` cat .git/refs/heads/BRANCHNAME ``` to see the commit ID for the branch. That commit's parent (and grandparents, great-grandparents, etc) determine the branch's history.

HEAD and heads

### panel 1: have you ever seen refs/heads/main or HEAD and wondered what they mean? here's the deal: * `head` = branch * `HEAD` = current branch (yes, these are TERRIBLE names) ### panel 2: a head in git is a branch nobody really uses the term "head" for a branch except the official git docs though ### panel 3: HEAD is the current branch for example HEAD could be set to main it's stored in .git/HEAD Unless you don't have a current branch... ### panel 4: `HEAD` can be a commit ID instead of a branch This means you have no current branch. Git calls this a "detached head state" (another terrible name!) (silly picture of a stick figure whose head has fallen off) fixing this is easy though: `git checkout BRANCHNAME` ### panel 5: the current branch matters for these commands ``` git commit git rebase git merge git cherry-pick ``` these 4 will work if you have no current branch but will create commits that you have no easy way to refer to ``` git pull git push ``` these don't work at all if you're not a branch

branches have no rules

### you might expect git to enforce some rules about branches some rules you might imagine: * you can't remove commits from a branch, only add them * the `main` branch has to stay more less in sync with `origin/main` But there are no rules. git character with demon hat: want to do something horrible to your branch? no problem! ### there are literally no rules commands that you can use to do weird stuff to a branch: * `git reset` * `git rebase` ### instead of rules, we have conventions for example: * run `git pull` often to keep your `main` up to date * if you're working with a big team, don't commit to `main` directly Illustration of the git demon talking to a nonplussed stick figure with curly hair. git demon: you've just gotta be really careful to not do the wrong thing and not mess up your branch person: um... thanks? ### our only saviour: the reflog `git reflog BRANCHNAME` will show you the history of every change to the branch, so you can always undo the reflog is a VERY unfriendly UI, but it's always there.

git cheat sheet

Illustration of a smiling stick figure with short curly hair. Person: git has 17 million options but this is how I use it! ### getting started #### start a new repo: `git init` #### clone an existing repo: `git clone $URL` ### know where you are `git status` ### prepare to commit #### add untracked file: (or unstaged changes) `git add $FILE` #### add ALL untracked files and unstaged changes: `git add` #### choose which parts of a file to stage: `git add -p` #### delete or move file: ``` git rm $FILE git mv $OLD $NEW ``` #### tell git to forget about a file without deleting it: `git rmcached $FILE` #### unstage everything: `git reset HEAD` ### make commits #### make a commit: (and open a text editor to write the message) `git commit` #### make a commit: `git commit -m 'message'` #### commit all unstaged changes: `git commit -am 'message'` ### move between branches #### switch branches: `git switch $NAM`E OR `git checkout $NAME` #### create a branch: `git switch -c $NAME` OR `git checkout -b $NAME` #### list branches: `git branch` #### delete a branch `git branch -d $NAME` #### force delete a branch: `git branch -D $NAME` #### list branches by most recently committed to: ``` git branch --sort--committerdate ``` ### look at a branch's history #### log the branch `git log main` #### show how two branches relate to each other: `git log-graph a b` #### one line log: `git log-oneline` ### code archaeology #### show who last changed each line of a file: `git blame $FILENAME` #### show every commit that modified a file: `git log $FILENAME` #### find every commit that added or removed some text: `git log S banana` ### diff commits #### show diff between a commit and its parent: `git show $COMMIT_ID` #### show diff between a merge commit and its merged parents: `git show --remerge-diff $COMMIT_ID` #### diff two commits: `git diff $COMMIT_ID $COMMIT_ID` #### just show diff for one file: `git diff $COMMIT_ID $FILENAME` #### show a summary of a diff: `git diff $COMMIT_ID --stat git show $COMMIT_ID --stat` ### diff staged/unstaged changes #### diff all staged and unstaged changes: `git diff HEAD` #### diff just staged changes: `git diff --staged` #### diff just unstaged changes: `git diff` ### configure git #### set a config option: `git config user.name 'Julia'` #### see all possible config options: `man git-config` #### set option globally: `git config --global ...` #### add an alias: `git config alias.st status` ### important git files #### local git config: `.git/config` #### global git config: `~/.gitconfig` #### list of files to ignore: `.gitignore` ### combine diverged branches #### how the branches look before: Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The second branch consists of a box with a spiral and a box with a squiggle. The second branch is labelled "banana". #### combine with rebase: ``` git switch banana git rebase main ``` Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The box with the spiral and the box with the squiggle have been added on after the box with the hashtag. The box with the squiggle is labelled "banana". The second branch, with the box with a spiral and the box with a squiggle, are drawn with dotted lines and labelled "lost". #### combine with merge: ``` git switch main git merge banana git commit ``` This diagram is like the "before" diagram, except now the two branches converge into a new box, with a diamond in it, labelled "main". #### combine with squash merge: ``` git switch main git merge git commit squash banana ``` This diagram is like the "before" diagram, except now, in the first of the two branches, after the hashtag symbol, there is a new box with both a spiral and a squiggle in it, labelled "main". ### bring a branch up to date with another branch (aka "fast-forward merge") main banana ---0-0 ``` git switch main git merge banana ``` banana ---0-2 main ### copy one commit onto another branch before: -K ← main +banana git cherry-pick $COMMIT_ID after: K main © -banana ### add a remote `git remote add $NAME $URL` ### push your changes #### push the main branch to the remote origin: `git push origin main` #### push a branch to the remote origin that you've never pushed before: `git push u origin $NAME` #### push the current branch to its remote "tracking branch": `git push` #### force push: `git push --force-with-lease` #### push tags: `git push --tags` ### pull changes #### fetch changes: (but don't change any of your local branches) `git fetch origin main` #### fetch changes and then merge them into your current branch: `git pull origin main` OR `git pull` #### fetch changes and then rebase your current branch: `git pull --rebase` #### fetch all branches: `git fetch --all` ### ways to refer to a commit every time we say $COMMIT_ID, you can use any of these: * a branch (`main`) * a tag (`v0.1`) * a commit ID (`3e887ab`) * a remote branch (`origin/main`) * current commit (`HEAD`) * 3 commits ago (`HEAD^^^`) * 3 commits ago (`HEAD~3`)

combining diverged branches

### there are 3 options for combining branches - merge - rebase - squash for example, let’s say we’re combining these 2 branches: Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle. ### panel 2: git rebase Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled “lost”. git merge Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond. git merge --squash Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 has a box with a spiral, followed by a branch with a squiggle. ### all 3 methods result in the EXACT SAME FILES some differences are: - the diff git shows you for the final commit - the commit ids - the specific flavour of suffering the method causes ### rebase pro: you can keep your git history simple: Diagram: a git history that is just a series of boxes in a straight line. pain: - harder to learn [sad face] - harder to undo [sad face] - easier to mess up [sad face] (I love rebase though!) ### merge pro: if you mess something up, the original commits are still in your branch’s history pain: when I look at histories like this I feel dread [sad face] Diagram: a complicated git history with a number of different branches. ### squash pro: have 20 messy commits? nobody needs to know! And it’s pretty simple to use. pain: “ugh, someone squashed their 3000-line branch into 1 commit” [sad face]

the floating point number line

### the (64-bit) floating point number line Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024]. Every window has 2^52 floats in it. - between -2 and -1 - between -1 and - 1/2 - between - 1/2 and - 1/4 - between - 1/4 and 0 - between 0 and 1/4 - between 1/4 and 1/2 - between 1/2 and 1 - between 1 and 2 ### the windows go from REALLY small to REALLY big The window closest to 0 is [2-1023 2-1022]. This is TINY: a hydrogen atom weighs about 2^-76 grams. The biggest window is [2^1023, 2^1024]. This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away. ### the gaps between floats double with every window - window: [1, 2] gap: 2^-52 - window: [2, 4] gap: 2^-51 - window: [4, 8] gap: 2^-50 - window: [8, 16] gap: 2^-49 ### why does `10000000000000000.0 + 1 = 10000000000000000.0?` - In the window [2^n, 2^n+1], the gap between floats is 2^n-52 - `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2) - So the next float after `10000000000000000.0` is `10000000000000002.0`

scenes from distributed systems

remote branch caching

### the "up to date" in `git status` is misleading ``` $ git status Your branch is up to date with origin/main ``` this does NOT mean that you're up to date with the remote main branch. But why not??? ### some old version control systems only worked if you were online Illustration of a sad stick figure with short curly hair. person (thinking): my internet went out, guess I can't work ### git works offline Illustration of a smiling stick figure with short straight hair. git developer (thinking): I want to be able to code on a train with no internet git developer (thinking): NOTHING in git will use the internet except `git pull`, `git push`, and `git fetch` ### this makes `git status` weird git developer (thinking): we need to tell people if their branch is up to date... with NO INTERNET??? how? ### solution: CACHING Every remote branch has a local cache named like `origin/mybranch` (`origin` is the remote name, `mybranch` is the branch name) Git doesn't call it a cache though, it calls it a "remote tracking branch" local branch: `mybranch` cache: `origin/mybranch` (only updated on `git pull`, `git push`, `git fetch`) remote branch: `origin mybranch` (`git push origin mybranch` updates this) (git has no easy way to see when `origin/mybranch` was last updated)

orphan commits

### commits in git are usually saved forever Except! Orphaned commits are deleted periodically. Illustration of a little garbage can. Commits are orphaned when you: - `git commit --amend` - `git rebase` - delete a branch that hasn't been merged ### what is an orphaned commit? it's a commit that isn't in the history of any branch they're almost totally invisible, since Git will usually only show you commits on branches ### orphan #1: `git commit --amend` before: An illustration for a box that says `parent`, with a line to a second box that says `fix color buug` (typo!). The second box is labelled `main` branch. after: The same diagram as above, but there is now a second line coming out of the `parent` box, going to a third box that says `fix color bug`. The `fix color buug` box is now labelled "now it's an orphan!" and the `fix color bug` box is labelled "`main` branch". ### orphan #2: `git rebase` before: A box with two branches coming out of it. The top one is labelled "`main` branch". The second branch has two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch". after: A box with two branches coming out of it. The top branch consists of three boxes, one blank, one with a heart, and one with a star. The blank box is labelled "`main` branch", and the box with the star is labelled "`feature` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "now these two are orphans!" ### orphan #3: `deleting unmerged branch` before: A box with two branches coming out of it. The first branch consists of one blank box, labelled "`main` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch". after deleting `feature`: The same diagram as above, except that the second branch is now labelled "now these two are orphans!" ### how to find orphan commits the only way to find them is with `git reflog` (or by memorizing their commit ID somehow)

inside .git

### `HEAD` `HEAD` is a tiny file that just contains the name of your current branch `.git/HEAD` `ref: refs/heads/main` `HEAD` can also be a commit ID, that's called "detached `HEAD` state" ### branches a branch is stored as a tiny file that just contains 1 commit ID. It's stored in a folder called `refs/heads`. `7622629` - (actually 40 characters) tags are in `refs/tags`, the stash is in `refs/stash` ### commit a commit is a small file containing its parent(s), message, tree, and author `.git/objects/7622629` ``` tree c4e6559 parent 037ab87 author Julia <x@y.com> 1697682215 committer Julia <x@y.com> 1697682215 commit message goes here ``` these are compressed, the best way to see objects is with `git cat-file -p HASH` ### trees trees are small files with directory listings. The files in it are called "blobs" `.git/objects/c4e6559` ``` 100644 blob e351d93 404.html 100644 blob cab4165 hello.py 040000 tree 9de29f7 lib ``` the permissions here LOOK like unix permissions, but they're actually super restricted, only 644 and 755 are allowed ### blobs blobs are the files that contain your actual code `.git/objects/cab4165` `print("hello world!!!!")` ### reflog the reflog stores the history of every branch, tag, and `HEAD` `.git/logs/refs/heads/main` ``` 2028ee0 c1f9a4c Julia Evans <x@y.com> 1683751582 commit: no ligatures in code ``` each line of the reflog has: - before/after commit IDs - user + - timestamp - log message ### remote-tracking branches remote-tracking branches store the most recently seen commit ID for a remote branch `.git/refs/remotes/origin/main` `a9bbcae` when git status says "you're up to date with `origin/main`", it's just looking at this ### .git/config .git/config is a config file for the repository. it's where you configure your remotes `.git/config` ``` [remote "origin"] url = git@github.com: jvns/int-exposed fetch = +refs/heads/*: refs/remotes/origin/* [branch "main"] remote = origin merge refs/heads/main ``` git has and local global settings, the local settings are here and the global ones are in `~/.gitconfig` ### hooks hooks are optional scripts that you can set up to run (eg before a commit) to do anything you want `.git/hooks/pre-commit` ``` #!/bin/bash any-commands-you-want ``` ### the staging area the staging area stores files when you're preparing to commit `.git/index` `(binary file)`

combining branches

### there are 3 options for combining branches * `merge` * `rebase` * `squash` for example, let's say we're combining these 2 branches: Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle. ### panel 2: 1. `git rebase` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan". 2. `git merge` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond. 3. `git merge --squash` Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan". ### all 3 methods result in the EXACT SAME FILES some differences are: * the diff git shows you for the final commit * the specific flavour of suffering the method causes ### merge pro: if you mess something up, the original commits are still in your branch's history pain: when I look at histories like this I feel dread Diagram: a complicated git history with a number of different branches. ### rebase pro: you can keep your git history simple: Diagram: a git history that is just a series of boxes in a straight line. pain: - harder to learn [sad face] - harder to undo [sad face] - easier to mess up [sad face] (I love rebase though!) ### squash pro: have 20 messy commits? nobody needs to know! And it's pretty simple to use. pain: "ugh, someone squashed their 3000-line branch into 1 commit"

diverged branches

### when pushing/pulling, the hardest problems are caused by diverged branches sad error messages: ``` ! [rejected] main -> main ``` (non `fast-forward`) `fatal: Not possible to fast-forward, aborting` `fatal: Need to specify how to reconcile divergent branches.` ### what are diverged branches it looks like this: Diagram with two blank boxes, followed by a box with a heart in it, that then branches out into two branches, one with a hash symbol in it, labelled "local main", and one with a squiggle in it, labelled "remote main". ### there are 4 possibilities with a remote branch 1. up to date (with a heart) Illustration of three boxes in a row, labelled both "local" and "remote" 2. need to pull Illustration of four boxes in a row. The second box in the sequence is labelled "local", the fourth branch is labelled "remote". 3. need to push Illustration of four boxes in a row. The second box in the sequence is labelled "remote", the fourth branch is labelled "local". 4. diverged (need to decide how to solve it) (sad face) Illustration of two boxes in a row, that then branches out into two branches. One of the branches has one box, labelled "remote", and the other branch has two boxes, labelled "local". ### how to tell your branches have diverged: `git status` 1. `$ git fetch` (get the latest remote state first) 2. `$ git status` Your branch and '`origin/main`' have diverged, and have 1 and 1 different commits each, respectively. (use "`git pull`" to merge the remote branch into yours) (diverged is highlighted) ### fix diverged branches before making more commits First illustration: two boxes in a row, then branches out into two branches, each with one box. It's labelled "not so bad to resolve..." Second illustration: two boxes in a row, then branches out into two branches, but each branch has a whole bunch of boxes. Illustration of a sad stick figure with curly hair. person: oh no ### there's no one solution Illustration of a smiling stick figure with curly hair. person: on the next page we'll talk about some options!

fixing diverged branches

### ways to reconcile two diverged branches Illustration of a sequence of boxes joined with lines. The first box is a star, the second box is a heart, and then it branches out into two boxes, one with a hash symbol and one with a squiggle. Hash symbol box is labelled "local main" and squiggle box is labelled "remote main" - combine the changes from both with (1) rebase or (2) merge! - throw out your local changes (3) after breaking your local branch! - throw out the remote changes (4) to get rid of something you accidentally pushed (be REAL careful with this one) ### 1. rebase ``` git pull --rebase git push ``` Illustration of four boxes (star, heart, squiggle, hash) in a straight line, labelled "local main" and "remote main" Illustration of a tiny little smiling stick figure with puffy hair in the corner of the panel. person: this one's my favourite! ### 2. merge ``` git pull --no-rebase git push ``` Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle) then reconvene into a fifth box, with a diamond in it, labelled "local main" and "remote main" ### 3. throw away local changes ``` git switch -c newbranch git switch main git reset --hard origin/main ``` (the first line is labelled "optional: save your changes on `main` to `newbranch` so they're not orphaned) Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle), which are labelled "new branch" and "local main, remote main" respectively. ### 4. throw away remote changes (DANGER!) ``` git push --force ``` Illustration of two boxes (star and heart) that then diverge into two branches one with a hash symbol, labelled "local main, remote main", and one with a squiggle, whose box is a dotted line, and that's labelled "orphan". (`--force` is always dangerous, `--force-with-lease` is a little safer) ### reasons to throw away changes - I'll throw away local changes if I accidentally committed to `main` instead of a new branch - I'll throw away remote changes if I want to amend a commit after pushing it, and I'm the only one working on that branch

losing your work

### people are always saying: Illustration of two stick figures talking. One is bald and smiling, the second has long curly hair and is frowning. person 1: don't worry! it's impossible to lose your work in git! person 2 (thinking): my lost work says otherwise but some parts of git are MUCH safer than others ### commits on a branch / tag (lock icon) never change Illustration of a smiling stick figure with curly hair. Their speech bubble is surrounded by hearts and stars. person: you can ALWAYS use the commit ID to get your work back! ### orphan commits (lock icon) never change, except... they'll eventually get deleted by git's garbage collection (usually not for a few months though) ### branches and `HEAD` (unlocked lock icon) change ALL THE TIME (clock going backwards icon) BUT there's a history of all the changes in the reflog Tiny cute illustration of a smiling stick figure with curly hair. person: the reflog is NOT easy to use but at least it's there ### staging area (unlocked lock icon) changes ALL THE TIME (crossed out clock going backwards icon) no history (sad face) just gotta be careful ### the stash (crossed out clock going backwards icon) `git stash pop` deletes entries forever ... but you can technically get them back by using `git fsck` to search EVERY SINGLE COMMIT

the current branch: HEAD

### HEAD is a tiny file containing the name of the current branch Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`". `HEAD` = `main` `main` = [squiggle] ### when you commit, git updates the current branch to point at the new commit Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`". `HEAD` = `main` `main` = [squiggle] Diagram of four boxes in a row, joined by lines. One has a heart, one has a star, one has a squiggle, and one has a spiral. The final one, with the spiral, is labelled "`main`". `HEAD` = `main` `main` = [spiral] ### SO MANY things in git use the current branch * `git commit` moves it forward * `git merge` merges into it * `git rebase` copies commits from it * `git push` and `git pull` sync it with a remote ### many git disasters are caused by accidentally running a command while on the wrong branch Illustration of a sad stick figure person: `git commit` person, thinking: UGH I didn't mean to do that on `main` ### I keep my current branch in my shell prompt `~/work/homepage (main) $` to me it's as important as knowing what directory I'm in ### panel 6 Illustration of a smiling stick figure with curly hair. person: I think `HEAD` is a weird name for the current branch (why not `CURRENT` or something?) but we're stuck with it

detached HEAD state

### `HEAD` isn't always a branch it can be a commit instead `git checkout a3ffab9` (a3ffab9 isn't a branch!) git calls this "detached `HEAD` state" ### by itself, HEAD being a commit ID is okay Illustration of a smiling stick figure with curly hair. person: it's a great way to look at an old version of your code! I do it all the time! ### the only problem is that new commits you make will be orphaned Diagram of a series of circles connected by lines, labelled "main". The first circle is labelled `HEAD`. There is a dotted line branching off `HEAD` to an additional circle. The additional circle is labelled "new commit will go here, danger! it won't be on any branch!" ### some ways `HEAD` can become a commit ID `git checkout a3ffab3` (`a3ffab3` is the commit id) `git checkout origin/main` (`origin/main` is the "remote-tracking branch") `git checkout v1.3` (`v1.3` is a tag) ### if you accidentally create some orphaned commits, it's SUPER easy to fix just create a new branch! `git switch -c newbranch` panel 6: my shell prompt tells me if `HEAD` is a commit `~/work (d63b29) $` `d63b29` tells me to avoid creating new commits (no `git commit`, `git merge`, or `git rebase`)

git branches: the rules

### branches have very few rules git lets you move branches forwards, backwards, or sideways if you want Illustration of three circles in a vertical line, with an additional branch extending out of the middle circle. The top circle is labelled "`main`". The middle circle is labelled "You could move `main` here. The circle in its own branch is labelled "or here." ### all changes to a branch are recorded in its reflog You can look at the reflog like this: `git reflog BRANCHNAME` reflog stands for "reference log" ### when you delete a branch, its reflog is deleted Illustration of a sad stick figure with short curly hair, talking to a box with a smiley face representing git. person: what if I wanted to look at the history of that branch to recover something? git: too bad! ### git will eventually delete any commit that isn't on a branch/tag/etc Illustration of four circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the second-from-bottom circle, and it is labelled "will be deleted by garbage collection after ~90 days unless you put it on a branch." ### git `branch -d` won't let you delete unmerged branches Illustration of three circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the bottom circle, labelled "my branch (not merged)" to delete an unmerged branch, you need to force it with `-D` ### rules git doesn't have about branches - when you push/pull a branch, the name doesn't have to match - the main branch doesn't have any special protections in git itself (though tools like GitHub can protect it)

git references

### git often uses the term "reference" in error messages ``` $ git switch asdf fatal: invalid reference: asdf $ git push To github.com:jvns/int-exposed ! [rejected] main -> main error: failed to push some refs to 'github.com:jvns/int-exposed' ``` "ref" and "reference" mean the same thing Illustration of a tiny worried-looking stick person with a thought bubble reading "!" ### "reference" often just means "branch" in those two error messages, you can replace "reference" with "branch" in my experience, it's: 96% "branch" 3% "tag" 3% "HEAD" 0.01% something else ### it's an umbrella term Illustration of git, represented by a box with a smiley face git, thinking: "well, I COULD check if the thing we failed to push is a branch or tag or what, and customize the error message based on that...." git, thinking: "seems complicated, let's just print out "reference"" sad person: "why?" ### reference: the definition References are files: either `.git/HEAD` or files in `.git/refs`. There are 5 main types. Here's a list of every type of git reference that I have ever used: - HEAD: `.git/HEAD` - branches: `.git/refs/heads/BRANCH` - tags: `.git/refs/tags/TAG` - remote-tracking branches: `.git/refs/remotes/REMOTE/BRANCH` - stash: `.git/refs/stash` all of these files contain a commit ID, but the way that commit ID is used depends on what type of reference it is (examples of more obscure references are `.git/FETCH_HEAD` and `.git/refs/notes/...` but I've never needed to think about those and your repository probably doesn't even have notes) ### git's garbage collection starts with references the algorithm is: 1. find all references, and every commit in every reference's reflog 2. find every commit in the history of any of those commits 3. delete every commit that wasn't found

knowing where you are in git

### many git disasters are caused by accidentally running a command while on the wrong branch... Illustration of a stick figure with a neutral expression. person: `git commit` person, thinking: UGH I didn't mean to do that on `main` ### ... or by forgetting you're in the middle of a multistep operation smiling stick figure with curly hair: la la la just writing code same person, now distressed and surrounded by exclamation marks: OMG I FORGOT I WAS IN THE MIDDLE OF A MERGE CONFLICT ### I always keep track of 2 things 1. am I on a branch, or am I in detached `HEAD` state? 2. am I in the middle of some kind of multistep operation? (`rebase`, `merge`, `bisect`, etc) ### I keep my current branch in my shell prompt `~/work/homepage (main) $` to me it's as important as knowing what directory I'm in git comes with a script to do this in bash/zsh called `git-prompt.sh` ### decoder ring for the default git shell prompt `(main)` on a branch, everything is normal `((2e832b3...))` `((v1.0.13))` the double brackets (( )) mean `detached HEAD state`. this prompt can only happen if you explicitly `git checkout` a commit/tag/remote-tracking branch `(main|CHERRY-PICK)` `(main|REBASE 1/1)` `(main|MERGING)` `(main|BISECTING)` in the middle of a cherry-pick/rebase/merge/bisect

merge commits

### merging 2 diverged branches creates a commit `git merge mybranch` Diagram of two boxes in a row, one with a heart, and one with a star. From the star, it branches out into a branch with a hash symbol, labelled `main`. The other branch coming off of the star has a box with a spiral followed by a box with a spiky symbol. The two branches converge in a box with a diamond symbol, labelled "merge commit!". merge commits have a few surprising gotchas! ### gotcha: merging isn't symmetric normal: ``` git checkout main git merge mybranch ``` weird: ``` git checkout mybranch git merge main ``` these two result in the same code, but the merge commit's parents have a different order This comes up when you use `HEAD^`: it refers to the first parent, and usually you want that to be the commit from the main branch ### gotcha: you can keep coding during a merge If you forget you're doing a merge, it's easy to accidentally keep writing code and add a bunch of unrelated changes into the merge commit. I use my prompt to remind me. ### gotcha: git show doesn't tell you what the merge commit did It'll often just show the merge commit as "empty" even if the merge did something important (like discard changes from one side). Illustration of a tiny sad stick person with curly hair person: why ### tip: see what a merge did with `git show --remerge-diff` `git show --remerge-diff COMMIT_ID` will re-merge the parents and show you the difference between the original merge and what's actually in the merge commit

interactive rebase

### git rebase -i lets you garden your commits I use it like this: 1. make commits chaotically, `git commit -am 'wip'` 2. clean up with `git rebase -i` before sending them off for code review ### interactive rebase's UI is a text file when you run `git rebase -i main`, it'll open a text editor with something like this in it: ``` pick 399990 add some padding pick fb59d8 french translation pick 617b19 sort titles pick 31b81f hashchange ``` ### deleting commits You can delete a commit just by deleting that line in the text editor! (same as previous panel but the "french translation" line is scribbled out) ### combine commits with fixup Here's how to combine all 4 commits into 1 commit: (`f` stands for `fixup`) ``` pick 399990 add some padding f fb59d8 french translation f 617b19 sort titles f 31b81f hashchange ``` ### check that the tests pass with exec You can run make test on every intermediate commit to make sure your tests pass like this: ``` git rebase -i --exec "make test" main ``` (you can also use this to format every commit's code!) ### some other tips * `reword` lets you edit a commit message * If something goes very wrong, I try to run `git rebase --abort` ASAP, because undoing rebases is annoying

submodules

### panel 1 Illustration of a smiling stick figure with curly hair. person: I find submodules confusing and I avoid them if possible, but here's what I've learned from other people's writing on submodules (especially Dmitry Mazin's great "Demystifying Git Submodules" post) ### submodules let you store another git repository as a subdirectory ``` git submodule add https://github.com/jvns/myrepo ./myrepo ``` (`jvns` is the remote, `myrepo` is the local path) Git will store the commit ID and URL of the submodule ### gotcha: cloning a repository doesn't download its submodules To get the submodules, you can run this after cloning the repository: `git submodule update --init` ### gotcha: git pull and git checkout don't update submodules gotcha: git pull and git checkout don't update submodules To actually update them, you have to run: `git submodule update` every single time you switch branches or pull ### gotcha: git submodule update puts the submodule in detached HEAD state might not be a big deal if you're only using the submodule in a read-only way, but seems like it could get weird if you're editing it ### some submodule config options automatically update submodules after a pull/checkout: `submodule.recurse true` show which commits were added/removed in `git diff/git status`: ``` status.submoduleSummary true diff.submodule log ```

git worktree

### git worktree lets you have 2 branches checked out at the same time Illustration of a smiling stick figure with curly hair, and a git worktree, represented by a box with a smiley face person: ugh, I want to take a look at this other branch, but I have all these uncommitted changes... git worktree: i can help! ### creating a worktree You can check out a branch into a new directory like this: `git worktree add ~/my/repo mybranch` (`my` is the directory, `mybranch` is the branch) Then you can run any normal git commands in the new directory: ``` $ cd ~/my/repo $ git pull ``` ### two worktrees cant have the same branch checked out Here's what happens if you try: ``` $ git checkout main fatal: main is already checked out at /home/bork/work/homepage ``` ### it's way faster (and uses less space!) than cloning the repository again Because worktrees share a .git directory, it just needs to check out the files from the branch you want to use! ### other worktree commands List all worktrees: `$ git worktree list` Delete a worktree: `$ git worktree remove ~/my/repo` ### sometimes I use worktrees to keep my .git directory and its checkout separate this lets me put the checkout in Dropbox but not the .git directory: ``` $ git clone --bare git@github.com:jvns/myrepo $ cd myrepo.git $ git worktree add ~/Dropbox/myrepo main ``` (`Dropbox` is the directory, `main` is the branch)

git add -p

### `git add -p` lets you stage some changes and not others I use this if I want to commit my real changes, but not the random debugging code I added. (this is one of the tasks GUIs and IDEs are best at, but I always use `git add -p` anyway) ### what the interface looks like ``` --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ "name": "homepage", - "version": "1.0.0", + "version": "1.0.1", "devDependencies": { - "dart-sass": "^1.25.0" + "dart-sass": "^1.26.0", (1/1) Stage this hunk [y,n,q,a,d, s,e,?]? ``` package.json is the filename lines 4-9 are the diff `[y,n,q,a,d, s,e,?]` is your choice ### y(es)/n(o)/q(uit) y means "stage this change" n means "don't" q quits, keeping what you did so far. pretty straightforward. ### how to check your work `git diff --cached` will show your staged changes ### s: split into two parts s will split a diff into smaller diffs you can say y or n to individually, like this: ``` +++ b/package.json @@ -1,7 +1,7 @@ - "version": "1.0.0", + "version": "1.0.1", "devDependencies": { ``` BUT! This only works if there's a newline between the two parts. ### how to split a diff if there's no newline You can use the e ("edit") option to edit the diff manually: - to remove a - line, replace "-" with a space - to remove a + line, delete the whole line version 1: ``` "name": "homepage", - "version": "1.0.0", - "devDependencies": { "version": "1.0.1", + "devDependenciezzz' ``` version 2: ``` "name": "homepage", - "version": "1.0.0", + "version": "1.0.1", [space] "devDependencies": [space] ``` (or you can just say 'n' and edit your code! that's what I do!)

merge conflict tips

### use `diff3` or `zdiff3` to see the original version of the code `git config --global merge.conflictstyle diff3` This will add an extra section in the middle of your merge conflicts ### if you get confused, merge (or cherry-pick) 1 commit at a time This can make the conflicts smaller and easier to resolve! `git-imerge` is a tool to make this easier, though I haven't tried it ### use rerere to remember how you resolved a conflict during a rebase `git config --global rerere.enabled true` This means you won't have to resolve the exact same conflict over and over again ### `git checkout --ours/theirs` can take all changes from one side For example `git checkout --ours file.txt` will take the version of file.txt from the "ours" side of the merge (though upsettingly the meaning of "ours" and "theirs" depends on whether you merged or rebased) ### if you can't tell which code comes from which branch, looking on the web can help Illustration of an uncertain-looking stick figure with short curly hair. person (thinking): I'll just go to GitLab and see what `file.txt` looks like on the main branch ### `git merge-tree` can check for merge conflicts without actually merging the branches ``` $ git merge-tree --write-tree main mybranch ... Auto-merging file.py CONFLICT (content): Merge conflict in file.py ```

PATH tips

### add a directory to your PATH at the end: `export PATH=$PATH:/my/dir` at the beginning: `export PATH=/my/dir/:$PATH` in fish: `set -e PATH $PATH /my/dir` (illustration of a little fish with a heart-shaped tail) ### you shell's config file bash: `.bashrc or .bash_profile` (exactly which one is a bit of a rabbit hole sadly) zsh: `~/.zshrc` fish: `~/.config/fish/config.fish` (illustration of a little fish with a heart-shaped tail) ### show what your shell is actually going to do when you run the program `type python3` instead of running what's in `PATH`, sometimes it'll run a builtin or alias or cached entry ### show the first match on your PATH for a program `which python3` (but in zsh `which` acts like `type`) ### show ALL matches on your PATH for a program, in order `which -a python3` ### look at your PATH `echo $PATH` ### show each entry on its own line `echo $PATH | tr ':' '\n'` ### clear the PATH cache (bash/zsh) `hash -r` why you might need to do this: bash and zsh cache `PATH` lookups, so sometimes updating your `PATH` doesn't work properly

PATH and finding programs

### PATH is how your shell knows where to find programs Illustration of a smiling stick figure with curly hair, and shell, represented by a box with a smiley face. person: run `python3` PATH is ``` /bin /home/bork/bin /usr/bin ``` shell, thinking: `/bin/python3`? nope, doesn't exist `/home/bork/bin/python3`? nope, doesn't exist `/usr/bin/python3`? there it is!!! I'll run that! ### how to add a program to your PATH 1. find the folder the programs is in 2. update your shell config to add it to your `PATH` 3. restart your shell, for example by opening a new terminal tab ### ...but how do you find the folder * think about how you installed it person (thinking): hmm, I used the Rust installer, where does that install things? * a brute force search `find / -name python3 | grep bin` ### `PATH` ordering drama person (thinking): ugh, no, don't run THAT `python3`, run the other one! You can prioritize a folder by adding it to the beginning of your `PATH` ### gotcha: not everything uses your shell's `PATH` cron jobs usually have a very basic `PATH`, maybe just `/bin` and `/usr/bin` In a cron job I'll use the absolute path, like: `/home/bork/bin/someprogram`

quitting in the terminal

### quitting a terminal program isn't always easy Illustration of a stick figure with short curly hair. They look distressed and have an exclamation mark above their head. person (thinking): "I pressed `Ctrl-C` 17 times and NOTHING HAPPENED" ### ways to quit - `Ctrl-C` - the default - `Ctrl-D` - if you're at a prompt in a `REPL >>>` - `q` - if it's a full screen program - `Ctrl-\` - sometimes works if `Ctrl-C` doesn't - `kill -9` - the last resort ### how `Ctrl-D` works programs that read input will usually have some code like this: ``` text = read_line() if (text == EOF) { exit() } ``` `Ctrl-D` is how you send an EOF to the program ("I'm done!") important: `Ctrl-D` ONLY works if you press it on an empty line ### how `Ctrl-C` works * `*` unless your program is in "raw mode", we'll talk about that later person, smiling: "`ctrl-C`" terminal emulator, represented by a box with a dollar sign: "ok, C is the 3rd letter of the alphabet, I'll write 3 to the tty" OS terminal driver, represented by a box labelled "OS": ah, a 3, that means I should send the `SIGINT` signal to the current program program, represented by a box with a smiley face: ooh, a `SIGINT`, I will [shutdown gracefully, immediately exit, ignore it, stop a subtask, etc] `*` unless your program is in "raw mode", we'll talk about that later ### some programs have weird quitting incantations for example every text editor (vim, nano, emacs, etc) has its own completely unique way to quit

line editing

### editing text you typed in seems so basic: `>>> print("helo")` oops, forgot an l! but there's actually no standard system ### programs need to implement even the most basic things Illustration of a little smiling stick figure with curly hair. person: "left arrow" program, represented by a box with a smiley face: "ok I will move the cursor to the left" often programs will use the readline library for this ### option 1: NOTHING person (angry): "even the ARROW KEYS don't work???" program (blissfully content): arrow keys? what's that? * Only `Ctrl-W` `Ctrl-U` and backspace work * Examples: `cat`, `nc`, `git` * You're probably in this situation if you press the left arrow key and it prints `^[[D` * You can often add readline shortcuts with `rlwrap`, like this: $ rlwrap nc ### option 2: READLINE person (neutral): "it's a little awkward but at least I can use those weird keyboard shortcuts from emacs!" * LOTS of keyboard shortcuts: `Ctrl-A` `Ctrl-E` , arrow keys, many more * You can use `Ctrl-R` to look at history * Examples: `bash`, `irb`, `psql` * If you press `Ctrl-R` and you see "reverse-i-search" , you're probably using readline * Configurable with the `~/.inputrc` config file ### option 3: CUSTOM person (smiling): "wow, I can type a multiline command without it being a total disaster?? amazing!" * The keyboard shortcuts are probably influenced by readline * Examples: `fish`, `zsh`, `ipython` * usually you only see custom implementations in bigger projects

folder gotchas

## panel 1: `ls ..` and `cd ..` refer to different folders if you `cd` to a symlinked folder `~/Dropbox -> ~/Library/CloudStorage/Dropbox` ``` cd ~ cd Dropbox ls .. cd .. ``` * `ls ..` lists `~/Library/CloudStorage` * `cd ..` moves to `~` this is because `ls` is a program and `cd` is run by the shell. The shell handles `..` differently from other programs. ## panel 2: `ls ~/Dropbox` will list the contents of the folder this is annoying if you just want to look at its permissions, or where it links to to fix this: ``` ls -d ~/Dropbox ``` ## panel 3: deleting a folder and recreating it with the exact same name makes everything weird everything you do in the folder will fail with weird errors like: ``` $ touch newfile touch: newfile: no such file or directory ``` how to fix it: ``` cd . ``` ## panel 4: on Mac OS, these are not the same: `cp -R a/ b` and `cp -R a b` * `cp -R a/ b` merges the contents of `a` into `b` * `cp -R a b` copies the whole folder into `b/a` ## panel 4: tip: `cd -` switches to the folder you were previously in ## panel 5: notes on `mv file.txt dest` * if `dest` is a file: renames `file.txt` * if `dest` is a folder: moves `file.txt` to that folder

Comics!

Saturday Morning Comics!