### every HTML element is in a box
```
<div class="1">
<div class="2" />
<div class="3" />
</div>
```
Illustration of a larger box, labelled 1. Nested inside it are two boxes. The one on top is labelled 2, and the one below 2 is labelled 3.
### boxes have padding, borders, and a margin
Illustration of a series of nested boxes. The middle box is empty. The area around the middle box is labelled "padding". The area around the padding is labelled "border". The area around the border is labelled "margin".
### width & height don't include any of those
The same illustration from the previous panel, but with two lines measuring the width and height of only the middle box, not the padding, border, or margin.
### margins are allowed to overlap sometimes
Illustration of two sets of nested boxes, similar to the diagrams above. One is on top of the other, and the area between the sets of boxes is shaded in green, showing that the bottom margin of the first set of boxes, and the top margin of the second set of boxes, overlap.
the browser combines these top/bottom margins.
look up "margin collapse" to learn more
### `box-sizing: border-box;` includes border + padding in the width/height
Illustration of a series of nested boxes with a middle box surrounded by padding, border, and margin. In this version, the lines measuring width and height extend all the way to the edge of the border (but don't include the margin surrounding the border.)
### inline elements ignore other inline elements' vertical padding
Illustration of two dotted line boxes stacked directly on top of one another. Each has the word "`span`" inside it.
you can set vertical padding but the other span won't move
### by default, bash will continue after errors
bash, represented by a box with a smiley face: oh, was that an error? who cares, let's keep running!!!
programmer, represented by a nonplussed stick figure with short curly hair: uh that is NOT what I wanted
### `set -e` stops the script on errors
```
set -e
unzip fle.zip
```
(typo! script stops here!)
programmer, smiling: this makes your scripts WAY more predictable
### by default, unset variables don't error
`rm -r "$HOME/$SOMEPTH"`
bash, happily: `$SOMEPTH` doesn't exist? no problem, i'll just use an empty string!
programmer: OH NOOOO that means `rm -rf $HOME`
### `set -u` stops the script on unset variables
```
set-u
rm -r "$HOME/$SOMEPTH"
```
bash, concerned: I've never heard of `$SOMEPTH`! STOP EVERYTHING!!!
### by default, a command failing doesn't fail the whole pipeline
`curl yxqzq.ca | grep 'panda'`
bash, pleased with itself: `curl` failed but `grep` succeeded so it's fine! success!
### `set -o pipefail` makes the pipe fail if any command fails
you can combine `set -e`, `set -u`, and `set -o pipefail` into one command I put at the top of all my scripts:
`set -euo pipefail`
cross-origin resource sharing
Cross-origin requests are not allowed by default: (because of the same origin policy!)
Javascript from clothes.com: POST request to api.clothes.com?
Firefox (thought bubble): same origin flow chart
Firefox: NOPE. api.clothes.com is a different origin from clothes.com
If you run api.clothes.com, you can allow clothes.com to make requests to it using the ```Access-Control-Allow-Origin``` header. Here's what happens:
javascript on clothes.com: ```POST /buy_thing```
```Host: api.clothes.com```
Firefox (thought bubble): That's cross-origin. I'm going to need to ask api.clothes.com if this request is allowed.
Firefox: ```OPTIONS /buy_thing```
```Host: api.clothes.com``` ("hey, what requests are allowed?" preflight request)
api.clothes.com: ```204 No Content``
```Access-Control-Allow-Origin: clothes.com```
Firefox (thought bubble): cool, the request is allowed!
Firefox: ```POST /buy_thing```
```Host: api.clothes.com```
```Referer: clothes.com/checkout```
api.clothes.com: ```200 OK```
```{"thing_bought": true}```
This OPTIONS request is called a "preflight" request, and it only happens for some requests, like we described in the diagram on the same-origin policy page. Most GET requests will just be sent by the browser without a preflight request first, but POST requests that send JSON need a preflight.
### duplication is annoying
Illustration of a frowning stick figure with curly hair.
person, thinking: ugh, I have `color: #f79` set in 27 places and now I need to change it in 27 places
### define variables in any selector
```
body {
--text-color: #f79;
body {
}
```
(applies to everything)
```
#header {
--text-color: #c50;
}
```
(applies to children of `#header`)
### use variables with `var()`
```
body {
color: var(--text-color);
}
```
(variables always start with `--`)
### do math on them with `calc()`
```
#sidebar {
width: calc(
var (--my-var) + 1em
);
}
```
### you can change a variable's value in Javascript
```
let root =
document.documentElement;
root.style.setProperty(
'--text-color', 'black');
```
### changes to variables apply immediately
JS, represented by a box with a smiley face: set `--text-color` to red
css renderer, also represented by a box with a smiley face: ok everything using it is red now!
These 15 lines of bash will start a container running the fish shell. Try it! (download this script at bit.ly/containers-arent-magic)
It only runs on Linux because these features are all Linux-only.
`wget bit.ly/fish-container -O fish.tar` (# 1. download the image)
`mkdir container-root; cd container-root`
`tar -xf ../fish.tar` (# 2. unpack image into a directory)
`cgroup_id="cgroup_$(shuf -i 1000-2000 -n 1)"` (# 3. generate random cgroup name)
`cgcreate -g "cpu, cpuacct, memory: $cgroup_id"` (# 4. make a cgroup & set CPU/memory limits)
`cgset -r cpu. shares=512 "$cgroup_id"`
`cgset -r memory.limit_in_bytes=1000000000 \`
`"$cgroup_id"`
`cgexec -g "cpu, cpuacct, memory: $cgroup_id" \ ` (# 5. use the cgroup)
`unshare -fmuipn --mount-proc\` (# 6. make and use some namespaces)
` chroot "$PWD" \` (# 7. change root directory)
`/bin/sh -c "`
`/bin/mount -t proc proc /proc &&` (# 8. use the right /proc)
`hostname container-fun-times &&` (# 9. change the hostname)
`/usr/bin/fish"` (# 10. finally, start fish!)
### your computer has physical memory
memory
868 204-PIN SODIMM DDR3 CE
### physical memory has addresses, like O-8GB
but when your program references an address like Ox 5c69a2a2, that's not a physical with memory address! It's a virtual address.
### every program has its own virtual address space
program 1: Ox 129520 → "puppies"
program 2: Ox 129520 → "bananas"
### Linux keeps a mapping, from virtual memory pages to physical memory pages called the page table
a "page" is a 4kb or chunk of memory (or sometimes bigger)
PID -- virtual addr -- physical addr
1971 -- Ox 20000 -- Ox 192000
2310 -- Ox 20000 -- Ox 228000
2310 -- Ox21000 -- Ox 9788000
### when your program accesses a virtual address
CPU: I'm accessing Ox21000
MMU "memory management unit" (hardware): I'll look that up in the page table and then access the right physical address
### every time you switch which process is running, Linux needs to switch the page table
Linux: here's the address of process 2950's page table
MMU: thanks, I'll use that now!
In 2004, if your website suddenly got popular, often the webserver wouldn't be able to handle all the requests.
slashdot:
person 1: I want cat picture!
person 2: me too!
person 3: me 300,000!
server, on fire: <no response>
web host: now you owe me $1000 for bandwidth
you: how will I pay for this?
A CDN (content delivery network) can make your site faster and save you money by caching your site and handling most requests itself.
20 million requests for 1 cute cat picture -> CDN (many powerful computers) ->
just 1 request: hey send me that cat picture?
server: here you go!
Today, there are many free or cheap CDN services available, which means if your site gets popular suddenly you can easily keep it running!
This is great but caching can cause problems too!
I updated my site yesterday but people are still seeing the old site!
(Cache-Control header)
French users are seeing the English site?!? Why?
(Vary header)
Next, we'll explain the HTTP headers your CDN or browser uses to decide how to do caching.
### HTML elements default to inline or block
example inline elements: `<a> <span> <strong> <i> <small> <abbr> <img> <q> <code>`
example block elements: `<p> <div> <ol> <ul><li> <h1> <h6> <blockquote> <pre>`
### inline elements are laid out horizontally
text text text `<a>` text text
text text `<span>` text text
### block elements are laid out vertically by default
`<div>`
`<p>`
to get a different layout, use `display: flex` or `display: grid`
### inline elements ignore width & height*
Setting the width is impossible, but in some situations, you can use `line-height` to change the height
`*` img is an exception to this: look up "replaced elements" for more
### display can force an element to be inline or block
`display` determines 2 things:
1. whether the element itself is `inline`, `block`, `inline-block`, etc
2. how child elements are laid out (`grid`, `flex`, `table`, `default`, etc)
### display: inline-block;
TRY ME!
`inline-block` makes a block element be laid out horizontally like an inline element
inline text
more inline text
inline-block
inline text
[manager]
I used to ask for feedback like this:
Illustration of two stick figures, both smiling. Person 1, the employee, has short curly hair, and person 2, the manager, doesn't have hair.
person 1 (speech bubble): dо you have any feedback for me?
person 2 (speech bubble): not right now!
person 1 (thought bubble): is there something they're not telling me?
person 2 (thought bubble): what specifically does she want feed back on?
I've learned that I get WAY BETTER answers if I ask more specific questions!
- what do you think of this design?
- did I prioritize these things well?
- should I be doing more or less of X?
- do you have any concerns about PROJECT?
- was that email clear?
Bonus: asking specific questions forces me to actually think about which areas I might want to focus on.
### ip
(Linux only)
lets you view + change network configuration.
`ip OBJECT COMMAND`
(`OBJECT` = addr, link neigh, etc)
(`COMMAND` = add, show, delete, etc)
Here are some ways to use it!
### ip addr list
shows ip addresses your devices. Look for something like this:
```
2: eth0:
link/ether 3c:97...
inet 192.168.1.170/24
```
### ip route list
displays the route table.
`default via 192.168.1.1` (my router)
`169.240.0.0/16 dev docker`
`...`
to see all route tables:
`ip route list table all`
### change your MAC address
good for cafés with time limits (devil face emoji)
```
$ ip link set wlan0 down
$ ip link set eth0 address
3ca9f4d1:00:32
$ ip link set wlan0 up
$ service network-manager
restart
```
(or whatever you use)
### `ip link`
network devices! (like eth0)
### `ip neigh`
view/edit the ARP table
### `ip xfrm`
is for IPsec
### `ip route get IP`
what route will packets with $IP take?
### `--color`
(the letters of "color" are in various rainbow colours)
pretty colourful output!
### `-- brief`
show a summary
`https://examplecat.com:443/cats?color=light%20gray#banana`
- scheme (`https://`): Protocol to use for the request. Encrypted (`https`), insecure (`http`), or something else entirely (`ftp`).
- domain (`examplecat.com`): Where to send the request. For HTTP(s) requests, the Host header gets set to this (`Host: example.com`)
- port (`:443`): Defaults to 80 for HTTP and 443 for HTTPS.
- path (`/cats`): Path to ask the server for. The path and the query parameters are combined in the request, like: `GET /cats?color=light%20gray HTTP/1/1`
- query parameters (`color=light gray`): Query parameters are usually used to ask for a different version of a page ("I want a light gray cat!"). Example:
`hair-short&color=black&name=mr%20darcy`. Hair is the name, short is the value, separated by &
- URL (`encoding %20`): URLS aren't allowed to have certain special characters like spaces, @, etc. So to put them in a URL you need to percent encode them as % + hex representation of ASCII value. space is %20, % is %25, etc.
- fragment id (`#banana`): This isn't sent to the server at all. It's used either to jump to an HTML tag (`<a id="banana"..>`) or by Javascript on the page.
### CSS has 2 kinds of units: absolute & relative
absolute:
- px
- pt
- pc
- in
- cm
- mm
relative
- em
- rem
- vw
- vh
- %
### `rem`
the root element's font size
`1rem` is the same everywhere in the document. `rem` is a good unit for setting font sizes!
### `em`
the parent element's font size
```
.child {
font-size: 1.5em;
}
```
Illustration of a box labelled "parent". Inside it is a box labelled, in larger text, "child". An arrow is pointing to the "child" text, labelled "font size is 1.5 x parent".
### O is the same in all units
```
.btn {
margin: 0;
}
```
also, `0` is different from `none`. `border: 0` sets the border width and `border: none` sets the style
### 1 inch = 96 px
on a screen, 1 CSS "inch" isn't really an inch, and 1 CSS "pixel" isn't really a screen pixel. look up "device pixel ratio" for more.
### rem & em help with accessibility
```
.modal {
width: 20rem;
}
```
this scales nicely if the user increases their browser's default font size
person: every user has a different email right?
1 query later... person, now sad: oh no
This query uses `HAVING` to find all emails that are shared by more than one user:
```
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1
```
users:
id 1, email asdf@fake.com
id 2, email bob@builder.com
id 3, email asdf@fake.com
query output:
email asdf@fake.com, `COUNT`(*) 2
`HAVING` is like `WHERE`, but with 1 difference: `HAVING` filters rows AFTER grouping and `WHERE` filters rows BEFORE grouping.
Because of this, you can use aggregates (like `COUNT` (*)) in a `HAVING` clause but not with `WHERE`.
Here's another `HAVING` example that finds months with more than $6.00 in income:
```
SELECT month
FROM sales
GROUP BY month
HAVING SUM(price) > 6
```
sales:
month: Jan. item: catnip price: 5
month: Feb item: laser price: 8
month: March item: food price: 4
month: March item: food price: 3
query output:
month: Feb
month: March
To wrap up, let's talk about one last wizard skill: confidence. When there's a hard project, sometimes I think:
maybe someone better than me should work on this?
and I imagine this magical human:
- codes really fast
- knows everything about every technology
- understands the business well
- great communicator
- has time for the project
- 20 years of experience
But in programming:
- we're changing the tech we use all the time.
- every project is different, and it's rarely obvious how to do it.
- there aren't many experts, and they certainly don't have time to do everything.
So instead, we have me:
- learns fast
- works hard
- 6 years of experience
- good at debugging
I figure "someone's gotta do this' write down a plan, and get started! A lot of the time, it turns out well. I learn something and feel a little more like a WIZARD.
HTTP requests always have:
- a domain (like `examplecat.com`)
- a resource (like `/cat.png`)
- a method (`GET`, `POST`, or something else)
- headers (extra information for the server)
There's an optional request body. `GET` requests usually don't have a body, and `POST` requests usually do.
This is an HTTP 1.1 request for `examplecat.com/cat.png`. It's a `GET` request, which is what happens when you type a URL in your browser. It doesn't have a body.
```
GET /cat.png HTTP/1.1
Host: examplecat.com
User-Agent: Mozilla...
Cookie: .....
```
`GET` = method (usually GET or POST)
`/cat.png` = resource being requested
`HTTP/1.1` = HTTP version
`examplecat.com` = domain being requested, header
`User-Agent: Mozilla`... = header
`Cookie: .....` = header
Here's an example POST request with a JSON body:
```
POST /add_cat HTTP/1.1
Host: examplecat.com
content type of body
Content-Type: application/json
Content-Length: 20
```
{"name": "mr darcy"}
`POST` = method
`Host: examplecat.com` = header
`Content-Type: application/json` = content type of body, header
`Content-Length: 20` = header
`{"name": "mr darcy"}` = request body: the JSON we're the server sending to
### HTTP responses have:
- a status code (200 OK! 404 not found!)
- headers
- a body (HTML, an image, JSON, etc)
### Here's the HTTP response from `examplecat.com/cat.txt`:
```
HTTP/1.1 200 OK
status
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Content-Length: 33
Content-Type: text/plain; charset=UTF-8 Date: Mon, 09 Sep 2019 01:57:35 GMT
Etag: "ac5affa59f554a1440043537ae973790-ssl"
Strict-Transport-Security: max-age=31536000
Age: 0
Server: Netlify
[ASCII image of a cat, labelled "cat!" with a smiley face]
```
The first line, `HTTP/1.1 200 OK` is the status code. "200" is the status.
The lines from `Accept-Ranges` to `Server` are the headers.
The cat picture is the body.
### There are a few kinds of response headers:
- when the resource was sent/modified:
```
Date: Mon, 09 Sep 2019 01:57:35 GMT
Last-Modified: 3 Feb 2017 13:00:00 GMT
```
- about the response body:
```
Content-Language: en-US
Content-Length: 33
Content-Type: text/plain; charset=UTF-8
Content-Encoding: gzip
```
- caching:
```
ETag: "ac5affa..."
Vary: Accept-Encoding
Age: 255
Cache-Control: public, max-age=0
```
- security: (see page 25)
```
X-Frame-Options: DENY
X-XSS-Protection: 1
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: default-src https:
```
- and more:
```
Connection: keep-alive
Accept-Ranges: bytes
Via: nginx
Set-Cookie: cat-darcy; HttpOnly; expires=27-Feb-2020 13:18:57 GMT;
```
By default, if you run `SELECT * FROM cats WHERE name = 'mr darcy'` the database needs to look at every single row to find matches.
database, sad: reading 30 GB of data from disk takes like 60 seconds by itself, you know!
(at 500 MB/s SSD speed)
Indexes are a tree structure that makes it faster to find rows. Here's what an index on the 'name' column might look like.
a-z
aaron to ahmed
aaron to abdullah
agnes to ahmed
molly to nasir
60 children
waseem to zahra
database indexes are b-trees and the nodes have lots of children (like 60) instead of just 2.
log <sub>60</sub> (1,000,000,000) = 5.06
This means that if you have 1 billion names to look through, you'll only need to look at maybe nodes in the index to find the name you're looking for (5 is a lot less than 1 billion!!!).
person 1: are you saying indexes can make my queries 1,000,000x faster?
person 2: yes! actually some queries. on large tables are basically impossible (or would take weeks) without using an index!
Cookies are a way for a server to store a little bit of information in your browser.
They're set with the `Set-Cookie` response header, like this:
### first request: server sets a cookie
browser, represented by a box with a smiley face: `GET /my-cats`
server, also represented by a box with a smiley face:
```
200 OK
Set-Cookie: user = b0rk; HttpOnly
<response body>
```
(`user` is the name, `b0rk` is the value. `HttpOnly` is the cookie options (expiry goes here))
### Every request after: browser sends the cookie back
browser:
```
GET /my-cats
Cookie: user= b0rk
```
server, thinking: oh, this is b0rk! I don't need to ask them who they are then!
Cookies are used by many websites to keep you logged in. Instead of `user=b0rk` they'll set a cookie like `sessionid=long-incomprehensible-id`. This is important because if they just set a simple cookie like `user=b0rk`, anyone could pretend to be b0rk by setting that cookie!
Designing a secure login system with cookies is quite difficult— to learn more about it, google "OWASP Session Management Cheat Sheet".
### On Linux, you start new processes using the fork() or clone() system call.
calling fork creates a child process that's a copy of the caller
### the cloned process has EXACTLY the same memory.
- same heap
- same stack
- same memory maps
if the parent has 36B of memory, the child will too.
### copying all that memory every time we fork would be slow and a waste of RAM
often processes call `exec` right after `fork`, which means they don't use the parent process's memory basically at all!
### so Linux lets them share physical RAM and only copies the memory when one of them tries to write
process: I'd like to change that memory
Linux: okay! I'll make you your own copy!
### Linux does this by giving both the processes identical page tables.
(same RAM)
but it marks every page as read only.
### when a process tries to write to a shared memory address:
1. there's a page fault=
2. Linux makes a copy of the page & updates the page table
3. the process continues, blissfully ignorant
process, happily: It's just like I have my own copy
HTTP/2 is a new version of HTTP. Here's what you need to know:
### A lot isn't changing
All the methods, status codes, request/response bodies, and headers mean exactly the same thing in HTTP/2.
before (HTTP/1.1):
```
method: GET
path: /cat.gif
headers:
- Host: examplecat.com
- User-Agent: curl
```
after (HTTP/2):
```
method: GET
path: /cat.gif
authority: examplecat.com
headers:
- User-Agent: curl
```
one change:
Host header => authority
#### HTTP/2 is faster
Even though the data sent is the same, the way HTTP/2 sends it is different. The main differences are:
- It's a binary format (it's harder to ```tcpdump``` traffic and debug)
- Headers are compressed
- Multiple requests can be sent on the same connection at a time
before (HTTP/1.1):
→ request 1
response 1 ←
→ request 2
response 2 ←
after (HTTP/2):
→ request 1
→ request 2
response 2 ←
response 1 ← (out of order is ok)
(one TCP connection)
All these changes together mean that HTTP/2 requests often take less time than the same HTTP/1.1 requests.
### Sometimes you can switch to it easily
A lot of software (CDNs, nginx) let clients connect with HTTP/2 even if your server still only supports HTTP/1.1.
1. Firefox to CDN: HTTP/2 request
2. CDN to your server: HTTP/1.1 request
3. your server to CDN: HTTP/1.1 response
4. CDN to Firefox: HTTP/2 response
### CSS seems simple at first
```
h2 {
font-size: 22px;
}
```
Illustration of a smiling stick figure with curly hair.
person: ok this is easy!
### and it is easy for simple tasks
image of a page with header and text underneath
a layout like this is simple to implement!
### but website layout is not an easy problem
image of a page with a logo, header, text, sidebar, and multiple images
this needs to adjust to so many screen sizes!
### the spec can be surprising
TRY ME!
CSS 2.1: setting `overflow: hidden;` on an inline-block element changes its vertical alignment
Illustration of a stick figure with curly hair, looking worried.
person: weird!
### and all browsers have bugs
Safari: I don't support flexbox for `<summary>` elements
person: ok fine
### accept that writing CSS is gonna take time
person: if I'm patient I can fix all the edge cases in my CSS and make my site look great everywhere!
### what's mmap for?
person 1: I want to work with a VERY LARGE FILE but it won't fit in memory
person 2: You could try mmap!
(mmap = "memory map")
### load files lazily with mmap
When you mmap a file, it gets mapped into your program's memory.
2 TB file: 2 TB of virtual memory
but nothing is ACTUALLY read into RAM until you try to access the memory.
(how it works: page faults!)
### how to mmap in Python
```
import mmap f= open("HUGE.txt")
mm= mmap.mmap (f. filenol), 0)
```
(this won't read the file from disk! Finishes ~instantly.)
`print (mm C-1000:7)`
this will read only the last 1000 bytes!
### sharing big files with mmap
three processes: we all want to read the same file!
mmap: no problem!
Even if 10 processes mmap a file, it will only. be read into memory once
### dynamic linking uses mmap
program: I need to use libc.so.6 (standard library)
ld dynamic linker: you too eh? no problem. I always mmap, so that file is probably loaded into memory already.
### anonymous memory maps
- not from a file (memory set to by default)
- with `MAP.SHARED`, you can use them to share memory with a subprocess!
### panel 1: unix programs have 1 input and 2 outputs
When you run a command from a terminal, the input & outputs go
to/from the terminal by default.
Picture of a program (represented by a box with a smiley face) with 1 arrow coming in and 2 arrows out. The arrows are numbered 0, 1, and 2, and there's a
comment: "each input/output has a number, its "file descriptor")
**arrow 0 (coming into program): `<` redirects stdin**
`wc < file.txt` and `cat file.txt | wc` both read `file.txt` to wc's stdin
```
wc < file.txt
cat file.txt
```
**arrow 1 (coming out of program): `>` redirects stdout**
```
cmd > file.txt
```
**arrow 2 (coming out of program): `2>` redirects stderr**
```
cmd 2> file.txt
```
### panel 2: `2>&1` redirects stderr to stdout
```
cmd > file.txt 2>&1
```
Illustration of cmd, represented by a box with a smiley face. There is one arrow, labelled "sdout(1)", leading to a box labelled "file.txt". There is a second arrow coming out of cmd, labelled "stderr(2)". Then, there's a squiggly third arrow, labelled "2>&1", that leads from "stderr(2)" to "file.txt".
### panel 3: `/dev/null`
your operating system ignores all writes to `/dev/null`
```
cmd > /dev/null
```
picture of stdout going to a trash can (`/dev/null`) and stderr still going to the terminal
### panel 2: sudo doesn't addect redirects
your bash shell opens a file to redirect to it, and it's running as you. So
```
$ sudo echo x > /etc/xyz
```
won't work. do this instead:
```
$ sudo echo x | tee /etc/xyz
```
### shellcheck finds problems with your shell scripts
`$ shellcheck my-script.sh`
shellcheck: oops, you can't use in an `if [ ... ]`!
### it checks for hundreds of common shell scripting errors
shellcheck: hey, that's a bash- only feature but your script starts with `#!/bin/sh`
### every shellcheck error has a number (like "SC2013")
and the shellcheck wiki has a page for every error with examples! I've learned a lot from the wiki.
### it even tells you about misused commands
shellcheck: hey, it looks like you're not using `grep` correctly here
person: wow I'm not! thanks!
### your text editor probably has a shellcheck plugin
shellcheck: I can check your shell scripts every time you save!
### basically, you should probably use it
bash has too many weird edge cases for me to remember, I love that shellcheck can help me out!
### networking protocols are complicated
book: TCP/IP Illustrated, Volume 1, by Stevens (600 pages)
person: what if I just want to download a cat picture?
### Unix systems have an API called the "socket API" that makes it easier to make network connections
Unix: you don't need to know how TCP works. I'll take care of it!
### here's what getting a cat picture with the Socket API looks like:
1. Create a socket: `fd= socket(AF_INET, SOCK-STREAM...)`
2. Connect to an IP/port: `connect (fd, 12.13.14.15:80)`
3. Make a request: `write (fd, "GET /cat.png HTTP/I.I...)`
4. Read the response: `cat-picture= read (fd...)`
### Every HTTP library uses sockets under the hood
`$curl awesome.com`
Python: `requests.get("yay.us")"`
(sockets)
person: oh, cool, I could write an HTTP library too if I wanted`*`. Neat!
`*` SO MANY edge cases though! :)
### AF_INET? What's that?
AF-INET means basically "internet socket": it lets you connect to other computers on the internet using their IP address.
The main alternative is AF-UNIX ("unix domain socket") for connecting to programs on the same computer.
### 3 kinds of internet (AF INET) sockets:
1. `SOCK_STREAM` = TCP (curl uses this)
2. `SOCK_DGRAM` = UDP (dig (DNS) uses this)
3. `SOCK.RAW` = just let me send IP packets. I will implement my own protocol. (ping uses this)
### Unix systems use integers to track open files
Process, represented by a box with a smiley face: Open `foo.txt`
kernel, also represented by a box with a smiley face: okay! that's
file #7 for you.
these integers are called file descriptors
### `lsof` (list open files) will show you a process's open files
`$lsof -P 4242`
(4242 is the PID we're interested in)
FD NAME
```
0 /dev/pts/tty1
1 /dev/pts/tty1
2 pipe: 29174
3 /home/bork/awesome.txt
5 /tmp/
```
(FD is for file descriptor)
### file descriptors can refer to:
- files on disk
- pipes
- sockets (network connections)
- terminals (like `xterm`)
- devices (your speaker! `/dev/null`!)
- LOTS MORE (`event fd`, `inotify`, `signalfo`, `epoll`, etc.)
little tiny smiling stick figure: not EVERYTHING on Unix is a file, but lots of things are
### When you read or write to a file/pipe/network connection you do that using a file descriptor
person: connect to google.com
OS: ok! fd is 5!
person: write GET / HTTP/1.1) to fd #5
OS: done!
### Let's see how some simple Python code works under the hood:
Python:
```
f = open ("file.txt")
f. read lines()
```
Behind the scenes:
Python program: open file.txt
OS: ok! fd is 4
Python program: read from file #4
OS: here are the contents!
### (almost) every process has 3 standard FDs:
- `stdin`: 0
- `stdout`: 1
- `stderr`: 2
"read from stdin"
means
"read from the file descriptor O"
(could be a pipe or file or terminal)
### a container is a group of Linux processes
Illustration of a smiling stick figure with curly hair.
person: on a Mac, all your containers are actually running in a Linux virtual machine
### panel 2
person: I started 'top' in a container. Here's what that looks like in ps:
- outside the container
```
$ ps aux grep top
USER PID START COMMAND
root 23540 20:55 top
bork 23546 20:57 top
```
- inside the container
```
$ ps aux | grep top
USER PID START COMMAND
root 25 20:55 top
```
(`root 23540 20:55 top` and `root 25 20:55 top` are the same process!)
### container processes can do anything a normal process can...
Illustration of a smiling stick figure with curly hair, and Linux, represented by its penguin mascot
person: I want my container to do X Y Z W!
Linux: sure! your computer, your rules!
### but usually they have restrictions
(there are drawings of locks on either side of the word "restrictions")
Illustration of a container, represented by a box with a smiley face. Around it are arrows with the following labels:
- different PID namespace
- different root directory
- cgroup memory limit
- limited capabilities
- not allowed to run some system calls
### the restrictions are enforced by the Linux kernel
Linux: NO, you can't have more memory!
person: on the next page we'll list all the kernel features that make this work!
### panel 1
Illustration of a smiling stick figure with curly hair.
person: CSS grid is a big topic, so I just want to show you one of my favourite grid features: areas!
### let's say you want to build a layout
Illustration of a long rectangle, labelled "header". Underneath it are two rectangles, side by side, labelled "sidebar" and "content"
### `grid-template-areas` lets you define your layout in an almost visual way
```
grid-template-areas:
"header header"
"sidebar content"
```
I think of it like this:
Illustration of a two rectangles side-by-side, both labelled "header". Underneath them are two rectangles, side by side, labelled "sidebar" and "content"
### write your HTML
```
<div class="grid">
<div class="top"></div>
<div class="side"></div>
<div class="main"></div>
</div>
```
### 2. define the areas
```
.grid {
display: grid;
grid-template-columns: 200px 800px;
grid-template-areas:"header header"
"sidebar content";
}
```
### 3. set grid-area
```
.top {grid-area: header}
.side {grid-area: sidebar}
.main {grid-area: content}
```
result:
Illustration of a long rectangle, labelled "`.top`". Underneath it are two rectangles, side by side, labelled "`.side`" and "`.main`"
### `position: absolute;` doesn't mean absolutely positioned on the page...
```
#star {
position: absolute;
top: 1em;
left: 1em;
}
```
doesn't always place element at the top left of the page!
### ... it's relative to the "containing block"
the "containing block" is the closest ancestor with a `position` that isn't `static`, or the body if there isn't one. (`position: static` is the default)
Illustration of a larger box, labelled "body", with a smaller box, labelled "`#star` nested inside it. The smaller box is off-centre within the larger box. The smaller box is labelled "this element has `position: relative` set"
### `top, bottom, left, right` will place an absolutely positioned element
```
top: 50%;
bottom: 2em;
right: 30px;
left: -2in;
```
"`left: -2in;`" is labelled "negative works too"
Illustration of two overlapping boxes. The top of the smaller one is halfway down the height of the larger one. The gap between the tops of the two boxes is labelled "50%". The smaller one extends to the left of the larger one, representing "`left: -2in;`", and its right and bottom sides are nested inside the larger one, representing "`right: 30px;`" and "`bottom: 2em;`".
### left: 0; right: 0; != width: 100%;
`left: 0; right: 0;`
Illustration of two boxes. The smaller box is nested within the larger box. It is the same width as the larger box, and is aligned to the top of it. This illustration is labelled "left and right borders are both 0px away from containing block".
`width: 100%;`
Illustration of two boxes. The smaller box is nested within the larger box, but its right edge extends past the right edge of the larger box. This illustration is labelled "width is the same as width of containing block".
### absolutely positioned elements are taken out of the normal flow
Illustration of two stick figures having a conversation.
Person 1: will a parent element expand to fit an absolutely positioned
child?
Person 2: nope!
### panel 1: web design is really hard
Illustration of a stick figure with short curly hair, looking pensive.
person (thinking): "wow, forms are way more complicated than I thought"
### panel 2: writing CSS is also hard
person (thinking): "ok, how exactly does flexbox work again?"
### panel 3: remember that they're 2 different skills
person (thinking): "hmm, I have NO IDEA what I want this site to look like,
maybe that's the problem and not CSS"
### panel 4: CSS is easier when you have a good design
Illustration of a box with smaller boxes arrayed inside it.
person (thinking, and now smiling): "I can make it look like that!"
### panel 5: usually you have to adjust the design
person (thinking): "oh right, I didn't think about how that menu should look on desktop"
### panel 6: sketching a design in advance can help!
Illustration of a box with text reading "title", and a grid of smaller boxes underneath.
even a simple sketch can help you think!
Every HTTP request has a method. It's the first thing in the first line:
`GET /cat.png HTTP/1.1`
`GET` means it's a `GET` request
There are 9 methods in the HTTP standard. 80% of the time you'll only use 2 (`GET` and `POST`).
### `GET`
When you type an URL into your browser, that's a `GET` request.
examplecat.com/cat.png
client, represented by a box with a smiley face:
```
GET /cat.png
Host: examplecat.com
```
server, also represented by a box with a smiley face:
```
200 OK
Content-Type: image/png
<the cat picture>
```
### `POST`
When you hit submit on a form, that's (usually) a `POST` request.
client:
```
POST /add_cat
Content-Type: application/json
{"name": "mr darcy"}
```
(`POST` requests usually have a request body)
server:
```
200 OK
Content-Type: text/html
<after sign up page>
```
The big difference between `GET` and `POST` is that `GET`s are never supposed to change anything on the server.
### `HEAD`
Returns the same result as GET, but without the response body.
client:
```
HEAD /cat.png
```
server:
```
200 OK
Content-Type: image/png
```
(no image, just headers)
### the same process has different PIDs in different namespaces
PID in host / PID in container
23512 / 1 (PID 1 is special)
23513 / 4
23518 / 12
### PID namespaces are in a tree
Diagram showing "host PID namespace (the root)" with three arrows coming down from it, each pointing to a label that says "child".
Often the tree is just 1 level deep (every child is a container)
### you can see processes in child PID namespaces
Illustration of a host, represented by a box with heart eyes and a big smile.
host: aw! look at all those containers running!
### if PID 1 exits, everyone gets killed
Illustration of PID 1, represented by a box with a smiley face, and Linux, represented by its penguin mascot.
PID 1: ok I'm done!
Linux: I'm kill -q'ing everyone else in this PID namespace IMMEDIATELY
### Killing PID 1 accidentally would be bad
Illustration of a container process, represented by a box with a smiley face, and Linux, represented by its penguin mascot.
container process: `kill 1`
Linux: do you WANT everyone to die? I'm not gonna let you do that
### rules for signaling PID 1
- from same container: only works if the process has set a signal handler
- from the host: only SIGKILL and SIGSTOP are ok, or if there's a signal handler
### panel 1: defining functions is easy
```
say_hello() {
echo "hello!"
}
```
and so is calling them:
```
say_hello
```
(no parentheses when calling a function!
### panel 2: functions have exit codes
```
failing_function () {
return 1
}
```
`0` is a success, everything else is a failure. A program's exit codes work the same way -- 0 is success, everything else is failure.
### panel 3: you can't return a string
you can only return an exit code from 0 to 255
### panel 4: arguments are `$1`, `$2`, `$3`, etc
```
say_hello() {
echo "Hello, $1!"
}
say_hello "Ahmed"
```
the above code prints `Hello, Ahmed!`. Again, `say_hello "Ahmed"`, not `say_hello("Ahmed")`
### panel 5: The `local` keyword declares local variables
```
say_hello() {
local x
x=$(date) # this is a local variable
y=$(date) # this is a global variable
}
```
### panel 6: `local x=VALUE` suppresses errors
this line never fails, even if `asdf` doesn't exist:
```
local x=$(asdf)
```
but this will fail (as you would expect) -- if you have `set -e` set, it'll stop the program
```
local x
x=$(asdf) # this line will fail
```
person: "I really have NO IDEA why it's like this, bash is weird sometimes"
### a z-index can push an element up/down...
```
.first {
z-index: 3;
}
. second {
z-index: 0;
}
```
Illustration of two boxes. The one labelled "`.first`" is layered over top of the other one.
### TRY ME: but a higher z-index doesn't always put an element on top
Illustration of a box labelled "`z-index: 0`". On top of that is a box labelled "`z-index: 10`". Another box is on top of that one. Layered over top of all of these is a box labelled "`z-index: 2`".
`z-index: 2` is on top! why?
### every element is in a stacking context
The same illustration as the previous panel, but a label pointing to both the "`z-index: 10`" and "`z-index: 2`" boxes says, "these 2 elements are in different
stacking contexts"
### a stacking context is like a Photoshop layer
Illustration of two boxes, each with three smiley faces and an "ok" button in it, one layered on top of the other. These are labelled "two 'layers'".
by default, an element's children share its stacking context
### setting z-index creates a stacking context
```
#modal {
z-index: 5;
position: absolute;
}
```
this is a common way to create a stacking context
### stacking contexts are confusing
You can do a lot without understanding them at all. But if `z-index` ever isn't working the way you expect, that's the day to learn about stacking contexts (smiley face)
### there are 4 ways to set padding
`padding: 1em;`
(all sides)
`padding: 1em 2em;`
(first value is vertical, second is horizontal)
`padding: 1em 2em 3em;`
(first value is top, second is horizontal, third is bottom)
`padding: 1em 2em 3em 4em;`
(first value is top, second is right, third is bottom, fourth is left)
### tricks to remember the order
1. trouble
top
right
left
bottom
2. it's clockwise
### you can also set padding on just 1 side
```
padding-top: 1em;
padding-right: 10px;
padding-bottom: 3em;
padding-left: 4em;
```
### TRY ME: differences between padding & margin
- padding is "inside" an element: the background color covers the padding, you can click padding to click an element, etc. Margin is "outside".
- you can center with margin: auto, but not with padding
- margins can be negative, padding can't
### margin syntax is the same as padding
`border-width` also uses the same order:
top, right, bottom, left
[dns]
### there are 2 ways to set up DNS for a website
1. set an A record with an IP
`www.cats.com A 1.2.3.4`
2. set a CNAME record with a domain name
`www.cats.com CNAME cats.github.io`
### CNAME records redirect every DNS record, not just the IP
I like to use them whenever possible so that if my web host's IP changes, I don't need to change anything!
### what actually happens during a CNAME redirect
Illustration of a conversation between a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown.
resolver: what's the A record for `www.cats.com`?
authoritative nameserver: `www.cats.com CNAME cats.github.io`
resolver (thinking): okay, I'll look up the A record for `cats.github.io`!
### rules for when you can use CNAME records
1. you can only set CNAME records on subdomains (like `www.example.com`), not root domains (like `example.com`)
2. if you have a CNAME record for a subdomain, that subdomain can't have any other records
(technically you can ignore these rules, but it can cause problems, the RFCs say you shouldn't, and many DNS providers enforce these rules)
### some DNS providers have workarounds to support CNAME for root domains
Look up "CNAME flattening" or "ANAME" to learn more.
[manager]
### Every so often I'll start with a small problem
Illustration of a stick figure with short curly hair, looking nonplussed.
employee: hmm this isn't great
### and forget to talk about it until I'm REALLY MAD
Illustration of a stick figure with short curly hair, looking very upset, and another stick figure, the manager, who has medium length straight hair, and looks confused, with question marks over their head.
employee: THIS IS TERRIBLE
manager, thinking: whoa where did that come from?
### It's way better to bring up a problem early and figure it out before it turns into a big deal!
Illustration of a stick figure with short curly hair, looking nonplussed, and their manager, a stick figure with medium length straight hair, who is smiling.
employee: I got paged 15 times this week, can we talk about how to improve this?
manager: yes let's work on that!
### display: flex;
set on a parent element to lay out its children with a flexbox layout.
by default, it sets `flex-direction: row;`
### flex-direction: row;
Illustration of three boxes, one with a star, one with a heart, and one with a starburst. They are side-by-side in a single row.
by default, children are laid out in a single row.
the other option is `flex-direction: column`
### flex-wrap: wrap;
Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side, then an arrow winds around to the starburst box, which is underneath the other two, aligned to the left.
will wrap instead of shrinking everything to fit on one line
### justify-content: center;
Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The star and heart boxes are side-by-side. The starburst box is centred underneath them.
horizontally center (or vertically if you've set `flex-direction: column`)
### align-items: center;
Illustration of three boxes, one with a star, one with a heart, and one with a starburst. The boxes are different heights, and are placed side-by-side in a single row, centred horizontally.
vertically center (or horizontally if you've set `flex-direction: column`)
### you can nest flexboxes
A box labelled `display: flex`. Inside it are two smaller boxes, side-by-side. Each is also labelled `display: flex`. One of the smaller boxes has three boxes side-by-side in it. The other smaller box has three boxes stacked on top of one another, inside it.
[dns]
### to make a subdomain, you just have to set a DNS record!
To set up cats.yourdomain.com, create a DNS record like this in your authoritative nameservers:
cats.yourdomain.com A 1.2.3.4
yourdomain is the name
A is the record type
1.2.3.4 is the value
### there are 2 ways a nameserver can handle subdomains
1. Store their DNS records itself
nameserver, represented by a box with a smiley face wearing a crown: here's the IP for cats.yourdomain.com!
2. Redirect to another authoritative nameserver
(this happens if you set an NS record for the subdomain, it's called "delegation")
nameserver: ask this other DNS server instead!
### you can create multiple levels of subdomains
For example, you can make:
a.b.c.d.e.f.g.example.com
up to 127 levels is allowed!
### www is a common subdomain
Usually www.yourdomain.com and yourdomain.com point to the exact same IP address.
If you wanted to confuse people, you could make them totally different websites!
### panel 5
Illustration of a smiling stick figure with curly hair.
person: I love using subdomains for my projects (like dns-lookup.jvns.ca) because they're free, I can give a subdomain a different IP, and it keeps projects separate.
[manager]
Where I work, my manager wants people on the team to get promoted. If people are being promoted, it (hopefully) means that they're growing & getting more awesome at their jobs, which makes the team's manager look good!
Illustration of a smiling stick figure with short curly hair.
person, thinking: huh, maybe promotions are just a normal thing we can have a conversation about?
Some ways to start conversations:
- can we walk through the expectations for the next level to make sure I understand them?
- what areas do you think I should focus on?
- if I accomplished X Y Z, do you think that would be enough to get promoted?
If this is something you care about, keep checking in periodically! The person who cares the most about your career is you ♡♡
I've had periods with some managers where, every time we talk, we're talking about SOME problem:
Two illustrations of the same stick figure with curly hair, looking unhappy.
me: why did y happen?
me: X has been a problem for a year and it's STILL not fixed
These days, I try to bring up problems that I'm interested in fixing and bring ideas for solutions when I can. Often we just talk about our work:
Each item is illustrated with a smiling stick figure with curly hair saying them.
- here's an idea I had...
- my intern is doing awesome work!
- did you see that great thing this other team did?
- here's an interesting bug from this past week...
- I thought of an onboarding project for the new person!
Sometimes venting can be useful too, though! If there's a problem, it's often helpful to bring it up even if I don't have a solution.
Let's start with some fundamentals! If you understand the basics about how git works, it's WAY easier to fix mistakes. So let's explain what a git commit is!
Every git commit has an id like 3f29abcd233fa, also called a SHA ("Secure Hash Algorithm").
A SHA refers to both:
- the changes that were made in that commit (see them with ```git show```)
- a snapshot of the code after that commit was made
No matter how many weird things you do with git, checking out a SHA will always give you the exact same code. It's like saving your game so that you can go back if you die You can check out a commit like this:
```git checkout 3f29abk```
SHAS are long but you can just use the first 6 chars
This makes it way easier to recover from mistakes!
person at 10 am: ok, let's commit, that's a2992b
person at 11 am: I really screwed up this file, let's go back to the version from a2992b
### OPTIONS
`OPTIONS` is mostly used for `CORS` requests. The `CORS` page has more about that.
It also tells you which methods are available.
### DELETE
Used in many APIs (like the Stripe API) to delete resources.
box with a smiley face 1: `DELETE /v1/customers/cus_12345`
་("delete this customer please!")
box with a smiley face 2: `200 OK`
("deleted!")
### PUT
Used in some APIs (like the S3 API) to create or update resources. `PUT /cat/1234` lets you `GET /cat/1234` later.
### PATCH
Used in some APIs for partial updates to a resource ("just change this 1 field").
### TRACE
I've never seen a server that supports this, you probably don't need to know about it.
### CONNECT
Different from all the others: instead of making a request to a server directly, it asks for a proxy to open a connection.
If you set the `HTTPS_PROXY` environment variable to a proxy server, many HTTP libraries will use this protocol to proxy your requests.
client, represented by a box with a smiley face:
`CONNECT test.com`
`$AFO XXRTZ`
(encrypted request)
proxy, also represented by a box with a smiley face, thinking: ok, I'll open a connection to test.com.
proxy: `$AFO XXRTZ`
test.com, represented by a box with a smiley face: [is here]
### all major browsers have a CSS inspector
usually you can get to it by right clicking on an element and then "inspect element, but sometimes there are extra step
### see overridden properties
`button {`
`display: inline-block;`
`color: var(--orange);` (this line in strikethrough)
`}`
### edit CSS properties
```
element {
{
```
(lets you change this element's properties)
```
button {
display: inline-block;
border: 1px solid black;
}
```
(this lets you change the border of every `<button>`!)
### see computed styles
person, represented by a smiling stick figure: here's a website with 12000 lines of CSS, what `font-size` does this link have?
browser, represented by a box with a smiley face: 12px, because of `x.css` line 436
### look at margin & padding
Box Model
Illustration of a small box labelled 1261 x 26. On the outside of that box is the word "padding". Surrounding the padding is the border. Surrounding the border is the margin.
### and LOTS more
different browsers have different tools!
For example, Firefox has special tools for debugging grid/flexbox.
[manager]
Sometimes I fall into a trap where I think my manager should be able to solve EVERY problem on the team and if they're not then they're not doing their job. (the word "every" is surrounded by glowing lines for emphasis)
It's helpful for me to remember that at any given time they're probably dealing with a lot!
Illustration of a smiling stick figure, representing the manager, surrounded by spiky bubbles containing the following items.
- hire 2 people
- coordinate with other teams
- make sure the intern gets an offer on time (illustration of a clock)
- write 10 performance reviews
- finalize plans for next quarter
- make sure we have an onboarding plan for the new person
- interview new manager candidate
- a team member is unhappy, figure out what's going on
- ... personal life (smiley face)
I try to be somewhat aware of what my manager is dealing with & help out when I can.
Illustration of two smiling stick figures, one with curly hair representing the employee, and one with medium length straight hair, representing the manager.
employee: Here's a project I think could be a good fit for the new person!
manager: good idea, thanks!
### network namespaces are kinda confusing
Illustration of an unhappy-looking stick figure with curly hair.
person: what does it MEAN for a process to have its own network??
### namespaces usually have 2 interfaces
(+ sometimes more)
- the loopback interface (127.0.0.1/8, for connections inside the namespace)
- another interface (for connections from outside)
### every server listens on a port and network interface(s)
`0.0.0.0:8080` means "port 8080 on every network interface in my namespace"
### 127.0.0.1 stays inside your namespace
Illustration of a server, represented by a box with a smiley face, and a smiling stick figure with curly hair.
server, thinking: I'm listening on 127.0.0.1
person: that's fine but nobody outside your network server namespace will be able
to make requests to you!
### your physical network card is in the host network namespace
Illustration of a rectangular box drawn with a dotted line. Inside it are:
- the label "host network namespace"
- 192.168.1.149, with an arrow pointing to it reading "requests from other computers"
- network card
### other namespaces are connected to the host namespace with a bridge
Illustration of a rectangular box drawn with a dotted line. Inside it are:
- the label "host network namespace"
- three boxes, each labelled "container"
### media queries let you use different CSS in different situations
```
@media print {
#footer {
display: none;
}
```
(`print` is the media query, and the rest is the CSS to apply)
### max-width & min-width
```
@media (max-width: 500px) {
// CSS for small screens
}
@media (min-width: 950px) {
// CSS for large screens
}
```
### print and screen
`screen` is for computer/ mobile screens
`print` is used when printing a webpage
there are more: `tv`, `tty`, `speech`, `braille`, etc
### accessibility queries
you can sometimes find out a user's preferences with media queries
examples:
`prefers-reduced-motion: reduce`
`prefers-color-scheme: dark`
### you can combine media queries
it's very common to write something like this:
```
@media screen and
(max-width: 1024px)
```
### the viewport meta tag
`<meta name="viewport" content="width=device-width, initial-scale=1">`
Your site will look bad on mobile if you don't add a tag like this to the `<head>` in your HTML. Look it up to learn more!
(thanks to Allison Kaptur for teaching me this attitude! she has a great talk called "Love Your Bugs.)
Debugging is a great way to learn. First, the harsh reality of bugs in your code is a good way to reveal problems with your mental model.
program: error: too many open files
person: I can't just open as? many files as I want?. Interesting!
Fixing bugs is a good way to learn to write also more reliable code!
person, thinking: hmm, I should put in error handling here in case that data base query times out.
Also, you get to solve a mystery and get immediate feedback about whether you were right or not.
person 1: that's weird...
person 1: oh goodness, that's a lot of errors
person 1: I have an idea!
person 1: [coding a fix]
person 1: it works now!
person 2: great work!
Nobody writes great code without writing + fixing lots of bugs. So let's talk about debugging skills a bit!
[manager]
Most of the rest of this zine is about
COMMUNICATION
(The word "communication" is surrounded by hearts, smiley faces, stars, and exclamation marks)
Basically your manager's job is to make sure that your team is getting work done that will help the business.
This is awesome because it means that if you just communicate with them well, then you can mostly focus on programming!!!
(the word "awesome" is surrounded by glowing lines and hearts)
Communicating well can help you:
- get awesome opportunities
- solve problems
- build trust
- understand priorities
- get promoted
- get feedback
(each of the above items is in a spikey bubble)
To start, let's talk about 1:1s (which hopefully your manager schedules regularly).
### there are many ways to make an element disappear
Illustration of a smiling stick figure with curly hair.
person: which one to use depends: do you want the empty space it left to be filled?
### TRY ME: display: none;
other elements will move to fill the empty space
Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `display: none;`, the heart and star boxes will now be side-by-side.
### visibility: hidden;
the empty space will stay empty
Illustration of three boxes side-by-side, with a heart, x, and star, respectively. When the "x" box is set to `visibility: hidden;`, the heart and star boxes will have a gap between them the size of the "x" box.
### opacity: 0;
like `visibility: hidden`, but you can still click on the element & it'll still be visible to screen readers. Usually `visibility: hidden` is better.
### how to slowly fade out
```
#fade:hover {
transition: all 1s ease;
visibility: hidden;
opacity: 0;
}
```
set the opacity just so that the transition works
### TRY ME: z-index
z-index sets the order of overlapping positioned elements
Illustration of two boxes, a smaller one with an "x" in it, that is overlapped over a larger empty box. There is an arrow pointing to a second illustration where the boxes are stacked in the opposite order, so that the small box is underneath of the large box.
[manager]
Being assigned a new manager is a little scary. Not all of my managers have been great!
Illustration of a stick figure with short curly hair, looking uncertain.
person: OH NO what if my new manager is hard to work with ?!?!
But! More than once I've started out thinking,
Illustration of a stick figure with short curly hair, looking scared.
person: who is this person they seem suspicious
and ended up, a year later, at
Illustration of a stick figure with short curly hair, smiling.
person: wow they have helped me and the team so much, this is AMAZING
so I try to assume that's where we'll end up.
Some things I've found helpful:
- write a document explaining my past work to them
- ask them about any concerns directly - often they have great answers!
- pay close attention to what they do well
- tell them when they do something great
[linux2]
### programs can be slow for a lot of reasons
Illustration of two programs, each represented by a box with a smiley face.
program 1: I'm waiting for a database query, you?
program 2: I'm using SO MUCH CPU!
### it's not obvious when a program is using CPU
Illustration of a stick figure with curly hair, looking unhappy.
person: my webserver took 6 seconds to respond to that request! why?
### panel 3
person: how can I tell how much CPU time was used in this part of my code?
### clock-gettime
clock-gettime is a system call. It can tell you how much CPU time your process/thread used since it started.
### how to track CPU time
1. run clock-gettime
2. do the thing (eg handle a HTTP request)
3. run clock-gettime
4. subtract!
### this trick works when You have 1 HTTP request per thread at a time
Illustration of Ruby and node.js, each represented by a box with a smiley face.
Ruby: I can use clock-gettime
node.js: doesn't work for me, I have an event loop!
### how to set a variable
- `var=value` right (no spaces!)
- `var = value` wrong
`var = value` will try to run the program var with the arguments "`=`" and "`value`"
### how to use a variable: "$var"
```
filename=blah.txt
echo "$filename"
```
they're case sensitive. environment variables are traditionally all-caps, like `$HOME`
### there are no numbers, only strings
```
a=2
a="2"
```
both of these are the string "2"
technically bash can do arithmetic, but I avoid it
### always use quotes around variables
`$filename="swan 1.txt"`
`$ cat $filename` (wrong)
bash: ok, I'll run `cat swan 1.txt`
2 files! oh no! we didn't mean that!
cat: Um `swan` and `1.txt` don't exist...
$ cat "$filename" (right!)
bash: ok, I'll run `cat "swan 1.txt"`
cat '"swan 1.txt"`! that's a file! yay!
### ${varname}
To add a suffix to a variable like "2", you have to use `${varname}`. Here's why:
`$ zoo=panda`
`$ echo "$zoo2"` prints `""`, `zoo2` isn't a variable
`$ echo "${zoo}2"` this prints "`panda2`" like we wanted
### `border` has 3 components
`border: 2px solid black;`
is the same as
```
border-width: 2px;
border-style: solid;
border-color: black;
```
### `border-style` options
- `solid`
- `dotted`
- `dashed`
- `double`
(each word is surrounded by the border it describes)
+ lots more (`inset`, `groove`, etc)
### `border-{side}`
you can set each side's border separately:
```
aborder-bottom:
2px solid black;
```
### `border-radius`
border-radius lets you have rounded corners
`border-radius: 10px;`
`border-radius: 50%;` will make a square into a circle!
### box-shadow
lets you add a shadow to any element
`box-shadow: 5px 5px 8px black;`
the first "5px" is the x offset, the second "5px" is the x offset, "8px" is the blur radius, and "black" is the color.
### outline
`outline` is like `border`, but it doesn't change an element's size when you add it
outlines on `:hover/: active` help with accessibility: with keyboard navigation, you need an outline to see what's focused
### browsers support old HTML + CSS forever
Illustration of a smiling stick figure with long hair, talking to a browser from 2020, represented by a box with a smiley face.
person: I wrote this CSS in 1998
2020 browser: still works great!
### this makes CSS hard to write...
Illustration of two stick figures talking
person 1: why are CSS units so weird?
person 2, with grey hair: let me tell you a story from 20 years ago...
### but it means it's worth the investment
Illustration of a smiling stick figure with long hair, talking to a browser, represented by a box with a smiley face.
person: I spent DAYS getting this CSS to work
browser: I'll make sure it keeps working forever!
### if you don't follow the
standards, you're not guaranteed backwards compatibility my site broke!
(oh yeah, Firefox dropped support for that experiment
### your CSS doesn't have to support browsers from 1998
Illustration of a smiling stick figure with short curly hair.
person: just test that your CSS works on the browsers that your users are using!
### newer features are often easier to use
what people expect from a website has changed a LOT since 1998. Newer CSS features make responsive design easy
### CSS has specifications
CSS 2.1, represented by an image of a document with many lines of text: hello, this is how max-width works in excruciating detail
### there used to be just one specification
Illustration of a smiling stick figure with curly hair.
person: it's called "CSS 2" and I still like to reference it to learn the basics
### today, every CSS feature has its own specification
you can find them all at https://www.w3.org/TR/CSS/
there are dozens of specs, for example: colors, flexbox, and transforms
### major browsers usually obey the spec
but sometimes they have bugs
Illustration of a happy little caterpillar-type bug.
browser, represented by a box with a smiley face: oops, I didn't quite implement that right...
### levels
CSS versions are called "levels".
new levels only add new features. They don't change the behaviour of existing CSS code
### new features take time to implement
https://caniuse.com
(The URL is surrounded by little hearts and stars)
can tell you which browser versions support a CSS feature
## panel 1: a string is an array of bytes
ASCII is the simplest string encoding: 1 character = 1 byte. Let's see how it works!
(We usually use UTF-8, which is WAY more complicated)
## panel 2: every printable ASCII character
```
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ\
[]^_`abcdefghijklmnopqrstuvwxyz{|}~
```
There are no accents because it's an English encoding: the "A" in ASCII is for "American".
## panel 3: there are 128 ASCII characters
Only the bytes 0 to 127 are defined.
It's very limited: you can really see why we need more powerful encodings like UTF-8!
## panel 4: how bytes map to characters
Here's a partial list, look up "ASCII table" for the full list. Bytes (in base 10) are on the left, characters are on the right.
33 is !, 34 is "
48 is 0, 49 is 1
64 is A, 65 is B
97 is a, 98 is b
## panel 5: a trick to translate from lowercase to uppercase
In ASCII, the lowercase letters are 32 more than the uppercase letters. So you can just subtract 32!
[bash, shell]
### panel 1: every process has environment variables
how to see any process's environment variables on Linux:
```
cat /proc/$PID/environ | tr '\0' '\n'
```
### panel 2: shell scripts have 2 kinds of variables
1. environment variables
2. shell variables
unlike in most languages, in shell you access both of these in the exact same way: `$VARIABLE`
### panel 3: export sets environment variables
```
export ANIMAL=panda
```
`export ANIMAL=panda` means that every child process will have `ANIMAL` set to `panda`
### panel 4: child processes inherit environment variables
this is wy the variables set in your `.bash_profile` work in all programs you start from the terminal. They're all child processes of your bash shell!
### panel 5: shell variables aren't inherited
```
var=panda
```
in this example, `$var` only gets set in this process, not in child processes
### panel 6: you can set environment variables when starting a program
Illustration of a smiling stick figure with curly hair, talking to env, represented by a box with a smiley face.
Person: `env VAR=panda ./myprogram`
env: OK! I'll set `VAR` to `panda` and then start `./myprogram`
## title: the gaps between floats
## panel 1: floating point numbers have to fit into 32 or 64 bits
This means there are only 2^64 64-bit floats, the same way there are only 2^64 64-bit integers
## panel 2: this means floating point numbers have to be spread out
you can imagine them all spaced out on a number line, like this: (picture of a bunch of lines, with small gaps between them. The gaps are smaller on the left and bigger on the right)
## panel 3: the gaps start small.
the next 64-bit float after 1.0 is 1 point (lots of 0s) 2
the gap between these two floats is 0 point (lots of 0s) 2, or 2^-52
gaps are always a power of 2
## panel 4: the gaps get bigger as the numbers get bigger
the next 64-bit float after 1000000000000000000 is that number plus 16384.
so the gap is 16384, or 2^14!
## panel 5: the gaps make calculations inaccurate
when you do math on floating point numbers, often you have to round the result to the nearest float
usually this doesn’t make a big difference, but small mistakes can add up
## panel 6: this inaccuracy is inevitable
if you want math to be fast, you have to store the numbers in a fixed number of bits, like 64 bits. So you’re always going to have accuracy issues.
## signed vs unsigned integers
## there are 2 ways to interpret every integer
unsigned:
- always 0 or more
- example: 8 bit unsigned ints are `0` to `255`
signed:
- half positive, half negative
- example: 8 bit signed ints. are `-128` to `127`
## negative integers are represented in a counterintuitive way
You might think that this is -5: `10000101`
(1 is the sign bit, and 101 in binary is 5)
But actually this is -5: `11111011`
this looks weird, but we'll explain why!
## integer addition wraps around
for example, for 8-bit integers `255 + 1 = 0`
for 16-bit integers, `65535 + 1 = 0`
by "addition", we mean "what the x86 `add` instruction does"
## panel:
but if `255 + 1 = 0`, you could also say `255 = -1`
## examples of bytes and their signed/unsigned ints
| byte | unsigned | signed |
|----------|----------|--------|
| `00000000` | 0 | 0 |
| `01111111` | 127 | 127 |
| `01111111` | 128 | -128 |
| `10000001` | 129 | -129 |
| `11111011` | 251 | -5 |
| `11111111` | 255 | -1 |
subtract 256 from unsigned numbers to get the signed numbers
## this way of handling signed integers is called "two's complement"
It's popular because you can use the same circuits to add signed and unsigned integers.
`5 + 255` has exactly the same result as `5 + (-1)`: they're both 4!
## science <3 floating point
## floating point was invented to do scientific computation
- weather simulations!
- earthquake modeling!
- orbital mechanics!
## scientists don't need unlimited precision...
we only know an electron's mass to 9 decimal places anyway...
9 decimal places is already VERY precise!
## but they do need TINY numbers and GIANT numbers
mass of hydrogen atom:
`1.6735575 * 10^-24` grams
distance to Andromeda galaxy:
`2.4 * 10^22` meters
## floating point is inspired by scientific notation
`1.6735575 x 10^-24`
The idea in floating point is to store a number by splitting it into:
- the exponent (like `-24`)
- the multiplier (like `1.6735575`)
- and its sign (+ or -)
## floating point isn't just used for science though
For example, Javascript's number type is floating point. Before it added `BigInt` in 2021, Javascript didn't have integers at all!
Similarly, numbers in JSON are often interpreted as floating point numbers.
## panel:
people usually explain floating point as "it's scientific notation, but in binary!" That's true, but I've never found it intuitive so we're going to explain it a different way.
## meet the byte
## You might have heard that a computer's memory is a series of bits (Os and 1s)...
`010100110101010110110111`
but you only access them in groups of 8 bits - a byte!
`01010011 1010101 10110111`
## 2 ways to think about a byte
1. 8 bits
2. an integer from 0 to 255
`00000000` = `0`
`00000001` (8 bits!) = `1` (integer!)
`00000010` = `2`
`01011001` = `89`
## you can't just access 1 bit
Every byte in your computer's memory has an address.
If you want to fetch 1 bit, you need to fetch the whole byte at that address and then extract the bit.
## some things that are 1 byte
- the boolean `true` (in C) `00000001`
- the ASCII character F `01000110`
- the red part of the colour `#FF00FF` `11111111`
## most things are more than one byte
- integers and floats are Usually 4 bytes or 8 bytes
- strings are LOTS of bytes (for example, in UTF-8 a heart emoji is 3 bytes)
## bytes weren't always 8 bits
In the past, people experimented with lots of different byte sizes (2, 3, 4, 5, 6, 8, and 10 bits!)
But now we've standardized on 8 bits pretty much everywhere.
## little endian / big endian
## we write dates in two main orders
1. 2023-03-17 ("big endian")
2. 17-03-2023 ("little endian")
3. 03-17-2023 ("american")
"big endian" means that the big unit (the year) is at the start ("big end first")
## similarly: computers order bytes in 2 ways
Here are 2 ways your computer might represent the integer 271:
1. big endian: `00000001 00001111`
2. little endian: `00001111 00000001`
How this corresponds to 271:
`00000001 00001111` is 271 in binary
## When you send integers on a computer network, they have to be big endian. Here's how that works:
Computer A has the 16-bit integer "271" in its memory: `00001111 00000001`
Computer A flips the bytes and sends it as big endian: `00000001 00001111`
Computer B receives the big endian integer
Computer B flips the bytes and stores it in memory as little endian: `00001111 00000001`
## a little history
Before 1980, computers ordered their bytes in different ways.
In 1980, the Internet started being standardized, causing a huge fight over which byte order to use on the Internet.
The terms "big/little endian" come from that fight: they were coined in an article called "On Holy Wars and a Plea For Peace" which compares the byte order fight to the Big/Little Endians in Gulliver's Travels.
Big endian won that fight, so most Internet protocols (IPv4, TCP, UDP, etc.) are big endian.
But almost all modern computers are little endian. Some machines, like the Xbox 360, are big endian though.
## integers
## panel 1:
To decode bytes as integers, we need to know 3 things:
1. the integer's size (8 bit, 16 bit, 32 bit, or 64 bit)
2. is it little or big endian?
3. is it signed or unsigned?
## panel 2:
how signed integers work is the hardest part) to understand (I only learned how it works a couple months ago!). Just knowing that unsigned and signed integers are different will take you a long way.
## 2 bytes, 3 interpretations
`254 | 0 `
We could interpret these 2 bytes as:
1. `254` (little endian)
2. `65024` (big endian, unsigned)
3. `-512` (big endian, signed)
## how you decode bytes depends on the context
- in a program's memory, the type of the variable tells you the integer's size and if it's signed/unsigned
- your CPU determines if integers are big or little endian (you don't have a choice)
- for a binary network protocol (like DNS), the specification (for DNS, that's RFC 1035) will tell you how to decode the bytes
## examples of types
- in Rust, an `i64` is a signed 64-bit integer
- in Go, a `uint32` is an unsigned 32-bit integer
- in C, a `short` is usually a signed 16-bit integer, depending on the platform
## integer overflow
### integers have a limited amount of space
The 4 usual sizes for integers are 8 bits, 16 bits, 32 bits, and 64 bits
### the biggest 8-bit unsigned integer is 255
... so what happens if you do 255 + 1?
going above/below the limits is called overflow
the result wraps around to the other side
255 + 1 = 0
255 + 3 = 2
200 * 2 = 144
0 - 2 = 254
### maximum numbers for different sizes
bits: unsigned signed
8: 127 255
16: 32767 65535
32: 2 billion ~4 billion
64: ~9 quintillion ~18 quintillion
### overflows often don't throw errors
computer (thinking): "255 + 1? that number is 8 bits, so the answer is 0! that's what you wanted right?"
This can cause VERY tricky bugs
### some languages where integer overflow happens
Java/Kotlin
C/C++
Rust
Swift
C#
SQL
R
Go
Dart
Python (only in numpy)
Some throw errors on overflow, some don't, for some it depends on various factors. Look up how it works in your language!
### Binary formats often pack information into bytes very tightly to save space.
For example, here are 2 bytes from a real TCP packet:
`10000000 00010000`
The first "`1000`" is the offset (4 bits)
The following "`000`" is reserved (3 bits)
The remaining "`00010000`" are the flags (9 bits)
Here's how `&`, `|`, `<<`, `>>` can be used to pack/unpack data into bytes.
### bit masking
Let's say we have the 2 bytes from the previous panel, and we want to extract just the flags part. Here's how to do it with `&` (bitwise and):
The idea is that you put a mask "on top" of the bytes to erase bits:
`X: 10000000 00010000` (number)
`0x01FF: 000000001 1111111` (bit mask)
`x & 0x01FF: 000000001 0010000` (how they combine)
`000000001`: these 7 bits all get set to 0
`0010000`: these 4 bits stay the same
### check/set bit flags
(see page 16 for more)
set a bit flag with or:
```
x = x | 0b010000;
```
check a bit flag with and:
```
if ((x & 0b010000) != 0) {
00001000 X
}
```
(this example is in C)
### unpack/pack bits
Now let's talk about the offset from the first panel. We can't do calculations in it with the packed form, so we need to unpack it.
You can unpack with >>:
```
10000000 -> 00001000
X -> X >> 4
```
and pack with <<:
```
0001000 -> 10000000
X -> X << 4
```
1000 in binary is 8, which in this case is the TCP offset value.
### panel 1:
Floats need to fit into 64 bits. But how do we actually convert a number like 10.87 into 64 bits?
First, we split the number into 3 parts: the sign, a power of 2 and an offset
(The usual term is "significand", but I find that term calling it "offset")
`10.87 = + (8 + 2.87) `
(8 is the biggest power of 2 that's less than 10.87)
Next, we encode the sign, power of 2, and offset into bits!
### encoding the sign (1 bit)
`+ is 0`
`- is 1`
### floating point encoding is defined in the IEEE 754 standard
since it's standardized, it works the same way on every computer!
it was originally defined in 1985
### encoding the exponent (11 bits, 2^-1023 to 2^1023)
`8`
↓ `2^3 = 8`
`3`
↓ add 1023 (this makes sure that the result is positive)
`1026`
↓ write it in binary, in 11 bits
`10000000010`
### encoding the offset (52 bits)
`2.87`
↓ divide by the gap size, 2^-49
in this case (2^exponent-52)
`1615666366319165.3 `
↓ round
`1615666366319165`
↓ write it in binary, 52 bits
`01011011110101110000101000 `
`11110101110000101000111101`
### And here's `10.87`!
`01000000 00100101 10111101 01110000 10100011 11010111 0001010 00111101`
### the (64-bit) floating point number line
Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024].
Every window has 252 floats in it.
The windows [-2, -1], [-1, -1/2], [-1/2, -1/4], [-1/4, 0], [0, 1/4], [1/4, 1/2], [1/2, 1], and [1, 2], each have 2^52 numbers. [2, 4] has 2^52 numbers. [4, 8] has 2^52 numbers.
Illustration of a horizontal line, with the windows plotted out on it, showing that each window doubles in size as it moves away from zero.
### the windows go from REALLY small to REALLY big
The window closest to 0 is [2^-1023, 2^-1022]
This is TINY: a hydrogen atom weighs about 2^-76 grams.
The biggest window is [2^1023, 2^1024].
This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away.
### the gaps between floats double with every window
window: [1, 2] gap: 2^-52
window: [2, 4] gap: 2^-51
window: [4, 8] gap: 2^-50
window: [8, 16] gap: 2^-49
### why does `10000000000000000.0 + 1 = 10000000000000000.0`?
- In the window [2^n, 2^n+1], the gap between floats is 2^n-52
- `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2)
- So the next float after `10000000000000000.0` is `10000000000000002.0`
## floating point math
let's deconstruct `0.1 + 0.2`
1. O The closest 64-bit float to 0.1 is (roughly) `0.1000000000000000055511151231`
2. For 0.2, it's (roughly) `0.2000000000000000111022302462`
3. `0.1000000000000000055511151231 + 0.2000000000000000111022302462 = 0.3000000000000000166533453693`
4. Inconveniently, `0.3000000000000000166533453693` is exactly in between 2 floating point numbers: `0.2999999999999999888977` and `0.30000000000000004440892`
5. How do we pick the answer? `0.30000000000000004440892` has an even offset, so we round to that one
## losing a little precision is okay
`0.1 0.2 0.30000000000000004` is usually no big deal. Do you REALLY need your answer to be accurate to 16 decimal places? Probably not!
## the more numbers you add, the more precision you lose
This Go code:
`var meters float32 = 0.0 `
`for i = 0; i < 100000000; i++ { meters += 0.01`
`} fmt.Println(meters)`
prints out `262144`, not `1000000` because `262144.0+ 0.1 = 262144.0`
## adding a number to a MUCH smaller number is bad
For example:
2 xx 53 + 1.0 = 2 xx 53
1.0 + 2 xx -57 = 1.0
(try it!)
## Use scientific computing libraries if you can
There are special algorithms for adding up lots of small floating numbers without losing accuracy!
For example `numpy` implements them.
## more floating point alternatives
## there are many alternative ways to represent numbers
These are all implemented in software (not hardware) so they're a lot slower, and different languages have different libraries.
## alternative 1: decimal floating point
This is like regular floating point, but in base 10 instead of base 2. It's also standardized in IEEE 754.
Examples: Python's `decimal` module or Java's `BigDecimal`
## alternative 2: fractions
This lets you do exact calculations with fractions (1/10 + 2/10 = 3/10)
Examples: Python's fractions module in the standard library, Lisps have first-class support
## alternative 3: symbolic computation
For example, `sqrt(2)` instead of `1.414`.
You'll see this in computer algebra systems like Mathematica, Maple, or sympy.
## alternative 4: interval arithmetic
The idea is to store every number as a range so that you can precisely track your error bars.
Probably the least mainstream of these alternatives.
## alternative 5: binary-coded decimal
This is how floating point numbers (and integers) were stored on IBM computers in the 60s, and you can still occasionally see it today in old formats like ISO 8583 for financial transactions.
## fixed point
## just because you see 0.23, doesn't mean it's floating point
For example, in this RGBA color: `rgba(211, 7, 23, 0.23)`
`0.23` isn't a float at all, it's the 8-bit integer `59`. Let's see how that works!
## fixed point numbers are integers
You interpret them as the integer divided by some fixed number (like 255 or 10000)
For example, that opacity should be divided by 255
`59 / 255 = 0.23ish`
## things fixed point is often used for
money: `$1.23 => 123`
time: `0.1 seconds => 100000 microseconds`
opacity: `0.23 => 59`
## fixed point is the most common alternative to floating point
It's very simple and it's pretty easy to implement!
## implementing fixed point is easy
(especially if you only need to add and subtract)
You just need:
- an integer
- some code to display it (by dividing by 255 or something)
## fixed point can help avoid accuracy issues
If you try to represent the current Unix epoch in nanoseconds as a 64-bit float, you'll lose accuracy.
But if it's a 64-bit integer, it'll be fine.
## bit flags
## bit flags are a clever way to store lots of information in one integer
If you have many options which are true or false, you can encode them all into an integer, with 1 bit for each option. 32 bits 32 options!
For example, some of the bit flags the open function in C uses:
- nofollow
- append
- truncate
- create
- write only
- read write
(this is on Linux)
## where you'll see bit flags
In libc, the open, socket, and mmap functions use bit flags to pass options.
The TCP and UDP protocol headers both have a flags field which has bit flags.
## bit flags are used a lot in C code
Here's some C code that opens a new file:
`fd = open("file.txt", O_RDWR | O_CREAT, 0666);`
`O_RDWR` is: `00000010`
`O_CREAT` is: `01000000`
`O_RDWR | O_CREAT` is: `01000010`
You can check if a bit flag is set in C like this:
`if (flags & O_RDWR) { ... }`
## fun example: tic tac toe!
Here's a way to encode the state of a tic tac toe game in 18 bits:
x positions:
`100`
`010`
`010`
O positions:
`010`
`001`
`100`
## big integers
## integers don't have to overflow
Instead, integers can expand to use more space as they get bigger. Integers that expand are called "big integers".
big integer: I'm going to use ONE THOUSAND bytes of space!
## big integer math is slower
It's slower because it's implemented in software, not hardware.
So a big integer addition is actually turned into lots of smaller additions.
## how big integers are represented (in Go, as of 2023)
You can think of this array of 64-bit integers as being the number written in base 2^64
## some languages only have big integers
Python 3 and Ruby: we'd rather have slower math and no weird overflow problems!
This works because people don't do a lot of math in Ruby/Python (except with numpy, which doesn't use big integers).
## some languages offer big integers as an option
Go, Javascript, Java, and lots more.
Each language has its own big integer implementation.
## when are big integers useful?
- they're used in cryptography (e.g. for large key sizes)
- for math on really big integers
## NaN and infinity
## NaN stands for "not a number"
It means the result of the calculation is undefined.
`0/0 = NaN`
`sqrt(-1) = NaN`
`log(-1) = NaN`
## infinity
"Infinity" just means "this number is too big for floating point to handle." There are two infinities: one positive, one negative.
`2.0**1024 = inf`
(`2.0**1024` means `2^1024`)
`-1/0 = -inf`
`inf 10 = inf`
`inf - inf = NaN`
## NaNs spread
As soon as one NaN gets in, it gets everywhere
`NaN * 5 = NaN`
`NaN + 2 = NaN`
## NaN != NaN
NaN isn't equal to anything (including itself)
## NaN and infinity: the bits
A floating point value is `NaN` or `infinity` if the bits in the exponent are all 1. For example, this is a `NaN`:
`01111111 11110001 00000000 00000000 00000000 00000000 00000000 00000000`
It's `infinity` if the offset bits are all 0, otherwise it's `NaN`.
There are 2^52 values like this: 2 of them are `±infinity` and the other 2^52-2 are `NaN`.
We usually treat `NaN` like a single value though.
## a note on byte order
All of the floating point examples in this zine use a big endian byte order, because it's easier to read. But most computers use a little endian byte order.
You can see this in action at `https://memory-spy.wizardzines.com`
### 1
An illustration of a smiling stick figure with curly hair, talking to a browser, represented by the Firefox logo of a fox wrapped around a globe.
person: I want to go to https://example.com
browser: hmm, I don't have an IP address for example.com cached. I'll ask a resolver!
### 2
An illustration of a browser talking to a resolver, represented by a box with a smiley face holding a magnifying glass.
browser: what's the IP for example.com?
resolver: hmm, I'll look in my cache...
### 3
❤ DNS cache ❤
archive.org: 207.241.224.2
jvns.ca: 172.64.80.1
resolver: nope, I don't have it cached, I need to ask the authoritative nameservers! I have the root nameserver IPs hardcoded.
note: we're pretending the resolver has no .com domains cached. Normally it would use its cache to skip step 4.
### 4
An illustration of a browser talking to a root nameserver, represented by a box with a smiley face wearing three crowns.
resolver: What's the IP for example.com?
root nameserver: ask a .com nameserver! It's at a.gtld-servers.net
→ com NS a.gtld-servers.net.
ca NS a.ca-servers.net.
horse NS a.nic.horse.
(NS stands for "nameserver")
### 5
An illustration of a browser talking to a .com nameserver, represented by a box with a smiley face wearing two crowns.
resolver: what's the IP for example.com?
.com nameserver: ask an example.com. nameserver! It's at a.iana-servers.net
list of DNS records:
neopets.com, NS, ns-42.awsdns-05.com.
→ example.com, NS, a.iana-servers.net.
### 6
An illustration of a browser talking to an example.com nameserver, represented by a box with a smiley face wearing one crown.
resolver: what's the IP for example.com?
example.com nameserver: it's 93.184.216.34!
resolver: great, I'll tell the browser!
→ example.com, A, 93.184.216.34
I literally mean everything, I copied this verbatim from a real DNS request using Wireshark. (DNS packets are binary but we're showing a human-readable representation here)
### Let's look at the actual data being sent during a DNS query:
Illustration of a browser, represented by the Firefox logo of a fox wrapped around a globe, talking to a resolver, represented by a box with a smiley face holding a magnifying glass.
browser: what's the IP for example.com?
resolver: 93.184.216.34!
### request
`Query ID: 0x05a8`
(randomly generated)
`Flags: 0x1000`
(these flags just mean "this is a request")
`Questions: 1`
`Answer records: 0`
`Authority records: 0`
`Additional records: 0`
`Question:`
`Name: example.com`
`Type: A
(A is for IPv4 address. other types: MX, CNAME, AAAA, etc)
`Class: IN`
(IN stands for "INternet")
### response
`Query ID: 0x05a8`
(matches request ID)
`Flags: 0x8580`
the response code is encoded in the last 4 bits of these flags. The 3 main response codes are:
- NOERROR (success!)
- NXDOMAIN (doesn't exist!)
- SERVFAIL (error!)
```
Questions: 1
Answer records: 1
Authority records: 0
Additional records: 0
```
(copied from request)
```
Question:
Name: example.com
```
(domain names aren't case sensitive)
```
Type: A
Class: IN
Answer records:
Name: example.com
Type: A
Class: IN
TTL: 86400
Content: 93.184.216.34
```
(the IP we asked for)
```
Authority records:
(empty)
Additional records:
(empty)
```
page 12 ("NS records") talks more about these 2 sections
Illustration of a smiling stick figure with curly hair.
Person: I'm always surprised by how little is actually in a DNS packet!
### every DNS resolver starts with a root nameserver
Illustration of a conversation between a resolver, represented by a box with a smiley face holding a magnifying glass, and a root nameserver, represented by a box with a smiley face wearing a stack of crowns.
resolver: what's the IP for example.com?
root nameserver: You should ask a `.com` nameserver! They're at `a.gtld-servers.net, b....`
### root nameserver IP addresses almost never change
`a.root-servers.net`'s IP (`198.41.0.4`) hasn't changed since 1993. DECADES ago!
### there are thousands of physical root nameservers, but only 13 IP addresses
Each IP refers to multiple physical servers, you'll get the one closest to you. (this is called "anycast")
There's a map at https://root-servers.org
### if they didn't exist, resolvers wouldn't know where to start
resolver, distressed: I need an IP address of an initial server to query, and I can't use DNS to get that IP!
### every resolver has the root IPs hardcoded in its source code
example: https://wzrd.page/bind
You can query one like this:
`dig @198.41.0.4 example.com`
All the IPs will give you the exact same results, there are just lots of them for redundancy.
Here they are!
```
a.root-servers.net 198.41.0.4
b.root-servers.net 199.9.14.201
c.root-servers.net 192.33.4.12
d.root-servers.net 199.7.91.13
e.root-servers.net 192.203.230.10
f.root-servers.net 192.5.5.241
g.root-servers.net 192.112.36.4
h.root-servers.net 198.97.190.53
i.root-servers.net 192.36.148.17
j.root-servers.net 192.58.128.30
k.root-servers.net 193.0.14.129
1.root-servers.net 199.7.83.42
m.root-servers.net 202.12.27.33
```
### panel 1
One reason DNS is confusing is that the DNS server you query (a resolver) is different from the DNS server where the records are stored (a network of authoritative nameservers.
Beside "resolver" there is an illustration of a smiling little box holding a magnifying glass, and beside "authoritative nameserver" there is an illustration of a smiling little box with a crown.
### anytime your browser makes a DNS query, it's asking a resolver
Illustration of a conversation between a browser, represented by the Firefox logo of a fox wrapped around a globe, and a resolver, represented by a smiling little box holding a magnifying glass
browser: what's the IP for `example.com`?
resolver: I'll find out for you!
### anytime you update a domain's DNS records, you're updating an authoritative nameserver
Illustration between a smiling stick figure with curly hair, and an authoritative nameserver, represented by a pink box with a smiley face wearing a crown.
person: set the IP for example.com to 1.2.3.4
authoritative nameserver: got it! Next time someone asks, that's what I'll tell them.
### how a resolver handles queries
1. check its cache, or (if that fails)
2. find the right authoritative nameserver and ask it
### how an authoritative nameserver handles queries
1. check its database for a match
2. that's it, there's no step 2. It's the authority! (illustration of a crown)
### the terminology is really confusing
Other names for resolvers:
- recursive resolver
- DNS recursor
- public DNS server
- recursive nameserver
- DNS resolution service
- caching-only nameserver
Types of authoritative nameservers:
- root nameserver
- TLD nameserver (like `.com` or `.ca`)
### Here's a problem I've had many times
Illustration of a stick figure with curly hair and a distressed expression.
Person's thought bubble: I set up my new domain, everything looks good, but it's not working?!?!
### I finally learned last year that my problem was "negative caching"
Same person, now smiling: now I never have this problem anymore!
### resolvers cache negative results
Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and an authoritative nameserver, represented by a box with a smiley face wearing a crown.
resolver: what's the IP for `bees.jvns.ca`?
authoritative nameserver: I don't have any records for that!
resolver (thought bubble) `caching: no A records for bees. jvns.ca`
### the TTL for caching negative results comes from the SOA record
`example.com. 3600 IN SOA ns.icann.org. noc.dns.icann.org. 2021120741 7200 3600 1209600 3600`
it's the smaller of the first number and the last number (in this case 3600 seconds)
### what you need to know about SOA records
1. they control the negative caching TTL
2. you can't change them (unless you run your own authoritative nameserver)
3. how to find yours: `dig SOA yourdomain.com`
### how to avoid this problem
Just make sure not to visit your domain before creating its DNS record!
That's it! (if you really want more details, see RFC 2308)
## panel 1: using 32-bit integers is dangerous
Let's see some examples of how it can go wrong and why it's almost always better to use 64-bit integers instead!
(32-bit floats are bad too, for similar reasons)
## panel 2: 32 bit integers are at most 4 billion
unsigned 32-bit ints go from 0 to 4,294,967,295 (4 billion)
signed 32-bit ints go from
-2,147,483,648 to 2,147,483,647
## panel 3: times "4 billion" wasn't enough
**Database primary keys**: 4 billion records really isn't that much.
**IPv4 addresses**: turns out we want more than 4 billion computers on the internet. Oops.
**Registers**: in the 90s, registers were 32 bits. 4 billion bytes of RAM is 4GB. We need more than that.
**Unix timestamp**s: 2 billion seconds after Jan 1, 1970 is Jan 19, 2038. That's going to be an exciting day. (look up "2038 problem"!)
## panel 4: 64 bits is usually big enough
For example, 2^64 seconds after Jan 1, 1970 is over 100 billion years in the future: well after the death of the sun.
So a 64-bit timestamp is definitely enough space.
## panel 5: be wary of using 32-bit integers by accident
Systems that were designed in the 90s often have a 32-bit integer as the default.
For example, in MySQL an INTEGER is 32 bits.
### the loud newbie
newbie: wait, HOW does X work??
other person, thinking: I'm so glad they asked, I was wondering that too...
### the grumpy old timer
new person: X is so cool!
grumpy old timer: it is! let me tell! you about some ways it can break though....
### the bug chronicler
that bug was so gnarly, I'm going to write an EXTREMELY CLEAR description of what happened so we I can all learn from it
### the documentarian
person 1: here's how you do X...
documentarian: I'll put those instructions in our wiki!
### the "today I learned..."
I just learned this cool new tool...
check out this weird bug!
### the "I've read the entire internet"
person: how does X work?
TAB GIRL: ah, I read about that recently... here's a link from my 200 browser tabs
### the tool builder
everyone keeps getting confused by X! I'm going to fix it with CODE.
### the question answerer
person 1: hey can you explain how X works?
question answerer: I would LOVE to
### blank final panel
?
Step 3 in our plan is "open a TCP connection!" Let's learn what this "TCP" thing even is
### When you send a packet sometimes it gets lost
jvns.ca server → Cat packets → lightning bolt
laptop: nope never got it
### TCP lets you send a stream of data reliably, even if packets get lost or sent in the wrong order.
four butterflies, labelled TCP C, TCP D, TCP D (duplicates), TCP A, and TCP B
laptop: it says "abcd"!
### how does TCP work, you ask? WELL!
### how to know what order the packets should gо in:
Every packet says what range of bytes it has.
Like this:
once upon a ti ← bytes 0-13
agical oysterbytes ← 30-42
me there was a m ← bytes 14-29
Then the client can assemble all the pieces into:
"once upon a time there was a magical oyster"
The position of the first byte (0,14,30 in our example) is called the "sequence number"
### how to deal with lost packets:
When you get TCP data, you have to acknowledge it (ACK):
jvns.ca server: here is part of a cat picture! that should be 28832 bytes so far!
jvns.ca server (thinking): yay
laptop: ACK! I have received all 28832 bytes
If the server doesn't get an acknowledgement, it will retry sending the data.
### 1. inspect, don't squash
Try to fix the bug (crossed out, bad)
Understand what happened (checkmarks, smiley faces)
### 2. Being stuck is temporary.
person (thinking): I WILL NEVER FIGURE THIS OUT
... 20 minutes later...
person (thinking): Wait, I haven't tried X...
### 3. Trust nobody and nothing
person (thinking): This library can't be buggy...
person (thinking): Or CAN IT???
(slowly growing horror)
off to the side, a bug looks on, with a sneaky expression
### 4. It's probably your code
person (thinking): I KNOW my code is right
... 2 hours later ...
person (thinking): Ugh, my code WAS the problem?!!?
### 5. don't go it alone
person 1: "WHAT IS HAPPENING?!?"
person 2: "What if we try X?"
### 6. There's always a reason.
A computer, illustrated by a box with a smiley face, surrounded by ones and zeros: Computers are always logical, even when it doesn't feel that way.
### 7. Build your toolkit
person (thinking, holding a box labelled TOOLZ): "wow, the CSS inspector makes debugging SO much easier"
### 8. It can be an adventure.
person: "You wouldn't BELIEVE the weird bug I found!"
adorable weird bug, standing beside them: hi!
Error messages are a goldmine of information, but they can be very annoying to read:
(image of an error message, represented by a stack of squiggly lines, with 2 notes pointing to it):
- giant 50 line stack trace full of impenetrable jargon, often seems totally unrelated to your bug
- can even be misleading, like "permission denied" sometimes means "doesn't exist"
Tricks to extract information from giant error messages:
- If there are many different error messages, start with the first one. Fixing it will often fix the rest.
- If the end of a long error message isn't helpful, try looking at the beginning (scroll up!)
- On the command line, pipe it to `less` so that you can scroll/search it
```(./my_program 2>&1 | less)```
Note: if you don't include `2>&1`, `less` won't show you the error messages (just the output)
After I've read the error message, I sometimes run into one of these 3 problems:
Each person is represented by a stick figure with curly hair.
### 1. misreading the message
person (thinking) ok, it says the error is in file X
spoiler: it actually said file Y
### 2. disregarding what the message is saying
person (thinking): well, the message says X, but that's impossible...
spoiler: it was possible
### 3. not actually reading it
person (thinking): ok, I read it...
spoiler: she did not read it
Once I have a list of suspects, I can think about how to eliminate them.
Illustration of a pensive stick figure with curly hair.
person (thinking): "I'm really confused, but I can at least check if the server returned the right HTTP response here.."
Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. Both boxes are labelled "suspicious".
person (thinking): "that response looks good! the server isn't the problem!"
Illustration of a box that says "client", and a box that says "server", with arrows going back and forth between them. The client box is labelled "suspicious", with exclamation marks and question marks surrounding it, but the "server" box is labelled "ok", with a check mark and smiley faces.
note: here we're assuming that was the only request being made. Otherwise this wouldn't be a safe conclusion :)
Some ideas:
### network diagram
An illustration of a network, with a cylinder labelled DB, and boxes labelled "factory", "handler", "obj", "model 1", and "model 2", with arrows amongst them showing their relationships.
### flowchart
A flowchart with boxes "set flag", "run cmd", "if failed, retry", and "return result", with arrows amongst them illustrating a process.
### state diagram
A diagram with boxes labelled "inventory page", "cart page", and "checkout page", with arrows amongst them labelled "cart icon", "continue shopping", "checkout", and "cancel".
### or anything else (like a data structure!)
A box labelled "on | off | on | off". The first "off" is labelled "[1, 1, 1, 0, 0, 1, 1, 1, 0", and the second "off" is labelled "5 seconds".
I love to add print statements that print out 1, 2, 3, 4, 5...
An illustration of a printer printing out lines of text.
```
console.log(1)
console.log(2)
console.log(3)
```
Using descriptive strings is smarter, but I usually use numbers or "wtf???"
This helps me construct a timeline of which parts of my code ran and in what order:
Illustration of timeline of code, with some arrows pointing at it numbered 1, 3, 2. Between 1 and 3, it says "everything is okay". Between 3 and 2 it says "the cause", with a picture of a bug, and after 2, it says "the error message" with a picture of a page of text.
Often I'll discover something surprising, like "wait, 3, never got printed??? Why not???".
If the bug is totally new to you, find out if there's a name people use for that type of bug!
Illustration of two stick figures. Person 1 has curly hair and looks worried, Person 2 has straight hair and is smiling.
person 1: "this bug is happening intermittently, it's so weird."
person 2: "that sounds like it might be a race condition..."
person 1 (thinking): "oh, what's a race condition?"
examples:
- `terminated by signal SIGSEGV (address boundary error)`
segmentation fault
- `flexbox: div doesn't fit in other div (CSS)`
item overflowing container
- `nodename nor servname provided, or not known`
DNS lookup failure
- `RecursionError: maximum recursion depth exceeded`
stack overflow
It's tempting to try lots of fixes at once to save time:
Illustration of a smiling stick figure with curly hair.
dream: I'm going to add Z, and replace X with Y, and improve C-- that'll definitely fix it!
Illustration of the same stick figure, now sad.
reality: ... now there's a new problem AND it's still broken
If I found I've done this by accident, I'll:
- undo all my changes (`git stash!`)
- make a list of things to investigate, one at a time
I find investigating a bug with someone else SO MUCH more fun than doing it alone.
Illustrations of two smiling stick figures, one with short curly hair, and one with longer straight hair.
Debugging together lets you:
- Teach each other new tools!
person 1: I wish we could find out x, but that's impossible...
person 2: Let's use my favourite tool, strace!!!!!!
- Learn new concepts!
person 2: What is this CORS thing?!?!
person 1: Oh, I can explain that!
- Keep each other on track
person 2: Maybe the problem is Y?
person 1: We already ruled that out! Right, I forgot!
Sometimes I need to trick myself into getting started:
Illustrations of a stick figure with short curly hair.
person (thinking, looking unhappy): "UGH, I do NOT want to look at this CSS bug!!!!"
Giving myself a time limit really helps:
Illustration of an alarm clock
person (thinking, now smiling): "Okay, I'll just see what I can figure out in 20 minutes..."
You can't always solve it in 15 minutes, but this works surprisingly often!
... 15 minutes later ...
person (thinking, happy): "all fixed! That wasn't so hard!"
When I'm REALLY stuck, I'll write an email to a friend:
- "Here's what I'm trying to do..."
- "I did X and I expected Y to happen, but instead..."
- "Could this be because....?"
- "This seems impossible because..."
- "I've tried A, B, and C to fix it, but...."
This helps me organize my thoughts, and often by the time I finish writing, I've magically fixed the problem on my own!
It has to be a specific person, so that the imaginary version of them in my mind will say useful things :)
Explaining what's going wrong out loud is magic.
Illustrations of two stick figures. One has curly hair, and one has short straight hair and is wearing a big t-shirt with a picture of a rubber duck.
person (looking sad): "so, when I do X thing, I'm getting an error, and it doesn't make any sense because I already checked that A and B are working...."
other person: huh...
person (now smiling, with an exclamation mark above their head): "OH I SEE WHAT I DID WRONG"
other person (also smiling): "happy to help!"
People call this "rubber ducking" because the other person might as well be a rubber duck.
Illustrations of an unhappy-looking stick figure with short curly hair.
Sometimes when I'm debugging, there are things I'll refuse to try because they take too long.
person (thinking): ugh, that part of the code is so confusing, I don't want to look at it...
But as I become more and more desperate, eventually I'll give in and do the annoying thing. Often it helps!
person (thinking): FINE, I'll look at that code... oh, yeah, here's the bug.
Here are some tools I've found useful:
- debuggers! (most languages have one!)
- profilers: `perf, pprof, py-spy`
- tracers: `strace, ltrace, ftrace, BPF tools`
- network spy tools: `tcpdump, wireshark, ngrep, mitmproxy`
- web automation tools: `selenium, playwright`
- load testers: `ab, wrk`
- test frameworks: `pytest, RSpec`
- linters/static analysis tools: `black, eslint, pyright`
- data formatting tools: `xd, hexdump, jq, graphviz`
- dynamic analysis tools: `valgrind, asan, tsan, ubsan`
- fuzzers/property testing: `hypothesis, quickcheck, Go's fuzzer`
(I've never used those last two but lots of people say they're helpful.)
Sometimes you print out an object, and it just prints the class name and reference ID, like this:
`MyObject<#18238120323>`
Illustration of a frowning stick figure with curly hair.
person (thinking): "ugh, thanks, very helpful... "
Implementing a custom string representation for a class you're often printing out can save a LOT of time. The name of the method you need to implement is:
- Python: `.__str__ `
- Ruby: `.to_s`
- JavaScript: `.toString`
- Java: `.toString`
- Go: `String()`
Also, pretty-printing libraries (like `pprint` in Python or `awesome_print` in Ruby) are great for printing out arrays/hashmaps.
Once you've solved it, don't forget to celebrate! Take a break! Feel smart!
Illustration of a smiling stick figure with curly hair.
person (thinking): "i did it, i did it, i'm amazing" (now is not the time for humility)
The best part of understanding a bug is that it makes it SO MUCH easier for you to solve similar future bugs.
Illustration of a smiling stick figure with curly hair, and another figure with short spiky hair.
person (thinking): I've seen something like this before, maybe the problem is X?
colleague: (annotation, saying that they're awestruck at your brilliance)
### panel 1: combining different versions of files is core to git
Illustration showing two boxes, each with three symbols, being added together.
(a, b, y) + (x, b, c) = ???
it's very hard
### panel 2: to merge files, you need to know what the original was
picture:
original is (a, b, c)
one side changed a -> x so it's (x, b, c)
the other side changed c -> y so it's (a, b, y)
### panel 3: git merges by combining all changes
merge machine with everything from panel 1 in a thought bubble:
(a, b, c)
(x, b, c)
(a, b, y)
result is (x, b, y)
### panel 4: if both changed the same line it's a merge conflict
merge machine with a thought bubble showing:
(a, b, c)
(x, b, c)
(z, b, y)
result is (a/z, b, y)
The result has red question marks around it because the first position has two values in conflict. There are also red sad faces and x's around the illustration.
### panel 5: git figures out the original version by looking at commit history
Illustration showing a path with a starting point labelled "original", with a note that this is called the "merge base". Two paths, labelled v1 and v2, diverge from it.
### panel 6: `cherry-pick`, `revert`, `rebase`, and `merge` all need to combine files
Illustration of a smiling stick figure with curly hair.
person: "they all use the same merge algorithm, using some clever tricks! we'll talk about that next."
### You might have heard that DNS updates need time to "propagate".
What's actually happening is that there are old cached records which need to expire.
### DNS records are cached in many places
- browser caches
- DNS resolver caches
- operating system caches
google.com, represented by a box with a smiley face: my DNS records are cached on billions of devices!
### let's see what happens when you update an IP
bananas.com A▾
300 [changed to] 60
1.2.3.4 [changed to] 5.6.7.8
beware: even if you change the TTL to 60s, you still have to wait 300 seconds for the old record to expire
### 30 seconds later...
(you go to bananas.com in your browser)
Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a browser, represented by the Firefox logo of a fox wrapped around a globe
browser: hey what's the IP for bananas.com?
resolver, thinking: let's check my cache for bananas.com... found it!!
resolver: it's 1.2.3.4!
### 400 seconds later...
(you refresh the page again)
browser: hey what's the IP for bananas.com?
resolver, thinking: The TTL (300s) is up, better ask for a new IP...
resolver: it's 5.6.7.8!
### 12 hours later...
(you check 1.2.3.4's logs to make sure all the traffic has moved over)
Illustration of a stick figure with curly hair looking confused, and a rogue DNS resolver, which looks like the other resolvers except that it is wearing a burglar mask.
person: that's weird, the old server is still getting a few requests...
rogue DNS resolver: I don't care about your TTL! I just cache everything for 24 hours!
the culprit: a rogue DNS resolver
### panel 1:
One weird thing about DNS is that different programs on a single computer can get different results for the same domain name.
Let's talk about why!
Illustration of a program, represented by a box with a smiley face, and a resolver (server), represented by a box with a smiley face holding a magnifying glass. Between them is a function, represented by a rectangle with squiggly lines on it. There are arrows going back and forth between the function and both the program and the resolver (server).
The function is the problem.
### reason 1: many (but not all!!) programs use the function getaddrinfo for DNS lookups...
ping, represented by a box with a smiley face: I use getaddrinfo!
dig, also represented by a box with a smiley face: I don't!
So if you see an error message like "`getaddrinfo: nodename or servname not provided...`", that's a DNS error.
### and not using getaddrinfo might give a different result
- the program might not use `/etc/hosts` (dig doesn't)
- the program might use a different DNS resolver (some browsers do this)
### reason 2: there are many different versions of `getaddrinfo`...
- the one in `glibc`
- the one in `musl libc`
- the one in Mac OS
And of course, they all behave slightly differently :)
### you can have multiple getaddrinfos on your computer at the same time
For example on a Mac, there's your system `getaddrinfo`, but you might also be running a container that's using `musl`.
### glibc and musl getaddrinfo are configured with `/etc/resolv.conf`
IP of resolver to use
```
# Generated by NetworkManager
nameserver 192.168.1.1
nameserver fd13: d987:748a::1
```
On a Mac, `/etc/resolv.conf` exists, but it's not used by the system `getaddrinfo`.
### When a resolver gets a DNS query, it has 2 options:
Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass.
resolver: I could tell you what the authoritative nameservers, said... or I could LIE!
### block ads / malware
Illustration of conversation between a resolver and a a browser, represented by the Firefox logo of a fox wrapped around a globe
browser: what's the IP for doubleclick.net?
(ad domain, definitely exists)
resolver: that domain doesn't exist
PiHole blocks ads this way.
### reason to lie: to show you ads (rude!)
browser: what's the IP for zzz.jvns.ca?
(doesn't exist)
resolver: here's an IP that will show you ads!
This is called "DNS hijacking".
### reason to "lie": internal domain names
browser: what's the IP for corp.examplecat.com?
(doesn't exist on the public internet)
corporate resolver: here's an internal IP address!
### reason to lie: airport DNS resolvers sometimes lie
browser: what's the IP for google.com?
airport resolver: you didn't log in yet so I will lie! here is our login page's IP!
### how does your computer know which resolver to use?
When you connect to a network, the router tells your computer which search domain and resolver to use (using DHCP).
Illustration of a router, represented by a box with antennae and a smiley face
router: `192.168.1.1 search domain: lan`
### What's actually happening when the root nameserver redirects to the .com nameserver, on page 6?
Illustration of a resolver, represented by a box with a smiley face holding a magnifying glass, and a root nameserver, represented by a pink box with a smiley face, wearing a stack of three crowns
resolver: what's the IP for example.com?
root nameserver: I am not concerned with petty details like that. Here's the address of the .com nameserver
(this is an NS record)
### The root nameserver can return two kinds of DNS records:
NS records: (in the Authority section)
```
com. 172800 NS a.gtld-servers.net
com. 172800 NS b.gtld-servers.net
```
com. is the name
172800 is the TTL
NS is the type
b.gtld-servers.net is the value
glue records: (in the Additional section)
```
a.gtld-servers.net 86400 A 192.5.6.30
b.gtld-servers.net 86400 A 192.33.14.30
```
a.gtld-servers.net is the name
86400 is the TTL
A is the type
192.33.14.30 is the value
### The NS record gives you the domain name of the server to talk to next, but not its IP address.
resolver: But I need the IP for `a.gtld-servers.net` to communicate with it!
is there a glue record?
### 2 ways the resolver gets the IP address
1. If it sees a glue record for a.gtld-servers.net, the resolver will use that IP
2. otherwise, it'll start a whole separate DNS lookup for a.gtld-servers.net
### glue records help resolvers avoid infinite loops
without a glue record for `a.gtld-servers.net`: disaster!
resolver: what's the IP for `a.gtld-servers.net`?
root nameserver: You should ask `a.gtld-servers.net`
### terminology note
NS records are DNS records with type "NS".
Also, an "A record" means "record with type A", "MX record" means "record with type MX", etc.
(confusingly, this is not true for glue records, glue records have type A or AAAA. It's weird, I know.)
### dig is my favourite tool for investigating DNS issues
I find its default output unnecessarily confusing, but it's the only standard tool I know that will give you all the details.
### tiny guide to dig's full output
```
$ dig example.com
; <<>> DiG 9.16.24 <<>> +all example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27580
18
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
; example.com. IN A
;; ANSWER SECTION:
example.com. 86400 IN A 93.184.216.34
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Jan 26 11:32:03 EST 2022
;; MSG SIZE rcvd: 56
```
`NOERROR` is the response code
`example.com. 86400 IN A 93.184.216.34` is the answer to our DNS query. The "." at the end means that example.com isn't a subdomain of some other domain (like it's not example.com.degrassi.ca). This might seem obvious, but DNS tools like to be unambiguous.
### panel 3:
Illustration of a smiling stick figure with curly hair.
person: `$ dig +noall +answer` means "Just show me the answer section of the DNS response." It's a lot less to look at!
### panel 4:
`$ dig +noall +answer example.com`
`example.com. 86400 IN A 93.184.216.34`
example.com is the name
86400 is the TTL
IN is the class
A is the record type
93.184.216.34 is the content
just the answer! so much less overwhelming!
### there are two kinds of IP addresses: IPv4 and IPv6
Every website needs an IPv4 address.
IPv6 addresses are optional.
### panel 2:
A stands for IPv4 Address
Example: `93.184.216.34`
AAAA stands for IPv6 AAAAddress (joke, but kinda true)
Example: `2606:2800:220:1:248:1893:25c8:1946`
it's called AAAA (4 As) because IPv6 addresses have 4x as many bytes
### in theory, the Internet is moving from IPv4 to IPv6
This is because there are only 4 billion IPv4 addresses (the internet has grown a LOT since the 1980s when IPv4 was designed!)
### happy eyeballs*
If your domain has both an A and an AAAA record, clients will use an algorithm called "happy eyeballs" to decide whether IPv4 or IPv6 will be faster.
`*` yes that is the real name
### using IPv6 isn't always easy
- not all web hosts give you an IPv6 address
- lots of ISPs don't support IPv6 (mine doesn't!)
### IP addresses have owners
You can find any IP's owner by looking up its ASN ("Autonomous System Number").
(except local IPs like `192.168.x.x`, `127.x.X.X`, `10.x.x.x`, `172.16.x.x`)
### panel 1:
In an internal network (like in a company or school), sometimes you can connect to a machine by just typing its name, like this:
`$ ping labcomputer-23`
Let's talk about how that works!
### many DNS lookup functions support "local" domain names
browser, represented by a box with a smiley face: where's lab23?
function, represented by a rectangle with squiggly lines: where's lab23.degrassi.ca?
arrow pointing to resolver (server) represented by a box with a smiley face holding a magnifying glass
(the function appends a base domain `degrassi.ca` to the end)
### the base domain is called a "search domain"
On Linux, search domains are configured in `/etc/resolv.conf`
Example:
`search degrassi.ca`
this tells `getaddrinfo` to turn `lab23` into `lab23.degrassi.ca`
### getaddrinfo doesn't always use search domains
It uses an option called ndots to decide.
```
search degrassi.ca
options ndots:5
```
this means "only use search domains if the domain name contains less than 5 dots"
### search domains can make DNS queries slower
browser: where's `jvns.ca`?
getaddrinfo, represented by a rectangle with squiggly lines: okay, first I'll try `jvns.ca.degrassi.ca`
this is silly but it can happen!
### avoid search domains by putting a "." at the end
Use `http://jvns.ca.` instead of `http://jvns.ca`
Illustration of a smiling stick figure with curly hair.
person: "local" domain names like this mostly exist inside of big institutions like universities
### When you make DNS changes for your domain, you're editing a DNS record
Туре: A
Name (subdomain): paw
Use @ for root
IPv4 address: 1.2.3.4
TTL: 1 min
Here's what the same record looks like with dig
(we'll explain dig on page 18)
```
$ dig +noall +answer paw.examplecat.com
paw.examplecat.com. 60 IN A 1.2.3.4
```
### DNS records have 5 parts
- name (eg `tail.examplecat.com`)
- type (eg `CNAME`)
- value (eg `tail.jvns.ca`)
- TTL (eg `60`)
- class (eg `IN`)
different record types have different kinds of values: `A` records have an IP address, and `CNAME` records have a domain name.
### name
`paw.examplecat.com`
When you create a record, you'll usually write just the subdomain (like `paw`).
When you query for a record, you'll get the whole domain name (like `paw.examplecat.com`).
### TTL
`60`
"time to live". How long to cache the record for, in seconds.
### class
`IN`
"IN" stands for "INternet". You can ignore it, it's always the same.
### record type
`A`
"A" stands for "IPv4 Address".
### value
`1.2.3.4`
the IP address we asked for!
### when you register a domain, your registrar runs your authoritative nameservers by default
your registrar, represented by a box with a smiley face wearing a crown: I'm taking care of your DNS!
You can change your nameservers in your registrar's control panel.
### LOTS of services can be your authoritative nameserver
your registrar: I can manage your DNS records!
AWS, also represented by a box with a smiley face wearing a crown: me too!
shopify, also also represented by a box with a smiley face wearing a crown: me three!
Nonplussed stick figure with curly hair: ok chill I only need one of you to do it
### how to find your domain's nameservers
```
$ dig +short NS neopets.com
ns-42.awsdns-05.com.
ns-1191.awsdns-20.org.
```
`neopets.com` is using AWS's nameservers right now
### how to change your nameservers
1. Copy your DNS records to the new nameservers (use dig to check that it worked) 2. On your registrar's website, update your nameservers
3. Wait 48 hours
4. Delete the old DNS records (to save your future self confusion)
### why changing your nameservers is slow
registrar: here's the new nameserver for example.com!
.com nameserver, represented by a box with a smiley face, wearing a stack of three crowns: ok great, I've saved this record: `example.com NS newns.com 172800`
updates are VERY SLOW because this TTL is 2 days
### what can go wrong if you don't delete the old records
Illustration of a nonplussed stick figure with curly hair.
person: I'll go to $OLD_NAMESERVER to change my DNS records!
person: WHY doesn't it WORK?!?!?
person: oh right, I changed this domain's nameservers last year, oops!
getting started: git init, git clone
move between branches: git branch, git checkout, git switch
restore old files: git checkout, git restore
preparing to commit: git status, git add, git mv, git rm, git diff, git reset
combining branches: git merge, git rebase, git cherry-pick
working with others: git pull, git push, git fetch, git remote
making commits: git commit
configuring git: git config, git remote
code archaeology: git blame, git log FILENAME, git log -S SEARCh, git show, git diff
trash changes: git stash, git checkout ., git reset --hard, git rebase -i
git troubleshooting: git log BRANCH, git status, git diff, git reflog
editing history: git rebase -i, git reset --hard
You can think about a Git branch in 3 different ways.
Each of the three ways is illustrated with a diagram of a vertical line divided up into four nodes, labelled "main". A diagonal line with three nodes is coming off the second node from the bottom, labelled "branch".
### way 1: just the commits that "branch" off
In this diagram, the two nodes that are on the branch, but not on the main, are labelled "these two".
This is what you're probably thinking about when you `merge` or `rebase` a branch into another one.
Git doesn't keep track of which branch another branch is "based" on though: that's why you have to run `git merge main` (you have to tell it which base branch to merge with!)
You can see these commits with:
```
git log main..BRANCHNAME
```
### way 2: every previous commit
In this diagram, the same two nodes are indicated as in way 1, plus the node on main that the branch comes out of, and the node on main before the branch.
This is what `git log BRANCHNAME` shows you.
When we say a commit is "on" a branch, we mean that it's somewhere in the history for that branch.
### way 3: just the commit at the end
In this diagram, the one node on the branch farthest from where main and branch diverge is labelled "this one".
This is how git represents a branch internally. You can run:
```
cat .git/refs/heads/BRANCHNAME
```
to see the commit ID for the branch.
That commit's parent (and grandparents, great-grandparents, etc) determine the branch's history.
### you might expect git to enforce some rules about branches
some rules you might imagine:
* you can't remove commits from a branch, only add them
* the `main` branch has to stay more less in sync with `origin/main`
But there are no rules.
git character with demon hat: want to do something horrible to your branch? no problem!
### there are literally no rules
commands that you can use to do weird stuff to a branch:
* `git reset`
* `git rebase`
### instead of rules, we have conventions
for example:
* run `git pull` often to keep your `main` up to date
* if you're working with a big team, don't commit to `main` directly
Illustration of the git demon talking to a nonplussed stick figure with curly hair.
git demon: you've just gotta be really careful to not do the wrong thing and not mess up your branch
person: um... thanks?
### our only saviour: the reflog
`git reflog BRANCHNAME`
will show you the history of every change to the branch, so you can always undo
the reflog is a VERY unfriendly UI, but it's always there.
Illustration of a smiling stick figure with short curly hair.
Person: git has 17 million options but this is how I use it!
### getting started
#### start a new repo:
`git init`
#### clone an existing repo:
`git clone $URL`
### know where you are
`git status`
### prepare to commit
#### add untracked file:
(or unstaged changes) `git add $FILE`
#### add ALL untracked files and unstaged changes:
`git add`
#### choose which parts of a file to stage:
`git add -p`
#### delete or move file:
```
git rm $FILE
git mv $OLD $NEW
```
#### tell git to forget about a file without deleting it:
`git rmcached $FILE`
#### unstage everything:
`git reset HEAD`
### make commits
#### make a commit:
(and open a text editor to write the message)
`git commit`
#### make a commit:
`git commit -m 'message'`
#### commit all unstaged changes:
`git commit -am 'message'`
### move between branches
#### switch branches:
`git switch $NAM`E OR `git checkout $NAME`
#### create a branch:
`git switch -c $NAME` OR `git checkout -b $NAME`
#### list branches:
`git branch`
#### delete a branch
`git branch -d $NAME`
#### force delete a branch:
`git branch -D $NAME`
#### list branches by most recently committed to:
```
git branch
--sort--committerdate
```
### look at a branch's history
#### log the branch
`git log main`
#### show how two branches relate to each other:
`git log-graph a b`
#### one line log:
`git log-oneline`
### code archaeology
#### show who last changed each line of a file:
`git blame $FILENAME`
#### show every commit that modified a file:
`git log $FILENAME`
#### find every commit that added or removed some text:
`git log S banana`
### diff commits
#### show diff between a commit and its parent:
`git show $COMMIT_ID`
#### show diff between a merge commit and its merged parents:
`git show --remerge-diff $COMMIT_ID`
#### diff two commits:
`git diff $COMMIT_ID $COMMIT_ID`
#### just show diff for one file:
`git diff $COMMIT_ID $FILENAME`
#### show a summary of a diff:
`git diff $COMMIT_ID --stat git show $COMMIT_ID --stat`
### diff staged/unstaged changes
#### diff all staged and unstaged changes:
`git diff HEAD`
#### diff just staged changes:
`git diff --staged`
#### diff just unstaged changes:
`git diff`
### configure git
#### set a config option:
`git config user.name 'Julia'`
#### see all possible config options:
`man git-config`
#### set option globally:
`git config --global ...`
#### add an alias:
`git config alias.st status`
### important git files
#### local git config:
`.git/config`
#### global git config:
`~/.gitconfig`
#### list of files to ignore:
`.gitignore`
### combine diverged branches
#### how the branches look before:
Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The second branch consists of a box with a spiral and a box with a squiggle. The second branch is labelled "banana".
#### combine with rebase:
```
git switch banana
git rebase main
```
Diagram of two boxes in a row, connected by lines. The first one has a heart, the second one has a star. Branching off from the star, there is one branch with a box with a hashtag symbol, labelled "main". The box with the spiral and the box with the squiggle have been added on after the box with the hashtag. The box with the squiggle is labelled "banana". The second branch, with the box with a spiral and the box with a squiggle, are drawn with dotted lines and labelled "lost".
#### combine with merge:
```
git switch main
git merge banana
git commit
```
This diagram is like the "before" diagram, except now the two branches converge into a new box, with a diamond in it, labelled "main".
#### combine with squash merge:
```
git switch main
git merge git commit
squash banana
```
This diagram is like the "before" diagram, except now, in the first of the two branches, after the hashtag symbol, there is a new box with both a spiral and a squiggle in it, labelled "main".
### bring a branch up to date with another branch
(aka "fast-forward merge")
main
banana
---0-0
```
git switch main
git merge banana
```
banana
---0-2 main
### copy one commit onto another branch
before:
-K
← main
+banana
git cherry-pick $COMMIT_ID
after:
K
main
©
-banana
### add a remote
`git remote add $NAME $URL`
### push your changes
#### push the main branch to the remote origin:
`git push origin main`
#### push a branch to the remote origin that you've never pushed before:
`git push u origin $NAME`
#### push the current branch to its remote "tracking branch":
`git push`
#### force push:
`git push --force-with-lease`
#### push tags:
`git push --tags`
### pull changes
#### fetch changes:
(but don't change any of your local branches)
`git fetch origin main`
#### fetch changes and then merge them into your current branch:
`git pull origin main` OR `git pull`
#### fetch changes and then rebase your current branch:
`git pull --rebase`
#### fetch all branches:
`git fetch --all`
### ways to refer to a commit
every time we say $COMMIT_ID, you can use any of these:
* a branch (`main`)
* a tag (`v0.1`)
* a commit ID (`3e887ab`)
* a remote branch (`origin/main`)
* current commit (`HEAD`)
* 3 commits ago (`HEAD^^^`)
* 3 commits ago (`HEAD~3`)
### there are 3 options for combining branches
- merge
- rebase
- squash
for example, let’s say we’re combining these 2 branches:
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle.
### panel 2:
git rebase
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled “lost”.
git merge
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond.
git merge --squash
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 has a box with a spiral, followed by a branch with a squiggle.
### all 3 methods result in the EXACT SAME FILES
some differences are:
- the diff git shows you for the final commit
- the commit ids
- the specific flavour of suffering the method causes
### rebase
pro: you can keep your git history simple:
Diagram: a git history that is just a series of boxes in a straight line.
pain:
- harder to learn [sad face]
- harder to undo [sad face]
- easier to mess up [sad face]
(I love rebase though!)
### merge
pro: if you mess something up, the original commits are still in your branch’s history
pain: when I look at histories like this I feel dread [sad face]
Diagram: a complicated git history with a number of different branches.
### squash
pro: have 20 messy commits? nobody needs to know!
And it’s pretty simple to use.
pain: “ugh, someone squashed their 3000-line branch into 1 commit” [sad face]
### the (64-bit) floating point number line
Floating point numbers aren't evenly distributed. Instead, they're organized into windows: [0.25, 0.5], [0.5, 1], [1,2], [2,4], [4,8], [8,16], all the way up to [2^1023, 2^1024].
Every window has 2^52 floats in it.
- between -2 and -1
- between -1 and - 1/2
- between - 1/2 and - 1/4
- between - 1/4 and 0
- between 0 and 1/4
- between 1/4 and 1/2
- between 1/2 and 1
- between 1 and 2
### the windows go from REALLY small to REALLY big
The window closest to 0 is [2-1023 2-1022].
This is TINY: a hydrogen atom weighs about 2^-76 grams.
The biggest window is [2^1023, 2^1024].
This is HUUUGE: the farthest galaxy we know about is about 2^90 meters away.
### the gaps between floats double with every window
- window: [1, 2] gap: 2^-52
- window: [2, 4] gap: 2^-51
- window: [4, 8] gap: 2^-50
- window: [8, 16] gap: 2^-49
### why does `10000000000000000.0 + 1 = 10000000000000000.0?`
- In the window [2^n, 2^n+1], the gap between floats is 2^n-52
- `10000000000000000.0` is in the window [2^53, 2^54], where the gap is 2^1 (or 2)
- So the next float after `10000000000000000.0` is `10000000000000002.0`
### the "up to date" in `git status` is misleading
```
$ git status
Your branch is up to date with origin/main
```
this does NOT mean that you're up to date with the remote main branch. But why not???
### some old version control systems only worked if you were online
Illustration of a sad stick figure with short curly hair.
person (thinking): my internet went out, guess I can't work
### git works offline
Illustration of a smiling stick figure with short straight hair.
git developer (thinking): I want to be able to code on a train with no internet
git developer (thinking): NOTHING in git will use the internet except `git pull`, `git push`, and `git fetch`
### this makes `git status` weird
git developer (thinking): we need to tell people if their branch is up to date... with NO INTERNET??? how?
### solution: CACHING
Every remote branch has a local cache named like `origin/mybranch` (`origin` is the remote name, `mybranch` is the branch name)
Git doesn't call it a cache though, it calls it a "remote tracking branch"
local branch: `mybranch`
cache: `origin/mybranch` (only updated on `git pull`, `git push`, `git fetch`)
remote branch: `origin mybranch` (`git push origin mybranch` updates this)
(git has no easy way to see when `origin/mybranch` was last updated)
### commits in git are usually saved forever
Except! Orphaned commits are deleted periodically.
Illustration of a little garbage can.
Commits are orphaned when you:
- `git commit --amend`
- `git rebase`
- delete a branch that hasn't been merged
### what is an orphaned commit?
it's a commit that isn't in the history of any branch
they're almost totally invisible, since Git will usually only show you commits on branches
### orphan #1: `git commit --amend`
before:
An illustration for a box that says `parent`, with a line to a second box that says `fix color buug` (typo!). The second box is labelled `main` branch.
after:
The same diagram as above, but there is now a second line coming out of the `parent` box, going to a third box that says `fix color bug`. The `fix color buug` box is now labelled "now it's an orphan!" and the `fix color bug` box is labelled "`main` branch".
### orphan #2: `git rebase`
before:
A box with two branches coming out of it. The top one is labelled "`main` branch". The second branch has two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch".
after:
A box with two branches coming out of it. The top branch consists of three boxes, one blank, one with a heart, and one with a star. The blank box is labelled "`main` branch", and the box with the star is labelled "`feature` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "now these two are orphans!"
### orphan #3: `deleting unmerged branch`
before:
A box with two branches coming out of it. The first branch consists of one blank box, labelled "`main` branch". The second branch consists of two boxes, one with a heart, and one with a star. This branch is labelled "`feature` branch".
after deleting `feature`:
The same diagram as above, except that the second branch is now labelled "now these two are orphans!"
### how to find orphan commits
the only way to find them is with `git reflog` (or by memorizing their commit ID somehow)
### `HEAD`
`HEAD` is a tiny file that just contains the name of your current branch
`.git/HEAD`
`ref: refs/heads/main`
`HEAD` can also be a commit ID, that's called "detached `HEAD` state"
### branches
a branch is stored as a tiny file that just contains 1 commit ID. It's stored in a folder called `refs/heads`.
`7622629` - (actually 40 characters)
tags are in `refs/tags`, the stash is in `refs/stash`
### commit
a commit is a small file containing its parent(s), message, tree, and author
`.git/objects/7622629`
```
tree c4e6559
parent 037ab87
author Julia <x@y.com> 1697682215
committer Julia <x@y.com> 1697682215
commit message goes here
```
these are compressed, the best way to see objects is with `git cat-file -p HASH`
### trees
trees are small files with directory listings. The files in it are called "blobs"
`.git/objects/c4e6559`
```
100644 blob e351d93 404.html
100644 blob cab4165 hello.py
040000 tree 9de29f7 lib
```
the permissions here LOOK like unix permissions, but they're actually super restricted, only 644 and 755 are allowed
### blobs
blobs are the files that contain your actual code
`.git/objects/cab4165`
`print("hello world!!!!")`
### reflog
the reflog stores the history of every branch, tag, and `HEAD`
`.git/logs/refs/heads/main`
```
2028ee0 c1f9a4c
Julia Evans <x@y.com>
1683751582
commit: no ligatures in code
```
each line of the reflog has:
- before/after commit IDs
- user + - timestamp
- log message
### remote-tracking branches
remote-tracking branches store the most recently seen commit ID for a remote branch
`.git/refs/remotes/origin/main`
`a9bbcae`
when git status says "you're up to date with `origin/main`", it's just looking at this
### .git/config
.git/config is a config file for the repository. it's where you configure your remotes
`.git/config`
```
[remote "origin"]
url = git@github.com: jvns/int-exposed
fetch = +refs/heads/*: refs/remotes/origin/*
[branch "main"]
remote = origin
merge refs/heads/main
```
git has and local global settings, the local settings are here and the global ones are in `~/.gitconfig`
### hooks
hooks are optional scripts that you can set up to run (eg before a commit) to do anything you want
`.git/hooks/pre-commit`
```
#!/bin/bash
any-commands-you-want
```
### the staging area
the staging area stores files when you're preparing to commit
`.git/index`
`(binary file)`
### there are 3 options for combining branches
* `merge`
* `rebase`
* `squash`
for example, let's say we're combining these 2 branches:
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of one box with a hash symbol, and branch 2, which consists of a branch with a spiral, followed by a branch with a squiggle.
### panel 2:
1. `git rebase`
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a branch with a spiral, then a box with a squiggle. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan".
2. `git merge`
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branches 1 and 2 both lead into a new box, with a diamond.
3. `git merge --squash`
Diagram: A box with a heart. To its right is a box with a star. From here, it branches out into branch 1, which consists of a box with a hash symbol, followed by a new box containing both a squiggle and a spiral. Branch 2 consists of a box with a spiral, followed by a box with a squiggle. Branch 2 is made up of dotted lines and labelled "orphan".
### all 3 methods result in the EXACT SAME FILES
some differences are:
* the diff git shows you for the final commit
* the specific flavour of suffering the method causes
### merge
pro: if you mess something up, the original commits are still in your branch's history
pain: when I look at histories like this I feel dread
Diagram: a complicated git history with a number of different branches.
### rebase
pro: you can keep your git history simple:
Diagram: a git history that is just a series of boxes in a straight line.
pain:
- harder to learn [sad face]
- harder to undo [sad face]
- easier to mess up [sad face]
(I love rebase though!)
### squash
pro: have 20 messy commits? nobody needs to know!
And it's pretty simple to use.
pain: "ugh, someone squashed their 3000-line branch into 1 commit"
### when pushing/pulling, the hardest problems are caused by diverged branches
sad error messages:
```
! [rejected]
main -> main
```
(non `fast-forward`)
`fatal: Not possible to fast-forward, aborting`
`fatal: Need to specify how to reconcile divergent branches.`
### what are diverged branches
it looks like this:
Diagram with two blank boxes, followed by a box with a heart in it, that then branches out into two branches, one with a hash symbol in it, labelled "local main", and one with a squiggle in it, labelled "remote main".
### there are 4 possibilities with a remote branch
1. up to date (with a heart)
Illustration of three boxes in a row, labelled both "local" and "remote"
2. need to pull
Illustration of four boxes in a row. The second box in the sequence is labelled "local", the fourth branch is labelled "remote".
3. need to push
Illustration of four boxes in a row. The second box in the sequence is labelled "remote", the fourth branch is labelled "local".
4. diverged (need to decide how to solve it) (sad face)
Illustration of two boxes in a row, that then branches out into two branches. One of the branches has one box, labelled "remote", and the other branch has two boxes, labelled "local".
### how to tell your branches have diverged: `git status`
1. `$ git fetch` (get the latest remote state first)
2. `$ git status`
Your branch and '`origin/main`' have diverged, and have 1 and 1 different commits each, respectively.
(use "`git pull`" to merge the remote branch into yours)
(diverged is highlighted)
### fix diverged branches before making more commits
First illustration: two boxes in a row, then branches out into two branches, each with one box. It's labelled "not so bad to resolve..."
Second illustration: two boxes in a row, then branches out into two branches, but each branch has a whole bunch of boxes.
Illustration of a sad stick figure with curly hair.
person: oh no
### there's no one solution
Illustration of a smiling stick figure with curly hair.
person: on the next page we'll talk about some options!
### ways to reconcile two diverged branches
Illustration of a sequence of boxes joined with lines. The first box is a star, the second box is a heart, and then it branches out into two boxes, one with a hash symbol and one with a squiggle. Hash symbol box is labelled "local main" and squiggle box is labelled "remote main"
- combine the changes from both with (1) rebase or (2) merge!
- throw out your local changes (3) after breaking your local branch!
- throw out the remote changes (4) to get rid of something you accidentally pushed (be REAL careful with this one)
### 1. rebase
```
git pull --rebase
git push
```
Illustration of four boxes (star, heart, squiggle, hash) in a straight line, labelled "local main" and "remote main"
Illustration of a tiny little smiling stick figure with puffy hair in the corner of the panel.
person: this one's my favourite!
### 2. merge
```
git pull --no-rebase
git push
```
Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle) then reconvene into a fifth box, with a diamond in it, labelled "local main" and "remote main"
### 3. throw away local changes
```
git switch -c newbranch
git switch main
git reset --hard origin/main
```
(the first line is labelled "optional: save your changes on `main` to `newbranch` so they're not orphaned)
Illustration of two boxes (star and heart) that then diverge into two branches (hash and squiggle), which are labelled "new branch" and "local main, remote main" respectively.
### 4. throw away remote changes (DANGER!)
```
git push --force
```
Illustration of two boxes (star and heart) that then diverge into two branches one with a hash symbol, labelled "local main, remote main", and one with a squiggle, whose box is a dotted line, and that's labelled "orphan".
(`--force` is always dangerous, `--force-with-lease` is a little safer)
### reasons to throw away changes
- I'll throw away local changes if I accidentally committed to `main` instead of a new branch
- I'll throw away remote changes if I want to amend a commit after pushing it, and I'm the only one working on that branch
### HEAD is a tiny file containing the name of the current branch
Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`".
`HEAD` = `main`
`main` = [squiggle]
### when you commit, git updates the current branch to point at the new commit
Diagram of three boxes in a row, joined by lines. One has a heart, one has a star, and one has a squiggle. The final one, with the squiggle, is labelled "`main`".
`HEAD` = `main`
`main` = [squiggle]
Diagram of four boxes in a row, joined by lines. One has a heart, one has a star, one has a squiggle, and one has a spiral. The final one, with the spiral, is labelled "`main`".
`HEAD` = `main`
`main` = [spiral]
### SO MANY things in git use the current branch
* `git commit` moves it forward
* `git merge` merges into it
* `git rebase` copies commits from it
* `git push` and `git pull` sync it with a remote
### many git disasters are caused by accidentally running a command while on the wrong branch
Illustration of a sad stick figure
person: `git commit`
person, thinking: UGH I didn't mean to do that on `main`
### I keep my current branch in my shell prompt
`~/work/homepage (main) $`
to me it's as important as knowing what directory I'm in
### panel 6
Illustration of a smiling stick figure with curly hair.
person: I think `HEAD` is a weird name for the current branch (why not `CURRENT` or something?) but we're stuck with it
### `HEAD` isn't always a branch
it can be a commit instead
`git checkout a3ffab9`
(a3ffab9 isn't a branch!)
git calls this "detached `HEAD` state"
### by itself, HEAD being a commit ID is okay
Illustration of a smiling stick figure with curly hair.
person: it's a great way to look at an old version of your code! I do it all the time!
### the only problem is that new commits you make will be orphaned
Diagram of a series of circles connected by lines, labelled "main". The first circle is labelled `HEAD`. There is a dotted line branching off `HEAD` to an additional circle. The additional circle is labelled "new commit will go here, danger! it won't be on any branch!"
### some ways `HEAD` can become a commit ID
`git checkout a3ffab3`
(`a3ffab3` is the commit id)
`git checkout origin/main`
(`origin/main` is the "remote-tracking branch")
`git checkout v1.3`
(`v1.3` is a tag)
### if you accidentally create some orphaned commits, it's SUPER easy to fix
just create a new branch!
`git switch -c newbranch`
panel 6: my shell prompt tells me if `HEAD` is a commit
`~/work (d63b29) $`
`d63b29` tells me to avoid creating new commits
(no `git commit`, `git merge`, or `git rebase`)
### branches have very few rules
git lets you move branches forwards, backwards, or sideways if you want
Illustration of three circles in a vertical line, with an additional branch extending out of the middle circle. The top circle is labelled "`main`". The middle circle is labelled "You could move `main` here. The circle in its own branch is labelled "or here."
### all changes to a branch are recorded in its reflog
You can look at the reflog like this:
`git reflog BRANCHNAME`
reflog stands for "reference log"
### when you delete a branch, its reflog is deleted
Illustration of a sad stick figure with short curly hair, talking to a box with a smiley face representing git.
person: what if I wanted to look at the history of that branch to recover something?
git: too bad!
### git will eventually delete any commit that isn't on a branch/tag/etc
Illustration of four circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the second-from-bottom circle, and it is labelled "will be deleted by garbage collection after ~90 days unless you put it on a branch."
### git `branch -d` won't let you delete unmerged branches
Illustration of three circles in a vertical line. The top one is labelled "`main`". There is a branch coming off of the bottom circle, labelled "my branch (not merged)"
to delete an unmerged branch, you need to force it with `-D`
### rules git doesn't have about branches
- when you push/pull a branch, the name doesn't have to match
- the main branch doesn't have any special protections in git itself (though tools like GitHub can protect it)
### git often uses the term "reference" in error messages
```
$ git switch asdf
fatal: invalid reference: asdf
$ git push
To github.com:jvns/int-exposed
! [rejected] main -> main
error: failed to push some refs to 'github.com:jvns/int-exposed'
```
"ref" and "reference" mean the same thing
Illustration of a tiny worried-looking stick person with a thought bubble reading "!"
### "reference" often just means "branch"
in those two error messages, you can replace "reference" with "branch"
in my experience, it's:
96% "branch"
3% "tag"
3% "HEAD"
0.01% something else
### it's an umbrella term
Illustration of git, represented by a box with a smiley face
git, thinking: "well, I COULD check if the thing we failed to push is a branch or tag or what, and customize the error message based on that...."
git, thinking: "seems complicated, let's just print out "reference""
sad person: "why?"
### reference: the definition
References are files: either `.git/HEAD` or files in `.git/refs`. There are 5 main types.
Here's a list of every type of git reference that I have ever used:
- HEAD: `.git/HEAD`
- branches: `.git/refs/heads/BRANCH`
- tags: `.git/refs/tags/TAG`
- remote-tracking branches: `.git/refs/remotes/REMOTE/BRANCH`
- stash: `.git/refs/stash`
all of these files contain a commit ID, but the way that commit ID is used depends on what type of reference it is
(examples of more obscure references are `.git/FETCH_HEAD` and `.git/refs/notes/...` but I've never needed to think about those and your repository probably doesn't even have notes)
### git's garbage collection starts with references
the algorithm is:
1. find all references, and every commit in every reference's reflog
2. find every commit in the history of any of those commits
3. delete every commit that wasn't found
### many git disasters are caused by accidentally running a command while on the wrong branch...
Illustration of a stick figure with a neutral expression.
person: `git commit`
person, thinking: UGH I didn't mean to do that on `main`
### ... or by forgetting you're in the middle of a multistep operation
smiling stick figure with curly hair: la la la just writing code
same person, now distressed and surrounded by exclamation marks: OMG I FORGOT I WAS IN THE MIDDLE OF A MERGE CONFLICT
### I always keep track of 2 things
1. am I on a branch, or am I in detached `HEAD` state?
2. am I in the middle of some kind of multistep operation? (`rebase`, `merge`, `bisect`, etc)
### I keep my current branch in my shell prompt
`~/work/homepage (main) $`
to me it's as important as knowing what directory I'm in
git comes with a script to do this in bash/zsh called `git-prompt.sh`
### decoder ring for the default git shell prompt
`(main)`
on a branch, everything is normal
`((2e832b3...))`
`((v1.0.13))`
the double brackets (( )) mean `detached HEAD state`. this prompt can only happen if you explicitly `git checkout` a commit/tag/remote-tracking branch
`(main|CHERRY-PICK)`
`(main|REBASE 1/1)`
`(main|MERGING)`
`(main|BISECTING)`
in the middle of a cherry-pick/rebase/merge/bisect
### merging 2 diverged branches creates a commit
`git merge mybranch`
Diagram of two boxes in a row, one with a heart, and one with a star. From the star, it branches out into a branch with a hash symbol, labelled `main`. The other branch coming off of the star has a box with a spiral followed by a box with a spiky symbol. The two branches converge in a box with a diamond symbol, labelled "merge commit!".
merge commits have a few surprising gotchas!
### gotcha: merging isn't symmetric
normal:
```
git checkout main
git merge mybranch
```
weird:
```
git checkout mybranch
git merge main
```
these two result in the same code, but the merge commit's parents have a different order
This comes up when you use `HEAD^`: it refers to the first parent, and usually you want that to be the commit from the main branch
### gotcha: you can keep coding during a merge
If you forget you're doing a merge, it's easy to accidentally keep writing code and add a bunch of unrelated changes into the merge commit.
I use my prompt to remind me.
### gotcha: git show doesn't tell you what the merge commit did
It'll often just show the merge commit as "empty" even if the merge did something important (like discard changes from one side).
Illustration of a tiny sad stick person with curly hair
person: why
### tip: see what a merge did with `git show --remerge-diff`
`git show --remerge-diff COMMIT_ID`
will re-merge the parents and show you the difference between the original merge and what's actually in the merge commit
### panel 1
Illustration of a smiling stick figure with curly hair.
person: I find submodules confusing and I avoid them if possible, but here's what I've learned from other people's writing on submodules
(especially Dmitry Mazin's great "Demystifying Git Submodules" post)
### submodules let you store another git repository as a subdirectory
```
git submodule add
https://github.com/jvns/myrepo
./myrepo
```
(`jvns` is the remote, `myrepo` is the local path)
Git will store the commit ID and URL of the submodule
### gotcha: cloning a repository doesn't download its submodules
To get the submodules, you can run this after cloning the repository:
`git submodule update --init`
### gotcha: git pull and git checkout don't update submodules gotcha: git pull and git checkout don't update submodules
To actually update them, you have to run:
`git submodule update`
every single time you switch branches or pull
### gotcha: git submodule update puts the submodule in detached HEAD state
might not be a big deal if you're only using the submodule in a read-only way, but seems like it could get weird if you're editing it
### some submodule config options
automatically update submodules after a pull/checkout:
`submodule.recurse true`
show which commits were added/removed in `git diff/git status`:
```
status.submoduleSummary true
diff.submodule log
```
### git worktree lets you have 2 branches checked out at the same time
Illustration of a smiling stick figure with curly hair, and a git worktree, represented by a box with a smiley face
person: ugh, I want to take a look at this other branch, but I have all these uncommitted changes...
git worktree: i can help!
### creating a worktree
You can check out a branch into a new directory like this:
`git worktree add ~/my/repo mybranch`
(`my` is the directory, `mybranch` is the branch)
Then you can run any normal git commands in the new directory:
```
$ cd ~/my/repo
$ git pull
```
### two worktrees cant have the same branch checked out
Here's what happens if you try:
```
$ git checkout main
fatal: main is already checked out at /home/bork/work/homepage
```
### it's way faster (and uses less space!) than cloning the repository again
Because worktrees share a .git directory, it just needs to check out the files from the branch you want to use!
### other worktree commands
List all worktrees:
`$ git worktree list`
Delete a worktree:
`$ git worktree remove ~/my/repo`
### sometimes I use worktrees to keep my .git directory and its checkout separate
this lets me put the checkout in Dropbox but not the .git directory:
```
$ git clone --bare git@github.com:jvns/myrepo
$ cd myrepo.git
$ git worktree add ~/Dropbox/myrepo main
```
(`Dropbox` is the directory, `main` is the branch)
### `git add -p` lets you stage some changes and not others
I use this if I want to commit my real changes, but not the random debugging code I added.
(this is one of the tasks GUIs and IDEs are best at, but I always use `git add -p` anyway)
### what the interface looks like
```
--- a/package.json
+++ b/package.json
@@ -1,7 +1,7 @@
"name": "homepage",
- "version": "1.0.0",
+ "version": "1.0.1",
"devDependencies": {
- "dart-sass": "^1.25.0"
+ "dart-sass": "^1.26.0",
(1/1) Stage this hunk
[y,n,q,a,d, s,e,?]?
```
package.json is the filename
lines 4-9 are the diff
`[y,n,q,a,d, s,e,?]` is your choice
### y(es)/n(o)/q(uit)
y means "stage this change"
n means "don't"
q quits, keeping what you did so far. pretty straightforward.
### how to check your work
`git diff --cached`
will show your staged changes
### s: split into two parts
s will split a diff into smaller diffs you can say y or n to individually, like this:
```
+++ b/package.json
@@ -1,7 +1,7 @@
- "version": "1.0.0",
+ "version": "1.0.1",
"devDependencies": {
```
BUT! This only works if there's a newline between the two parts.
### how to split a diff if there's no newline
You can use the e ("edit") option to edit the diff manually:
- to remove a - line, replace "-" with a space
- to remove a + line, delete the whole line
version 1:
```
"name": "homepage",
- "version": "1.0.0",
- "devDependencies": { "version": "1.0.1",
+ "devDependenciezzz'
```
version 2:
```
"name": "homepage",
- "version": "1.0.0",
+ "version": "1.0.1",
[space] "devDependencies":
[space]
```
(or you can just say 'n' and edit your code! that's what I do!)
### PATH is how your shell knows where to find programs
Illustration of a smiling stick figure with curly hair, and shell, represented by a box with a smiley face.
person: run `python3`
PATH is
```
/bin
/home/bork/bin
/usr/bin
```
shell, thinking:
`/bin/python3`? nope, doesn't exist
`/home/bork/bin/python3`? nope, doesn't exist
`/usr/bin/python3`? there it is!!! I'll run that!
### how to add a program to your PATH
1. find the folder the programs is in
2. update your shell config to add it to your `PATH`
3. restart your shell, for example by opening a new terminal tab
### ...but how do you find the folder
* think about how you installed it
person (thinking): hmm, I used the Rust installer, where does that install things?
* a brute force search
`find / -name python3 | grep bin`
### `PATH` ordering drama
person (thinking): ugh, no, don't run THAT `python3`, run the other one!
You can prioritize a folder by adding it to the beginning of your `PATH`
### gotcha: not everything uses your shell's `PATH`
cron jobs usually have a very basic `PATH`, maybe just `/bin` and `/usr/bin`
In a cron job I'll use the absolute path, like:
`/home/bork/bin/someprogram`
### quitting a terminal program isn't always easy
Illustration of a stick figure with short curly hair. They look distressed and have an exclamation mark above their head.
person (thinking): "I pressed `Ctrl-C` 17 times and NOTHING HAPPENED"
### ways to quit
- `Ctrl-C` - the default
- `Ctrl-D` - if you're at a prompt in a `REPL >>>`
- `q` - if it's a full screen program
- `Ctrl-\` - sometimes works if `Ctrl-C` doesn't
- `kill -9` - the last resort
### how `Ctrl-D` works
programs that read input will usually have some code like this:
```
text = read_line()
if (text == EOF) {
exit()
}
```
`Ctrl-D` is how you send an EOF to the program ("I'm done!")
important: `Ctrl-D` ONLY works if you press it on an empty line
### how `Ctrl-C` works *
`*` unless your program is in "raw mode", we'll talk about that later
person, smiling: "`ctrl-C`"
terminal emulator, represented by a box with a dollar sign: "ok, C is the 3rd letter of the alphabet, I'll write 3 to the tty"
OS terminal driver, represented by a box labelled "OS": ah, a 3, that means I should send the `SIGINT` signal to the current program
program, represented by a box with a smiley face: ooh, a `SIGINT`, I will [shutdown gracefully, immediately exit, ignore it, stop a subtask, etc]
`*` unless your program is in "raw mode", we'll talk about that later
### some programs have weird quitting incantations
for example every text editor (vim, nano, emacs, etc) has its own completely unique way to quit
### editing text you typed in seems so basic:
`>>> print("helo")`
oops, forgot an l!
but there's actually no standard system
### programs need to implement even the most basic things
Illustration of a little smiling stick figure with curly hair.
person: "left arrow"
program, represented by a box with a smiley face: "ok I will move the cursor to the left"
often programs will use the readline library for this
### option 1: NOTHING
person (angry): "even the ARROW KEYS don't work???"
program (blissfully content): arrow keys? what's that?
* Only `Ctrl-W` `Ctrl-U` and backspace work
* Examples: `cat`, `nc`, `git`
* You're probably in this situation if you press the left arrow key and it prints `^[[D`
* You can often add readline shortcuts with `rlwrap`, like this:
$ rlwrap nc
### option 2: READLINE
person (neutral): "it's a little awkward but at least I can use those weird keyboard shortcuts from emacs!"
* LOTS of keyboard shortcuts: `Ctrl-A` `Ctrl-E` , arrow keys, many more
* You can use `Ctrl-R` to look at history
* Examples: `bash`, `irb`, `psql`
* If you press `Ctrl-R` and you see "reverse-i-search" , you're probably using readline
* Configurable with the `~/.inputrc` config file
### option 3: CUSTOM
person (smiling): "wow, I can type a multiline command without it being a total disaster?? amazing!"
* The keyboard shortcuts are probably influenced by readline
* Examples: `fish`, `zsh`, `ipython`
* usually you only see custom implementations in bigger projects