Notes: Computer Networks

This is an ongoing course, which is so far just half complete. These are my notes from the same.

Introduction

Jan 1 1983 — internet flag day. Arpanet transitioned to TCP/IP

1982 DNS was rolled out that replaced the host.txt file that used to have a list of all the hosts of the network

1988 TCP congestion control was rolled out

1989 BGP and other routing protocols were rolled out

Packet switching

Used to achieve the goal of multiplexing on a network connection

Which means that multiple users can use the same network to send and receive packets. Unlike circuit switching, which is used while making phone calls, no connection is reserved by any party for the duration that it is used.

Pros: better use of resources,

Cons: variable delays, may sometimes lead to lost packets.

Switches

The problems like flooding or broadcasting hell that arise due to a cyclic physical topology in a subnet is solved by using spanning trees to decide how the network packets should be transmitted

Cyclic networks are created to have redundant paths in case of connection failures

Intra Domain AS topology

AS -- Autonomous Systems

nodes - Points of presence (POPs)

edges — constrained by fiber paths

two types of Intra Domain routing

Distance vector

sends vetors to neighbors
routers compute cost to each destination for shortest path
based on bellman ford algorithm
Problem — count to infinity problem. Solution — poison reverse
Eg. Routing information protocol (RIP)
- split horizon
- circa 1982
- refresh 30 sec or when updates occur
- edges have unit cost
- inf = 16

Link State

Eg: OSPF, ISIS
uses dijkstra for shortest path computation
Doesn’t scale well, so has the concept of dividing networks into hierarchy layers

Inter domain AS routing

The protocol → BGP

Nodes publish route advertisements

border router of two neighbouring AS would/might be in the same subnet, typically a /30 subnet
BGP has two types
- External BGP (EBGP): To exchange routing information between AS
- Internal BGP (IBGP): To share the received routing information (received on border router) to the multiple internal routers of the AS
  - should not be confused with IGP which is an intra domain routing protocol
    
    IGP vs iBGP

BGP route selection process

Local preference

not advertised in route advertisements
Set by the operator of an AS to prefer one route over the other
- Control outbound traffic
AS can also use BGP communities to control inbound traffic paths
- It can define a “primary” and “backup” route in route advertisements to prefer one ingress route over the other
setting BGP communities requires prior agreements

Multiple Exit Discriminator (MED)

used in cases where there are multiple peer connections between two AS
MED supersedes IGP path cost calculation, hence avoiding hot potato routing in some cases
IGP cost changes can lead to too many changes in internally saved routes for IP prefixes. Hence MED helps decouple BGP from IGP
Eg. When a Transit provider peers with a CDN. Transit provider avoids to carry too much of the CDN’s traffic and advertises med values in a way that makes the CDN route long distances within its network

Inter domain routing business

The concept of routing can be seen as the concept of routing money.

Types of neighbors

There are 3 types of neighboring networks for an AS

Customers
Peers
Providers

How money flows

When routing, money always flows from the customer to the next network. Hence customers are a source of income
It is free to route to peers
Money flows towards the providers. Hence providers have to be paid whenever you route through or for them

How money dictates routing decisions on the internet

Rules

Based on the incentive of making more money.

If every network would follow these rules then routing would be guaranteed on the internet, but it turns out that routing is not guaranteed on the internet.

Ranking

Which network will be preferred for the next hop (although these rules do not seem to be a part of BGP)

Customer > Peer > provider

Filtering/Export

An operator has to decide which of it’s learned routes should be advertised

It will always advertise routes to a neighboring customer node to everyone because this will help it make money
It will only advertise peer and provider routes to a customer node because only then it has a chance to make money
- Otherwise, worst case, it advertises a provider’s route to another Provider and then traffic between these two providers passes through it and hence it now has to pay both these providers for the privilege of routing data for them!
  
  How filtering is done to maximise profits

IP addressing

Classful addressing

IP classes -- Identifier, Network ID and Host ID masks

IPv4 prefixes recorded in BGP announcements Routing table growth

IP address allocation on modern internet

IANA is responsible to assign IP addresses to various IP registeries

whois is a tool used to query an IP address in an IP registery

CIDR

in case of overlapping address prefixes, those with longest prefix for net id are preferred in case of routing

Multihoming

Not good for hierarchical IP aggragation

Happens when a stub AS decides to advertise it’s address to other carriers.

Even though it’s original aggregator is already advertising a parent address space (say /8) but another network might now be advertising the longer AS prefix (of say /24)

The longer prefix will be chosen because of the rules, and so to avoid that the original parent network also has to advertise and store IPs of its hosts in its routing table. This leads to longer prefix addresses populating in global address registries

Multihoming leads to longer prefixes in higher levels

Just like AS path pretending, Longest prefix match can be used to control inbound traffic

The address of D is subnet into two equal sized addresses and advertised separatedly, giving D control over loadbalancing routes for ingress

The weekly CIDR report shows which AS are advertising overlapping prefixes and the amount of improvements that can make on their current aggregation techniques

Using tries for lookups

Single bit trie — Efficient memory usage but requires too many memory accesses for a 32 bit lookup. This increases lookup times significantly
Direct trie — instead of single bit lookup we partition the address into two parts of variable bit lengths
- Fast lookup compared to Single bit trie
- But this results in inefficient memory use
  
  Breaking address into parts of 24 and 8 bits for Direct trie

Fast and memory efficient

Use multi-ary trie

Eg.

Further optimizations can include techniques like

creating a leaf pushed trie to avoid one extra memory access on leaf nodes when the node above it is empty

The parent of these nodes is empty and can be used to store the data of it's children to avoid an extra memory lookup for next level in the trie

Now that previously empty node that only contained the address of the two children, contains the actual data of those nodes (probably indexed).

Other techniques are called
- Lulea
- patricia

Alternatives

Didn’t quite understand this

NAT — Network Address translator

Due to scarcity of ip addresses, networks have internal private IPs. These are specified in RFC 3130 (as said in the course, but found concrete details in RFC 1918)

Private IP ranges as mentioned in RFC 1918

So a NAT just creates a mapping from the private ip and port to the public ip and port. This way they can use the same public ip and keep changing the port, thereby eliminating the need to assign unique address to all the devices on the network.

NAT violates internet rules

The e2e rule is violated because if a NAT fails, then the received packets will not reach the right host inside the network
They also restrict direct bi-directional communication between two hosts in different private networks

IPv6

need to better understand the breakdown of the address bits

Note that the address has 13 bits reserved for Tier-1 ISP

So if you change providers the source address changes

To Look into How does QUIC fix this problem with streams? Is it just because it uses UDP for now and might break when it used independently on the transport layer?

Incremental deployment

Deployment is hard due to protocol ossification (will link to soon to be released notes of QUIC, but here's the gist)

Wikipedia: Protocol ossification is the loss of flexibility, extensibility and evolvability of network protocols This means that introducing new protocols is hard because routers and other middleboxes violate the end-to-end principle.

In short, the current middle boxes are only designed to understand IPv4, not v6

But we can’t just replace these with IPv6 compatible versions of the equipment because it would then break the currently running IPv4 backed infrastructure ⇒ Hence they did incremental deployment

Dual stack deployment

One approach to incremental deployment would be to use a dual stack middle box.

Meaning one that can understand both IPv4 and IPv6

There can be two approaches to building a dual stack infrastructure

Different approaches to dual stack deployments

v6 to 4 tunneling

Basically sending the v6 packet as data inside a v4 packet in a v4 silo

The v4 network acts as carrier for v6 packets between compatible networks

I wonder how the routing negotiations for this work, and if this is even considered. Would this be advertised in BGP route announcements? Or is it factored in the calculation for MED? Would this even have a significant increase in RTT to even be considered as a route deciding factor? Probably not

Router

A router contains line card that is I guess the core of what executes all the functionality

Every line card has these components to carry out the following roles

Different components and tasks of a Line Card

Roles of a router

Receive a packet
Read packet headers to determine destination
Search for entry in forwarding table to determine the correct output interface for the packet
Modify packet headers, like
- Decrement the TTL
- Updating packet checksum appropriately
Send the packet to the output interface
- This can be more trickier than just sending it to the designated port, because the port will be accessible only through the router’s interconnection frabric
  
  Routers have an interconnection fabric that connects input ports to output ports

Important decisions

Copies of forwarding tables

Earlier, all the line cards on a router used a central lookup table over a shared bus.

But this became a bottle neck

Now every line card has a copy of the look up table.

This although has it’s own complications of handling coherence among the various copies, but it is much more faster

Crossbar Switching

Switching happens in timeslots

Switching algorithm: Maximal matching

Even though we can only have one connection between the input and output port through the interconnect, per timeslot, we can speed up matching by doing multiple matching in the same time slot, if there are multiple matching required for a given port in a given timeslot

For eg, if the interfaces have a 10Gbps connection, then it is possible to increase number of matching per timeslots by have interconnect speed more than that

This is termed speedup!

Head of line blocking

But speedup does not work if the port sending the packet only has one queue, because even though the interconnect is capable of handling multiple connections now, the packets are still processed synchronously because of the single packet queue

Solution: virtual output queues

Having separate output queues for designated output ports

Breakdown the packet buffer into virtually separate queues for individual output ports

Scheduling

The decision about which ports should be matched in any particular time slot

During which timeslots should input and output ports be matched

Considerations

Efficiency: If there is a packet to be sent and the output port of that packet is available, then the scheduling should be such that it is possible to switch connections quickly to send that packet

Fairness: Given demands at the inputs, each queue at the input is scheduled fairly!

max-min fairness

Here flow allocations are flow rates at input port

Eg:

The dividing by 4 is based on the number of inputs

Note:

that the demand exceeds capacity
demands of small users are met fully
for the large demand users (those whose demands are not fully met) the remaining capacity is shared equally

Different ways to achieve max-min fairness

There are various scheduling techniques to achieve max-min fairness

DNS

A DNS query leads to a recursive search, if the result is not already in cache

Caches

Caches are stored along with a TTL

Caches include NameServer record

Global records like those of google.com rarely change have a larger TTL

Local records like smaller websites have a smaller TTL

Record types

A: Name → IP
NS: Name → Authoritative nameserver ⇒ Leads to further queries, helps implement herarchy
MX: Name → Mail Server
CNAME: Canonical name ⇒ This is a pointer to another domain name that needs to be looked up
PTR: IP → Name ⇒ reverse Lookup
AAAA: Name → IPv6 Address

dig

There are various commands explained in the course, which are easy to learn with a simple 5 minute tutorial

But it is interesting to follow the trace for a PTR query 👇

Note the name server referral to in-addr.arpa at level 2 of the search

in-addr.arpa is an interesting top level nameserver for this

Congestion control

Congestion

Congestion collapse

Increase in load results in Decrease in useful work

After a certain level of load, effective work done decreases because of lost packets and retransmission delays

Causes:

Spurious retransmission of packets. Packets that do not receive ACK. Can be solved by better timers
Undelivered Packets. Resources consumed but dropped later

Congestion collapse isn’t just theory and has been observed in many networks

Congestion collapse can be prevented by proper congestion control

Goals of congestion control

Same as scheduling in router interconnect (efficiency + fairness) + avoid congestion collapse

Approaches

TCP congestion control

Sender increases rate until packets are dropped

TCP interprets packet loss as congestion control and slows down

Although this might not always be true. For eg. in wireless networks, packets can be lost due to interference

But this is generally a good assumption

I remember reading an article referring to TCP slow start that is a hack on this behaviour of TCP for better performance Link: Why your website should be under 14kB in size