Notes: Computer Networks

  1. internet
  2. networks
  3. router
  4. bgp
  5. routing
  6. tcp
  7. dns
  8. ip
  9. autonomous systems

This is an ongoing course, which is so far just half complete. These are my notes from the same.

Introduction

Jan 1 1983 — internet flag day. Arpanet transitioned to TCP/IP

1982 DNS was rolled out that replaced the host.txt file that used to have a list of all the hosts of the network

1988 TCP congestion control was rolled out

1989 BGP and other routing protocols were rolled out

Internet-problems
Internet-problems


Packet switching

Used to achieve the goal of multiplexing on a network connection

Which means that multiple users can use the same network to send and receive packets. Unlike circuit switching, which is used while making phone calls, no connection is reserved by any party for the duration that it is used.

Pros: better use of resources,

Cons: variable delays, may sometimes lead to lost packets.

Switches

The problems like flooding or broadcasting hell that arise due to a cyclic physical topology in a subnet is solved by using spanning trees to decide how the network packets should be transmitted

Spanning tree to avoid cyclic routing
Spanning tree to avoid cyclic routing

Cyclic networks are created to have redundant paths in case of connection failures


Intra Domain AS topology

AS -- Autonomous Systems

nodes - Points of presence (POPs)

edges — constrained by fiber paths

two types of Intra Domain routing

Distance vector

  • sends vetors to neighbors
  • routers compute cost to each destination for shortest path
  • based on bellman ford algorithm
  • Problem — count to infinity problem. Solution — poison reverse
  • Eg. Routing information protocol (RIP)
    • split horizon
    • circa 1982
    • refresh 30 sec or when updates occur
    • edges have unit cost
    • inf = 16

Link State

link state routing
link state routing

  • Eg: OSPF, ISIS
  • uses dijkstra for shortest path computation
  • Doesn’t scale well, so has the concept of dividing networks into hierarchy layers

Inter domain AS routing

The protocol → BGP

Nodes publish route advertisements

  • border router of two neighbouring AS would/might be in the same subnet, typically a /30 subnet
  • BGP has two types
    • External BGP (EBGP): To exchange routing information between AS
    • Internal BGP (IBGP): To share the received routing information (received on border router) to the multiple internal routers of the AS
      • should not be confused with IGP which is an intra domain routing protocol

        IGP vs iBGP
        IGP vs iBGP

BGP route selection process

BGP route selection
BGP route selection

Local preference

  • not advertised in route advertisements
  • Set by the operator of an AS to prefer one route over the other
    • Control outbound traffic
  • AS can also use BGP communities to control inbound traffic paths
    • It can define a “primary” and “backup” route in route advertisements to prefer one ingress route over the other
  • setting BGP communities requires prior agreements

Local preference in BGP announcements
Local preference in BGP announcements

Multiple Exit Discriminator (MED)

  • used in cases where there are multiple peer connections between two AS
  • MED supersedes IGP path cost calculation, hence avoiding hot potato routing in some cases
  • IGP cost changes can lead to too many changes in internally saved routes for IP prefixes. Hence MED helps decouple BGP from IGP
  • Eg. When a Transit provider peers with a CDN. Transit provider avoids to carry too much of the CDN’s traffic and advertises med values in a way that makes the CDN route long distances within its network

Inter domain routing business

The concept of routing can be seen as the concept of routing money.

Types of neighbors

There are 3 types of neighboring networks for an AS

  1. Customers
  2. Peers
  3. Providers

How money flows

  • When routing, money always flows from the customer to the next network. Hence customers are a source of income
  • It is free to route to peers
  • Money flows towards the providers. Hence providers have to be paid whenever you route through or for them

How money dictates routing decisions on the internet
How money dictates routing decisions on the internet

Rules

Based on the incentive of making more money.

If every network would follow these rules then routing would be guaranteed on the internet, but it turns out that routing is not guaranteed on the internet.

Ranking

Which network will be preferred for the next hop (although these rules do not seem to be a part of BGP)

Customer > Peer > provider

Filtering/Export

An operator has to decide which of it’s learned routes should be advertised

  • It will always advertise routes to a neighboring customer node to everyone because this will help it make money
  • It will only advertise peer and provider routes to a customer node because only then it has a chance to make money
    • Otherwise, worst case, it advertises a provider’s route to another Provider and then traffic between these two providers passes through it and hence it now has to pay both these providers for the privilege of routing data for them!

      How filtering is done to maximise profits
      How filtering is done to maximise profits


IP addressing

Classful addressing

IP classes -- Identifier, Network ID and Host ID masks
IP classes -- Identifier, Network ID and Host ID masks

IPv4 prefixes recorded in BGP announcements Routing table growth
IPv4 prefixes recorded in BGP announcements Routing table growth

IP address allocation on modern internet

IANA is responsible to assign IP addresses to various IP registeries

IANA IP regions
IANA IP regions

whois is a tool used to query an IP address in an IP registery

Whois result for Georgia Tech IP
Whois result for Georgia Tech IP

CIDR

in case of overlapping address prefixes, those with longest prefix for net id are preferred in case of routing

Classless Addressing
Classless Addressing

Multihoming

Not good for hierarchical IP aggragation

Happens when a stub AS decides to advertise it’s address to other carriers.

Even though it’s original aggregator is already advertising a parent address space (say /8) but another network might now be advertising the longer AS prefix (of say /24)

The longer prefix will be chosen because of the rules, and so to avoid that the original parent network also has to advertise and store IPs of its hosts in its routing table. This leads to longer prefix addresses populating in global address registries

Multihoming leads to longer prefixes in higher levels
Multihoming leads to longer prefixes in higher levels

Just like AS path pretending, Longest prefix match can be used to control inbound traffic

The address of D is subnet into two equal sized addresses and advertised separatedly, giving D control over loadbalancing routes for ingress
The address of D is subnet into two equal sized addresses and advertised separatedly, giving D control over loadbalancing routes for ingress

The weekly CIDR report shows which AS are advertising overlapping prefixes and the amount of improvements that can make on their current aggregation techniques

Using tries for lookups

  • Single bit trie — Efficient memory usage but requires too many memory accesses for a 32 bit lookup. This increases lookup times significantly
  • Direct trie — instead of single bit lookup we partition the address into two parts of variable bit lengths
    • Fast lookup compared to Single bit trie

    • But this results in inefficient memory use

      Breaking address into parts of 24 and 8 bits for Direct trie
      Breaking address into parts of 24 and 8 bits for Direct trie

Fast and memory efficient

Use multi-ary trie

Benifits of multi-ary tire
Benifits of multi-ary tire

Eg.

mutliary trie example
mutliary trie example

Further optimizations can include techniques like

  • creating a leaf pushed trie to avoid one extra memory access on leaf nodes when the node above it is empty

The parent of these nodes is empty and can be used to store the data of it's children to avoid an extra memory lookup for next level in the trie
The parent of these nodes is empty and can be used to store the data of it's children to avoid an extra memory lookup for next level in the trie

Now that previously empty node that only contained the address of the two children, contains the actual data of those nodes (probably indexed).
Now that previously empty node that only contained the address of the two children, contains the actual data of those nodes (probably indexed).

  • Other techniques are called
    • Lulea
    • patricia

Alternatives

Didn’t quite understand this

Alternatives to longest prefix match with tries
Alternatives to longest prefix match with tries

NAT — Network Address translator

Due to scarcity of ip addresses, networks have internal private IPs. These are specified in RFC 3130 (as said in the course, but found concrete details in RFC 1918)

Private IP ranges as mentioned in RFC 1918
Private IP ranges as mentioned in RFC 1918

So a NAT just creates a mapping from the private ip and port to the public ip and port. This way they can use the same public ip and keep changing the port, thereby eliminating the need to assign unique address to all the devices on the network.

NAT violates internet rules

  • The e2e rule is violated because if a NAT fails, then the received packets will not reach the right host inside the network
  • They also restrict direct bi-directional communication between two hosts in different private networks

IPv6

need to better understand the breakdown of the address bits

IPv6 headers and address bit breakdown
IPv6 headers and address bit breakdown

Note that the address has 13 bits reserved for Tier-1 ISP

So if you change providers the source address changes

To Look into How does QUIC fix this problem with streams? Is it just because it uses UDP for now and might break when it used independently on the transport layer?

Incremental deployment

Deployment is hard due to protocol ossification (will link to soon to be released notes of QUIC, but here's the gist)

Wikipedia: Protocol ossification is the loss of flexibility, extensibility and evolvability of network protocols This means that introducing new protocols is hard because routers and other middleboxes violate the end-to-end principle.

In short, the current middle boxes are only designed to understand IPv4, not v6

But we can’t just replace these with IPv6 compatible versions of the equipment because it would then break the currently running IPv4 backed infrastructure ⇒ Hence they did incremental deployment

Dual stack deployment

One approach to incremental deployment would be to use a dual stack middle box.

Meaning one that can understand both IPv4 and IPv6

There can be two approaches to building a dual stack infrastructure

Different approaches to dual stack deployments
Different approaches to dual stack deployments

v6 to 4 tunneling

Basically sending the v6 packet as data inside a v4 packet in a v4 silo

The v4 network acts as carrier for v6 packets between compatible networks
The v4 network acts as carrier for v6 packets between compatible networks

I wonder how the routing negotiations for this work, and if this is even considered. Would this be advertised in BGP route announcements? Or is it factored in the calculation for MED? Would this even have a significant increase in RTT to even be considered as a route deciding factor? Probably not


Router

A router contains line card that is I guess the core of what executes all the functionality

Every line card has these components to carry out the following roles

Different components and tasks of a Line Card
Different components and tasks of a Line Card

Roles of a router

  1. Receive a packet
  2. Read packet headers to determine destination
  3. Search for entry in forwarding table to determine the correct output interface for the packet
  4. Modify packet headers, like
    • Decrement the TTL
    • Updating packet checksum appropriately
  5. Send the packet to the output interface
    • This can be more trickier than just sending it to the designated port, because the port will be accessible only through the router’s interconnection frabric

      Routers have an interconnection fabric that connects input ports to output ports
      Routers have an interconnection fabric that connects input ports to output ports

Important decisions

Copies of forwarding tables

Earlier, all the line cards on a router used a central lookup table over a shared bus.

But this became a bottle neck

Now every line card has a copy of the look up table.

This although has it’s own complications of handling coherence among the various copies, but it is much more faster

Crossbar Switching

Switching happens in timeslots

Rules for Crossbar switching
Rules for Crossbar switching

Switching algorithm: Maximal matching

Even though we can only have one connection between the input and output port through the interconnect, per timeslot, we can speed up matching by doing multiple matching in the same time slot, if there are multiple matching required for a given port in a given timeslot

For eg, if the interfaces have a 10Gbps connection, then it is possible to increase number of matching per timeslots by have interconnect speed more than that

This is termed speedup!

Head of line blocking

But speedup does not work if the port sending the packet only has one queue, because even though the interconnect is capable of handling multiple connections now, the packets are still processed synchronously because of the single packet queue

Blocking packet queue
Blocking packet queue

Solution: virtual output queues

Having separate output queues for designated output ports

Breakdown the packet buffer into virtually separate queues for individual output ports
Breakdown the packet buffer into virtually separate queues for individual output ports

Scheduling

The decision about which ports should be matched in any particular time slot

During which timeslots should input and output ports be matched

Considerations

Efficiency: If there is a packet to be sent and the output port of that packet is available, then the scheduling should be such that it is possible to switch connections quickly to send that packet

Fairness: Given demands at the inputs, each queue at the input is scheduled fairly!

max-min fairness

Here flow allocations are flow rates at input port

Rules for max-min fairness algorithm
Rules for max-min fairness algorithm

Eg:

max-min fairness example
max-min fairness example

The dividing by 4 is based on the number of inputs

Note:

  • that the demand exceeds capacity
  • demands of small users are met fully
  • for the large demand users (those whose demands are not fully met) the remaining capacity is shared equally

Different ways to achieve max-min fairness

There are various scheduling techniques to achieve max-min fairness
There are various scheduling techniques to achieve max-min fairness


DNS

A DNS query leads to a recursive search, if the result is not already in cache
A DNS query leads to a recursive search, if the result is not already in cache

Caches

Caches are stored along with a TTL

Caches include NameServer record

Global records like those of google.com rarely change have a larger TTL

Local records like smaller websites have a smaller TTL

Record types

  • A: Name → IP
  • NS: Name → Authoritative nameserver ⇒ Leads to further queries, helps implement herarchy
  • MX: Name → Mail Server
  • CNAME: Canonical name ⇒ This is a pointer to another domain name that needs to be looked up
  • PTR: IP → Name ⇒ reverse Lookup
  • AAAA: Name → IPv6 Address

dig

There are various commands explained in the course, which are easy to learn with a simple 5 minute tutorial

But it is interesting to follow the trace for a PTR query 👇

Note the name server referral to in-addr.arpa at level 2 of the search
Note the name server referral to in-addr.arpa at level 2 of the search

in-addr.arpa is an interesting top level nameserver for this


Congestion control

Congestion

Simple example of how congestion occurs
Simple example of how congestion occurs

Congestion collapse

Increase in load results in Decrease in useful work

After a certain level of load, effective work done decreases because of lost packets and retransmission delays
After a certain level of load, effective work done decreases because of lost packets and retransmission delays

Causes:

  • Spurious retransmission of packets. Packets that do not receive ACK. Can be solved by better timers
  • Undelivered Packets. Resources consumed but dropped later

Congestion collapse isn’t just theory and has been observed in many networks

Congestion collapse can be prevented by proper congestion control

Goals of congestion control

Same as scheduling in router interconnect (efficiency + fairness) + avoid congestion collapse

Goals for congenstion control
Goals for congenstion control

Approaches

Approaches to congenstion control
Approaches to congenstion control

TCP congestion control

Sender increases rate until packets are dropped

TCP interprets packet loss as congestion control and slows down

Although this might not always be true. For eg. in wireless networks, packets can be lost due to interference

But this is generally a good assumption

I remember reading an article referring to TCP slow start that is a hack on this behaviour of TCP for better performance Link: Why your website should be under 14kB in size

Approaches to Adjusting rate

  1. **Window-based (**Generally used // widely implemented )

    Additive increase Multiplicative decrease (AIMD)

Number of packet streams sent increases incrementeally until congestion is hit (packet lost). Then rate is halved.
Number of packet streams sent increases incrementeally until congestion is hit (packet lost). Then rate is halved.

  1. Rate Based

    Rate based congestion control rules
    Rate based congestion control rules

Phase plot

Graph of flow rates of different input ports agoinst each other. The optimal point is where the effectiveness curve intersects the fairness curve

The goal is to keep the flow rates such that they lie on the optimal point
The goal is to keep the flow rates such that they lie on the optimal point

AIMD always converges to optimal point (fairness + efficiency)

AIMD over few iterations helps converge to optimal point
AIMD over few iterations helps converge to optimal point

Additive increase ⇒ increases efficiency

Multiplicative decrease ⇒ increases fairness

TCP saw tooth

This happens as a result of AIMD

The TCP flow rate oscilates between max and half of max window size
The TCP flow rate oscilates between max and half of max window size

Here Wm is the max window size

Note that Throughput is inversly proportional to RTT and LossRate

TCP incast problem

Generally seen in data centers

Too many requests from servers fill the small packet buffers of switches
Too many requests from servers fill the small packet buffers of switches

Caused when the server make several requests that fill up the buffers in the intermediary switches (which are generally small)

This results in retransmissions after only after timeouts

In TCP, timeouts occur after several hundred milliseconds, but the RTT inside a data center is generally less than 1ms


Multimedia & Streaming

Mininet assignment (deleted)

TODO

mininet assignment screenshot
mininet assignment screenshot


To be continued