Appunti - IP

Network numbers

Definitions

The IP address space is partitioned in intervals called network numbers.

The length and starting address of a network number must satisfy these constraints:

Length must be a power of 2.
Let 2^k be the length of the network number. The binary representation of the starting address must have the k rightmost bits equal to 0.

Let 2^k be the length of a network number. The following claims follow from the definition of network number and the properties of binary representations.

The smallest IP address in a network number has the k rightmost bits equal to 0.
The largest IP address in a network number has the k rightmost bits equal to 1.
Any two IP addresses in the same network number are identical in the 32-k leftmost bits and different in the k rightmost bits.

Assignment to networks

A network number may be assigned to a network, which means that hosts in that network have an IP address in that network number.

We make the following assumptions:

There is a one-to-one correspondence between networks and assigned network numbers. That is, each network is assigned only one network number and each assigned network number is assigned to only one network.
Assigned network numbers do not overlap.

These assumptions imply that the number of hosts in a given network cannot be greater than the length of the corresponding network number; and, unused IP addresses of a network number cannot be used for hosts of other networks.

In practice, there may be assignments that violate assumption 1 (a network may be assigned multiple network numbers; a network number may be used by hosts in multiple networks) and/or assumption 2 (an assigned network number may overlap with one or more other assigned network numbers, either partly or completely). These scenarios are not considered in this course and we will always assume that these assumptions are satisfied.

Allocation of network numbers

A network number may be allocated to an organization, which means that the organization has exclusive use of IP addresses in that network number.

Let N be a network number allocated to organization. The organization may partition N in two or more network numbers. The corresponding procedure is called subnetting (each of the resulting network numbers must satisfy the constraints listed above).

Having partitioned N in two or more network numbers, the organization can:

allocate the resulting network numbers to other organizations; or,
assign the resulting network numbers to its networks.

Definition of network numbers and their allocation to organizations occurs in a hierarchical way, as follows.

IANA (Internet Assigned Numbers Authority) is an organization that manages the entire IP address space. It has partitioned the address space in network numbers by means of subnetting and has allocated the resulting network numbers to 5 organizations called Regional Internet Registries (RIRs), one for each continent.
Each RIR has partitioned its network numbers through subnetting and has allocated the resulting network numbers to organizations called Local Internet Registries (LIRs). LIRs may be either enterprises, or academic institutions, or Internet Service Providers. A LIR must be located in the same continent as the corresponding RIR.
Each LIR has partitioned its network numbers through subnetting and then allocates these network numbers as follows:
- Internet Service Providers (ISP): Partition the network number with subnetting and assign the resulting network numbers to its customers.
- Enterprise/academic institutions: Partition the network number with subnetting and assign the resulting network numbers to its networks.
- Residential customers usually take only one IP address from an ISP (which corresponds to a network number of length 4, that is the smaller length available).

A fundamental property of the above framework is that allocations and assignments are managed in a completely local way. That is, let N be a network number allocated to organization X:

X may decide locally how to use N (how to apply subnetting, whether to allocate or assign the resulting network numbers, to which organizations or networks);
X need not inform any other organization of how X uses N.

In the earliest years of the Internet, the role of the IANA was played by a single researcher (Jon Postel). In some cases, a RIR does not allocate network numbers directly to LIRs; a RIR may allocate network numbers to a National Internet Registry (NIR) that manages allocation to all LIRs of the corresponding country.

Identifying the owner

Each organization in the hierarchy must maintain a local description of the allocation of each network number.

The collection of the databases at all the organizations make it possible to determine the organization to which a specific network number has been allocated: information at the IANA identifies the RIR; information at the RIR identifies the LIR; if the LIR is an ISP, information at the LIR identifies the customer.

The information associated with a network number in each database is expressed in a standard format. A protocol called WHOIS describes this format and the network protocol for querying each database.

Many programs, as well as freely accessible web applications, take an IP address and return the corresponding WHOIS record describing the owner of that address (an enterprise, academic institution or ISP). . When an IP address is owned by an ISP, the information publicly available through WHOIS does not allow identifying the customer that has been allocated that address. Such information must be maintained by the LIR internally, though, and must be made available to the public authority if required.

IP geolocation services are usually based on the information available through WHOIS and associate geographical coordinates to that information based on service-specific technology, independent of the IP protocol.

Node configuration

IP configuration

A node connected to an IP network, in particular, connected to the Internet, must be provided with:

IP address;
network number;
IP address of the default gateway.

The default gateway is a router connected to the same network as the node; the node will send to this router all packets whose destination is not in the network of the node.

The network number is specified indirectly, by means of a 32-bit sequence composed of a sequence of 1 bits followed by a sequence of 0 bits. Such a sequence is called subnet mask. The length of the network number is the length of the sequence of 1 bits in the subnet mask. The starting address of the network number is the result of the binary AND between the IP address of the node and the subnet mask.

In practice, a node must be provided also with the IP address of a DNS server.

Static vs dynamic configuration

Static configuration: The IP configuration is provided by the user, with information provided by a network administrator. In this case the information is stored in the operating system and is immutable, that is, the node will always have the same IP configuration.

Dynamic configuration: The IP configuration is provided by a dedicated configuration server automatically. In this case the information has a limited temporal validity and the node may have different IP configurations across its lifetime.

Routers and servers usually have static configuration. User devices may have either static or dynamic configuration, but dynamic configuration is much more prevalent. Dynamic configuration may occur with a variety of technologies and protocols. The following paragraph describes a possible procedure for dynamic configuration. Such a description is an oversimplification: it does not exist in the reality and the real procedure is different. In this course we will only consider this oversimplified description.

A configuration server must be in the same network as the device to be configured dynamically. The protocol for interacting with the configuration server is DHCP. The device broadcasts a DHCP request message encapsulated in a frame. The configuration server allocates an IP address to the requesting device and responds with a DHCP response message encapsulated in a frame addressed to that device. The DHCP response message contains all the configuration information. Note that if a configuration server was not in the same network as the device to be configured, then the device would not be able to send a frame to the configuration server (a device without IP configuration can only communicate with nodes in its network; it cannot communicate with nodes in other networks).

Routing

Sending a packet

Let S be a node configured with IP-s, SM-s, IP-gw. The IP layer of S (briefly, IP@S) sends a packet P addressed to IP-d as follows (we denote the network address associated with IP address IP-x as Phys(IP-x)).

IP@S encapsulates P as the payload of a frame.
If IP-d is in the same network as IP-s
- Then IP@S sends the frame to Phys(IP-d)
- Otherwise IP@S sends the frame to Phys(IP-gw).

IP@S translates IP addresses to network addresses (i.e., Phys()) by means of the ARP protocol.

IP@S cannot execute the test at step 2 “If IP-d is in the same network as IP-s” because IP@S does not have local knowledge of the set of IP addresses in its network.

IP@S executes instead the test “(IP-d AND SM-s) == (IP-s AND SM-s)” that can be executed locally and provides the same result. To prove that the two tests are equivalent observe what follows:

IP-d and IP-s are in the same network ⇔ IP-d and IP-s are in the same network number: this follows from the assumption “There is a one-to-one correspondence between networks and assigned network numbers”.
IP-d and IP-s are in the same network number ⇔ (IP-d AND SM-s) == (IP-s AND SM-s): the proof of this claim is in the Appendix and is based on the assumption “Assigned network numbers do not overlap”.

Address Resolution Protocol (ARP)

Let S be a node with IP address IP-s. S may need to know the network address of the node D with IP address IP-d. In this course we assume that this need may arise only if D is in the same network as S. ARP allows S to determine Phys(IP-d), as follows.

The software layer that implements ARP is placed at the same abstraction level as IP. When IP@S needs to know Phys(IP-d), IP@S obtains this information from ARP@S. The module ARP@S maintains a table, called ARP cache, that maps IP addresses to Ethernet addresses. This table is maintained in a fully automatic way, with the following rules.

When ARP@S needs Phys(IP-d) and this information is not in the ARP cache, it broadcasts a frame in the ARP protocol. The payload of this frame is an ARP request and describes both the requested information and who is requesting that information. We denote ARP requests with the self-explaining notation “(IP-s, Eth-s, IP-d, ?)”.
When ARP@R receives an ARP request, it adds to its ARP cache the mapping describing the sender, i.e. “IP-s, Eth-s”.
If the ARP request is for the IP address of the receiving node, then the receiving node also responds to the sender of the request with a frame in the ARP protocol. The payload of this frame is an ARP response and describes both endpoints. We denote ARP responses with the self-explaining notation “(IP-d, Eth-d, IP-s, Eth-s)” (the first pair is an ARP message always describes the sender of the message).
When a node connects to the Internet, the node sends a gratuitous ARP. A gratuitous ARP is an ARP request for the IP address of the sending node, i.e., “(IP-s, Eth-s, IP-s, ?)”. If the network is configured correctly, no ARP response should arrive. We do not discuss the handling of responses to a gratuitous ARP.

Since ARP requests are transmitted to a broadcast address, they are received by every node in the network. Each node in the network will thus execute step 2 (and only one node will execute step 2-a). Note that a gratuitous ARP not only allows detecting possible misconfigurations, it also inserts an entry for the connecting node into all the ARP caches in the network.

An ARP cache cannot contain two entries for the same IP address (this is an oversimplification that does not occur in certain real scenarios). In case of multiple, conflicting information for the same IP address, only the most recent one is retained. Thus, an ARP request sent by a given IP address will overwrite all entries in the network that describe any previous usage of that IP address by another node.

An ARP entry must be periodically refreshed. A node that has not received, for a predefined time interval, any IP or ARP traffic proving that an entry is still valid, will remove that entry.

ARP entries may be defined statically by an administrator. Statically defined ARP entries are always present in the ARP cache of the corresponding node and are never removed (they need not be refreshed). Unless stated otherwise, in this course we will not consider statically defined entries.

The above description of ARP is focussed on the mapping from IP addresses to Ethernet addresses, which is the only case of interest in this course. ARP allows more general forms of mapping and supports many different kinds of addresses. For this reason, the technical literature often uses other terminologies: logical or virtual addresses (IP, in our case); physical or hardware addresses (Ethernet, in our case).

Router

Let R be a router, i.e., a node with multiple network interfaces. Each interface has a local identifier, an IP address and a subnet mask. The IP layer of R (briefly, IP@R) handles a packet P addressed to IP-d as follows.

R has a routing table. The routing table is an immutable data structure provided by the administrator. The routing table has a range column that contains a network number and an action column that describes how to handle a packet addressed to that network number. The possible actions are:

Deliver, if R has an interface connected to the network at which the corresponding range has been assigned. The action contains the identifier of that interface. In this case R sends the frame on that interface and addresses the frame to IP-d.
Route, otherwise. The action contains the identifier of an interface and the IP address of another router connected to the network of that interface, say IP-r. In this case R sends the frame on that interface and addresses the frame to Phys(IP-r).

When handling a packet P addressed to IP-D (i.e., when routing a packet), IP@R proceeds as follows:

Determine the rows of the routing table whose range includes IP-d;
If one or more rows are found, then execute the action in the row whose range is smaller (i.e., the more specific rule).
Otherwise, if no row is found, then discard P.

Note that if the union of all ranges does not include all the network numbers that exist in the internetwork, then packets addressed at ranges not described by the routing table will be discarded.

Of course, assuming that a routing table must contain one row for each network number that exists in the Internet is not realistic: even leaving storage and computation issues aside, a router administrator cannot have knowledge of all the network numbers worldwide. In practice, a router administrator can only have knowledge of the network numbers that exists within the organization where the router is placed. There is thus the problem of how to describe all the network numbers that are external to the organization. Such a description is necessary because, without such a description, packets addressed to the outside of the organization would be discarded.

The range column of a routing table may contain a special value that describes all the network numbers not described by the other rows. We denote this special value by the term “default”. A row with this special value thus allows to describe all the network numbers that are external to an organization. The corresponding action will route packets toward a router closer to the outside of the organization (or to the ISP, if the router is on the border of the organization).

Routing a packet

An endpoint may belong to a residential customer, a non-residential customer, an enterprise, an academic institution. For simplifying the discussion and without loss of generality, we assume that:

any endpoint belongs to an “organization”;
the organization consists of an internetwork;
the organization is connected to an Internet Service Provider (ISP);
the connection between organization and ISP is implemented by means of two routers, one on the boundary of the organization and the other on the boundary of the ISP.

Routing within an organization

Routing within an organization is usually static, that is, routing tables are configured by network administrators and never modified.

Administrators configure routing tables based on the knowledge of the network numbers internal to the organization and the corresponding topology. Packets addressed to network numbers not allocated to the organization will be routed toward the ISP, by properly using the default ranges described in the previous section.

Since the path toward a destination is immutable, a destination might become unreachable in case of failures, even when alternative working paths toward that destination are available.

Routing outside of an organization

A packet that has left the organization reaches its destination by means of a routing procedure that is not part of this course. We only sketch the basic idea below.

The portion of the Internet that routes packets toward ISPs is subdivided in autonomous systems. An autonomous system (AS) is a portion of the Internet (an internetwork) with complete autonomy on the choice of routing policies and technologies to use within that portion. There are more than 60.000 ASes in the Internet.

An AS is connected with one or more ASes, each connection being implemented by a pair of routers, one on each AS boundary. As AS exchange routing information with neighboring ASes by means of the Border Gateway Protocol (BGP). BGP messages follow this pattern:

An AS informs its neighbors of the network numbers that have been allocated to it.
When AS-j becomes aware that network number N has been allocated to one of its neighbors, say AS-i, AS-j informs its neighbors that the path toward N is ;
neighbors of AS-j (e.g., AS-k) will then inform their neighbors that the path toward N is and so on. The corresponding information will eventually reach all ASes. Several complex algorithms are then applied by each AS for translating this information into rows for its routing tables. Note that an AS may receive multiple alternative paths toward a given network number.

The above procedure is dynamic, that is, routing tables may be modified automatically as a result of the dynamically changing flow of BGP information. This dynamic flow is able to find alternative paths toward a given destination in case of failures.

Thus, a packet that has left an organization will travel across ASes until reaching the ISP of the destination. The ISP will then route the packet toward the corresponding organization.

Firewall

A firewall is a software module that analyzes all packets and determines whether a packet can proceed toward its destination or must be discarded. A firewall is configured with a list of filtering rules: a packet can proceed only if it satisfies at least one of these rules. A firewall is thus a module that forbids all network traffic that has not been explicitly authorized. A firewall must be configured with only the rules that are strictly necessary for executing the applications that must be allowed. Any kind of traffic that is not strictly necessary must be forbidden (thus, rules must not be excessively permissive). This requirement is particularly important for security reasons not analyzed in this course.

A firewall may be placed on a router or on an endpoint.

Organizations usually have a firewall on the border router. This firewall thus controls all the inbound and outbound network traffic of the organization. An organization may place additional firewalls in internal routers. Network architectures with multiple levels of firewalls (segmented networks) are not very common but are particularly desirable for security reasons not analyzed in this course.

Firewalls on endpoints are part of the operating system. These firewalls usually have a default set of rules that allow all the traffic required by common applications, in order to avoid users the burden of configuring firewalls themselves. The default set of rules is usually excessively permissive, because a default installation of the operating system must accommodate a broad set of applications. It follows that firewalls on endpoints provide very little security, unless their configuration is modified and made more restrictive.

Firewall rules

Rules are boolean expressions with four operands: IP addresses (source and destination) and port numbers (source and destination). Usually, each operand is compared for equality to a constant (or for belonging to a range, or for equality to any constant in a list) and all comparisons are joined with AND operators. An expression need not use all operands. When an operand is not part of a rule, the value of that operand is irrelevant for that rule.

When the firewall has to handle a packet, the firewall:

Takes the IP addresses from the IP header and the port numbers from the TCP header (which is part of the IP payload);
Computes each rule;
If at least one rule is true, then the firewall allows the packet to proceed, otherwise it drops the packet.

In principle, a firewall could have rules for inbound packets that are completely independent of those for outbound packets. In practice, for each rule that allows a certain kind of inbound packets there must be a rule allowing outbound packets of the same kind. Stated differently, there must be two rules for each kind of allowed TCP connection: one rule for inbound packets and another rule for outbound packets. If either of these rules is missing, then the corresponding TCP connection cannot even be open.

The traffic to be allowed is often specified textually. A simple way for deriving the corresponding firewall rules is as follows:

Analyze the textual specification and list the kind of operations to be allowed.
For each kind of operation, list the required TCP connections.
For each TCP connection:
1. By using a diagram of the network architecture, try to place the two ends of the connection on the opposite sides with respect to the firewall.
2. If the connection can never cross the firewall, then this connection does not need any firewall rule.
3. Otherwise:
  1. Write as much information as possible on each end of the connection (IP address and port number at each end);
  2. Use that information for writing a rule describing packets in one direction;
  3. Use that information again for writing another rule describing packets in the opposite direction;

When discussing about firewall rules, the terminology might be confusing. For example “allow DNS client traffic”: for a border firewall, the “client” is within the organization or outside of the organization?

The terminology always refers to the entity protected by the firewall, as follows:

Client traffic: Client internal to the organization (border firewall) or endpoint acting as a client (endpoint firewall).
Server traffic: Server internal to the organization or endpoint acting as a server.
Inbound connection: Server internal to the organization or endpoint acting as a server.
Outbound connection: Client internal to the organization or endpoint acting as a client.

Note that inbound and outbound connections must not be confused with inbound and outbound traffic. Inbound and outbound traffic are terms that refer to IP packets. A connection is bidirectional, thus a connection always requires both inbound (IP) traffic and outbound (IP) traffic.

Practical considerations

Operands that may be used in real firewalls include the protocol transported by the IP packet, e.g., TCP, UDP, ICMP.

Rule syntax and user interface depend on the firewall implementation.

Windows has a firewall accessible in Control Panel, Windows Defender. The Windows firewall has several user interfaces, all of them quite different from the one described in the previous section (which is very close to the firewall implementation).

The basic interface specifies a very broad set of “applications”. Most of them are enabled by default. When an application is enabled the corresponding network traffic is allowed and vice versa. The correspondence between applications and TCP connections is fully hidden. Specific constraints (e.g., “use application X only with remote node Y”) cannot be expressed.

The interface available in Advanced Configuration is in terms of “connections”. It distinguishes between “inbound” and “outbound” connections: the former are those in which Windows acts as a server, the latter are those in which Windows acts as a client.

There is a large set of predefined rules for inbound connections and a similar set for outbound connections. Each rule may be enabled or disabled. There are many operands available for writing rules. Operands may be modified and new rules may be added.

By default, all outbound connections are allowed, even those that do not satisfy any rule.

Appendix

Let N be a network number and let 2^k be its length. Let IP-1 be an IP address in N.

Claim:

IP-2 is in the same network number as IP-1 ⇔ IP-2 and IP-1 are identical in the 32-k leftmost bits.

Proof: * ⇒ Trivial consequence of the definition of network number and the properties of binary representations. * ⇐ Let N1 and N2 be the network number of IP-1 and IP-2, respectively. Suppose by absurd that N1<>N2. * If N2.length = 2^k then N1.start=N2.start, thus it must be N1=N2. * If N2.length > 2^k then N2.start < N1.start (N2.start has k+j (j >= 1) rightmost bits equal to 0). N2.end has k+j rightmost bits equal to 1 thus N2.end>N1.start. If N1<>N2 then N1 and N2 partly overlap, but this violates the assumption about overlapping network numbers. * If N2.length < 2^k then N1.start < N2.start (N2.start has k-j (j >= 1) rightmost bits equal to 0). N1.end has k rightmost bits equal to 1 thus N1.end>N2.start. If N1<>N2 then N1 and N2 partly overlap, but this violates the assumption about overlapping network numbers.

Note that the proof of ⇐ is based on the assumption: 1. ... 2. Assigned network numbers do not overlap. about ove

If assigned network numbers may overlap, thus, the property holds in only one direction:IP-2 is in the same network number as IP-1 ⇒ IP-2 and IP-1 are identical in the 32-k leftmost bits.the claim does not hold either