Skip to content

Appunti - Introduction

Clients and Servers

A running program is called a process. An Internet application is a set of processes distributed across different computers and communicating among themselves.

Processes in Internet applications may be categorized depending on whether they make services available to other processes, or they use services made available by other processes. In the former case the process is called a server, in the latter case the process is called a client. A server is usually used by many clients simultaneously. A client usually uses one or a few servers simultaneously. Many different interaction patterns between processes may exist, however.

The above categorization is only for descriptive and introductory purposes. A process may act both as a server and as a client, either at the same time or at different times.

A process communicates with another process by means of an abstraction called a connection. Processes operate on connections by invoking procedures implemented within the operating system (system calls). In this course we are concerned only with the functionality (usage) of connections, not with their implementation.

Each process in an Internet application has a globally unique identifier.

A client process is usually structured as follows:

  1. Determine the identifier of the server;
  2. Open a connection to the server by specifying the server identifier;
  3. Send requests into the connection and receive responses from that connection;
  4. Close the connection.

The client need not know its own identifier (but it may obtain this identifier by asking the operating system). Usually the client sends a request and does not send any further requests until receiving a response. Many different request-response patterns are possible, however.

A server process is usually structured as follows:

  1. Choose its own identifier;
  2. Wait for connections open by clients;
  3. For each open connection, receive requests from the connection and sends responses into that connection.

The server need not know the client identifier (but it may obtain this identifier by asking the operating system). Steps 2 and 3 may be executed in parallel and step 3 may be executed in parallel for all connections currently open.

Note the asymmetry of behavior: a client process specifies the identifier of the server with which it wants to connect; a server process accepts connections from any client process.

Note that processes need not know how connections are implemented by the operating systems. Processes need to know only the functionality provided by the operating system, not how it is implemented.

Distinguishing between functionality and implementation is a fundamental concept in every engineering field (when you use a TV remote control, you need to know what happens when you press a button; you do not need to know what happens inside the remote control and the TV) and especially in computer engineering.

Protocols

Connections transport byte streams in both directions. Communicating processes must thus implement rules for:

  • Subdividing a byte stream in application messages.
  • Describing the structure of each message (syntax).
  • Describing the meaning of each message (semantics).
  • Describing the pattern of message exchange (when each of the two parties send a.message)

The set of such rules is called a protocol. Any two communicating processes must obviously implement the same protocol. Those processes may be written in different programming languages and may run on different operating systems.

A protocol may be either non-proprietary or proprietary.

A non-proprietary protocol is specified in a public document and can be implemented by anyone. Clients and servers that interact with a non-proprietary protocol may thus have been developed by different organizations. Most Internet protocols, including those for the email and the web, are non-proprietary. Documents specifying these protocols are called RFC (Request For Comments) and are published by an organization called IETF (Internet Engineering Task Force).

A proprietary protocol may be developed by any organization, usually a private company, and cannot be implemented by anyone. Clients and servers that interact with a proprietary protocol must have been developed by the same organization that developed the protocol. This is because either the internals of the protocols are kept secret (in principle, many such details can be inferred by analyzing the network traffic), or because usage of the protocol has been restricted with forms of copyright. WhatsApp and Skype are examples of proprietary protocols.

Networking software

Connections are implemented by the networking software within the operating system of the hosts where the processes execute.

In each operating system, the networking software is conceptually composed of 3 distinct layers. Information sent by a process travels through the 3 layers in the local operating system of that process, then through the Internet, then through the 3 layers in the local operating system of the receiving process.

From a logical point of view, the two processes communicate directly among themselves and understand each other because they implement the same protocol. The same logical structuring exists for the layers of the networking software:

  • The two upper layers of the networking software (i.e., the upper layer at one side and the upper layer at the other side) communicate directly among themselves and implement the same protocol. This protocol is the one that implements connections and is called TCP.
  • The two intermediate layers communicate directly among themselves and implement the same protocol. This protocol is the one that allows hosts located anywhere in the world to communicate and is called IP.

The implementation of TCP is not part of this course. The implementation of IP will be analyzed later.

The operating systems where the communicating processes execute need not be the same. All that is required is that all layers implement the same protocols.

Protocols implemented by processes are called application protocols to distinguish them from protocols used within the networking software.

Properties of a communication service

A communication service may appear to its users as:

  • Connection-oriented: a connection must be open before any communication may take place. Communication then occurs through the connection. When the connection is no longer required it may be closed.
  • Connectionless: communication may occur at any time, without any need of creating a connection in advance.

A communication service may appear to its users as:

  • Message-oriented: the service transports and delivers transmission units, i.e., messages.
  • Byte-oriented: the service has no notion of message. Any piece of information sent through the service becomes part of a unique byte stream that flows toward the other end. Application-level messages that are sent separately are thus merged together by the service.

A communication service may appear to its users as:

  • Unreliable: information may be lost, delivered out-of-order, duplicated. -Reliable: information is not lost, is delivered in order and not duplicated.

TCP is connection-oriented, byte-oriented and reliable. All the applications that we will study in this course use TCP. The software that implements those applications must thus handle connections, must be able to handle the conceptual mismatch between messages and byte stream and, most importantly, may assume a reliable service.

IP is connectionless, message-oriented, unreliable. Since the implementation of TCP (not studied in this course) uses IP, it must be able to implement a reliable service over an unreliable service.

Some applications use an upper layer that implements a protocol called UDP. The properties of UDP are the same as those of IP. Applications that use UDP, thus, must be prepared to handle duplicates, out-of-order delivery and losses.

Note that time is never mentioned in the properties of communication services. The Internet does not provide any guarantee regarding the time it takes for information to reach its intended destination (or for concluding that it will never reach the destination).

The definition of reliable service provided above is oversimplified. Obviously, if an atomic bomb explodes over a communication service while some information is travelling through that service, that information will never be delivered. A reliable service provides the properties listed above only when it exists; however, it may cease to exist at any time and applications will be eventually notified of this fact. These properties have subtle and important implications that will not be analyzed in this course. We only point out that when a process has sent a message and the operating system has returned a “success”, this fact does not imply that the message has been delivered or that will certainly be delivered. It only implies that the message has been copied from the process to the local operating system. A process can be certain that a message has been received by the other process only upon receiving an explicit response message by that process.

Process identifiers and process structuring

A TCP process identifier is a pair of items called IP address and port number.

  • An IP address is a 32-bit globally unique host identifier assigned by a network administrator (this definition of an IP address will be refined later in the course).
  • A port number is a 16-bit identifier locally unique on the host where the process is executing and is assigned by the local operating system.

IP addresses are represented in dotted decimal notation: the 32 bits are split into 4 bytes; each byte represents a natural number in the range [0,255]; the 4 numbers are concatenated and separated by a dot character (e.g. 131.114.9.252). Port numbers are represented as natural numbers (e.g., 80).

Each connection will be associated with two TCP process identifiers, i.e. of pairs: one at each end of the connection. It is common to refer to those identifiers as either local or remote: these terms must always be accompanied by the reference point, that may be either the client or the server. In other words, the local port and remote port for the client are, respectively, the remote port and the local port for the server.

TCP process identifiers are selected as follows. A process cannot choose the IP address and uses the IP address of the host where it is executing. Concerning port numbers:

  • A server process chooses the port number before starting to wait for connections from clients and tells the operating system the chosen port number (if the port number is already in use by another process, then the operating system at the server will reject the choice). Internet standards define the server port number for each application protocol. This port number is usually in the range 1-1024 .
  • A client process specifies the IP address and port number of the server process. The operating system of the client process will assign to the client process an unused port number, usually greater than 1024.

Note that the port number of the client is different from the port number of the server.

Note also that all clients connected to a given server identify that server with the same port number, that is, all connections to that server will have the same port number on the end at the server.

Writing of client and server programs depend on the networking libraries available for the programming language used. Those libraries usually hide many of the details that must be taken into account when invoking directly the procedures provided by the operating system. These procedures are called socket interface. The details of the socket interface differ between operating systems but their general structuring is the same. Some of these procedures are used by servers, some others by clients, some others by both servers and clients. These procedures conceptually operate on objects implemented within the operating system that represent one end of a connection. Each of these objects is called a socket (there is no “plug” object, though).

The programming languages and libraries used by the communicating processes execute need not be the same. All that is required is that the processes implement the same protocol.