Internet 2.0

This is the original white paper, before I started implementing it

This is a white paper about how I would reimplement the Internet if I had a chance to do that from scratch, without the accumulated cruft from the past decades. Each part of the Internet as such has its own set of problems, and needs separate solutions. I will describe problems and solutions in order of the network stack, and start with layer 2 and 3.

IPv6 solves some of the Internet problems (like the limited address space), and creates others. One problem of IPv6 is that backward compatibility is a significant concern. The goal of a next Internet should be to take care of compatibility, and try to give the benefit of new features, when possible. It is important to have benefits for everyone (backbone to local user) to make the transition happen. There's a "Legacy" chapter describing how to connect the Internet 2.0 to Internet 1.0.

There's one point to be made about the numbers of layers: The ISO OSI layer system defines seven "onion" layers. TCP/IP reduces that to five layers. The minimum are three layers: physical and application layer are essential, and one intermediate layer is sufficient to link them together (an application that directly uses the physical layer is networked, as well, but we can't call that a "protocol").

The requirements for Internet 2.0 are those:

Publications so far:

Check out the [html]Fossil repository of the work in progress with

mkdir net2o; cd net2o
fossil clone http://fossil.net2o.de/net2o net2o.fossil
fossil open net2o.fossil
./do

Eventually, net2o will reside on its own version control system.

Other food for thoughts:

Topology

The current Internet separates local nets (which are either on a shared medium like the original Ethernet or WLAN, or connected together with switches), and larger nets, which are connected by routers.

Routing tables in especially those routers that transfer a lot of data become larger and larger, and will present a performance problem (limit scalability of the current Internet - often worked around by using MPLS or similar approaches to reduce effort). Each packet needs to be routed again, while most packets in the current Internet are connection based; the only widely used connection-less service is DNS - and even DNS queries typically go to a local resolver. It is usually sufficient to obtain a route once per connection, and keep routes cached for further connections to the same host (same as with DNS).

So here are some details about how to make an efficient route field that allows world-wide communication via switched networks on layer 2 (no LAN/WAN separation with LAN switched, WAN routed):

To route to a destination, the source has to ask a (distributed) routing server for the route. This [html]routing server can directly resolve names to routes, so taking over the DNS system and the routing tables. Since routing now happens only once per new host and connection, and frequent routes are cached close to the user, routing isn't a bottleneck any more.

Note that this is not identical to "source routing" as in IPv4 and IPv6. Source routed packets describe the hops for the packet, and can abuse endpoints in the net. This system describes the path of the packet, and can't abuse endpoints in the net (those just take the packet and consume it). The fact that all traffic is based on these routes makes it possible to inspect these routes in the firewall, and filter out unwanted packets just as well as in the current Internet - or even better, since faked source IP addresses are not possible.

Virtual routes still allow to hide details in the system, and make it possible to use load balancers in larger nets without relying on cooperation of the users. Virtual routes greatly reduce the routing tables, since only the relevant part of the route needs to be translated with a table (effectively compressing a many:one relationship into a one:one relationship).

This design has two important features: There's no spoofing of the return address possible, and the space of to and from address actually is shared, compressing the header. A 64 bit field is IMHO completely sufficient. For convenience of small devices, a 16 bit address field is optionally used (allows only short distance calls).

Packet size in the current internet is byte-granular; for better performance, a limited set of power-of-two packet sizes is recommended. The header of a packet must contain all necessary informations about the size in few bits. Suggested is a packet size of either 32, 128, 512, or 2048 bytes. For secure encryption, each packet must contain some randomness to prevent known-source attacks (e.g. the data stream of an encrypted block starts with 64 random bits).

The header byte of a Internet 2.0 frame allows to determine the size of the packet:

The second byte describes how the packet is going to be treated by the switch:

Flow control

The current Internet (TCP) does flow control only on end nodes. You acknowledge packets you received. There are several downsides for this:

Doing the flow control inside the network however is not such a big deal; it should be possible to produce better overall throughput, while at the same time reducing attack methods. If a switch gets jammed through one port, it can send a "jam" message back based on some simple statistics it has taken of the packets. This can prevent overload before the switch has to actually drop packets. Also, a switch can notivy the sender if it did drop a packet.

For end node flow control, secure acknowledges should use the additional random number to show that they actually received the packet - either that or a checksum. Also, the routing information can contain enough data to estimate the initial window size, so that the slow start can adapt to the actual available bandwidth instead of being "one size fits all".

For legacy end-to-end flow control, a LEDBAT-derived algorithm should be used for low priority data transmission. High priority data simply can send at the assigned rate, and signal fail when the bandwidth can't be achieved (too many drops).

Multicasting

One important requirement for future Internet usage is efficient multicasting. Unlike direct connections, multicasts can't find their destination by embedding the route into the source packet. The route of a multicast packet is a tree, not a path. Therefore, each branch needs to know the list of destinations. Switches therefore still have to maintain tables; in this case tables for all active multicast streams that actually branch in that node (nodes where multicast data passes just to one port don't need to know about the fact that it is multicast).

Multicast packets addresses the branching switch directly; the switch chooses the destinations based on the remaining bits of the address (an address does not have to be completely zero to address the switch as such). Each destination contains a path to either another switch which then again creates multiple copies, or an endpoint which consumes the data.

Multicasting can therefore use a table driven approach, where small tables are sufficient for local switches, and wide area switches need larger tables. The multicast resource is limited, but dynamically allocated, so no regulation is needed.

Multicast routing gets adjusted by clients joining a multicast domain. Such a join needs to pass from switch to switch, hop by hop, in the direction of the sender, until a joinable table entry is found. Multicasting will be a relatively scarce switch resource, and is best only used when the bandwidth saving is worth the price.

Broadcasts

Broadcasts are similar to multicasts, but use a block of fixed addresses. Broadcasts get dispatched by each switch, and users can filter broadcasts in, rather than joining multicast domains. The joining of a broadcast has to hop through the switches until they find one which carries the broadcast, as well.

Two implementation strategies are possible: Bitmaps and content addressed memory. Small local switches can use CAMs, larger wide area switches can use bitmaps. Broadcast numbers are unique, need regulation, and are generally a limited resource. Wide area broadcasts should be used for content similar to on-air broadcasts today (TV, radio). It's also possible for a wide area switch to pass complete blocks of broadcast channels, when the likelihood is high that most of them will be used downstream.

Broadcasts can have domains and may not cross boundaries. E.g. a television station wants to broadcast only within one country, because it only has a license for broadcasting there. Then all cross-country switches have to block that broadcast on the border.

Local broadcasts reach all computers in a LAN without being blocked; all of them are blocked between LANs. Local broadcasts are used to announce services and similar informations.

A note on the number of channels necessary: a single TV program can use several channels. Users might choose different resolutions (depending on their connection bandwidth), and different languages for the audio channel.

Both multicasting and broadcast need filtering against intruders - there's only one legitimate source of a multicast or broadcast.

Legacy

Since layer 2 systems can be separated by routers, connecting old systems simply requires protocol translation.

However, there are other possible options as well for a transition phase. It is possible to use other lower level layers to tunnel Internet 2.0 packets through existing equipment. For local networks, you can pack them into Ethernet frames (if available, jumbo frames for 2k byte packets), using a new dedicated Ethernet type field. And for WAN tunnels, pack them into UDP packets.

The reverse is possible, too: pack (and pad) IP packets into Internet 2.0 packets. Providers which use an Internet 2.0 infrastructure internally can use similar approaches as MPLS for Internet 1.0 customers.

Plug&Play

To connect independently configured parts without much thinking together, systems should be open about their capabilities. The basic connecting infrastructure is the name and routing server, which gives or takes names from connected end-points, can enlist their capabilities, and respond to capability based queries from other hosts (like "who's offering file server capabilites"). For administrators, capabilities can also contain further informations, like which node is the file server for a specific user. Sign-on can and should be done with certificates, making sure that only trusted hosts can be connected, or that untrusted hosts get different responses for queries (e.g. being isolated from the local net, and only able to connect to the outside, a limited "guest" subnet). The typical certification chain in a company is that the IT department issues certificates; the typical certification chain in a ISP is probably similar. See PKI subsection below.

Real Time

Media transmission like VoIP or video have quality of service (QoS) requirements, both for timing and packet loss (forward error correction can allow some packet losses, and buffering can cover up some timing problems, but for telephony systems, both affect the perceived quality of the system; since it introduces unnecessary delays and echos).

To fulfill realtime requirements, a two-fold strategy is necessary. First of all, bandwidth for real time services needs to be allocated beforehand. The switch will calculate available real time bandwidth, and deny a request if the available bandwidth is already used up. This has to go from start to end, and through the entire chain, all switches have to accept the required bandwidth. Larger switches might allocate bandwidth in larger quantization steps, so not all allocations have to bother them.

Real time packets have to be tagged with a priority. High priority will go out first, low priority packets may even be dropped if not enough bandwidth is available - at least they are buffered up and can generate jam responses. The pre-allocated bandwidth for high priority data can be monitored vs. the actually used, and surplus traffic can be reduced in priority, or excessive connections can be dropped with notice. Note that a wide area switch is possibly not able to track individual connections, so in order to work well, Internet service providers have to follow the rules. Wide area switches can track violations on a per-port basis; this should be enough to keep the system going.

Abstraction

Avoid unnecessary abstraction.

Distributed Shared Memory

The most successful abstraction for bus systems is that of distributed shared memory, where remote memory is best write-only. The packet thus contains an in-memory address portion, where the data is written to (actually, the destination address is virtualized, and can also depend on the source address if necessary). Data should come in chunks of 2^n bytes, starting with a minimal chunk of 32 bytes. The size of the junks has been carefully wrapped around typical requirements (e.g. disks transmitting 512 bytes sectors). The application is informed which part of the memory was written to, and in case of packet drops can re-request those parts that weren't written to. The subset of memory addresses used by a connection is roughly equivalent to the "port" (base address) plus "window" (address range) in TCP/IP.

Most Internet protocols contain both data and meta-data interleaved together in one connection. That's because TCP/IP just opens a byte stream, where data and meta-data has to be interleaved. Some Internet protocols like FTP open another connection for the data, to separate data and meta-data. These protocols have caused problems due to the nature of who initiates the data transfer.

It's far easier to separate data and meta-data right from start. The suggestion is quite easy: meta-data and data are on different addresses within the mapped memory. Meta-data informations can be used to change the predefined allocation regions if a protocol needs more data or meta-data space.

Modern hardware with IOMMUs and similar arrangements can handle this sort of connection almost completely in hardware, so the software has nothing to do but to map and access memory - the only additional requirement is an event queue. Only the setup of a connection needs to go through the operating system - and only the setup of a connection needs a FIFO paradigm, because the sender doesn't know yet which address to use. So address 0 is redirected to a FIFO that goes to the connection maker in the OS.

Legacy

The current Internet allows byte-granular connections. Sideband meta-data information can contain additional informations allowing byte-granular transfers when necessary.

Active Messages

The suggestion for meta-data abstraction are active messages. Meta-data thus are executable programs in a limited programming language - the interpretation of the message is its execution in a virtual machine. Note that all commands of this language must be checked for validity, and properly sandboxed, since these programs come from untrusted sources. Of course, my suggestion is to use a stack machine VM.

The language of this VM is essentially dealing with objects of the next abstraction layer.

Files with Attributes

The next layer of abstraction is how data is organized, and the answer is: in files. Everything is a file, and files can have attributes (an attribute is a <name, value> pair, where the data type can vary). A lot of the current Internet protocols deal with files in one way or the other. E-Mails are files (or containers of several files), SMTP sends files, POP3 and IMAP receive files. HTTP requests deliver basically files (even when delivering files means generating their content from a script). FTP deals with files, anyway. Having files and containers of several files (directories) maps quite well with the Unix view of things.

The key to put all these protocols together is by adding properties to the files, and by allowing queries for properties (e.g. search an e-mail by subject - the subject is a property tag of the message). Sending an e-mail then means putting it into the send folder, and then the MTA can add attributes about where the message has been delivered.

Caching, P2P, Clouds

The current Internet can cache files with transparent or explicit proxies, and has P2P protocols, which are separated from other protocols (as usual - too many protocols doing essentially the same - file transfer - on the current Internet).

You should view the Internet 2.0 as distributed file system. Not as many file systems connected by networking, but as a single one. To make this happen, only one step is necessary: Apart from the URL (the uniform resource locator), you also need an URI (the unique resource identifier - I deliberately vary a term here). This should identify the document as such, regardless where it is. The easiest way to generate such an URI is a secure hash key.

The corresponding infrastructure necessary to obtain such an URI cheaply is similar to current P2P networks. You can use trackers, or distributed hash tables, or a compromise, a distributed network of automatic trackers. It is in the interest of a ISP to install such a tracker to save peering costs (external bandwidth); ISPs already deploy transparent proxies and cloud cohosting opportunities to reduce traffic. It is possible to couple a tracker with a cache.

Caching is not limited to open data. Also, encrypted files can be cached; it is highly recommended to keep only the file itself cached, not together with the list of encrypted keys. This is an attribute to the file, part of the envelope (and for multiple blind copy receivers, each receiver should only get his envelope). Mail servers can do encryption themselves, even when the public key of the user is unknown. In this case, the key is passed as "plain text", secured only by transport layer encryption (which should be sufficient). If a mail server knows the plain text file and its hash, it can identify all potential copies, and keep the file only as a single copy (plus reference counting).

Text formatting

Text on the current Internet is typically either plain text or HTML. Plain text is easy for users to write, HTML is easy for machines to render. Often people use something in between, simple markup languages that use plain text when possible, and a few special characters for the markup; very common in Wikis, but also in e-mail, where by convention some markups are possible.

The fact that Wikis resemble the original intention of the Web much more than the HTML based part of it probably means that the Wiki syntax principle really is the right thing to do. It allows to structure the text, while keeping use easy, and programs simple. Layout and formatting is best left to style sheets, anyway. For examples, see different Wiki languages, and for more sturctured data [html]YAML and [html]JSON.

The original Internet was ASCII only, and developed an abundance of different coding systems before converging to UTF-8. Internet 2.0 will have all texts in UTF-8, and when possible also uses approaches like Unihan to reduce unnecessary diversification of code points.

Single application as frontend

The "browser" is the Internet: Many people like this association, so while it is wrong today, just make use of it. If every server deals with file abstractions as much as possible, the remaining application logic is just how to display and organize these files. A web browser is scriptable, anyway, so all a mail server has to do is to put an appropriate script into the top index file.

Security

The requirements of a secure Internet are multiple:

Multiple implementations for this problem already exist: IPSec, SSL, PGP, and SSH, to name a few popular. These protocols have common parts, like key exchange, and use common algorithms, like RSA or ElGamal and AES or Blowfish. They also deal with some things differently, like key trust. SSL uses an institutional trust, where a "trust center" provides a certificate, while PGP uses a trust network, where peers add trust to a key by signing it. In SSL, the certificate is presented during establishing the connection, while in PGP, the key may be obtained from a key server (or by other non-formalized means). Other people might like hierarchical trust chains, and so on.

Ideally, all these systems can be unified, as well. This can't be done with one of the systems, as each of them have their own shortcomings. One important is the key certificate: A container file like the PGP key can handle both the institutional approach (by simply containing a trust center certificate), and the peer-to-peer approach (by containing signatures of peers). It is also beneficial if the key distribution and the message distribution work with the same protocol, i.e. you can use the mail transfer protocol to obtain the key of your e-mail partner as well as to deliver the message.

Keys can be cached, both public keys of endpoints as well as session keys - if you connect the same server again, it's faster to resume a cached session key (a "shared secret") instead of doing the key exchange again.

Connection initiation is limited to one single packet (e.g. down to 128 bytes; very simple devices like mouse or keybords, capable of only sending and receiving 32 bytes packets likely also can't encrypt). This single packet can contain a session key encrypted with the public key of the receiver - if that public key is available by the routing servers. This public key can't be long, so elliptic curve cryptography is highly recommended (128 bit security can be obtained with a 256 bit key). The initiator sends a session key encrypted with the public key of the server. If authentification is necessary, a certificate exchange can happen afterwards on the encrypted channel (certificates are unknown in size).

If the public key is not available on a key server (routing server), a query for the public key can be the first packet send by the initiator. The reply will not initiate a connection; the initiator will use this public key to initiate the secure connection.

Encryption technologies face challenges, and may be found out as weak. So it is important that more than one secure hash, one symmetric and one assymmetric encryption method is available.

Anonymity often requires hiding your destination addresses, as well (onion routing). Onion routers need a setup period for key exchange. The onion router will take packets that are fully encrypted (except address field), determines key and target address based on the source address and the remainder of the destination, decrypts the packet, and passes it forward to the next hop after a random delay. Only the destination will drop the random seed from the packet.

PKI, identity, and security tokens

As already mentioned, one important part of security is to actually have a publik key infrastructure (PKI), i.e. be able to retrieve public keys and establish trust to these keys. With DNSSEC, the Internet 1.0 is finally starting to get there. Internet 2.0 should embrace this approach: The DNS/routing service infrastructure is hierarchical, and should provide a PKI for name services. The physical connection (ISP) is also sort-of hierarchical, and can provide another PKI, this time for physical connections rather than for names. Mail hoster can provide a PKI for their members. And independent certification authorities can link names with real persons or corporations, if not already provided by the other PKIs - or with audit results (e.g. an audit on information privacy for service providers like banks or freemailers).

Public keys are also used to manage identity. A public key login is a lot more secure than a password login, the private key can't be guessed, and is in the control of the user. For users who use different computers, including those they don't control themselves (employer, internet cafe), a small "security token" or smart card is a good way to carry the token with them - it can be part of their mobile phone or a small USB token to be inserted in the computer. Mobile phones already use a similar approach (the SIM card). The SIM card is a good example how a PKI with physical tokens can be handled by provider/customer relationships.

The security token should store the private key(s), allow to back it up on another security token, but not expose it. The token should have at least one button to separate access permission from the hostile environment, which includes the keyboard of the used computer. Adding a token reader to the keyboard itself (to allow entering pins without host interaction) could be a good idea, as well as adding finger print readers or other ways to improve secure authentication.

Some users might completely outsource the identity to a web service, like OpenID proposes. I don't consider this a good idea, especially when the sign in approach is prone to phishing (people don't inspect the source code of the web page). The current user+password authentication with backup questions for lost passwords are worse, though.


Created 03sep2005. Last modified: 8jan2014 by MailBernd PaysanPGP key