Merchant Silicon and Bare Metal Changed Everything

In the third part of this technical blog series, I talk about how the emerging merchant silicon and “bare metal” switching movements have permanently changed software-defined networking, enabling production-grade SDN solutions in the data center.  I’ll describe how these two complementary technology trends are providing unprecedented hardware access departing from the  networking industry’s history of vertical integration.  I’ll then segway into how they’ve affected SDN in terms of providing a greater degree of control of the underlying hardware and simplifying the hardware abstraction problem.  And then I’ll finish with how they allow companies such as Big Switch Networks and others to produce more complete, polished solutions than traditional hardware-vendor OpenFlow alone in terms of streamlined upgrade, improved management, and “Zero Touch” network installation.

Crouching Silicon, Hidden Metal

The first trend, “merchant silicon”, refers to the horizontalization of the switching silicon industry; that is, there are now companies that make and sell only the low-level packet forwarding chip that are the workhorses of modern networking.  Historically, networking companies (e.g., Cisco, HP, Juniper, Brocade, etc.) made their own packet forwarding processors and sold them only as part of a larger, bundled solution.  But, in recent years, companies like Broadcom, Marvell, Fulcrum (purchased by Intel), Mellanox and others have started to give the chip makers at the traditional networking companies a run for their money.  That is, rather than producing an apples-to-apples competing vertically-integrated network solution, these merchant silicon companies are only competing with the traditional networking vendor’s in-house silicon design teams by offering better products in terms of features, performance, and time-to-market.  In other words, merchant silicon vendors are filling the same role that Intel, AMD, and others have in the server ecosystem.

The resulting change has been amazing.  Today, every major networking manufacturer is shipping products built with third-party merchant silicon.  Second, the competition among merchant silicon manufacturers is heating up in terms of advanced features, lower prices, and higher performance -- as well as an emerging number of new players (MediaTek, Centec) and competitive startups (Bare Foot Networks, XPliant).  Perhaps most interesting from an SDN perspective, the merchant silicon companies have been more open to publishing the low-level software APIs to control their chips (see Broadcom’s OF-DPA , Mellanox’s OpenEthernet, and Centec’s Lantern Project), allowing a collection of startups and DIY-types direct access to the lowest-level of packet forwarding memory.

The second trend, “bare metal” switches, does for switch hardware boxes what merchant silicon did for packet forwarding chips: there are now companies that make and sell only the networking hardware box without any bundled software.  I use the term bare metal to mean you can buy the hardware unbundled from the software.  Different from the term “white box”, there are both branded (Dell and more coming) and white-label (Quanta, Edge-Core, Agema, Celestica, Alpha, Interface Masters, etc.) bare metal switches.  These bare metal manufacturers produce the rest of the hardware for a switch or router including the sheet metal case, PCBs, SFP housings, power supplies, fans, temperature sensors, etc.  In other words, bare metal vendors are fulfilling the role of Dell, SuperMicro, HP, Lenovo, and others from the server ecosystem.

Bare Metal Packet Control

Looking back, OpenFlow is actually a technical solution to a non-technical problem.  Originally, myself and other researchers at Stanford University wanted to create new networking protocols that worked on real hardware.  To do that, we needed to control the low-level packet forwarding memory in networking devices.  But there was a non-technical problem that held us back: unlike servers, our networking products were closed and proprietary.  So, we could not deploy our own programs on the box or directly manage the networking device’s low-level  forwarding tables.  OpenFlow was the eventual work-around to this problem: a remote API to run new forwarding algorithms off-box (e.g,. on a controller) that exposes some rough pass-through interface to the underlying forwarding tables.  In retrospect, this pass-through interface was problematic as it only exposed a limited subset of the forwarding memory (see previous blog) and evolved slowly over time because it required help from network device vendors to deploy.

Fast forward to the present: with the advent of merchant silicon and bare-metal switching, the need to use OpenFlow as technical work-around to a non-technical problem has disappeared.  With merchant silicon’s APIs, it’s now possible for developers to get direct, raw access to the forwarding memory and create our own, new forwarding algorithms.  With bare metal, it’s possible to run new processes on the box itself, so the decision of which control plane processes to leave on the box (e.g., when distributed processing is needed) and which to move to a centralized controller becomes a design choice and not enforced by the API or architecture.  Further, we could evolve the protocol and products faster because bare metal removes the hardware vendor from the software development loop.

Easier, Application-Specific Hardware Abstractions

An implicit problem with the original OpenFlow model was that it forced everyone to use a single, common forwarding abstraction.  This was problematic in two ways.  First, a single API made it technically difficult to expose vendor-specific innovative packet forwarding differentiation (e.g., differences in pipeline, shared tables, programmable parsing engines) because they all had to fit the same Official OpenFlow abstraction.  Second, because there was a single, agreed-upon API, OpenFlow had to be everything to everyone.  That is, if your application required specific hardware forwarding features (e.g., an egress TCAM table, L3 VRF, hardware MAC learning, etc.), then you needed to make sure your feature was included into the official standard or else your application wouldn’t work.  This lead to a “kitchen sink” approach to API creation where everyone’s features got thrown in.  Additionally, it created an unnecessarily slow API evolution process simply because all changes had to be consensus approved.  Both of these issues have served to hold back the deployment velocity, as well as the scalability and reliability challenges, of initial OpenFlow deployments.

But with merchant silicon and bare metal, developers can now create their own, custom forwarding abstractions; abstractions that best fit their specific hardware and applications.  This is not to say that OpenFlow as a standard is irrelevant: it is still incredibly useful to have a common, agreed-upon RPC protocol between controller and switch.  These benefits extend to both the development side in terms of tooling (e.g., the wireshark plugin!) as well as on the user-side for control plane debugging and monitoring.  But the specifics of the packet processing pipeline order, which tables are exposed, and access to underlying meta-data structures differ heavily across applications.  Our experience has shown that there is as of yet no “one size fits all” abstraction.  With modern OpenFlow on top of merchant silicon and bare metal switches, by writing both the switch and controller logic, it’s possible for the networking application designer to create their own application-specific abstraction of the hardware, dramatically simplifying the abstraction challenge and speeding up the overall innovation cycle.

As an aside, creating application-specific forwarding abstractions completely blows the doors off of the “declarative vs. imperative” discussion.  Despite initial claims and subsequent attempts by the actual authors to clarify the discussion, there is still a belief that OpenFlow is strictly an imperative protocol.  Putting aside the discussion to whether imperative or declarative actually matters in terms of real scalability or performance (hint: it doesn’t), with modern OpenFlow it’s possible to design an abstraction which is as imperative or declarative as the application author fancies.  That is, the controller can and does use OpenFlow to communicate policy information just as well as low-level forwarding memory changes, because in practice, the OpenFlow prioritized “match action” table primitive fits well into many policy language frameworks.  For example, the actual state that is transferred from controller to switches to map a host into a VLAN versus an End-Point Group (EPG) is virtually identical.

More to SDN than OpenFlow

The last big benefit of building on bare metal switches is that we can create complete SDN solutions that use more than OpenFlow.  For example, our Switch Light SDN OS has number of OpenFlow-independent management interfaces (e.g., SNMP, syslog, CLI, and environmental controls) that can be managed directly from each individual switch or in aggregate from the controller as best fits the deployment environment.  Further, using our Zero Touch Networking (ZTN) protocol, we can solve the chicken-and-the-egg problem of how does a switch discover its controllers.  As I demonstrated in ONS last year, ZTN builds on technologies such as DHCP, ONIE, and others to dynamically discover controllers and network boot the switch operating system directly from the controller.  Not only does ZTN automate installation, it also automates software upgrades -- another operational woe.  This level of operational efficiency and no-hands deployment simplicity is simply beyond traditional OpenFlow capabilities and has only been possible with SDN on top of bare metal switches.

--Rob Sherwood, Big Switch CTO