@oristarA,
Robot Operating System 2: Design, architecture, and uses in the wild
SCIENCE ROBOTICS • 11 May 2022 • Vol 7, Issue 66 • DOI: 10.1126/scirobotics.abm6074
Abstract
The next chapter of the robotics revolution is well underway with the deployment of robots for a broad range of commercial use cases. Even in a myriad of applications and environments, there exists a common vocabulary of components that robots share—the need for a modular, scalable, and reliable architecture; sensing; planning; mobility; and autonomy. The Robot Operating System (ROS) was an integral part of the last chapter, demonstrably expediting robotics research with freely available components and a modular framework. However, ROS 1 was not designed with many necessary production-grade features and algorithms. ROS 2 and its related projects have been redesigned from the ground up to meet the challenges set forth by modern robotic systems in new and exploratory domains at all scales. In this Review, we highlight the philosophical and architectural changes of ROS 2 powering this new chapter in the robotics revolution. We also show through case studies the influence ROS 2 and its adoption has had on accelerating real robot systems to reliable deployment in an assortment of challenging environments.
INTRODUCTION
Many software platforms have been proposed, sometimes called middlewares, introducing modular and adaptable features that make it easier to build robot systems. Over time, some middlewares have grown to become rich ecosystems of utilities, algorithms, and sample applications. Few rival the Robot Operating System (ROS 1) in its significance on the maturing robotics industry.
ROS 1 was popularized by the robotics incubator Willow Garage (1). Every effort was made to create a quality and performant system, but security, network topology, and system up-time were not prioritized. Regardless, ROS 1 has become influential in nearly every intelligent machine sector. Its commercial rise was the result of flagship projects providing autonomous navigation, simulation, visualization, control, and more (2–4). As commercial opportunities transitioned into products, ROS’s foundation as a research platform began to show its limitations. Security, reliability in nontraditional environments, and support for large-scale embedded systems became essential to push the industry forward. Further, many companies were building workarounds on top or inside of ROS 1 to create reliable applications (5).
The second generation of the Robot Operating System, ROS 2, was redesigned from the ground up to address these challenges while building on the success of its community-driven capabilities (6). ROS 2 is based on the Data Distribution Service (DDS), an open standard for communications that is used in critical infrastructure such as military, spacecraft, and financial systems (7). It solves many of the problems in building reliable robotics systems. DDS enables ROS 2 to obtain best-in-class security, embedded and real-time support, multirobot communication, and operations in nonideal networking environments. DDS was selected after considering other communication technologies, e.g., ZeroMQ and RabbitMQ, because of its breadth of features including a User Datagram Protocol (UDP) transport, distributed discovery, and a built-in security standard (8).
In this Review, we will establish ROS 2’s state-of-the-art suitability for modern robot systems and showcase the technological and philosophical changes that have driven its success. Then, we will expand on that foundation to demonstrate how ROS 2 is influencing the deployment of autonomous systems in several unique domains. Five case studies explore how ROS 2 has enabled or accelerated robots into the wild on land, sea, air, and even space.
RELATED WORK
The history of robot software is long and storied, going back more than 50 years with robots like Shakey (9). Over time, much has been written about how to structure classical planners, concurrent behaviors, and three-layer architectures (10–12). An early example of this is the Task Control Architecture (TCA), which was used to control a variety of robots. For example, Carnegie Mellon Robot Navigation Toolkit (CARMEN) was built on TCA’s message-passing system called IPC (interprocess communications) (13, 14). Message passing has its own rich history in distributed systems: from IBM’s work on message queuing, Java’s Jini, and middlewares such as MQ Telemetry Transport (MQTT) (15–17).
Robotics frameworks provide architectural methods to decompose complex software into smaller and more manageable pieces. Some of these components can find reuse in other systems and may be established into libraries to be leveraged by users. An early attempt to manage this complexity was via a client/server approach in Player (18). A Player server communicates with robot hardware and runs the algorithms needed to perform its task. Clients can connect to the server to extract data and control the robot over a Transmission Control Protocol (TCP) connection. However, its architecture hampered reliability, code reuse, and ability to change out components.
Yet Another Robot Platform (YARP) aids in building control systems organized as peers, communicating over several protocols (19). It facilitates research development and collaboration by promoting code reuse and modularity while retaining high performance. YARP can be used for any application, but its community has focused on humanoid and legged robotics, such as iCub and the Massachusetts Institute of Technology’s Cheetah, and only supports C++.
Lightweight Communications and Marshalling (LCM) is a middleware that uses a publish/subscribe model with bindings in many languages. It concentrates on handling messaging and data marshaling in high-bandwidth low-latency environments (20). This limits the range of robotic applications for which LCM can be effectively used. Open Robot Control Software (OROCOS) is a set of libraries for robot control, focused on real-time control systems and related topics, such as computing kinematic chains and Bayesian filtering (21). The project has grown into a full framework integrating the Common Object Request Broker Architecture (CORBA) middleware and tooling for deterministic computation in real-time applications. The LCM and OROCOS frameworks each concentrate on smaller pieces of the overall system, with a nontrivial proportion of the overall robotics problem left to the end-user.
ROS 1 contains a set of libraries that are useful when building many kinds of robots (1). There are utilities for monitoring processes, introspecting communications, receiving time-series transformations, and more. ROS 1 also has a large ecosystem of sensor, control, and algorithmic packages made available by community contributions, enabling a small team to build complex robotics applications. Although ROS 1 solves many of the complexity issues inherent to robotics, it struggles to consistently deliver data over lossy links (like WiFi or satellite links), has a single point of failure, and does not have any built-in security mechanisms. A table of key differences between ROS 1 and ROS 2 can be seen in Table 1.
Category ROS 1 ROS 2
Network transport Bespoke protocol
built on TCP/UDP Existing standard (DDS),
with abstraction supporting
addition of others
Network
architecture Central name
server (roscore) Peer-to-peer discovery
Platform support Linux Linux, Windows, and
macOS
Client libraries Written independently
in each language Sharing a common
underlying C library (rcl)
Node versus process Single node per
process Multiple nodes per
process
Threading model Callback queues
and handlers Swappable executor
Node state
management None Lifecycle nodes
Embedded systems Minimal
experimental
support (rosserial) Commercially supported
implementation
(micro-ROS)
Parameter access Auxilliary protocol
built on XMLRPC Implemented using
service calls
Parameter types Type inferred when
assigned Type declared and
enforced
Table 1. Summary of ROS 2 features compared with ROS 1.
The ROS 1 community attempted to address some of these concerns, but in nearly all cases, there were compromises made because of architectural and engineering limitations. For example, to address the single point of failure (“rosmaster”), it was required to patch all of the existing client libraries individually with bespoke solutions. In other cases, it was possible to extend ROS 1 for security, via the SROS project. Although successful, it was difficult to maintain and needed further development to meet security trends. These are just two of the many attempts to patch ROS 1, which extended its useful lifetime but did not solve its core limitations.
ROS 2
ROS 2 is a software platform for developing robotics applications, also known as a robotics software development kit (SDK). Importantly, ROS 2 is open source and distributed under the Apache 2.0 License, which grants users broad rights to modify, apply, and redistribute the software, with no obligation to contribute back (22). ROS 2 relies on a federated ecosystem in which contributors are encouraged to create and release their own software. Most additional packages also use the Apache 2.0 License or similar. Making code free is fundamental to driving mass adoption—it allows users to leverage ROS 2 without constraining how they use or distribute their applications.
Scope
ROS 2 supports a broad range of robotics applications, from education and research to product development and deployment. It comprises a large set of interrelated software components that are commonly used to develop robotics applications. The software ecosystem is divided into three categories:
1) Middleware: Referred to as the plumbing, the ROS 2 middleware encompasses communication among components, from network application program interfaces (APIs) to message parsers.
2) Algorithms: ROS 2 provides many of the algorithms commonly used when building robotics applications, e.g., perception, Simultaneous Localization and Mapping (SLAM), planning, and beyond.
3) Developer tools: ROS 2 includes a suite of command-line and graphical tools for configuration, launch, introspection, visualization, debugging, simulation, and logging. There is also a large suite of tools for source management, build processes, and distribution.
In this section, we will explore the first category, the middleware, as the foundation of ROS 2.
Design
Design principles
The design of ROS 2 has been guided by a set of principles and a set of specific requirements. The following principles are asserted:
Distribution
As with similarly complex domains, problems in robotics are best tackled with a distributed systems approach (23). Requirements are separated into functionally independent components, such as device drivers for hardware, perception systems, control systems, executives, and so on. At runtime, these components have their own execution context and share data via explicit communication. This composition should be conducted in a decentralized and secure manner.
Abstraction
To govern communication, interface specifications must be established. These messages define the semantics of the data exchanged. A favorable abstraction balances the benefits of exposing the details of a component against the costs of overfitting the rest of the application to that component, thereby making it difficult to substitute an alternative. This approach leads to an ecosystem of interoperable components abstracted away from specific vendors of hardware or software components (24).
Asynchrony
The messages defined are communicated among the components asynchronously, creating an event-based system (25). With this approach, an application can work across the multiple time domains that arise from combining physical devices with a host of software components, each of which may have its own frequency for providing data, accepting commands, or signaling events.
Modularity
The UNIX design goal to “make each program do one thing well” is mirrored (26). Modularity is enforced at multiple levels, across library APIs, message definitions, command-line tools, and even the software ecosystem itself. The ecosystem is organized into a large number of federated packages, as opposed to a single codebase.
We do not pretend that these design principles are universal and without trade-offs. Asynchrony can also make it more difficult to achieve deterministic execution. For any single, well-defined problem, it is possible to construct a special-purpose monolithic solution that is more computationally efficient because it does not involve abstractions or distributed communication.
However, after a decade of experience with the ROS 1 project, we claim that adherence to these principles will generally lead to better outcomes. This approach facilitates code reuse, software testing, fault isolation, collaboration within interdisciplinary project teams, and cooperation at a global scale.
Design requirements
ROS 2 aims to meet certain requirements based on the design principles and needs of robotics developers.
Security
Any software that interacts with a network must include features to secure that interaction against accidental and malicious misuse. ROS 2’s integrated security system includes authentication, encryption, and access control (27–29). Designers can configure ROS 2 to meet their needs through access control policies that define who can communicate about what (30).
Embedded systems
As a general rule, a robot includes sensors, actuators, and other peripherals. These devices can be relatively sophisticated, containing microcontrollers that need to communicate with CPU(s) where ROS 2 is running. A full ROS 2 stack is not expected to run on small embedded devices, although ROS 2 should facilitate and standardize integration of CPUs and microcontrollers. Micro-ROS allows ROS 2 to be reused on embedded systems (31).
Diverse networks
Robots are used in a variety of networking environments, from wired LAN for robot arms on assembly lines to multihop satellite connections for planetary rovers. In addition, robots will often use internal networks to connect processes within and across CPUs. ROS 2 provides quality of service that configures how data flow through the system, thereby adapting to the constraints of a network (32).
Real-time computing
From humanoids to self-driving cars, it is common for robot applications to include real-time computing requirements. To meet safety and/or performance goals, some parts of a system must execute in deterministic amounts of time. ROS 2 offers APIs for developers of real-time systems to enforce application-specific constraints (33, 34).
Product readiness
When a robot moves beyond the laboratory and into commercial use, new constraints are introduced. ROS 2 aims to meet product requirements spanning design, development, and project governance. One objective result of these efforts is Apex.AI’s functional safety [International Organization for Standardization (ISO) 26262] certification of their ROS 2-based autonomous vehicle software (35). This allows ROS 2 to be run in safety critical systems like autonomous vehicles and heavy machinery.
Communication patterns
The ROS 2 APIs provide access to communication patterns. These are notably topics, services, and actions that are organized under the concept of a node. ROS 2 also provides APIs for parameters, timers, launch, and other auxiliary tools that can be used to design a robotic system.
Topics
The most common pattern that users will interact with is topics, which are an asynchronous message-passing framework. This is similar to other asynchronous frameworks, such as ASIO (36). ROS 2 provides the same publish-subscribe functionality but focuses on using asynchronous messaging to organize a system using strongly typed interfaces. It does so by organizing end points in a computational graph under the concept of a node. The node is an important organizational unit that allows a user to reason about a complex system, shown in Fig. 1.
Fig. 1. ROS 2 node interfaces: Topics, services, and actions.
The anonymous publish-subscribe architecture allows many-to-many communication, which is advantageous for system introspection. A developer may observe any messages passing on a topic by creating a subscription to that topic without any changes.
Services
Asynchronous communication is not always the right tool. ROS 2 also provides a request-response style pattern, known as services. Request-response communication provides easy data association between a request and response pair, which can be useful when ensuring a task was completed or received, shown in Fig. 1. Uniquely, ROS 2 allows a service client’s process to not be blocked during a call. Services are also organized under a node for organization and introspection, allowing a subsystem’s interfaces to appear together in system diagnostics.
Actions
A unique communication pattern of ROS 2 is the action. Actions are goal-oriented and asynchronous communication interfaces with a request, response, periodic feedback, and the ability to be canceled (Fig. 1). This pattern is used in long-running tasks such as autonomous navigation or manipulation, although it has a variety of uses. Similar to services, actions are nonblocking and organized under the node.
Middleware architecture
Adhering to the previous design philosophies, the architecture of ROS 2 consists of several important abstraction layers distributed across many decoupled packages. These abstraction layers make it possible to have multiple solutions for required functionality, e.g., multiple middlewares or loggers. In addition, the distribution across many packages allows users to replace components or take only the pieces of the system they require, which may be important for certification.
Abstraction layers
Figure 2 displays the abstraction layers within ROS 2. They are generally hidden behind the client library during development, and developers would only need to be aware of them for unusually application-specific needs. Most users will experience only the client libraries.