direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Logo der TU Berlin

Inhalt des Dokuments

Development and Iterative Learning Control of a Two-Wheeled Inverted-Pendulum-Robot

From Fachgebiet Regelungssysteme TU Berlin

Jump to: navigation, search

Fig. 1: CAD rendering of the two-wheeled-inverted-pendulum-robot.

Over the last decades, the two-wheeled inverted-pendulum-robot (TWIPR) has become a popular testbed in control research. The robot consists of a chassis housing a computing unit, motors, and sensors, and has two wheels mounted to its bottom. The resulting dynamics are similar to those of an inverted pendulum, thus, requiring feedback control to stabilize the system. Due to the non-linear, under actuated dynamics, the TWIPR poses a challenging task in control engineering, which contributes to the system's popularity as a testbed.

In contrast to the majority of previous research, which has focused on developing stabilizing feedback control, this project aims at extending the TWIPR's areas of application. Namely, we intend to use the TWIPR as an intelligent agent, which can solve complex tasks and, in the meantime, serve as a testbed for learning algorithms. To illustrate this idea, consider a TWIPR as depicted in Fig. 2, which has a electromagnet mounted on top. In front of the robot, there is a box with a magnetic surface. If the TWIPR dives forward following a pitch and position trajectory such that the electromagnet connects with the box, the robot can pick up and transport the object. Afterwards, the TWIPR can deliver the object to a container by appropriately diving forward and disabling the electromagnet. In a similar fashion, the TWIPR can solve tasks such as diving beneath obstacles as visualized in Fig. 3.

Fig. 2: CAD rendering of the two-wheeled-inverted-pendulum-robot picking up an object

Fig. 3: Animation of the TWIPR diving beneath an obstacle

This novel idea of using the TWIPR as an intelligent agent interacting with its environment motivates the question of how a TWIPR has to be designed to perform such highly dynamic maneuvers. This question becomes particularly relevant when considering the existing body of literature, where the TWIPR typically was only operated in pitch angles of roughly 20 degrees. In contrast, the above described maneuvers require the TWIPR to operate within pitch angles of up to 90 degrees.

In order to build a TWIPR capable of performing such highly dynamic maneuvers, model-based analysis of the interaction between the robot's design and achievable closed-loop performance was carried out. In these simulations, the effects of the robot's weight distribution and inertia on the region of operation were investigated leading to an optimal combination of the chosen hardware and weight distribution. Furthermore, it was found that adding artificial mass to lower the pendulum's center of gravity increases the achievable region of attraction. In a similar fashion, adding mass to the wheels increases the maneuverability of the robot.

Fig. 4: Video of the TWIPR reaching its upright equilibrium from an initial pitch angle of 90 degrees

To stabilize the system, linear state feedback is applied. By appropriately adjusting the parameters of the feedback controller, the closed-loop systems achieves sufficient performance despite the disturbing effects of the non-linear dynamics and significant friction. In particular, the TWIPR is capable of reaching its upright equilibrium from a pitch angle of 90 degrees as shown in Fig. 4.

In order to perform maneuvers such as the dives described above, the TWIPR has to precisely track given reference trajectories. This is difficult to achieve with feedback control alone as the system is subject to unknown disturbances such as friction and the system parameters are not precisely known. To solve this problem iterative learning control (ILC) is applied, which aims at improving tracking performance in repeated tasks. The fundamental concept of ILC consists of applying a feed-forward input trajectory. Based on the resulting output trajectory, the input is updated to decrease the error in the following trial. As ILC exploits the error information of previous trials, it often achieves greater tracking performance than conventional feedback control.

In the case of the TWIPR, the learning law is determined using quadratic optimal design, which results in an monotonically convergent ILC system meaning that the norm of the error trajectory decreases on each trial. Properties such as monotonic convergence pose a major advantage of ILC compared to machine learning methods such as reinforcement learning. For the latter, none or only few guarantees regarding convergence and the error progression during the learning can be given. In contrast, theory of ILC has produced criteria guaranteeing stability of the learning system, perfect tracking, and monotonic convergence of the error. The capability of the ILC system is displayed in Fig. 5, which shows the learning process of the TWIPR and the resulting dive.

Fig. 5: Videos of the TWIPR diving beneath an obstacle and the learning process

People involved



The drive solution used in this project is sponsored by Maxon Motor AG.

Related Publications

Music, Zenit, Molinari, Fabio, Gallenmüller, Sebastian, Ayan, Onur, Zoppi, Samuele, Kellerer, Wolfgang, Carle, Georg, Seel, Thomas, Raisch, Jörg. Design of a Networked Controller for a Two-Wheeled Inverted Pendulum Robot. 11 2018.

T. Seel, T. Schauer, J. Raisch. Monotonic Convergence of Iterative Learning Control with Variable Pass Length. International Journal of Control, 90 (3):409–422, 2016.
M. Guth, T. Seel, J. Raisch. Iterative Learning Control with Variable Pass Length Applied to Trajectory Tracking on a Crane with Output Constraints. In Proceedings of the 52nd IEEE Conference on Decision and Control, pages 6676–6681, Firenze, Italy, December 2013.
T. Seel, T. Schauer, J. Raisch. Iterative Learning Control for Variable Pass Length Systems. In Proceedings of the 18th IFAC World Congress, pages 4880–85, Milan, Italy, 2011.
J. Beuchert, J. Raisch, T. Seel. Design of an iterative learning control with a selective learning strategy for swinging up a pendulum (accepted). In European Control Conference (ECC), 2018.

Recommend this page