The limits of optimal control, from the maximalist and minimalist perspectives
This is a live blog of Lecture 4 of my graduate seminar “Feedback, Learning, and Adaptation.” A table of contents is here. The dramatic conclusion of Steampunk Data Science will appear later this week.
Though ostensibly chasing similar questions, the fields of artificial intelligence and control theory couldn’t be more temperamentally different. In AI, much like the rest of computer science, there’s a mindset to “just do stuff” and patch bugs as they come. There’s a spirit of tinkering and benchmaxxing, and the expectation that systems will get more robust as they get more capable. I mean, the AI Safety people seem to think that the way to get safe AI is to spend billions of dollars building a superintelligent AI. They say this out loud to anyone willing to listen.
Control theory, by contrast, wants you to learn non-self-adjoint operator theory before you think about building anything. If you’re designing an airplane or any other system where failure can lead to catastrophe, safety has to come first. And the path to safety is always more math.
Finding a nice middle ground between these two mindsets has been much harder than I’d like. And yet, the two fields connect at optimal control. The foundation of reinforcement learning (and its successor, recursive self-improvement) is optimal control. Modern control theory emerged in the Cold War planning of the 50s, where aerospace engineers developed optimal control to plot rocket trajectories. In class today, we’ll look at both perspectives and find why they not only can’t meet in the middle, but both leave us with rather unsatisfactory views of the role of optimization in sequential decision-making.
In its most grandiose formulation, one often adopted by rationalist AI maximalists, optimal control claims a universal model for making decisions in the face of uncertainty. You just need the right cost function and proper Bayesian updating.
This view falls apart once you move beyond the toy problems people nerd out about in online rationalist forums. Most problems have more stages than the Monty Hall problem. And once you have a modestly long horizon, exact solutions of optimal control problems are beyond impossible. We all quickly learn that with any reasonable complexity, you have to move from dynamic programming to approximate dynamic programming, meaning you are never sure if you are actually solving your original problem. Moreover, as soon as you have to incorporate measurement uncertainty and those clever Bayesian updates, the associated optimization problems are at best PSPACE-complete or at worst undecidable. You could try to argue to me that your POMDP problem isn’t PSPACE-complete, and enough iterations of GFYPO will find the answer. But this means that your superintelligent optimizer isn’t actually solving optimization problems. What it’s actually doing is anyone’s guess.
Perhaps we can retreat to the more modest view promoted in contemporary control classes. Optimal control is a nice scaffolding for engineering design. You get a framework to locally make sense of a huge parameter space. If you’re trying to tune multiple PID loops at once, you’d probably rather link things together with a Q and an R than a few dozen controller gains without a clear relation between them. But it’s a local framework (a small world framework, if you will), not a global system.
And the optimization framework often gives you some robustness. In the LQR problem, we know that if we find a control policy, it is stabilizing. We know that LQR gives us a way to bound the amount of errors we’ll accumulate if unmodeled noise perturbs the system. And we know that near-optimal solutions are often pretty good. Some early exciting work showed that LQR was robust to misspecification of the system model.
The gain margins, sadly, turned out to be a bit of a mathematical illusion. As soon as you incorporate the Bayesian reasoning that rationally summarizes the uncertainty in measurement, the policy becomes exceptionally fragile to model misspecification. This is Doyle’s famous “There Are None” example, and we’ll work through the details today.
You can incorporate robustness directly into the formulation, but this comes at its own computational and conceptual costs. Robust control is not easy and is seldom taught. My colleagues can correct me, but I’m not sure we ever teach it at Berkeley. Why is a topic for a future blog post.
So from both the maximalist RL perspective and the minimalist control perspective, we’re stuck…with LQR. Any other optimal control problem of reasonable complexity is intractable and inherently fragile to model mismatch and measurement uncertainty. This feels like a funny place to lay the foundation of an engineering philosophy!
So why would we build an entire system around optimization? I suppose I understand the promising allure of just writing down cost functions and having robots come out. But even people who don’t work on optimization know that there are always tradeoffs in designing systems and making decisions.1 If the goal is literally just “minimize cost,” we get cheap, fragile garbage out. We have to account for all the things we might care about, and write these explicitly into the cost, the constraints, or the solution heuristics.
Ultimately, real robustness gets added in with engineering expertise rather than mathematical rigor. There’s no getting around the years of hard work in simulation and pilot programs before you can convince everyone your system is actually ready for prime time. Optimal control gives you a false sense of interpretability and rigor, and it’s important to be periodically reminded of its illusoriness.
You could argue people who don’t work on optimization probably have a better perspective on this than those of us in the field.












