I built an interactive Shiny application that uses simulated annealing to solve the famous traveling salesman problem. You can play around with it to create and solve your own tours at the bottom of this post. Here's an animation of the annealing process finding the shortest path through the 48 state capitals of the contiguous United States:
We start by picking an arbitrary initial tour from the set of all valid tours. From that initial tour we "move around" and check random neighboring tours to see how good they are. There are so many valid tours that we won't be able to test every possible solution, but a well-designed annealing process eventually reaches a solution that, if it is not the global optimum, is at least good enough. Here's a step-by-step guide:
The key to the simulated annealing method is in step 4: even if we're considering a tour that is worse than the tour we already have, we still sometimes accept the worse tour temporarily, because it might be the stepping stone that gets us out of a local minimum and ultimately closer to the global minimum. The temperature is usually pretty high at the beginning of the annealing process, so that initially we'll accept more tours, even the bad ones. Over time, though, we lower the temperature until we're only accepting new tours that improve upon our solution.
If you look at the bottom 2 graphs of the earlier USA animation, you can see that at the beginning the "Current Tour Distance" jumps all over the place while the temperature is high. As we turn the temperature down, we accept fewer longer tours and eventually we converge on the globally optimal tour.
That's all well and good, but why do we need the annealing step at all? Why not do the same process with 0 temperature, i.e. accept the new tour if and only if it's better than the existing tour? It turns out if we follow this naive "hill climbing" strategy, we're far more likely to get stuck in a local minimum. Histograms of the results for 1,000 trials of the traveling salesman through the state capitals show that simulated annealing fares significantly better than hill climbing:
Simulated annealing doesn't guarantee that we'll reach the global optimum every time, but it does produce significantly better solutions than the naive hill climbing method. The results via simulated annealing have a mean of 10,690 miles with standard deviation of 60 miles, whereas the naive method has mean 11,200 miles and standard deviation 240 miles.
And so, while you might not think that Nikolay Chernyshevsky, college football coaches, and Chief Wiggum would be the best people to offer an intuition behind simulated annealing, it turns out that these characters, along with cliche-spewers everywhere, understand the simple truth behind simulated annealing: sometimes things really do have to get worse before they can get better.