This blog post is about motorsport. What does motorsport have to do with software engineering, you ask? Read on!
I’m a big fan of motorsport and Le Mans 24 hours in particular. Le Mans is a 24 hours motor race with about 50 race cars of four different classes competing in the same race. Le Mans is also a legend, run first time in the 1920’s. To run a race over 24 hours is very challenging for teams and machinery. An F1 race is only 2 hours and cars are only a bit faster. We’re 240,000 spectators, and about 40,000 danes travel the 1500 km to get there – including me and two of my boys, so it’s also a big, great party.
But to me as an engineer, Le Mans is also intellectually inspiring. Le Mans is a reminder that while we can do a lot with technology, there’s also a lot that we can’t do and that the laws of physics will always set a limit on the track. In order to try to win, race car manufacturers and teams will constantly try to push that limit, but it will always be there.
When cars are withdrawn from an F1 race, its typically because of an accident – drivers making mistakes. While driver mistakes are unavoidable over such a long race, withdrawals are actually more common due to technical reasons: The equipment breaking down, engines blowing up, or just electrical gremlins pulling the plug. The fascinating part is that it has been like this since the very beginning.
So failures are more or less expected. 50 cars at the start line, and usually only some 25 at the finish. But Le Mans 2010 was a little different: It was Peugeots ”Black Swan Year”.
The Peugeot 908’s were again extremely fast, perfectly tuned, and ready to race. Audi had gone through a challenging development process with their new R15, which turned out to not be as fast as they had had hoped it would be in 2009, but was improved in 2010, so we all thought that 2010 was to be a year where Audi would be able to compete with Peugeot on speed. But Peugeot again set impressive lap times never before seen at Le Mans. Couple that with the fact that their team finally seemed to be a well working machine now (proved by the 2009 overall win), so it seemed that Audi could only hope for a podium.
Until a conrod broke on the leadning Peugeot at Tertre Rouge on Sunday morning. I was there with my camera, enjoying the early morning and the race, but I left that area only 10 minutes before so I didn’t have a chance to catch the action (aren’t you always in the wrong place at the right time?).
It came as a shock to everyone. I watched the TV pictures on the big screens around the track showing the team completely in shock about what had happened, and I looked down into the pit area where the Eurosport TV crew was trying to get comments from the team which seemed to be paralyzed. But it seemed to be a coincidence at the time. Until a few hours later when another Peugeot failed in a similar way. We started wondering what was going on? And with only one hour remaining of the race, the customer entered Peugeot 908 failed and the race was lost. Audi won 1-2-3 with their three R15+ cars.
It was devastating. The Peugeot Sport director was seen crying on TV. The french spectators and press went home early from the race. This was a nation loosing a battle with their negihbors.
Of course we didn’t know the technical reason why all Peugeots had failed at the race, but it seemed as if they had been ‘programmed’ to fail. About a month later, Peugeot released a statement that the three cars had suffered from the same failure and that the fourth car (which retired before the others due to a broken suspension) would have suffered the same problem if it had still been running during Sunday. Peugeot said that the breakdown came as a surprise. That they had tested the cars and engines and never expected this. I’m sure it was a surprise. I’m also sure their sports director didn’t expect this embarrasing disaster in front a whole nation of supporters. I’m sure they thought everything was Hunky-dory.
But at the same time, I’m not in doubt that the problem was rooted in history: That an engineer somewhere knew that there was a risk, but for reasons which are probably rooted in group thinking and organisational behaviour, kept the knowledge to himself – or simply chose to ignore it. Conrods have failed in cars since the first reciprocating engine was built, but engineers have learnt to handle this so today we have reliable engines that can easily do more than 300,000 km. When engineering has made something inherently unreliable reliable, people tend to forget about it. Management expect it to be under control.
This is true even for competetion engines, even though they are of course pushed much more and designed to be minimal and as light as possible in order to promote power output: I’m sure Peugeot management thought the conrod supplier had everything under control, which they might have had – but they could have worked to meet the wrong specification. We won’t know the details, and it’s not important either.
To win Le Mans you have to be running at the end. The Peugeots didn’t. They obviously forgot what it takes to make something inherently unreliable reliable: It takes focus on what can possibly go wrong. Software is not different: When software fails, it’s often also because someone forgot to raise and issue or because someone chose to ignore it. Many disasters in systems are rooted in history, which also means that they could have been prevented.
This is where professional pessimists on a team can help. Where testers’ negative attitude can mean the difference between success and failure.
For me, Peugeot’s black swan event at Le Mans 2010 is a reminder that we all have shared responsibility for seeking out and communicating these details. By testing, careful inspection, talking to developers and users, and by constantly focusing on problems. We’re on a mission to prevent disaster by making the risks known so managers can take informed descisions.
Le Mans 2011 will be interesting in a new way since the cars will be technologically different with hybrid engines. This is new technology, so we should probably expect it to fail more at random or just affect performance during the race: Longer pitstops and the like. But lets see, it’s a big and long race. Anything can happen! I’ve got our tickets booked, so let me know if you’ll be at Le Mans in June – and we can meet up and enjoy the cars. And perhaps discuss engineering and testing?