Flying Blind: The Perils of Relying on Machine Learning Without Accurate Data

Summary: The most advanced machine learning can produce inaccurate results if the problem is not defined correctly. This is highlighted in the scheduling application for aviation companies. Pilots were unsatisfied due to the flawed algorithm.

Introduction: Building schedules for large aviation companies can be a complex task that involving various factors: Seniority, regulations, individual preferences and routes for thousands of pilots and crew members. It is crucial to integrate all the factors correctly to create a fair and efficient schedule that satisfies everyone.

User experience review: The scheduling application was evaluated by conducting one-on-one interviews and observations with 20 pilots from five aviation companies: Delta, United, Air Transat, Air Canada. The pilots found the application frustrating to use. They felt they had no control over it. They did not know how to input information and did not receive the schedule they wanted in the end leading to their dissatisfaction.

Algorithm review: The algorithm behind the application was reviewed to determine why the pilots were unhappy. It was found that the algorithm was based on flawed definitions of preferences with weights that were not set consistently. Machine learning was built on the wrong definitions and wrong data.

Redesign recommendations: The team recommended redesigning the preferences by listing them in a sequence and defining satisfaction and dissatisfaction as meeting or not meeting the preferences, respectively. This approach would ensure that the data accurately reflects the pilots’ preferences.

However, the head of the machine learning department, responsible for machine learning, was initially resistant to the recommendations. Despite proving the effectiveness of the new approach, he believed that his expertise, having a Ph.D and years of research, in mathematics was superior. Eventually and with unnecessary delay, the company implemented the recommendations.

Conclusion: The scheduling application was built on the wrong definitions, resulting in the dissatisfaction of the pilots. The review and redesign of the preferences were necessary to accurately reflect the pilots’ preferences and create efficient and satisfactory schedules. This example highlights the importance of defining the problem correctly, as advanced machine learning and data science algorithms can produce inaccurate results.