Software engineers working in a conference room on their laptops.
Tourism & Hospitality

Booking Cancellation Prediction

Predicting booking cancellations for hospitality resilience.


The project

Booking cancellations were a concern a few years ago, but during the pandemic, they hit the industry hard, which was the reason why we started working on this project. 

The project consists of training algorithms by using the data from our customer’s booking history to predict booking cancellations in their system, allowing hotels or travel agencies to anticipate and act in a preventive manner.

The process

To train the algorithm we used 80% of the bookings from the most active hotels that were created between 2018 and 2019. The other 20%, the test set, was used to validate the performance of our model.

Moreover, we categorized the predictions of the bookings in the test set, into three categories: to be canceled, to be confirmed, and uncertain. Uncertain predictions are predictions that have a probability that is close to 50% for either outcome of the booking.

Of the 24.652 bookings that were used to test the model, the predictions were:

  • 5.256 (21%) to be canceled;

  • 15.500 (63%) to be confirmed;

  • 3.896 (16%) uncertain;

Of the 5.256 to-be-canceled predictions, 4.955 (94%) were correct.

Of the 15.500 to be confirmed predictions, 13.887 (90%) were correct.

With a 94% accuracy for predicting canceled bookings and 90% accuracy for predicting confirmed bookings, we can reliably use the booking predictions of our model. The next step is to decrease the number of uncertain bookings, while also maintaining the accuracy of the predictions.

After the implementation of this project, we continued with future improvements to ensure that the ML service is easy to operate and that the trained models are easy to manage, monitor, and deploy, including: 

Implementation of ML Pipeline with DVC consists of five stages: 

  1. Data selection

  2. Feature extraction

  3. Train-test split

  4. Train model

  5. Model evaluation

  • DVC and git enable versioning of the dependencies, parameters, and outputs for every stage of the pipeline which allows reproducing different versions of our model and gives a simple method to run multiple model experiments with different training parameters;

  • Development of the ML API which receives requests for predictions, makes the predictions on the already trained models, and sends back the results;

  • Generalized model of training multiple algorithms for different GODO properties sequentially and storing and versioning the generated models in the ML Pipeline in order to achieve a simpler process of tracing the changes, and an easier process of managing the different models; 

The result

The final result is a light, yet powerful solution - one model to rule them all.

Similar projects

Let's discuss your project.

We're all ears.

Drop us a line