This article has been authored by Vish Bhatia, Co-Founder, Hotelsoft.
It is for this reason there has been significant research into different forecasting methodologies and their success in understanding demands in different market structures. Our article will go through some of these demand forecasting techniques and how to simulate them in simple tools like Excel. However, programmatically we use “R” for our forecasting needs.
Some of the existing approaches are:
Pick-Up Methods – The most commonly used class of demand forecasting methods are advanced bookings called additive and multiplicative pick-up models. The basic concept behind those methods is identifying increase of booking in different booking periods and then accumulating it into total demand that is expected in the future.
For example, in the case of additive pickup, the forecast for any stay-date is computed by:
where x(ij) is the number of reservation already booked by time for the date of stay i, and Aj is the average number of increased bookings that were seen for this property for stay dates which were T days away from completion.
In case of multiplicative pickup this formula would be:
where is the rate of increased bookings during time j-1 to time j and Mj is the average rate of increased bookings during time j -1 to time j and T is booking horizon.
So a stay date’s forecast depends both on the existing booking for the date in question, but also focusses on the average pickup in bookings seen for past stay dates which were at a similar distance from their actual dates as the date we are currently analyzing. So if we found that on average a stay date 30 days away from being completed gains 100 rooms within those 30 days, we could add 100 rooms average to a new stay date if it was also 30 days away from today.
Linear Regression – The regression model consists of estimating the demand (room nights) in a variable Y based on existing bookings for the same stay date at any time t:
where Y(t-n),…,Y(t-n-k) are number of reservations t-n days before arrival date to t-n-k days before arrival. β0 is a constant variable in the regression and β1, β2 and .. βp are variables which determine the influence of that respective variable (on-the-books reservations) in the overall equation. So we have k observations in hand to estimate the final number of rooms booked. As can be seen, the regression model goes through past data of bookings done for this stay date to see a close fit equation which can just allow revenue managers to plug in on-the-book reservations for the stay date and get the forecasted stay date booked rooms.
For example, in the below scenario, we have on-the-books demand for a stay date when the booking dates are 3 to 54 days away. We can use the increase in bookings to estimate where we will land when the actual stay date occurs. This can simply be done by using the TREND function within excel. This excel function will take the known y values as shown in second row, along with known x values given in first row and come up with a Y value for x=0 (which is 56 as shown in the table below). This Y value is essentially the estimated occupied rooms for the stay date in question by getting an equation shown above fit over the observed bookings for the stay date so far.
|Days to Arrival||0||1||3||7||14||21||28||35||42||49||54|
So the TREND function goes through the booking pace shown in the two rows, calculates the values of variables in above equation and comes up with the demand forecast, i.e. 56, which is the left side of the equation.
Bayesian Models – One must keep in mind that the pickup seen for every stay date as we get closer to the date is time series data (data which changes based on time variable). Due to this reason, application of Bayesian Models is also possible on pickup data to come up an expected occupancy forecast. Essentially, Bayesian Models treat past data of pickup as the existing “state” of the stay date and use the existing state data to predict various possible states of the future and the chance/probability of that state occuring. The expected forecast is a weighted average of these various states weighed by the probability of that state.
The Bayesian formula is:
Where we discuss the probablity of an event A(say room demand) being 100 when event B has already occurred (say room demand has already touched 90). The Bayesian probability formula essentially mentions that probability of A, given B has already occurred, is equal to probability of A and B occurring together (ie., room demand exceeding both 90 and 100) divided by probability of B occuring in the first place (ie., what was the probability that room demand would have touched 90 at all).
Thus by using Bayesian probability formula and placing in the appropriate probability chances, a revenue manager could try to check the probability of room demand hitting a certain number (such as event A) with an existing number already on books (event B) and see which value of A has the highest probability (most likely) to occur given B has already occurred.
Recent research has found that additive pickup method has some of the best prediction rates across multiple types of properties. Regardless, each model comes with its strength and weaknesses, specifically Linear regression is known for better accuracy in far-out prediction ranges and Pickup models work better in medium to short stay-date ranges. This variability has lead Hotelsoft to come up with non-linear combination models which tend to combine the predictions from multiple models for a future stay date in question. The combination forecasts from these disparate models are based on previous attempts to combine these numbers in a weighted average (i.e. giving one forecast importance over another) and then measuring the accuracy levels of these attempts to achieve the highest accuracy.
On the question of prediction capacity, it would be ideal to discuss what factors tell a good model from even better one. In the world of statistics, the best way to compare model prediction either against actual data or against each other is through three numbers which check a model’s forecast against actual data for accuracy. These numbers are:
In the Error formulae shown above, A represents Actuals and F represents forecast. MAE simply takes the absolute difference (modulus difference) between forecast and actual values for say, n, historical data points and takes an average of this absolute difference. MSE does the same average but of takes a square of the difference at any data point.
MAE and MSE as shown above are usually acceptable numbers when we have to compare two models running on the same data to check which is more accurate. Since we want higher accuracy, the lower the MAE and MSE numbers, the better the model.
In case we just want to check the accuracy of a demand forecasting model along (not a comparison against another model), MAPE is the only benchmark as MAPE always delivers a percentage between 0 and 100%. The lower the percentage, the better the model is in predicting room nights. MAPE is similar to MAE in that it too uses modulus difference but divides the difference by the actual value to check percentage of difference. This percentage figure allows us to check a model if its forecasts are acceptable and whether the model can be applied in the real world.
For example, if on 1st Jan 2016, we see that the number of booked rooms for Dec 31st 2015 (yesterday) were 60 even though the forecast for the same was 80 rooms, MAE would count the error as |60-80| = |-20| = 20. MAPE would be considered as 20/60 = 33.33% and MSE would be 202 = 400.