The power of clean data: a baseline for good forecasting

Data management

Sometimes the simplest tasks can be made problematic by the most obscure hurdles. This is particularly true for data management.

Imagine a Database Administrator is charged with sorting a range of historic texts by estimated publishing date. Let’s imagine these texts have estimated publication dates ranging from the early 18th century to the mid 20th, and that the headline information (including dates) for these texts are all stored in Excel. At first, you might assume this would simply require selecting the column containing the dates, and sorting by time.

However, in Excel the world began on the 1st January 1900 and is due to end on the 31st December 9999. After researching how to overcome this problem, the Database Administrator has installed a handy plugin that captures pre-1900 dates. Then the next hurdle presents itself, the eleven “lost days” between 2-14 September 1752. (Which arose from the Julian to Gregorian calendar switch.) By now the “simple task” has revealed itself to have far more steps than originally estimated.

The value in data

A good data set is a large data set. Large data sets, however, come with increased complexities. As we saw with the example above, even fairly simple data sets can throw up difficulties. The larger the volume of data, the harder these problems are to unpick. The main issue with large data sets, though, is the increased likelihood of anomalies.

In short, the larger the data set becomes, the more difficult it becomes to work with. This catch-22 has led to increased investment in the field of data cleansing. A task that is necessary before any data mining activities, which is where the real value-add is found, can take place.

The data sets that are used in a cash forecasting process are no different. Just like all other forms of data analysis, cash forecasts suffer the same decreased output quality if the data integrity is compromised.

Key cash flow data sources

In the field of cash forecasting, having a clean set of historic data is enormously powerful as it forms a baseline from which accurate forecasts can be made. The data sources that feed into a cash forecasting process can be broadly broken into two categories, sources of actual data, and sources of forecast data.

In most cases the actual cash flow data is sourced from:

  • Enterprise Resource Planning (ERP) / Accounts systems
  • Bank file downloads (BAI2 or MT940 files)

Whereas forecast cash flow data is usually sourced from:

  • ERP systems
  • Treasury Management Systems (TMS)
  • Data modelling software
  • People (individual business units)

Data sources infographic

The extraction of this data can either be automated, through the use of specialised cash flow forecasting tools, or it can be drawn manually. Because of the range of data sources that input into the process, it is important that the data is standardised into a common format before it is used. A common cause of data defects is that incorrect information has been input, possibly due to human error (if the process is manual).

As previously stated, this standardisation step can be an automated part of the process, but it is important that input feeds are mapped carefully, by someone with the appropriate expertise, to ensure the correct fields are captured.

What to watch out for

Ultimately, the goal of cleaning a historical cash flow data set is to create to a representative view of what’s happened in the past, which in turn provides a reliable basis for building a view of the future. Key to this exercise is removing items and once off cash flows that will not be repeated in the future as they will ultimately pollute the modelled forecast output.

Things to watch out for when cleaning a cash flow data source include:

Accounting Journals

Any information exported from an ERP or accounting system could potentially contain accounting journals such as reversals or currency adjustments that will impact the quality of the underlying data that will be used for forecasting. These journals don’t represent underlying business activity and will need to be removed.

Acquisitions & Divestures

Acquisitions are likely to be the biggest cash out flow, or series of cash flows, leaving a business over the course of year. It’s highly unlikely that the amount spent and timing of an acquisition will be repeated in the future and therefore it will be necessary to remove these cash flows before using the data set for forecasting purposes. The same is true for divestures.

Investment Capital Expenditure

Aside from acquisitions, investment capital expenditure, particularly for spend on large once off projects, can be the lumpiest cash out flows over a period of time. Even if capital expenditure levels are expected to remain consistent with previous periods, the amount and timing of this expenditure will likely be very different to previous periods. Which in turn warrants their removal from the data set.

Debt Movements

Debt drawdowns, repayments and refinancing can have a huge impact on total cash movements over a period of time. Typically, these movements aren’t representative of day-to-day business activity and should be removed when using the data set to model future activity.


In most mid to large size companies intercompany cash flows between business units can sometimes equal external cash flows in total volume over a period of time. Of course, intercompany shouldn’t have an impact on net liquidity but when analysing a particular business unit or segment it is important to remove non-trading intercompany movements that are unlikely to be repeated in the future.

General Outliers

In the regular course of business unusually large cash movements, be they payments or receipts, will occur for a number of reasons. The winning of a large new customer account could lead to a once off large cash receipt, for example, or the prepayment for raw materials to secure a discount could lead to large once off payment. The removal of these types of cash movements may require a little more judgement than the previous items but, in some cases, data is improved when they are removed.

Data cleaning – a continuous process

Owing to the anomalies listed above, data sets often undergo a “cleaning” exercise before any in-depth analysis is performed. However, as noted, care must be taken not to conflate data anomalies with data defects. While it goes without saying that removing data defects improves data quality, callously removing all anomalies might mean removing some important signals from the data.

In cash forecasting, data cleansing is a continuous process. The various measures that a large organisation must take, such as those listed above, mean that there are regularly factors that cause sizable distortions to these data sets.

Maintenance made easier

Specialised software simplifies the continuous data cleaning process, along with all other elements of the cash forecasting process. As referenced in the section on data sources above, automating the process expedites data cleaning. Elements such as intercompany cash flows should always net to zero. However, with a manual process this balancing task can be hugely time-consuming and inaccurate. Specialised software, with a dedicated counterparty driven intercompany tool, simplifies this task to a touch-of-a-button exercise.

This simplification enables the person analysing the data to quickly review and amend outputs and therefore take the right anomalies into consideration.

Having the right personnel at the right stage of the process

As we discussed in a recent article on the analytics skills gap, it is important to have people with the right skills at the right stages of the process. While not an absolute requirement, some companies may choose to recruit a Database Administrator (DBA) to ensure that all data is formatted and structured properly as it goes into the model.

In any event, the person charged with managing the database should be familiar working with large data sets and have a good knowledge of database theory and database design. This expertise will help with the identification and rectification of defects in the data, as well as managing the integrity and security of the data as a whole.

Progression through iteration

Once all checks have been made, appropriate actions have been taken, and everybody is confident that the data is of sufficient quality and integrity, it can be loaded into the forecasting model. The focus now switches to the output side of the process, i.e. on the reporting, forecasting, and analytics. Here, quality is improved through a process of careful measurement and adjustment. A gradual process, but one that becomes increasingly valuable to the company as these improvements are made.

Help for Corporate Treasurers

Going back to our hypothetical Database Administrator, struggling to sort historical dates in Excel, their best option is to re-format the database. They’ll need to sort their dates into separate columns; one containing year, another for month, another for day. Then filter with a cascading sort, factoring in all three columns.

For Corporate Treasurers, however, we can be of far greater assistance. We provide dedicated cash flow forecasting software to large, multinational organisations and have extensive experience helping clients across a range of industries.

In addition, to assist those considering updating their old processes, we have written a whitepaper which outlines the steps involved in setting up a cash forecasting process. If you have any questions on this, or would like to see a demo of our software in action, please do not hesitate to contact us.