No data found

Britian’s top stats expert’s new book
THE UPS AND DOWNS OF SYSTEMS

By Peter May

Everyone interested in horserace betting either uses a system or knows someone who does. This may seem to be an over-generalisation but it is nonetheless true.

Systems are a very popular betting method. As opposed to form study, they are easy to use and take very little time to check each day, which is a major advantage for those who do not rely 100 per cent on the sport for income.

The bulk of the work is undertaken during a period of system development resulting in one or several systems which can then be employed during the season. Another advantage with this approach is that systems can be precisely structured.

This rigid style allows them to be tested against known race results and thus provides the bettor with invaluable information regarding their likely performance. Unfortunately this level of rigidity can be a drawback since it does not allow the system to develop and adapt to changing circumstances.

Like many things in life, systems are based on past experience. The developer either uses his or her personal knowledge of the sport to generate a system, or uses a more structured technique based on an analysis of previous results.

The method is very effective, but like any form of retrospective analysis, it needs to be conducted and validated in an unbiased manner to achieve the best results.

A requirement of a horseracing system is that it should be able to be written as a rule that is unambiguous and straightforward to apply. The system should also generate an acceptable success rate and return a profit. But, most importantly, the system should be likely to reproduce its good historical performance in future races.

To satisfy the first objective the system should rely on quantifiable variables, such as the number of days since the horse last raced or its position in the betting market. Qualitative variables, such as suitability of the going, or race distance, should be avoided unless they can be defined precisely. These opinion-based variables are always difficult to validate and often their assessment does not remain constant over time.

Simple systems are always preferred to highly complex ones. A simple system consisting of just a few conditions is easy to implement and hence more likely to be applied correctly. Furthermore, these systems tend to be more general requiring less data to validate them and are less prone to over-fitting. Though highly refined systems map the historical data more precisely, they may not perform as well in the future.

Ideally the system will also be unique. Using the usual data items in a conventional fashion is unlikely to return a long-term profit. However, a new relationship between the data and the odds is more likely to return a profit since it will not already have been accounted for in the prices on offer.

Finally, the system must be structured so that the bettor is able to implement it. It is no good developing a highly profitable system that cannot be implemented. For instance, someone who is employed full-time may not be able to monitor the betting shows closely and so would not be able to run a system with such a requirement. This may appear to be obvious, but it is surprising the number of systems that are developed without thought for how they are going to be run.

Although it is possible to identify specific combinations of variables which return a long-term profit as a result of general form study, the most common approach to systems development is by data mining.

Data mining methods require a data set containing historical results which include all of the critical factors. This is a key step in the whole process. It is necessary for the systems developer to base his or her work on a data set that is representative of the domain, accurate and unbiased.

This may seem obvious, however, it is not necessarily a straightforward task to undertake, especially when extracting the data manually from a formbook. Furthermore it is important to use data from more than a single season. Seasonal variations need to be accounted for in any system, thus a data set covering three or more seasons is desirable.

The data mining process then requires the analyst to search the data for profitable relationships between the data elements. Consequently, the search is driven by the profit variable. With form study the result is generally the horse with the highest chance of success, the bettor can then convert these findings into acceptable prices before considering a bet.

However, with system development the target is profit from the outset and this changes the way the variables need to be viewed. When analysing a race with the aim of determining the most likely winner, the factors considered generally have a linear relationship with the outcome of the race.

For instance, the higher the speed rating, with respect to the other runners, the greater the chance of success; the higher the trainer’s success rate the more likely the horse is to win and so on.

With systems development the relationships that need to be considered change from linear to non-linear simply because the target variable is not chance of success but profit.

The betting market mirrors the horse’s chance of winning and the more likely a horse is to win the shorter the price on offer will be. Consequently a horse with ideal credentials is unlikely to be on offer at a value price. This also applies to horses at the other end of the market where again the odds will understate the chance of success.

The aim of a system is to find a combination of factors which generate a profit in the long term. This may require the bettor to invest in horses which are far from the most likely winning candidates.

Although the combination of variables that constitute a system may not match the idealistic view of horserace analysis, they must still be logical. However, this logic is not based on conventional approaches, but in respect to the profit/loss statistics.

So, in other words, a profitable system may have as a condition that the horse has been unraced for over 100 days. This defies the conventional logic which dictates that horses returning from a long course absence are less likely to be fit and hence less likely to win. However, the system view is that these horses are more likely to be overpriced by the market simply due to this factor and the general assessment of its importance by the betting public.

So although it may at first seem illogical to include such a condition, it is in fact a sensible factor to include when considered in the context of the profit/loss variable.

The easiest way to develop a horseracing system is to identify an initial key variable (or base factor), and then build the system around it. This initial factor may be something as simple as a rating, or it may be more complex such as good horses running below par.

Once identified, the race results should be analysed with respect to this key variable to determine a benchmark profit/loss value, and then the other factors considered in turn to determine whether they make any significant improvement to this benchmark figure.

However, with detailed databases examining the other variables is a time consuming task, and the simplest method is to adopt sensitivity analysis approach.

Sensitivity analysis is used in many different numerical disciplines, and essentially it monitors how one system reacts to changes in its influencing variables. For horseracing systems, the initial condition is fixed and the other critical variables allowed to vary across their entire range with profit and loss figures calculated for each value.

As an example consider a system which takes the top-rated horses, based on speed figures, in novice and maiden hurdle races as the base factor. Over the four UK seasons 1998/99-2001/02, these horses returned a level stake profit at starting price of 7p for every £1 staked on the 2,014 races. This is an excellent starting point, and in fact could be used as the system itself since a 7 per cent return is not insignificant and a success rate of 31 per cent is more than acceptable. However, it is always desirable to check other influences in case the system can be improved without losing any generality.

Conventional form study states that winning recent form is preferred, however, it could be apparent that horses which finished unplaced on their latest start are favoured over winners, simply based on the level of profit.

Applying this one condition reduces the number of races to 625 but increases the profit to 25p for every £1 staked. The overall success rate does drop to just over 20 per cent though, which may be too low for some bettors so this condition is reset and ignored at this stage of the process.

In 1936, the Literary Digest published the results of a poll designed to forecast the result of the forthcoming American presidential election. The poll indicated that Alfred Langdon would win the election with 370 votes; his opponent, Franklin D. Roosevelt, was expected to receive only 61 votes. In fact the election results produced 523 votes for Roosevelt; Langdon secured a paltry eight.

Clearly the poll was misleading. The error here was far from intentional and lay in the incorrect method of sampling employed. As opposed to sampling the views of the public in a conventional way, the pollsters decided to conduct a telephone poll.

In 1930s America the telephone was a rare commodity and only the very rich normally had access to one at home. These tended to support Langdon, which resulted in a bias in the poll results.

There are very often connections, in statistical terms, between sets of information which are not fully realised. In the previous example, the company conducting the poll had not appreciated that people who owned telephones were not representative of the country and that they were likely to provide a biased sample.

The same can happen in horseracing systems with third party variables affecting the outcome of the system for reasons the system developer does not fully understand.

This can lead to invalid systems which suddenly under-perform for no apparent reason. A change in race structuring may not appear to have any bearing on a system, yet this apparently unrelated variable could be affecting the results.

Furthermore, an understanding of the relationships can produce more meaningful analyses of the races and produce more reliable systems.

When developing the system, success rate may not appear to be important. After all the main aim is to return a good profit so providing this is achieved the rate at which the system identifies winning bets is irrelevant. However, once a system is running its success rate becomes a crucial factor.

The problem facing most system followers is whether to continue with a system that is performing poorly with long losing runs. Naturally this is more likely to happen if the system has a low success rate. For instance a system with a 10 per cent success rate is very likely to produce a losing run of more than 20 bets during a run of 200 bets.

The worst response to a losing run is to stop playing the system. Providing the system is well-founded, and adequately tested, there is no need to doubt that it will return a profit at the end of the season providing other conditions remain stable.

Systems have good and poor runs, this is only to be expected, so unless there are significant changes to the conditions that directly impact on the system it should be followed for the predetermined time period.

During the Foot and Mouth crisis in the UK of a couple of years ago, system players were well advised to stop following their methods, simply because this unexpected outbreak had a significant impact on the structure and results of races. Under normal circumstances this action should be avoided in order to give the method a chance to recover.

In order to avoid long losing runs, it is preferable to follow systems that possess a high success rate. As a guide, a 20 per cent success rate is considered a minimum, and one method to ensure this outcome is to restrict the system to qualifiers that are priced nearer the front of the market. As an example, a maximum price of the qualifiers can be set to a relatively low figure, such as 9/1.

Other methods include staking plans that reduce the stake for the longer priced qualifiers, however, these staking methods need to be fully accounted for in the system development and validation phases.

This is an extract from Peter May’s latest book Horseracing: A Guide to Profitable Betting (a collection of authoritative articles covering all aspects of betting and form analysis). It is available from www.racingpost.com/bookshop.

By Peter May

PRACTICAL PUNTING - JULY 2004