AVM Testing: A Crash Course in Best Practices
By Lee Kennedy | December 13, 2006
What does AVM testing generally entail?
Good testing requires several key steps. First, testers must find a source of clean data to be tested. The industry-accepted practice is to use recent purchase properties, as they represent an arm’s length transaction between a willing seller and willing buyer through a negotiated process that derives a true market value for the property. Other types of transactions can have pressures that skew or bias the value, rendering it unusable as a benchmark value for testing purposes. Lenders who have or purchase a production origination source can tap that data, which is anywhere from one to four weeks old. However, for even the largest lenders, this process often does not yield adequate numbers for a statistically significant sample size.
For lenders and users of AVMs who do not have access to a purchase pipeline, the use of an independent source for this purchase data may be an option. Above all, the data must be current, so that little of it has made it to the public records system.
Next, send the property addresses, without the sales price, to each AVM vendor whose model(s) you wish to evaluate and have them provide a predicted value as well as other data points for each of the properties. At this point, an analysis of the difference between the sales price (benchmark value) and the AVM-predicted value must be completed. This analysis determines which AVMs perform well under different conditions. From that analysis, you can establish the order in which the AVMs will be used and under what scenarios. This is a very simplified explanation of a testing process. Normally, we’d take a least a day of lecture to adequately touch all the salient points.
How can AVM testing be made indicative of true AVM performance?
One of the most important issues is the currency and size of your test data set. If the data you provide to the AVMs as part of the test is old or seasoned, then there is a very good chance that the models will have access to those properties and will, effectively, know the answers. So, current purchase data is paramount in getting a test off on the right foot.
Secondly, use a large sample size. For the statistically inclined, use as large a sample as possible so as to best reflect the population. The more data you can provide and the closer it represents the lender’s geographic footprint, the more closely it will approximate the lender’s origination or production.
If these first three issues are covered, then you can move on to the output metrics used to evaluate the performance of the models. No one model performs perfectly in all situations, so the analysis performed must identify where each model performs well and where it falls short. Accomplishing this will allow you to apply the correct AVM under the right set of circumstances and achieve, in production, what you observed in testing.
What are some areas where there’s wide variation among AVM products or providers?
The most common parameters under which AVMs perform differently are geography and price tiers, or range. For a given county, as an example, different models can have radically different hit rates and accuracy. In addition, similar overall performance for models may differentiate when a housing price tier or housing type is introduced. Most of this variation is driven by the source and completeness of the data being employed by the models and how the models’ underlying algorithms are tuned.
Another substantial variation in model performance is in the correlation of the model’s confidence score to the predicted value. Many in the valuation industry are discussing the standards surrounding the various confidence scoring schemes.
How much do reputable AVM providers test their own products to ensure the highest accuracy, hit rate, or confidence score?
First, let me say that all the AVM vendors with whom we work are reputable. Second, the AVM vendors constantly test their models so they can adjust the models to reflect the most current conditions affecting property values. A model builder would be better suited to answer this question, however, I have observed through our testing programs that ongoing testing or validation of model performance by the vendors in some aspects mirrors what any good testing program entails. Their process is very similar to how we at AVMetrics independently test models.
Modelers will hold the incoming purchase data to the model as an out-of-sample data set and run the model against those address. They then compare the differential between the predicted value verses the actual sales price and make corrections to the model based on those observations. Most models do this if not on a daily basis, at least on a frequent basis.
Do most AVM providers guarantee a certain level of accuracy or successful hit rate?
It would be very difficult for an AVM vendor to guarantee the performance of a model or the correlation of a given confidence score to a predicted value except in a general sense. Think of the FICO score as an analogy.
The FICO score predicts the chance of a borrower failing to make his payments as a percentage of the population in that score range. AVM value, when correlated to a confidence score and validated and standardized through a testing process, can deliver a value in much the same manner as the FICO score, predicting the accuracy of a value within a specified range or percentage of accuracy. In general, when AVMs are created, they reflect the best fit to the underlying data used to build them. By definition, all models are imperfect. Therefore, they will never be a perfect predictor of collateral value in all situations, nor is a traditional appraisal.
How do financial institutions qualify the AVM providers they use?
Lenders should qualify the AVM providers through a series of steps based on their intended use of the model. Some of these steps that would apply across most uses for AVMs are listed below:
- Define how you are using or plan to use AVMs in your different lending channels.
- Ensure that your integration solution is flexible and scalable enough to meet your current and forecasted needs.
- Get the training you need to understand how to effectively employ AVMs in your processes.
- Perform due diligence on the potential vendors, which would include pre-testing engagement questions about how the models work, about the vendor’s data sources, how frequently the models and data are updated, and so forth.
- Thoroughly test the models available in your geographic lending areas
- Employ a rigorous back testing or audit program to ensure that the performance expectations established through testing are reflected in the production environment.
As you can see from this list, most of the qualification criteria for the use of AVMs rests within the lending institution and not with the AVM provider.
Are there different levels of AVM testing based on which products the AVMs are used in conjunction with? For example, do you test differently for conforming properties and HELOCs or other uses?
The AVM is blind to what loan product it is being used for, therefore, the process for testing AVMs is the same regardless of the final application. However, once the models have been evaluated and deemed acceptable for use, the lender must develop risk-based rules which, among other things, define when an AVM can be used and under what conditions a value is accepted or declined.
Lenders must also define the use of multiple AVMs for a certain area, property type, or price tier within a geographic area. This introduces another layer of complexity when a second or third AVM is allowed to run for the same transaction. We refer to the use of more than one AVM per a single transaction as Cascading Logic. These decisions usually fall under the purview of risk management rather than a question of how thoroughly the AVMs are tested.
Do you think the industry will develop a set of uniform standards or methods for AVM performance testing? How is this effort progressing?
Industry best practices for AVM testing have been becoming better defined for the last several years and will probably continue to become more defined in the near future. The world of AVMs continues to be both collaborative and contentious. Several advisory groups have been formed in the last four years, with the two most prominent being the Joint Industry Task Force (JITF) and the Collateral Assessment and Technology Committee (CATC).
Recently, the Mortgage Banker Association released a draft on AVM standards and appears to be ready to form a subcommittee. In addition, venues such as the Predictive Methods Conference allow AVM users, AVM vendors, rating agencies, MI companies, regulatory agencies and investors to share information about testing and evaluating the different models.
Lenders, vendors, and advisory groups seem to have reached a consensus concerning a basic set of quantitative and qualitative metrics which virtually everyone agrees should be used to measure AVM performance without going as far as the adoption of a single metric. For the most part, yes, there is a fairly uniform set of metrics used to evaluate the models.
Is testing across the industry pretty standard, or does every institution put its personal stamp on the process? Do some institutions do no testing at all?
Is testing a standard practice for the users of AVMs? Well, yes and no. The larger, more visible lenders all have some degree of AVM testing in place, but that falls off rapidly outside of the top 20. Are there lenders out there using AVMs to directly fund loans or to qualify values from wholesale or correspondent channels that have no testing program in place? The short and scary answer is yes. Are there common best practices for testing? Yes, but the adoption is spotty at best.
As the regulatory bodies become more involved, specifically with regulated lenders outside of the top 20, I predict increased guidance concerning the level and frequency of model testing and evaluation that should be completed. In the future, this regulatory guidance will prove to be the biggest driver for the standardization of AVM testing and use.
Is there some level of AVM testing or documentation that is mandatory or stipulated by a regulatory body?
At this point in time, the OCC’s primary guidance on model testing and validation is OCC 2000-16. In essence, this document outlines the basic steps necessary to validate model performance. For any lending institution developing an internal or outsourced testing program, this is where to start. The OCC provides additional guidance with bulletins 2004-59 and 2005-22. Each of these documents provide additional insight into the direction that the regulatory bodies are heading.
About Lee Kennedy
Lee Kennedy is the founder and managing director of AVMetrics. Kennedy founded the company in 2005 in response to a need for true independent testing and auditing of AVM models. The company’s services are designed to assist clients with the technical and subject matter support necessary for the use of Alternative Valuation Products, specifically the use of Automated Valuation Models (AVMs).
Before founding AVMetrics, Kennedy was a vice president and alternative valuation products manager at Washington Mutual.