Question:
What is the best address standardization and geocoding software available?
Aaron A
2006-04-11 10:04:29 UTC
What is the best address standardization and geocoding software available?
Six answers:
fop_5
2006-04-11 10:18:37 UTC
Background

The intent of this guideline is to be a technical reference for geocoding street addresses, and to provide some background on the technologies used by the Washington State Department of Health (DOH). The DOH Division of Information Resource Management (DIRM) currently provides address standardization and geocoding services. These services are available to all divisions of DOH as well as other State agencies, Local Health Jurisdictions and other health related agencies. DIRM attempts to provide the highest quality and number of street level address matches. To this end, DOH has entered into data sharing agreements with many Washington State counties to share accurate street and parcel ownership data. Combining these data with commercially available data allows DIRM to maximize the number and quality of matches. Appendix A provides definitions of technical terms used in this guideline.



Address Standardization

Address standardization takes a street address and ZIP code and attempts to correct misspellings and changes in ZIP codes. The address is parsed into standard pieces including the house number, street name, direction prefix, direction suffix, and the street type. Once these parts of the address are created the values are then standardized (e.g. AV becomes AVE, LP becomes Loop). Currently we use the Centrus software from Group 1 Software, Inc. http://www.centrus.com. The data used by Centrus is proprietary and comes from both the USPS and Geographic Data Technologies. The data are updated quarterly and the standardized addresses are CASS certified by the U.S. Postal Service (USPS) for bulk mailing rates. The Centrus software compares addresses to a USPS national database. This step is critical for increasing geocoding match rates. (See Appendix B for examples.)



Address Matching

Address matching is the process of matching the street address and ZIP code in the original dataset to another address and ZIP code. Typically, the second address and ZIP code represent street centerlines or ownership parcels. The street centerlines can have address ranges and ZIP codes assigned to each side of the street. The ownership parcels have a single address and ZIP code assigned to a point.



Geocoding

There are three main types of geocoding functions. The first type assigns latitude and longitude to a street address that has been matched to a street centerline or ownership parcel. This is the only type of geocoding covered in this guideline. These addresses can then be displayed as points on a map, or aggregated to larger areas (e.g. city limits, wellhead protection areas, school districts). For example, this type of geocoding can be used to show points on a map for all the addresses in the Washington State Cancer Registry. CAUTION: In general, the latitude and longitude at which a health event occurred are confidential information. Just as publishing someone’s address is most often a violation of confidentiality, data users need to be sensitive to the scale at which they display dots representing health events on maps. Before disseminating such maps they need to be sure that this method of visualizing data does not violate confidentiality.



The second type of geocoding is used for data without a street address. If the data in the original dataset has a geographic reference (e.g. ZIP code, county, U.S. Census tract) it can be geocoded to those geographic features. The data can be displayed as counts in graduated colors on a map. For example, survey results that contain only ZIP codes can be shown on a map, by the number of results in each ZIP code.



The third type of geocoding is used for data without a street address or a specific geographic reference. This requires a common link between the data in a given data set and an existing geographic feature. For example, a data set that contains a hospital name and bed capacity can be shown at the hospital locations on a map. This is accomplished by linking the hospital name with previously geocoded hospitals that also contain the name.

The Importance of Geocoding

The majority of data that DOH uses has an address or other geographic reference. It has also been estimated that over 90% of corporate America’s data has some kind of geographic reference. Geocoding allows DOH to use these data to display health-related information on maps and to conduct geospatial analyses to determine whether there are geographic patterns in rates of health-related events. For example

determining rates of health-related events by county may require geocoding;

investigating disease outbreaks and potential clusters requires accurately geocoded locations; and

geocoding to larger areas, like ZIP codes, allows sensitive data to be displayed while maintaining confidentiality.

A partial listing of geocoded data used at the DOH includes vital records, cancer registry data, daycare facilities, cases of sexually transmitted disease, tobacco retailers, schools, hospitals, pharmacies, hazardous waste sites, and drug labs.



Geocoding Software

DIRM staff evaluated five software vendors for accuracy and overall match rates, ArcView, Centrus, MapMarker, GeoVista and Maptitude. After detailed benchmarking in 2000, it was still unclear which software performed the best. While some software, such as Centrus, provided address standardization that improved the match rates, the quality of the underlying data seemed to make the most difference. DIRM decided to use the combination of Centrus and ArcView GIS.



Street Centerline Data

DIRM staff evaluated a variety of street centerline data, U.S. Census TIGER 2000-1992, ESRI Streetmap, GDT Dynamap 2000 and Navigation Technologies. These data sets were provided by the U.S. Census or purchased from commercial vendors. While no data set was complete, Navigation Technologies was the most accurate and complete for the entire state. Local level street data were also evaluated. The overall accuracy of street data obtained from counties and cities is higher than the other statewide data sets. Appendix C shows the counties in Washington for which we have digital data available for streets or parcel centroids. Since no data set was complete, DIRM recommends using more than one source.



Accuracy

The process of address matching and geocoding involves many variables that affect the accuracy of the results. Below is a partial list of some of the potential inaccuracies.



•The input address or ZIP code is incorrect.

•The address standardization software incorrectly parses the address or ZIP code.

•The street centerline attribute data may be incorrect for the address range, street name or ZIP code.

•A street may be “flipped” so the address is placed on the wrong side or at the opposite end of the street. This can place a geocode in the adjacent U.S. Census tract or even county.

•The various street and parcel data files do not exactly overlay with U.S. Census tracts. The boundaries of the tracts are based on TIGER streets. Latitude and longitude may be more positionally accurate than the TIGER data resulting in tract assignments that are incorrect.



Each successful geocode generates a match score (called “Av_score”) that reflects the accuracy of the match. Match scores range from 70 to 100. A score of 100 indicates that after the geocoding software parsed the address, a street or parcel was found where everything matched. A score of 0 indicates a centroid match or an unmatched address. Appendix D contains some examples of address matches and the assigned scores.



CAUTION: Rates that are based on geocoded data can change significantly over time. For example, rates based on data geocoded in 2000 could differ from rates if the same data were geocoded in 2003. As data and technology improve, both the number and accuracy of matched records is expected to increase, and this might affect rate calculations. Thus, it is important to assess the proportion of geocoded records and the accuracy of the matches when interpreting rates or other statistics based on geocoded data. (See Using Geocoded Data.)



Using Geocoded Data

In order to use the geocoded data, especially at relatively small geographies such as the sub-county, there must be a way to evaluate the accuracy of each geocode. At a minimum the “Av_score” field in the output file should always accompany the output data. For example, when there are no street centerline or parcel matches for the street addresses in the Washington State Cancer Registry, DIRM uses the 5-digit ZIP code or city name to assign addresses to the centroid of a ZIP code, city, or populated place. This process maximizes the number of records that can be assigned to a county and is useful for county level rates and reports. The user can use the “Av_score” to know which records were geocoded using centroids and which were matched at the street level. These centroid geocodes may not be appropriate for small area analysis like cluster investigations or census tract level analysis. (See Processing Unmatched Addresses.)



Current geocoding process at the DOH



DIRM uses the Centrus software to perform address standardization and ArcVIEW software to perform the geocoding and the assignment of spatial attributes. This process is automated using the Avenue scripting language inside ArcView. This allows the use of multiple street and parcel datasets. The accuracy and source of the geocodes are also tracked. See Figure 1 for an overview of this process.



Address Standardization

1.Address data are provided to DIRM in a digital format (i.e. Access, ASCII, dBase).

2.The addresses are standardized using the Centrus software to fix misspellings, and ZIP code errors. (See Appendix B.) Centrus also attempts to geocode the addresses, these are used as approximate matches (step 8) below.

Address Matching

3.Inside ArcView, the tolerances are set to accept only close matches.

4.The original addresses are matched to street centerlines using the following data sets. Once a match is made the address is not used for the next data set.

•Local Government streets or parcel databases. See Appendix C.

•NAVTEQ GPS Streets, Navigation Technologies

•Streetmap 1000, Environmental Systems Research Institute (ESRI)

•TIGER 2000, U.S. Census Bureau

•TIGER 1998, U.S. Census Bureau

•TIGER 1998, U.S. Census Bureau (up to 10 additional address ranges per street segment)

•TIGER 1995, U.S. Census Bureau

•TIGER 1992, U.S. Census Bureau

5.For records that are not matched in Step 4, Step 4 is repeated using the standardized addresses. Tolerances continue to be set to close. This is done after Step 4, because we first want to use the original address exactly as it was entered.

6.Inside ArcVIEW, the matching tolerances are set to accept “approximate” matches only.

7.Steps 4 and 5 are repeated for records not matched in Steps 3 – 5.

8.If Centrus geocoded any addresses that ArcVIEW did not, they are included as approximate matches.

Geocoding

9.Inside ArcView, the latitude and longitude are calculated for each matched address. This estimates the coordinates by averaging along a street segment and applying a 30’ offset from the centerline or using the centroid of a parcel.

Assigning Attributes

10.Each matched address is assigned U.S. Census attributes and other geographic values. This is accomplished by comparing the latitude and longitude to other GIS spatial layers, using a point-in-polygon operation.

11.Two output files in dBase format are created containing the matched addresses (with additional attributes) and the unmatched addresses. See Appendix E for the file structures.

Processing Unmatched Addresses (not automated)

Depending on the data type, intended use, and the number of unmatched records there are other options for geocoding.



If there are only a few unmatched records, interactive matching can be completed using GIS software like ArcView. If the user does not have GIS software, the following link provides for simple geocoding through a Web browser interface: http://ww4.doh.wa.gov/scripts/esrimap.dll?Name=geoview&Cmd=Map. The user will need to edit the output files by hand to add the appropriate attributes.



If there are a large number of unmatched records, the ZIP code or city name can be used instead of the street address. If a match is found, the center (or centroid) of a ZIP code or city is used to calculate the latitude/longitude. Using this approximate location, U.S. Census and other geographic values are assigned. These types of matches can be used at the level at which the match occurs or at larger aggregations, but will not be accurate for other purposes. Centroid matches are not included in the DOH’s standard process, but are used with selected data sets, such as the Washington State Cancer Registry.



http://www.esri.com/partners/alliances/group1/
anonymous
2016-12-30 12:17:57 UTC
Address Standardization Software
?
2017-01-22 05:11:27 UTC
1
Patricia
2016-03-14 02:00:53 UTC
For a long time, I've debated skeptics as well as non-skeptics on what I think astrology is. I don't think it can be compared to science as defined by the strict rules used in scientific research. I'm not saying this in a derogatory way, instead I'm trying to clarify that the science world has made the definition of what defines a science, not I. I think astrology can only be described as astrology, and not pigeon-holed into some category of science, religion, mysticism, where it finally finds a home agreed on by everyone. Astrology is astrology. I'm say all this because I've attended astrology meetings at the Rosicrucian Society and the American Federation of Astrologers. At these gathering, I learned first hand just how much disagreement there is between what astrologers and enthusiasts think is accurate. I think the closest that astrology can come to standardization is when someone experienced writes a book, and it becomes the respected book of most persons into astrology. I belong to a group involved with new types of medical research and treatments for endocrine cancers. When I have attended meetings regarding this area of science, the bickering and disagreements are about as livid as the ones at the astrology gatherings. Even in science, there are areas where no standardization can be reached. There's some modern and more accurate tests for types of endocrine cancers, yet these tests are still not considered the "Gold Standard" of test though some experts in the field feel that fact should be obvious. Other researchers and doctors say that using a "Gold Standard" of testing, and ignoring the older tests, might possibly miss unknown types of endocrine tumors, and that would become a great problem. Again, I'm not comparing astrology and science. I'm saying that there are certain subjects which can't be successfully standardized to suit everyone. Astrology is a hobby for me, but I've studied it for a long time. One thing that annoys me about most astrology books is lack of a reminder that an astrology chart is not describing only the person's past and present. I've done charts for people, and told that a certain situation I'm describing is not something they have done, or what they have been. I remind them that the chart describes for a lifetime. So if you are asking for a standardization in the hopes that more people will find it easier to study, rather than relying on stereotypes, that is not likely either. Most people want information now, now, NOW, and don't really want to read the details.
stayer
2016-11-14 07:29:58 UTC
Geocoding Software
myrtguy
2006-04-12 04:25:54 UTC
That is some answer ... I'd just go with Arc View.


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...