Frequently Asked Questions (FAQ) on ITS Data Quality
Special for the ICDN Newsletter
by
James Pol, Transportation Specialist
USDOT ITS Joint Program Office
Last Updated 1/31/04
- When you talk about "ITS data," what types of information are you talking about, and where does that data come from?
- What is "data quality" and why is it important?
- What application does Data Quality affect the most?
- What other applications does data quality affect?
- What is a typical institutional barrier to achieving appropriate data quality?
- What are some near-term payoffs for improving data quality at TMCs?
- Does poor public sector data quality affect only the public sector?
- How can one describe and measure Data Quality?
- What actions have been identified to address substandard Data Quality?
- What new initiatives are ongoing to address data quality, and how can I participate?
1. When you talk about "ITS data," what types of information are you talking about, and where does that data come from?
ITS Data can be generated from a variety of sources to serve the needs of many applications. Many agencies seek to maximize the utility of data across multiple applications as a means to control the costs of data collection. As one can tell, the scope of the problem is wide-ranging, so FHWA and its stakeholders are focusing first on traffic-related data. The most common traffic data types are volume, vehicle classification, truck weight, occupancy, speed, travel time, and queue length. Not all of these data types are collected in the same way, and often there is a certain amount of calculation required to produce some of the data types, particularly speed and travel time.
2. What is "data quality" and why is it important?
Data quality is an issue that pervades every application of ITS. It challenges the credibility of the information products that agencies produce, and it has a significant impact on the private sector market opportunities for data. Data quality is an issue that is equally technical and institutional, equally equipment installation and equipment maintenance, and equally coverage demand and resource availability.
3. What application does Data Quality affect the most?
Traveler information, or Advanced Traveler Information Systems (ATIS), has been the focus of attention of FHWA and its stakeholder agencies. The universal axiom of "garbage in-garbage out" quickly conveys the impact of data quality; in other words, traveler information systems based on inaccurate or obsolete data will not be helpful to anyone. Of course, the data quality issue is more complex than that. Customer service remains at the core of traveler information products. For instance, some agencies have been conservative in deploying traveler information services such as 511 because they are concerned about the quality of their available data related to roadway congestion. There is no effective market for services built upon poor data quality.
Data quality guidelines were identified as part of a workshop investigating the data gaps for ATIS. The workshop report, Closing the Data Gap: Guidelines for Quality Advance Traveler Information System Data (2000), identifies guidelines for attaining good, better and best quality data from traffic sensors and incident/event data sources.
The issue of travel time calculation and accuracy has been the subject of exploration recently. The Travel Time Data Collection for Measurement of Advanced Traveler Information Systems Accuracy (2003) report explores the accuracy issue. This report also explores the extent of coverage and the accuracy that is achieved.
4. What other applications does data quality affect?
A recent Transportation Research Circular titled The Roadway INFOstructure: What? Why? How?(2003) documented a workshop that explored the elements necessary to realize the "Infostructure." Establishing standards for data collection and quality, making the most effective use of available data, and ensuring data accuracy and reliability were identified as key issues throughout the document. That workshop affirmed the pervasive impact that data quality can have upon users.
5. What is a typical institutional barrier to achieving appropriate data quality?
A common comment that practitioners in Transportation Management Centers (TMCs) make is: "I only need enough quality to group the data into 3-5 broad 'bins' for color coding traffic maps -- providing greatly improved data to others is not feasible, since planners, researchers, and commercial reporters don't pay any of our expenses to install and maintain traffic detection equipment." Limited public agency funding often forces TMCs to conservatively manage their funds. Although TMC managers often recognize that there is broad potential for data sharing, they may remain concerned that there are too many "customers" for them to manage.
6. What are some near-term payoffs for improving data quality at TMCs?
Better data quality would be helpful to TMCs to provide more accurate travel times on dynamic messages signs (DMS), for ramp metering, for incident/workzone analyses, for more credible reporting of their performance measures to senior leaders, and for better integration with the planning process that would likely lead to better support of traffic management projects and funding.
7. Does poor public sector data quality affect only the public sector?
Emerging opportunities for the private sector to collect data and develop and market effective traveler information products are often dependent upon the availability of good quality public-sector data. To gain market share, private-sector ATIS vendors must produce consistently reliable and credible information. These firms often depend on public-sector data, which complements and validates their own privately produced information. Thus, poor quality public sector data can prevent these firms from gaining traction in the marketplace and adversely affect customer retention and new subscriber growth.
8. How can one describe and measure Data Quality?
A white paper published by the Federal Highway Administration, Defining and Measuring Traffic Data Quality (2002), recommended these six traffic data quality measures:
- Accuracy – The measure or degree of agreement between a data value or set of values and a source assumed to be correct. Also, a qualitative assessment of freedom from error, with a high assessment corresponding to a small error.
- Completeness (also referred to as "availability") – The degree to which data values are present in the attributes (e.g., volume and speed are attributes of traffic) that require them.
- Validity – The degree to which data values satisfy acceptance requirements of the validation criteria or fall within the respective domain of acceptable values.
- Timeliness – The degree to which data values or a set of values are provided at the time required or specified.
- Coverage – The degree to which data values in a sample accurately represent the whole of that which is to be measured.
- Accessibility (also referred to as "usability") – The relative ease with which data can be retrieved and manipulated by data consumers to meet their needs.
An FHWA-led initiative is now underway to develop guidelines for data quality measures based upon these criteria. The final report from this initiative should be available in the summer of 2004.
9. What actions have been identified to address substandard Data Quality?
In early 2003, the FHWA conducted two regional workshops with a representative group of agencies to identify the scope of data quality issues, and to learn what could be done to ensure consistent levels of data quality. The report Traffic Data Quality Workshop Proceedings and Action Plan (2003) summarizes the workshop discussions, and identifies the following recommended actions:
- Develop guidelines and standards for calculating traffic data quality standards
- Synthesize data validation procedures that are employed among the states
- Prepare a best practices document on the installation and maintenance of monitoring devices
- Establish a clearinghouse for vehicle detector information
- Conduct sensitivity analyses on how data quality affects distinct user applications
- Develop guidelines for sharing resources for traffic monitoring
- Create a life-cycle cost methodology for data collection
- Prepare guidelines for innovative contracting approaches for traffic data collection
- Conduct a case study to evaluate the return on investment realized from improved data quality
- Provide guidance on innovative and emerging uses of existing monitoring technologies to stretch the life cycle value
- Prepare guidance on quality parameters for key data elements
10. What new initiatives are ongoing to address data quality, and how can I participate?
ITS America will be hosting a national Data Quality Workshop on Feb. 23-24, 2004 in Houston, Texas. The goal of the workshop is to prioritize the actions identified in the 2003 workshops (described above) and to elicit guidance on what policies and procedures public agencies should consider incorporating into their everyday practices. While the number of workshop attendees is limited, representatives from interested agencies or private-sector data providers can inquire about participation by contacting James Pol (see contact information below).
Additionally, ITS America is forming a Special Interest Group on Data Quality under its Information Forum to continue to address the issue and to make recommendations to the ITS Community.
For more information regarding these activities, please contact James Pol of the FHWA ITS Joint Program Office (James.Pol@fhwa.dot.gov, 202- 366-4374) or Ralph Gillmann of the FHWA Office of Policy (Ralph.Gillmann@fhwa.dot.gov, 202-366-5042).
