Chronic Disease Teaching Tools - Population Data

What population data are available?

  • In general, there are two types of population data that are useful in chronic disease and injury programs. First is data about the size of the population, which is used to calculate incidence, prevalence and mortality rates. The second is data about the sociodemographic make-up of a community, such as education, income and housing value information, which is used to assess community risk and to determine the size of the population, which is used to assess community risk and to determine the size of the target population for intervention programs.
  • Population size data can be counts, estimates or projections. In the United States, a 100% count of the population is performed every ten years (the decennial census). In the years when there is no decennial census, the size of the population is either estimated or projected. Projections are different from estimates in that they are based on previous patterns of change to predict what will happen in the future. Estimates, on the other hand, use the 1990 census counts and supplementary data to determine the size of a population in non-census years.
  • Estimates of the size of the population following the decennial census are called postcensal estimates. After each decennial census, population estimates for the years between two most recent censuses are prepared to replace the postcensal estimates. These intercensal population estimates are more accurate than postcensal estimates because they take into account the census population at the beginning and end of the decade.

What population data do DCDPAH programs use to calculate disease rates?

  • For 1990 and earlier years, Programs within the Division of Chronic Disease Prevention and Adult Health use the decennial population modified count and the intercensal population estimates released by the U.S. Census Bureau. For the 1990's, Division programs primarily use the postcensal population estimates released by the U.S. Census Bureau to calculate disease rates.
  • Because the U.S. Census estimates are not always as timely as some other sources, occasionally programs use estimates produced internally by the Health Department's demographer in the Bureau of Biometrics or estimates purchased by the Department from Claritas, which is a third party vendor.

What is the modified count?

  • After the 1990 census was conducted, some modifications in the age and race distribution were made to make these data conform to the definitions used in other data systems. Age was corrected because some respondents who completed the census forms after April 1, 1990 used their current age instead of the age they were on April 1. The result was an overestimation of age. In addition, there was some confusion about how to report the age of infants younger than 12 months old.
  • Race was corrected to conform to the Office of Management and Budget (OMB) racial groupings. A large proportion of respondents, particularly those who reported themselves as being of Hispanic origin, did not report a race which could be classified into one of the OMB groupings (White, Black, American Indian, Eskimo, Aleut, Asian, or Pacific Islander). These persons were assigned to a race category.
  • Neither the Census estimates nor the Biometrics estimates correct for under counting in the decennial census. Post enumeration surveys were conducted to analyze the degree of under counting, but the Census Bureau did not elect to change the official counts.

What are the population estimates based on?

  • To come up with the postcensal population estimates, the Census Bureau uses the decennial U.S. census modified count of the population and supplementary data to derive the estimated population for an area through computer modeling techniques. The supplementary information, which the Census Bureau calls components of change, includes births, deaths, domestic migration, and international migration. Births and deaths are counted from the birth and death certificates in every state. Domestic migration is estimated from federal income tax and social security data. International migration is estimated using data that accounts for legal immigration, federal citizen emigration, Puerto Rican migration, refugees and undocumented immigration. Other state specific data, such as school enrollment information, is also used to determine the state populations. Because the Census Bureau must wait until this supplementary information is compiled, the currency of the estimates depends on the degree of detail desired. In general the total population of the U.S. is made available before the state or county population estimates. Estimates for counties by age, sex, race and ethnicity take the longest.
  • The Bureau of Biometrics postcensal population estimates are based on much of the same data as the Census Bureau estimates, but use a slightly different methodology to determine the population of each geographic area. The Census Bureau determines the total population of each state based on the total population of the nation and the components of change in the states. They then apportion the state population to each county. The Bureau of Biometrics method instead determines the population of each county through modeling and then sums the counties to determine the statewide population. Because the Census predetermines what the state population should be, the estimates tend to be lower than the Bureau of Biometrics. In addition, the data available for the components of change may be different. Bureau of Biometrics estimates are a year or so more current, but they do not include any data on race or ethnicity.

Does it make a difference which estimates are used to calculate disease rates?

  • All data concerning the size of the population are subject to error. The farther removed the estimates are from the 100% count of the population, the larger the degree of error will be. This means that estimates for 1991 are probably more accurate than estimates for 1997. In addition, estimates concerning smaller populations will be affected by error much more than estimates of large populations. This compounds the instability of incidence and mortality rates for smaller populations, which also have fewer incidence cases and deaths to base the rates on.
  • "Error of closure" is the difference between the estimated population at the end of a decade and the census count for that date. The error of closure at the national level was quite small during the 1960's (379,000 persons). The error of closure for the 1970's, however, amounted to almost 5 million persons. For the 1980's, it was 1.5 million persons.
  • Small differences between the population estimates usually would not create meaningful differences in the rates of disease at the state level, but may make a difference for less populated counties. In general, the estimates produced by the Bureau of Biometrics tend to be higher than those produced by the Census Bureau, which would cause the incidence and mortality rates to be lower.
  • Whenever possible, the same source of denominator should be used if comparisons are desired. When comparing New York to other areas of the country or world (i.e. breast cancer mortality in New York compared to the national average), rates based on the U.S. Census Bureau figures are most appropriate. When comparing different outcomes within New York, (i.e. cardiovascular mortality versus injury mortality), either the Biometrics or Census based denominators can be used, so long as there is consistency.
  • If you have access to the Internet, the most readily available and comprehensive source of population and demographic information available is the U.S. Census Web Site, at The Census also releases data on CD-ROM and computer tapes.
  • Population can also be purchased from proprietary sources, such as Claritas. Additional sources of population data may be available for local areas. These include estimates from county Planning Boards and utility companies.

What data are available about the demographic make-up of a community?

  • The U.S. Census Bureau puts a great deal of effort into collecting and distributing data about the population. In addition to the decennial census, they conduct many surveys of the population. The surveys usually are designed to get national estimates of social and economic trends, and therefore are not always valid for determining state or local level estimates. Information about these surveys and some data query applications can be found at the Census web site.
  • For local level data, the decennial census (1990 census) is the most widely used source of demographic information. These data include population characteristics (e.g. age, gender, race, ethnicity, employment, education, military service, place of birth, migration) and household characteristics (e.g household income, rental status, age of building, year moved in, water source). The census data are released to the public in two forms: the Summary Table Files (STF) and the Public Use Microdata Sample (PUMS).

What are the Summary Table Files (STF)?

  • These files contain frequency and magnitude data in the form of precalculated tables pertaining to person and household characteristics. Frequency data means the data are in the form of counts (i.e. the number of children ages four to nine years old in the county). Magnitude data consists of summary statistics (i.e. median household income or percent of persons below the poverty level). These tables are calculated for many geographic levels: nation, state, county, standard metropolitan statistical areas (SMSAs), congressional districts, town and villages, census tracts, and block groups. STF1 contains data compiled from the short form, which was completed by every household (100% count). STF3 is compiled from the long form, which is a representative sample of the population, completed by one out of every six households (17 percent) in areas with 2,500 or more people and every other household (50 percent) in areas with fewer than 2,500 people.
  • There are also estimates available for zip codes (STF3B). The zip code estimates were derived after the fact by combining block level data to re-create the 1990 zip code boundaries. The re-created boundaries do not necessarily match the service delivery areas for a particular zip code. At times, entire sections of zip codes were assigned to the wrong area. In addition, zip code boundaries are not static like political or census-designated boundaries, so changes in service delivery areas cannot be taken into account. These problems have a large impact on the calculation of disease or injury rates at the zip code level. When determining the sociodemographic make-up of a community, however, problems with the zip code boundaries not matching exactly have less of an affect.

What are the Public Use Microdata Samples (PUMS)?

  • The PUMS data are individual level data compiled from the census long form. Unlike the STF files, these data are not pretabulated, so the user can get estimates of characteristics or the population which are not available from the STF files. For example, a user might want to determine the number of women age 40 and older who have family incomes of 250% of the Federal Poverty Level or below. This is not available from the STF files, which do not have a table listing age by sex by poverty level.
  • For confidentiality reasons, the Census Bureau has only released a sample of the records from the long form (5% sample). In addition, the smallest geographic level available is for places with 100,000 or more persons. If a county or city does not have at least 100,000 residents, the data are combined with a neighboring area. When possible, areas with similar demographic characteristics were grouped together.

Where can I go for more information?