EARN 201: Using microdata, what makes a great “State of Working X” report, and other data tools

EARN 201:

Using microdata, what makes a great “State of Working X” report, and other data tools

Janelle Jones

Economic Analyst
Economic Policy Institute/ Economic Analysis and Research Network
Friday, December 16, 2016

Data Sources

ACS – American Community Survey (sub-state income and poverty, more robust state-level data)

CPS – Current Population Survey (historical state- and national-level survey back to 1979)

  • March CPS – Also known as the Annual Social and Economic Supplement (state and national poverty, annual income, health insurance, unionization)
  • CPS ORG – Outgoing Rotational Group
  • Basic Monthly – Source of labor force stats at state level

BLS – Bureau of Labor Statistics (state and national jobs, unemployment)

SWXX – State of Working X (pre-created tables utilizing all of the above sources)

EARN 201 Data Sources


ACS March CPS CPS ORG SWX Spreadsheet
National X (CPS recommended) X X
State X X X
Sub-state X
Annual Income
National X X X
State X X X
Sub-state X
Hourly Wages
National X X
State X X
Health Insurance
National X X
State X X
National X X
State X X
National X X X X
State X X X X
Sub-state X
Unemployment Rate
National X X
State X X
Sub-state X


Current Population Survey

The Current Population Survey (CPS) is the most commonly used data source by EARN groups. Most of the SWXX data is from the CPS. The CPS is a monthly survey of about 50,000 households conducted by the BLS.

The CPS is the primary source of information on the labor force characteristics of the US. Respondents are interviewed to obtain information about the employment status of each member of the household 15 years of age and older. However, published data focus on those ages 16 and over.

Estimates obtained from the CPS include employment, unemployment, earnings, hours of work, and other indicators. They are available by a variety of demographic characteristics including age, sex, race, marital status, and educational attainment. They are also available by occupation, industry, union status, and class of worker.

Accessing EPI CPS Extracts


to be deleted

EARN Data Files

EARN data files



Years available: 2000-present

Search for variables by: household, person, alphabetically, search bar

Options for each variable: description, codes, comparability, universe, availability, questionnaire, text flags

IPUMS FAQ page is a worthwhile resource.

Other Sources for Microdata Acess

CPS website: directly download raw microdata: http://www.census.gov/programs-surveys/cps/data-detail.html

Data Ferrett: Create basic tables, or extract the variables you need for a microdata run: http://dataferrett.census.gov/

CPS Table Creator: Similar to data ferrett, but much easier to navigate: http://www.census.gov/cps/data/cpstablecreator.html

NBER Website: Complete CPS extracts back to mid-1970s: http://www.nber.org/cps/

State of Working X Data

The State of Working X project is an annual data dump that EARN staff provide to EARN groups as the basis for writing their State of Working X reports. These super-charged excel workbooks contain state-level runs of March CPS, CPS ORG, CPS Basic Monthly, ACS, CES, and other Census and BLS data.

What’s available?

National and State Level

  • SWX-Wages: Wages by decile, race, sex, age, education, union status from 1979-2015
  • SWX-Jobs: Monthly jobs by industry from 1990-June 2016
  • SWX-Labor Force: Labor force statistics (including unemployment rates, labor force participation, etc.) by age, gender, race, and education
  • SWX-Misc: Annual income, poverty, health insurance rates, pension coverage, unemployment insurance recipiency and exhaustion, union coverage, and state GDP from the March CPS and ACS

State of Working X Data


BLS Website

State level

National level

Census Website

National and State level

Each year, usually in the fall around “Poverty Day”, the Census Bureau releases the March CPS and annual ACS data. This data details income, poverty and health insurance data. The March CPS provides national and state-level breakdowns of income, poverty, and health insurance data. The ACS allows for state and sub-state comparison of income and poverty data.

Example .do files

Thank you!

Economic Policy Institute: epi.org

Get this presentation at: go.epi.org/______

Appendix Slide: What  years should I compare?

The short answer is that id depends on what you are trying to prove. What makes sense in one paper or state may not make sense in another. You should use your knowledge of your state’s economy to help guide your choice of years.

However, a general rule of thumb is to compare time at consistent points in the business cycle. Economic indicators fluctuate considerably with short-term swings in the business cycle. For example, incomes tend to fall in recessions and rise during expansions. Therefore, economists usually compare business cycle peaks with other peaks and compare troughs with other troughs so as not to mix apples and oranges. In SWA, we examine changes between business cycle peaks. At the national level, those years are 1979, 1989, 2000, 2007, and 2015.

In some cases, it is desirable to separately present trends for the 1995-2000 period in order to highlight the differences between those years and those of the early 1990s and earlier business cycles. This departs from the convention of presenting only business cycle comparisons or comparisons of recoveries. We depart from convention because there was a marked shift in a wide variety of trends after 1995.

It should be mentioned that states have slightly different business cycles from the US as a whole, and these paeks and troughs should be taken into consideration when making comparisons.

Appendix Slide

Statistics available in SWXX

Each of the seven statistics available in SWXX —Labor Force is a proportion, so it is important to understand exactly what the numerator and denominator represent.


Labor force participation rate = Civilian labor force / Civilian non-institutional population, ages 16+
Employment / population ratio = Employed / Civilian non-institutional population, ages 16+
Unemployment rate = Unemployed / Civilian labor force
Long-term unemployment share = Long-term unemployed / Unemployed
Underemployment rate = ( Unemployed + Marginally attached workers + Part-time for economic reasons )  /  ( Civilian labor force + Marginally attached workers )
Part-time workers share = Part-time workers / Employed
Part-time for economic reasons share = Part-time for economic reasons / Part-time workers

Appendix Slide

LF stats diagram

Appendix Slide: Real vs. Nominal Values

When comparing two dollar values from different years it is often desirable to adjust for inflation so that a meaningful comparison can be made. Nominal values are simply the dollar value of something without being adjusted for inflation. Real values adjust nominal values to account for differences in the price level.

Real Value (Year x)=Nominal Value (Year y) * [ CPI(year x) / CPI(year y) ]

Appendix Slide: What’s the difference between CPI-U and CPI-U-RS?

The Consumer Price Index for All Urban Consumers (CPI-U) is the most commonly used price index to adjust dollar values for inflation. However, some analysis hold that the CPI-U overstated inflation in the late 1970s and early 1980s by measuring housing costs inappropriately. The methodology for the CPI-U from 1983 onward was revised to address these objections. Other changes were introduced into the CPI in the mid-1990s but not incorporated into the historical series.

To allow a historically consistent series, the CPI-U-RS (Research Series) was created. This index uses the new methodology for housing inflation over the entire 1967-2001 period and incorporates the 1990s changes into the historical series. The CPI-U-RS is now used by the Census Bureau (and EPI) in its presentations of real income data.

Appendix Slide: How is statistical significant calculated?

The statistics in the data sources commonly used by EARN groups come from surveys of American households. Because they are drawn from a sample, there is always a chance that the statistics you are observing are not representative of what is actually going on in the population. “Statistical significance” is a way of trying to get a handle on how big that chance is. Often we care about whether a change in a statistic is actually a change, or whether it is roughly the same.

The formula for calculating a Z statistic :Z stat

If the absolute value of Z is >1.645 then the difference is significant at the 90 percent confidence level. You can find this and other ACS statistical testing formulas at