Blog | IMPLAN

The Many Tributaries of Big Data: Where Does U.S. Economic Data Come From?

Written by Tim French | April 9, 2019

Economic data flows and collects from sources both varied and unique. But which sources are significant and why? And how complex does this world of big data get when it comes to trying to explore the economic landscape? Cue Ms. Frizzle’s “Seatbelts, everyone!” line and let’s take a tour of economic data sources on the Magic School Bus!

Getting Started

When it comes to sectoring schemes, it’s practically a requirement to start with the North American Industry Classification System (NAICS). For background, this sector scheme is designed to standardize categories so federal statistical agencies can classify business establishments for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy. The classification structure is based on a composite code, ranging from a broad industry sector—two-digit code, containing 20 broad sectors—all the way down to a six-digit code containing 16,196,514 U.S., Canadian, or Mexican National specific business industries.

Ready to make an impact of your own? Here's how to put big economic data to work for you.

For historical context of the importance and purpose of NAICS, allow me to defer to the same U.S. Census link above:

“NAICS was developed under the auspices of the Office of Management and Budget (OMB), and adopted in 1997 to replace the Standard Industrial Classification (SIC) system. It was developed jointly by the U.S. Economic Classification Policy Committee (ECPC), Statistics Canada, and Mexico's Instituto Nacional de Estadistica y Geografia, to allow for a high level of comparability in business statistics among the North American countries.”

This effort was a huge win for economists at the time that NAICS was introduced. The system only appreciates in value as the world economy and its industries become more diverse on their trend toward globalization—helping more detailed economic policy studies take wing.

Awesome, we can now accurately classify a business with NAICS as our encyclopedia of “What industry is this business in?” But if our next question is “What data exists about this industry and what are the behaviors of the industry?” then things get very complicated.

Major Economic Data Sources

The Bureau of Labor and Statistics (BLS)

Let’s start our tour chronologically, taking a look at the oldest established sources. First up is The Bureau of Labor and Statistics (BLS).

The Department of the Interior established the BLS with the Bureau of Labor Act (23 Stat. 60) on 27 June 1884 to collect information about employment and labor and the BLS does an impressive job of it, at that. There are a lot of key historical milestones in the 130+ years that the BLS has existed but, for our purposes, let’s highlight 3 events to showcase a few key datasets.

  • 1915—the first monthly surveys of employment and payrolls began to occur which founded today’s Current Employment Statistics program. This program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls for all 50 States, the District of Columbia, Puerto Rico, the Virgin Islands, and about 450 metropolitan areas and divisions.
  • 1983—when the BLS assumed full responsibility for federal-state cooperative programs on labor market information. This impacted and helped standardize a handful of reports including CES mentioned above, Local Area Unemployment Statistics, and soon after in 1984, Employment and Wages (Quarterly Census of Employment and Wages).
  • 1996—the BLS bolstered the datasets beyond just wage data to include occupational employment data for every state in the Occupational Employment Statistics program.

Employment, wage, and salary data is a crucial insight in any economic analysis since it holistically affects the economy whenever you want to model a potential change (or even understand the current known breakdown of industries). Furthermore, BLS data is very detailed when it comes to the number of sectors it describes (including 6-digit NAICS categorization in its reports). That’s crucial since every industry is unique and the more granular the data, the more accurate the base of your analysis can be. These data are published quarterly as part of the aptly named Quarterly Census of Employment and Wages (QCEW) Data. QCEW’s sister dataset, Consumer Expenditure Survey (CES), comes out every year (although lagged by one year—the data released this year describes economic activity from two years ago) and contains information that helps break out household income into categories based on income levels and lets us see the subsequent spending patterns of those household categories.

U.S. Economic Census

The Economic Census is a pretty broad and, as a result, poor term to use to lump together all of these various programs and surveys. However, all of these datasets are housed on census.gov so for our purposes it’ll work.

For some context, per records, the census began with the Census of Manufactures in 1810 to take economic account of a handful of producing and service-based industries. And then industrialization changed everything.

In 1902, Congress authorized the establishment of a permanent Census Bureau and directed that a census of manufactures be taken every five years. The 1905 manufacturing census was a milestone, marking the first time a census of any kind was taken separately from the decennial population census. The rest, as they say, is history.

Today, the Economic Census provides a handful of types of data which add even more depth to our understanding of the economy:

  • County Business Patterns (CBP): This is very similar to the BLS’s QCEW data in that it also relies on NAICS’s sectoring scheme, but this dataset is one-year lagged. What sets this apart from CEW data, however, is that it describes how many businesses are in a region and how many employees those businesses have on staff. Crucially, the data breaks its business counts down to the zip-code level which gives us a level of granularity that few other data sources can provide.
  • Annual Survey of Manufactures (ASM): As a data source, the output and inventory for manufacturing sectors that this provides gives us a sort of mooring for gauging the estimates for output for these industries based on region that we might extrapolate from other data sources.
  • U.S.-level construction sector output: Same story as for ASM, only this time we get all things construction-related from new residential buildings to major national infrastructure projects.
  • U.S.-level foreign exports and imports: This describes some of the leakages from the national economy and where domestic sectors get inputs.
  • Census of Government Finances: As expected, this includes revenue and spending by state, county, and city governments.

Bureau of Economic Accounts (BEA)

This branch of the U.S. data tree got its start shortly after the great depression in an effort to better understand the links and significance of production to local economies across the country. The Department of Commerce even tipped its hat to the BEA, describing its estimation process for GDP as “the greatest achievement of the 20th century.”

What the BEA provides is an annually-released National Income and Product Accounts (NIPAs) dataset which includes total numbers for data points like U.S. employment, GDP, capital investment, and Personal Consumption Expenditure (PCE) spending. You can think of this as an atlas for the economy—you get the big picture but some of the finer details need filling in from other sources like the BLS and U.S. Census.

The BEA also releases benchmark input-output tables every 5 years. These are the proverbial “national checkbooks” which describe what any given industry pays any other industry to provide inputs for production. The number of rows in this checkbook increase every time the BEA defines a new industry sector to describe the unique production functions which emerge as new business types enter the national economy.

Regional Economic Accounts (REA) follow NIPAs and input-output tables but are lagged by 1 year. They contain information about employee compensation and proprietor employment and income to state- and county-level detail. Of the data sources we’re covering here, this is the only one that includes information about what employees get paid (including benefits and payroll taxes).

And there’s more! Output for most service sectors, past-year deflators, state-level tax data, and net commuting rates also enrich the information available in that atlas of the U.S. economy. All you need now is to figure out the names of all those side roads and alleys are and where they lead which brings us to...

USDA

As its name suggests, this is where all the agricultural economic activity lives. The USDA is especially valuable as a data source because many other sources introduced in this article treat agriculture as a single sector or industry whereas the USDA provides far richer detail.

The additional detail finds its way into three major USDA data sets:

As you can imagine, production and sales for farmers differ widely depending on the crops that they’re growing and where those crops are grown. Also, what a farmer produces in one year may not be sold until the following year (or later). We can use all three of these USDA data sets together to reconcile what’s materially contributed to the economy in our “checkbook” snapshot of the economy for an isolated year. The USDA data helps fill in some of those gaps in the map which the other data sources don’t provide.

Oak Ridge National Laboratory

This one isn’t strictly an economic data provider, but their data does inform the way that you might accurately describe the trade that counties in the United States share with each other.

Specifically, these good folks provide data which can be used to extrapolate a travel-cost index which details how much it costs in terms of time and money to move goods from one county to another relative to all other counties. This is especially useful for building a gravity model of trade—but that’s a topic for a whole other article. If, for example, two counties share a border then it’s reasonable to assume that trade might flow freely between them. But if that border is scribed by an impassable mountain range, then more adjoining counties may be involved in the trade process between our first two mountainous counties. These data are also broken out by mode of transportation so if there ain’t no trains in a county, then you’re going to have to hire a company in the trucking sector to deliver.

Railroad Retirement Board

This one sounds oddly specific, I know, but if you want to get a complete picture of the U.S. employment landscape, then you’ll have to make a stop here. BLS QCEW data only covers employees eligible for Federal unemployment insurance programs. Railroad employees don’t fall under that federal umbrella—they’re covered by their own program.

NCES Integrated Postsecondary Education Data System

This details the employment data for colleges and universities all around the world. As you can imagine, layering this dataset into those we’ve already talked about gives you even more granularity into how higher education sectors pay and structure their workforce.

NOAA Fisheries Statistics Division

And, finally, what atlas of the U.S. economy could be complete without output for fishing sectors. NOAA’s got you covered.

Wrapping It Up

Whew! That’s quite a haul of information to help solve our initial goals of identifying the sector a project of interest falls into, the relationships and characteristics the sector exhibits, and where to find standardized data pertaining to it. Bear in mind too, the big takeaway to consider as we step off of our magical school bus tour of U.S. economic data sources is that no two data sets are created equally or thoroughly—and this is just the 30,000-foot view; there are even more data sources to explore. In many cases, the economic questions you’re trying to answer require more than one source. Getting a complete, holistic snapshot of the economy takes consulting multiple data sets, knowing what’s not represented, and filling in the missing pieces wherever possible.