ITEP Tax Model Methodology

The ITEP model is a tool for calculating revenue yield and incidence, by income group, of federal, state and local taxes. It calculates revenue yield for current tax law and proposed amendments to current law. Separate incidence analyses can be done for categories of taxpayers specified by marital status, the existence of children in the family and age. To forecast future revenue and incidence the model relies on government or other widely respected economic projections.

I. Construction of the Data Set

Three separate data files were constructed: a database of resident records containing information on income, deductions, property values, consumption and other data for the residents of each state; an input-output (I-O) model of the industry structure and level of intensity of operation in each state; and estimates of visitor expenditures in each state.

1. Residents

Most microsimulation models used for tax policy analysis start with a sample of income tax returns representative of the tax-filing population of interest. To this sample, records are usually imputed or obtained from an auxiliary data source to represent the non-filing population. While this procedure is not strictly necessary for certain types of income tax analysis, to do meaningful incidence analysis, all major taxes and all segments of the population must be accounted for.

Once this representative file is assembled, a statistical match with auxiliary data sources and a number of imputations must be performed to fill in the gaps for data which do not appear in the primary set of records. Next, since tax returns data are usually available only with a lag of several years, the file must be extrapolated to a future year of interest to the analyst. This section describes how these procedures were implemented on the ITEP model.

a. Data

The centerpiece of the ITEP microsimulation tax model is a stratified random sample of approximately 365,000 federal tax returns. The returns were chosen so as to be statistically valid at the state level for items relating to income, deductions, federal taxes paid and exemptions. Since most states couple their income tax to federal law, and this data set has proven to be generally reliable for a wide variety of data items, this is a logical starting point for estimating state individual income tax collections as well as ascertaining certain other information.

Since, however, many who do not file federal tax returns pay state and local taxes, it was particularly important to augment these data with information representative of the total population in each state. In addition, certain information that is necessary for the calculation of state income, property and consumption taxes is not available in the federal tax return data.

For creating non-filer records and for obtaining some of the information not available on tax returns, the 1990 Public Use Microdata Sample (PUMS) of census records was used. This data source is a five-percent sample taken from the Decennial Census and provided a sufficient number of records to allow for statistically valid results.

b. Identifying Filers and Non-Filers

The first step in using the Census PUMS data was to identify tax-filing units on the census. This is, in some ways, the most challenging aspect of the process since it was necessary to identify in the census data how individuals within households are likely to be grouped for tax purposes. This meant applying criteria to each possible grouping identified to determine whether they are likely to be tax filers based on filing thresholds, eligibility for credits (especially the Earned Income Tax Credit), the desire to receive refunds or some other criteria. There are, however, a substantial number of filers where the reason for filing is not apparent (particularly among the elderly) and adjustments were made accordingly. Also, individuals in households were combined into economic units in a way that would produce meaningful results. (For example, one does not want to treat a 5 year old child living with a family to which she is unrelated as a separate family for creating an income distribution or as an actual, or potential, taxpayer.)

This procedure leaves us with the Census PUMs data divided into filer and non-filer groupings. For the non-filers, this is the basic set of data which we use for calculating income and property taxes and in imputing consumption (described later). The filer data is attached to federal tax return data in a process of statistical matching.

The final production file contains 761,363 tax unit records representing both the filing and non-filing population in the states. These records represent (a) economically independent family units or individuals (b) dependent filers and (c) economically dependent families living with non-relatives. The vast majority of the records fall into the first of these categories. Note that, typically, only records in the first category are used when conducting incidence analyses.

c. Statistical Matching

Augmenting the income tax return data is a necessary step in building a comprehensive state and local microsimulation model. First, many items relevant to the proper calculation of state income tax liability are not available on federal income tax returns. For example, some states allow special deductions and/or exemptions based on the age of the taxpayer and this information is largely absent on the federal return (it is possible, however, in most cases, to identify those taxpayers who are over 65). Additionally, information on income that is not reported on the federal return is needed for calculation of total income for classification purposes. Finally, information on property taxes and consumption is limited on federal tax returns.

The ITEP model augments the tax return data in two ways. First, through a statistical match with census data and second through imputations of consumption and other data.

Statistical matching is a process by which information from two or more data files--usually microdata sets collected from a survey--are combined in order to augment information that is contained on one file and not the other. A typical situation is as follows: the analyst has observations from one data file, File A (the "recipient" file) say, with information on one set of variables--call them X-variables--as well as another set of Y-variables. he analyst also has access to another data file, File B (the "donor" file), containing information on the same set of X-variables plus additional information on a different set of Z-variables. Statistical matching involves creating a new data file, File C, containing information on X, Y and Z. In tax policy analysis, File A could be a sample of tax records and the X-variables usually represent income and selected demographic data (e.g., family size) while the Y-variables contain deduction and tax information. File B could be a census survey of some type where, again, the X-variables represent the same income and demographic information and the Z-variables are usually information on family composition (e.g., children's ages), labor force attachment, consumption and savings components or health-related indicators. It is common, but not necessary, that while both files will most always contain a different number of records, the weighted number of records on each file is constrained to be equal.

When performing a statistical match, the analyst would like to ensure that records on File A are only matched to records on File B that are sufficiently "close". This means that some sort of metric need be devised to measure closeness and much empirical work on matching involves the construction of such a metric. Cohen (1991) provides an excellent and up-to-date survey of statistical matching methods.

An interesting aspect of this particular matching problem is that it is logically equivalent to a famous problem in network optimization, the transportation problem. Briefly, consider the records in File A as factories and those on File B as warehouses with the weights on File A as the productive capacity of the factory and the weights on File B as the available storage in each warehouse. A simple formulation of the transportation problem tries to minimize the cost of shipping the production ("supply") to its destination ("demand") subject to the capacity constraints.

Mathematically, let cij be the cost of shipping one unit of production from factory i to warehouse j, ai the production at factory i, and bj the capacity of warehouse j. Then the transportation problem becomes:

Minimize cijxij
{xij} i,j

subject to:

xij = ai

xij = bj.

This is simply the matching problem with the ai's the weights on File A, the bj's the weights on File B and the xij's the weights on the records in File C that one obtains from matching record i with record j. This particular approach to statistical matching is termed "constrained matching" and has several desirable properties. In particular, it is not hard to show that a constrained match guarantees that the moments (means and variances) of the matched variables are maintained. Unfortunately, constrained matches are difficult to implement in practice due to the "curse of dimensionality". For a typical problem in tax policy analysis both input files might contain approximately 100,000 records resulting in over 10 billion possible combinations to be evaluated. Consequently, researchers using constrained matching must first reduce the size of the problem by appropriate partitioning of the two input datasets. See Barr and Turner (1983) for details.

Another practical disadvantage of constrained matches is that predicting a general set of constraints that will produce the optimum results is difficult. In other words, depending on certain characteristics of records, sometimes allowing greater distance in the income criteria between two records might be preferable to greater distance in family size. But, for other types of records the converse might be true.

In constructing the matched data file for the ITEP study, a different approach was implemented, based on random sampling from equivalence classes of records in both files. We chose this approach because it is conceptually quite similar to the partitioning solution and appears to perform well in practice (Armstrong, 1989). Sampling rates were adjusted according to the population weight of the sampled record in the course of performing the match to ensure the final

matched file had desirable properties. The sampling was done without replacement so no oversampling of records occurred. Results from this procedure were then compared with the targets for each state and sampling rates were adjusted where necessary.(2)

Table 4. - Equivalence Classes Used In Constructing The Matched File
&nbsp Number of Classes
Total Income 11
Homeowner Status 3
Number of Dependents maximum on file
Age of Taxpayers 2
Family Structure 4
Existence of Wage Income 2
Number of Earners 3

Table 4. lists the properties that define the equivalence classes that were constructed. The equivalence classes were defined by income, income type, filing status (family structure), age, number of dependents, type of income and home ownership. For most classes, 11 income groups were used. For a few classes where the sample was extremely small, usually in atypical high-income categories, the equivalence classes were broadened. For example, in some cases only 3 income groups were used and in others the number of dependents was capped. This was necessary because of the apparently poor sampling in census products at higher incomes.

Each tax record from the sample was, in addition to having it's equivalence class identified, categorized as being home-owning, non-home-owning, or unknown home-ownership. This categorization was based on itemized deduction information. Also, each married-filing-jointly tax return record was identified as being two-earner, not two-earner or unknown. This categorization was based on dependent care credit information and self-employment tax payments.

All home-owning tax return records were matched with home-owning census records and all non-home owning SOI records were matched with non-home-owning census records. Tax records where home ownership status was unknown (principally non-itemizers) were matched with census records to maintain the census percentage home ownership for the equivalence class.

Two-earner couple records used the wage and self-employment income split from the tax return record where such was calculable and used the income split from the census record if it was within the range of income splits consistent with data on the tax record. On records where the precise income split was not obtainable from the tax return record, yet the census record provided a split inconsistent with information on the tax return record, splits consistent with the record were chosen to maintain the average split and percentage of two-earner couples found in the census data for the equivalence class.

Using these techniques we were able to obtain additional information from the match for use with our tax return records, consistent with the information already on those records. A final step to hit control totals, using both filer and non-filer records was taken. These adjustments were only necessary for a few income types that are unavailable from the SOI and where there is under reporting on the census--public assistance payments being an example.

Lastly, many tests were done to ensure that the matching process produced accurate results for all of items for which the match was to be relied on. In addition, a number of tax-change simulations were performed that make use of the match and the results compared with macro estimates for the same tax-change. This is to ensure that, although hitting targets, we are not creating any aberrational relationships between data items.

d. Consumption Imputations

A different approach was used for assigning consumption to the data set records. This information was imputed onto the file instead of being obtained through a statistical match.

Detailed consumption amounts were imputed to each household in the matched file using estimated relationships obtained from the Consumer Expenditure Survey (CES), a quarterly survey of approximately 7,800 U.S. households conducted on a rotating basis and administered by the Bureau of Labor Statistics. Since the CES was designed to augment development of the Consumer Price Index, the quarterly sample frame of the survey can give reliable estimates of consumption patterns quickly. This creates problems, however, for researchers wishing to construct annual measures of consumption since households are rotated out of the survey every quarter and the quarterly data has to be merged to construct the annual files. Because the survey is chosen based upon the housing unit, renters tend to be under-represented when annual files are constructed. Our approach was to split the sample and perform separate imputations for homeowners and renters.

Six annual waves of quarterly consumption data were assembled with the first wave beginning in the fourth quarter of 1991 and the last quarter ending in the second quarter of 1993.

In all, a total of 5,958 household units representing those with complete income reporting were included in our sample. Summary statistics are include in Table 5.

Table 2. - Summary Statistics From the CES
&nbsp Homeowners




&nbsp Mean Standard Deviation Mean Standard Deviation
Total Expenditures 24,684 17,166 18,434 13,916
Non-Durable Exp. 20,959 13,892 16,713 12,684
Total Income 43,617 31,896 24,203 20,423
Food Share .2555 .0904 .2289 .0931
Shelter Share .2270 .1271 .4062 .1267
Clothing Share .0676 .0500 .0561 .0425
Transportation Share .1740 .0840 .1188 .0809

Our procedure for imputing consumption onto individual tax records can be thought of as involving two distinct steps: (i) econometrically estimating the necessary relationships for each of the desired consumption items from the Consumer Expenditure Survey (CES); and (ii) using the resulting regression coefficients to simulate consumption on the merged data file for non-dependents. Implicit in this approach is reliance on the strong separability of a utility function over different categories of consumption; i.e., we used a "utility tree" approach to estimate several systems of share equations.

The first step was to estimate "lumpy" purchases of household durables and automobiles (both new and used) using a limited dependent variable ("Tobit") specification. "Tobit" equations were estimated separately on homeowners and renters for automobile purchases and major household durables. Independent variables used in the equations were age, family size, marital status and total family (cash) income. The resulting coefficients from these equations were used to impute first a probability of making a purchase and, conditional on making the purchase, the mean amount. This probability was compared with a uniform random number to select records for imputation.

Next, total non-durable consumption expenditures were imputed in a similar manner: separate ordinary least squares (OLS) regressions were estimated from the CES on both samples with a similar set of predictor variables. Coefficients from these equations were then used to impute mean (non-durable) consumption expenditures to each household and a normally distributed error term with a mean of zero and a standard deviation equal to the standard error of the regression was added to each imputed amount. Two sets of adjustments were then made to the imputed amounts.

First, the particular functional form used was unstable at very low levels of income resulting in extraordinary amounts of imputed consumption for several records. For nondurable consumption, our OLS specification included two terms, 1/Y and 1/Y2, where Y is total family income, that presented problems at both ends of the income distribution. For very low incomes, the nonlinearity introduced by 1/Y and 1/Y2 caused estimates of mean consumption to approach infin-ity. This was handled by constraining consumption for these records to be no more than 1.5 times income. This limit was based on analysis of the CES data independent of the imputation process.

Second, the tax return data that formed the basis of the income information for filers contained income amounts far outside the range observed on the CES and caused problems when our regression coefficients were used. Our approach was to assume that the estimated equation was valid for incomes within the range of the CES and to fit a spline function for the portion of income in excess of this amount for those households (about 2.5%) with reported incomes outside the range of that reported in the CES.

A normally distributed error term with a zero mean and standard deviation equal to the standard error of the regression was added to each predicted consumption amount to impart some variance to the final imputations. For the systems of share equations (discussed below), this technique was repeated except that a vector of multivariate normal deviates was added to each system of mean shares.

In the next stage of the imputation procedure, share equations were estimated for non-durable consumption in the following major categories: food, shelter, clothing, transportation and other consumption.(3) Since the sum of the shares must equal 1.0, one equation is redundant and can be dropped. The resulting system of equations was estimated using a "seemingly unrelated" regression (SUR) approach(4). (This was not technically necessary since our list of predictor variables ended up being the same for each of the equations and OLS would have given the same estimates, but we wanted to save the variance-covariance matrix.) Mean consumption shares were then imputed and a vector of multinormally distributed random error terms having the same variance-covariance were drawn and added to the mean shares. Finally, these shares were then multiplied by total non-durable consumption to obtain dollar amounts.

A similar approach was used for several more "levels" of consumption (e.g., food was further split into food at home and food away from home) resulting in a set of seventy-two consumption imputations. Because of the importance of alcohol and tobacco taxes, it was decided to separate these items and estimate Tobit equations for them in a similar manner. Also, for alcohol consumption we further apportioned these expenditures into alcohol consumed at home and alcohol consumed away from home and further subdivided the expenditures into those on beer, wine and distilled spirits. Table 6 lists all the imputed items and the system of equations they were grouped by.

Once the first set of consumption imputations were obtained, it was necessary to make two further sets of adjustments to reflect both national control totals and state-by-state per capita consumption amounts for selected items. Detailed estimates of Personal Consumption Expenditures (PCE) for the U.S. are contained in the National Income and Product Accounts (NIPA) published by the Commerce Department. Our overall consumption imputations were constrained to hit these control totals after adjusting the NIPA totals to correspond to our definition of consumption.(5)

Table 3. - Detailed Consumption Imputations
Total Expenditures
Major Household Durables
Automobiles (New & Used)
Total Durable Expenditures
Food Housing Clothing Transportation Other Expenditures
  At Home   Shelter   Footware   Gasoline & Oil   Medical & Health
Food   Fuel   Jewelry   Maintenance & Repairs Prescription Drugs
Alcohol Electricity   Other Clothing   Other Physicians
Beer Natural Gas Infant Clothing Local Travel Dentists
Wine Oil & Coal Children's Clothing Mass Transit Hospital
Spirits Other Fuels Adult Clothing Taxicabs Insurance
  Away From Home   Utilities   Intercity Travel Other Medical
Food Water & Sewer Airline   Miscellaneous
Alcohol Telephone Railway Education
Beer Other Utilities Bus Recreation/Entert.
Wine   Household Operations Other Travel Books, Magazines
Spirits   Miscellaneous   Personal Care
Tobacco Repair and Cleaning Other Miscellaneous
  Lawn Service  
Other Misc. Household

For certain consumption items--including, alcohol, tobacco, gasoline and most utilities--reliable state-by-state per capita consumption figures are available from government agencies or other private sources and our consumption imputations were constrained to hit these targets.(6)

Extrapolating the Residents Database

After creation of the data set, the next step was to extrapolate the data to various years for which analyses are to be performed. This was accomplished in two stages. In Stage I, selected data items, including most types of income and some deductions were adjusted by per-capita factors so as to match targeted totals for these items on a state-by-state basis for every year, up to and including the latest year for which extrapolation is performed. These targets were based on detailed federal tax return data published by the Internal Revenue Service, national IRS microsample data from years subsequent to the base data year, state personal income and population growth as provided by the U.S. Commerce Department, and other data from a variety of sources including the National Income and Product Accounts, and estimates of Gross State Product provided by the Commerce Department. Tests were performed to ensure that certain population and demographic totals were reached for each state

Once the data have been adjusted in Stage I, the individual weights on the file are adjusted in order to hit aggregate control totals for each state. These control totals can be simple aggregate amounts and/or number of returns or they can be further broken down by income class. The Stage II extrapolation involves solving an optimization problem that involves minimizing (some function of) the changes in weights subject to hitting the targets. The ITEP extrapolation is formulated as follows:

Min {Min | zi | }
zi i

where zi is the percentage change in the weight on the ith record and is the maximum allowable percentage change for all records on the file. Operationally, this can be transformed into a linear programming problem subject to equality constraints. Let

ri = z+i = max(0 , zi)

si = z-i = abs[min(0 , zi)]

the positive and negative parts, respectively, of zi. Then | zi | equals ri + si (zi = ri - si) and the reformulated problem becomes:

Min {Min ri + si }.
ri,si i

As for the constraints, let wi be the original weight on the file for record i and xi be an arbitrary income item. Then w*i = wi (1 + zi) is the new weight to be solved for and

w*i xi


represent income and return targets, respectively, while the ri and si are constrained to be less than . This linear programming problem is then solved iteratively over.

2. Business Database

There is considerable diversity with respect to the types of taxes businesses pay in each of the states. These differences can have important consequences when one attempts to allocate the burden of business taxes to residents of each of the states.

Since we did not have access to tax return information for businesses operating in each of the states, our approach entailed first estimating the revenue collected by each of the major taxes (income, consumption and property) and then allocating these amounts according to general principles of state and local tax incidence. A principle tool in our incidence analysis was a forty-nine sector input-output model for each of the states calibrated to reflect current economic conditions.

a. Input-Output (I-O) Model

The starting point for constructing the regional input-output models for each of the states was the 1987 Benchmark I-O Table of the U.S. published by the Commerce Department (1992). The technical coefficients in the table were used to calibrate estimates of gross state product (GSP) across forty-nine industry groups to arrive at estimates of intermediate purchases and capital investment.(7) Our approach relied upon the "location quotient" method of constructing the state tables: each industry's intensity of operation in a particular state was assumed proportional to its contribution to final demand in the state. For example, if aij represents the per unit input of industry i to the output in industry j from the national table, then the dollar amount of (intermediate) purchases of industry j from industry i in the regional table was calculated as:

Qijk = aij * (dik / Dk)

where dik represents the contribution to final demand of industry i in state k.(8) Of course, implicit in this approach are two very strong assumptions relating to the aij's: that they have remained constant since 1987 and that they are the same across states. Neither assumption is likely to hold in any state. Miller and Blair (1985) discuss each of these issues as well as the construction of regional I-O tables from national estimates.

3. Visitors Database

State-by-state estimates of expenditures by travelers are provided by the U.S. Travel Data Center. These expenditures are broken down into six categories: public transportation, auto transportation, lodging, food service, entertainment and recreation, and general retail sales. In order to properly reflect each states sales tax base, these categories were further subdivided based on proportions contained in the CES with minor adjustments reflecting the likely composition of expenditures by tourists. The last year data are available was 1993; expenditure data for subsequent years were adjusted to reflect the change in the consumer price index (CPI).

Information on non-traveler, visitors (e.g. commuters) was obtained by using aggregate data and subtracting the portions attributable to residents, business and travelers. Based on the success of the model in predicting the aggregates in states where non-traveler visitors are expected to be insignificant, the use of this residual amount was assumed to be reliable in states where non-traveler visitors are significant.

II. Computing the Taxes

The most important, and difficult, part of developing a valid state and local microsimulation tax model is constructing the data file. If the data file is constructed with all (or the overwhelming majority) of elements necessary for calculating any tax in the nation, developing the actual tax calculator is relatively straightforward. It was partly for this reason that the process of developing the ITEP model focused very heavily on creating and extensively verifying a statistical match and high quality imputations, and using sophisticated extrapolation techniques.

Personal Income Tax

Once the matched data file was assembled, developing a tax calculator to compute state income tax liability for each taxpayer was a straightforward but non-trivial task. The great variety in state and local income tax provisions required a model with great flexibility.

Consumption Taxes

Once the consumption amounts were imputed for residents, each state's sales and excise tax law was simulated. For most items in the state sales tax base, this procedure was straightforward: food away from home, clothing, gasoline, etc. were added to the taxable sales tax base for each household. For some (mostly minor) items where the CES did not have sufficient observations to perform a meaningful imputation (e.g., opera tickets), the same overall mean proportion of these expenditures across all households was used. For purposes of computing excise tax revenues, dollar values of consumption were converted to physical units by relying on average prices in the state for the affected commodities.

Businesses operating in a state are generally required to pay sales and excise taxes on many of their purchases, although most states exempt raw materials that are part of the final product to eliminate the "cascading" of these taxes. Numerous other sales tax exemptions are also available to businesses in particular states. For example, goods used in the production of agricultural commodities are often exempt as are some utilities and certain purchases of machinery and equipment.

The approach to estimating the revenue from consumption taxes collected from businesses in each state was to use the input-output model to construct a taxable sales tax base for each industry operating in the state. This allowed the model to identify the commodities specifically exempted from the base since these would vary depending on the industry. The ITEP model's ultimate distribution of business consumption and other business taxes is discussed below.

Consumption taxes on visitors were calculated using the visitors database described above.

Property Taxes


The property tax estimation relied heavily on the statistical match. The data used in the property tax estimate were property taxes reported on the tax return file by itemizers, property taxes reported on the census for non-itemizers and home values reported on the census. In addition, property tax law and practice information was obtained for each state. Specifically, information on assessment practices, statutory assessment ratios, property tax rates, homestead exemptions and other relevant information was collected for each state for 1989 and target years (1995 initially). The availability of all information sought varied among states.

Using this information and the tax return and census data on property taxes and home value, a property tax rate and assessment ratio was calculated for each record for the data year of 1989. In some states with idiosyncratic rules, additional special calculations were necessary. For instance, in the states that have homestead exemptions that are not consistent statewide, a homestead exemption was calculated. After this step we have all information necessary for calculating the property tax in 1989.

The next step was one of extrapolation. Home values were increased from 1989 to 1995 (initially, some states have now been brought to later years) using the average increase in Home values, 1995 assessment practices and law were imposed and rates were increased to hit the 1995 average. Taxes were recalculated for 1995. For states with simple property tax structures that had gone largely unchanged, this was little different than simply adjusting the property taxes reported in 1989 to 1995 levels. For states with homestead exemptions or other complexities the separate aging of each of the elements was critical to producing accurate results. Where available, the results were compared with data on homeowner taxes.

"Circuit breakers" and other income-dependent property tax relief measures were calculated either in the income tax model using property tax data imported from the property tax model, or in the property tax model using income data imported from the income tax model. The choice generally depended on whether the relief was provided in a lowering of taxable value or a rebate of taxes paid.


Rental property taxes were computed by imputing a property value to each renter by multiplying rent by a factor. These factors were calculated using state assessment data or, in some cases, research conducted in the state. The quality of the data for this purpose varied widely among states. For states where data was not available, we used the average factor for the other states. Rental property provisions are generally calculated in the personal income tax model since they using rent data from the matched data file.

Ad-Valorem Automobile Taxes

For itemizers, the personal property tax deduction is used to impute auto values. With this information, tax is then recalculated for the analysis year, incorporating value extrapolations and tax law changes since the data year. For non-itemizers auto value is imputed using income and number of household members. Aggregate results are compared with data from the state, where available.

Intangible Property Taxes

For itemizers, personal property tax deductions are used to impute the value of intangible assets. With this information, tax is then recalculated for the analysis year incorporating value extrapolations and tax law changes since the data year. For non-itemizers the value of intangibles is imputed using income from intangibles. Aggregate results are compared with data from the state, where available.

Business Property Taxes

Business property taxes were computed as a residual amount after residential property taxes were computed from the individual simulation model. This residual was calculated from the most recent data available from each state for total property tax collections.

III. Incidence Assumptions

Public finance theory suggests that it is often the case that the person or entity that initially remits a tax or fee is not necessarily the one that bears the ultimate burden of the tax. Some useful recent surveys of the tax incidence literature are Kotlikoff and Summers (1990) and Bradford (1995), the latter being particularly concerned with the distributional analysis of tax burden across income groups. Our approach in the ITEP model was not to break new ground on the incidence debate but to use generally accepted and reasonable guidelines by which to base our analysis. Since assumptions about state and local tax incidence can often be quite different from, say, the incidence of a national tax due to the mobility of factors of production (capital and labor), a number of interesting issues present themselves. Our approach followed closely, in principle, that which the Minnesota Department of Revenue has employed in its incidence analyses, with some differences in underlying assumptions..

First, individuals were assumed to bear the burden of the individual income tax directly according to their liability. Similarly, consumption taxes paid by individuals were assumed to be borne directly. Visitors taxes were assumed borne by the visitors. In our analyses, however, we do not show this visitors tax burden because we generally only show taxes paid by taxpayers to their own state and local governments. Direct payments of individual property taxes were also assumed borne by the payer.

Second, taxes on business income, capital and property were generally treated as taxes on capital and allocated to individuals--both residents and out-of-state owners of capital-- according to the ownership of capital. For purposes of computing this amount, half of total business property taxes on residential rental property was assigned to individual tenants and distributed based on rents paid. Capital income was defined to include interest, dividends, realized capital gains, passive income reported on Schedule E and seventy percent of taxable pension income (the approximate amount that reflects the return to capital rather than deferred wages). The distribution of these items was computed using the microsimulation model.

Third, in computing each states' share of the overall burden of capital taxes, each state was assumed to retain its share of national capital income with an adjustment to reflect the fact that residents of a particular state would be somewhat more likely to own in-state taxable capital than out-of-state residents. For residential rental property, this adjustment was generally 50 percent of the remaining portion of the tax. For other business taxes, the adjustment was 20 percent. In cases where a state's personal income as a share of its GSP was below the national median (e.g,, Alaska, Delaware and the District of Columbia) these amounts were reduced ratably.

Fourth, in states that imposed very high taxes on capital, those taxes may not be borne entirely by capital owners and may instead be shifted back to wages or forward to consumers. To account for this effect we computed, roughly, the total amount of corporate income and capital taxes and non-residential business property taxes as a share of output for several different types of industries: mining, timber, national market agriculture, national market financial activities, national market hotel activities, other national market activities, tourist activities, and other domestic market activities. These shares were based on the input-output model. In cases where one of these computed taxes was significantly above the national median, we assigned the excess to either in-state wages or in- and out-of-state consumption depending on the type of activity.

Fifth, sales and excise taxes paid by businesses were divided into taxes paid by industries principally engaged in the production of output sold in national and domestic markets. Taxes on domestic market items were assumed to borne by the residents of each state (except for amounts paid by visitors) according to their share of total consumption. Taxes on national market items were assigned to national consumption with an adjustment to reflect a proportion (about 15 percent) assumed to be retained in-state. These adjustment factors were adjusted downward in states where personal income as a share of state GSP was below the national median.

Finally, in states that imposed abnormally high sales and excise taxes on national market activities, those taxes may not entirely be borne by consumers because of competitive factors. Instead, they may be shifted back to wages or capital. We computed total national market business sales and excise taxes as a share of each state's national market GSP and where these national market business taxes were significantly above the national median, we assigned 50 percent of the excess to in-state wages and the remaining half to capital. The latter amounts were allocated according to our rules for allocating capital.


Armstrong, J., "An Evaluation of Statistical Matching Methods", Working Paper no. BSMD 90-003E, Methodology Branch, Statistics Canada, Ottawa, 1989.

Barr, R.S. and J.S. Turner, "A New, Linear Programming Approach to Microdata File Merging", in 1978 Compendium of Tax Research, Office of Tax Analysis, U.S. Department of Treasury, Washington, DC.

Bradford, D. F., Editor, Distributional Analysis of Tax Policy, American Enterprise Institute Press, Washington, DC, 1995.

Citro, C.F. and E.A. Hanushek, Editors, Improving Information for Social Policy Decisions: The Uses of Microsimulation Modeling, Vol II, National Research Council, Washington, DC, 1991.

Cohen, M.L., "Statistical Matching and Microsimulation Models", in Citro, C.F. and E. A. Hanushek, Improving Information for Social Policy Decisions: The Uses of Microsimulation Modeling, Vol II, National Research Council, Washington, DC, 1991.

Kotlikoff, L.J. and L.H. Summers, "Tax Incidence", in Auerbach, A.J. and M. Feldstein, eds., Handbook of Public Economics, vol. 2., (Amsterdam: North Holland, 1985).

Miller, R.E. and P.D. Blair, Input-Output Analysis: Foundations and Extensions, Englewood Cliffs, NJ: Prentice-Hall, 1985.

Minnesota Department of Revenue, 1995 Minnesota Tax Incidence Study: Who Pays Minnesota's Household and Business Taxes?, Tax Research Division, March 1995.

U.S. Congress, Joint Committee on Taxation, Methodology and Issues in Measuring Changes in the Distribution of Tax Burdens, (JCS-7-93), Washington, DC, June 14, 1993.

U.S. Department of Commerce, Bureau of Economic Analysis, Regional Multipliers: A User Handbook for the Regional Input-Output Modeling System (RIMS II), Second Edition, Washington, DC: U.S. Government Printing Office, May 1992.


1. This is an updated and revised version of a paper presented at the National Tax Association's 89th Annual Conference on Taxation in November of 1996, by Michael P. Ettlinger of the Institute on Taxation and Economic Policy, and John F. O'Hare of the Urban Institute. The title of the original paper was "Revenue and Incidence Analysis of State and Local Tax Systems: Tools, Technique and Tradition."

2. Our procedure is very similar to the approach reported in Armstrong (1989) and performs quite well in practice Cohen (1991).

3. For homeowners, shelter costs did not include mortgage payments because of some concern about the reliability of the data on the CES.

4. In our final imputations, SUR was not necessary since we used the same set of regressors in each of the equations, but we needed estimates of the variance-covariance matrix as explained below.

5. Two major places where this was important was in owner-occupied housing where NIPA imputes the rental value of homes and medical expenditures where we only impute unreimbursed expenditures.

6. For example, U.S. Department of Energy data on energy consumption by state were used. Further adjustment to such data is needed to account for large numbers of tourists visiting the state.

7. The original GSP data contained detail on 61 industries while the Benchmark I-O table had detail on 97 separate industry groupings. Because of overlap in the definitions of several of the industries in the two datasets, our final set of industries totalled 49.

8. An alternative method of calculating the location coefficients is to use wages to represent the intensity of each industry's operation. We tried both approaches and found only minor differences.