LR moment conditions have reduced bias and so are important when the first step is machine learning. We derive LR moment conditions for dynamic discrete choice based on first step machine learning estimators of conditional choice probabilities.

We provide simple and general asymptotic theory for LR estimators based on sample splitting. This theory uses the additive decomposition of LR moment conditions into an identifying condition and a first step influence adjustment. Our conditions require only mean square consistency and a few (generally either one or two) readily interpretable rate conditions.

LR moment functions have the advantage of being less sensitive to first step estimation. Some LR moment functions are also doubly robust meaning they hold if one first step is incorrect. We give novel classes of doubly robust moment functions and characterize double robustness. For doubly robust estimators our asymptotic theory only requires one rate condition.

]]>We use data from the English Longitudinal Study of Ageing, which surveyed a representative sample of the English household population aged 50 and over between 2002–03 and 2014–15, and the ONS 2014-based life tables for England and Wales.

**Subjective expectations of survival**

**Modern surveys ask individuals about their probability of survival to specific older ages. In only a small proportion of cases is there clear evidence that these questions are not understood.**98% of individuals gave an answer to a question asking their chances of surviving to older ages and of these just 14% – i.e. fewer than one-in-six – showed clear evidence of misunderstanding (e.g. by reporting no chance of death in the coming 10-year period).

**Individuals’ stated beliefs about their probability of survival are correlated with known risk factors such as smoking and the age that their parents died.**Those who currently smoke report on average 6–8 percentage points lower chance of surviving to an age 11–15 years ahead than do people who have never smoked. Those whose mother died at age 85 or older report on average 5-7 percentage points higher chance of surviving to an age 11–15 years ahead than do those whose mother died aged 60-64.

**Beliefs about probability of survival are also correlated with the individual’s actual age of death and respond to new diagnoses of health conditions.**Those reporting a 10% or less chance of survival to an age 11–15 years ahead were more than twice as likely to die in the following 10 years than those who reported a 50% or greater chance of survival. A new cancer diagnosis was associated with a 5 percentage point reduction in the stated probability of surviving to an age 11–15 years ahead.

**Comparing subjective expectations with life table estimates and mortality data**

**Relative to life tables, individuals from a range of ages and birth cohorts underestimate their chances of survival to ages 75, 80 and 85, on average.**Those in their 50s and 60s underestimate their chances of survival to age 75 by around 20 percentage points and to 85 by around 5 to 10 percentage For example, men born in the 1940s who were interviewed at age 65 reported a 65% chance of making it to age 75, whereas the official estimate was 83%. For women, the equivalent figures were 65% and 89%.

**Individuals in their late 70s and 80s are, on average, optimistic about surviving to ages 90, 95 and above**. This optimism becomes larger at older ages (10–15 percentage points when looking at age 95) and is larger for men than for women, amongst those born in the 1920s and 1930 For example, men born in the 1930s who were interviewed at age 80 reported a 32% chance of making it to age 95, whereas the official estimate was 17%. For women, the equivalent figures were 37% and 24%.

**Figure 1.1. Comparing subjective reports and “objective” life table estimates of survival probabilities (for men born 1930-39)**

**Subjective survival curves, estimated using individuals’ stated survival expectations, capture the general patterns in expectations.**These show growing pessimism relative to official life tables for ages up to the mid 70s before turning to optimism from the late 90s onwards.

**Actual survival probabilities differ significantly according to individuals’ education, wealth and marital status.**Women aged 60 in the bottom household wealth quintile had a 65% chance of surviving to age 80, compared with 87% for those in the top quintile, based on actual mortality data.

**Subjective survival curves reflect differences in actual mortality rates between groups to varying degrees.**Widows and widowers aged 60 show the greatest survival pessimism. While their estimated chances of surviving to age 80 were 77% and 67%, respectively, their subjective survival curves implied a 49% and 39% chance, respectively, a gap of almost 30 percentage points in both cases.

**Subjective expectations and economic behaviour**

**Survival pessimism is a potential driver of the unpopularity of annuities.**An annuity priced according to average survival chances should represent a fair deal (or better) for around half of individuals. But given individuals own survival expectations, around two-thirds of individuals in their 60s would perceive an annuity priced according to average survival chances as offering a less than fair deal.

**Deferral of the state pension, a choice analogous to annuity purchase, is rarely taken up despite being offered at a favourable rate.**While individuals are roughly twice as likely to defer the state pension if it represents a ‘fair deal’ given their survival expectations, the overall level of deferral is sufficiently low that we cannot make strong statements about this relationship.

**Optimism about survival at the very oldest ages may lead to reluctance to spend remaining wealth**if an individual survives through their 80s and into their 90s or beyond.

**As individuals are given more control over saving for retirement and use of accumulated wealth, the divergence between subjective expectations and official projections of survival is a concern.**

- Survival pessimism may mean that individuals save less during working life, and spend more in the earlier years of retirement, than they would, given their actual survival chances.

- The risk of this faster-than-optimal spending down of wealth, combined with optimism about survival at the very oldest ages, means that if individuals survive through their 80s and into their 90s, they may not only have relatively low levels of wealth, but may then be reluctant to spend this, with negative consequences for their living standards.

This is the first draft of a paper examining the distributional impacts of VAT exemptions and reduced rates and direct cash (and near-cash) transfer schemes in a series of low and middle income (LMIC) countries. All results presented are preliminary; it is being shared in order to elicit comments and provide early sight of findings we consider robust.

]]>First, we establish that the K-ML estimator is consistent and asymptotically normal for any K. This complements ndings in Aguirregabiria and Mira (2007), who focus on K = 1 and K large enough to induce convergence of the estimator. Furthermore, we show that the asymptotic variance of the K-ML estimator can exhibit arbitrary patterns as a function K.

Second, we establish that the K-MD estimator is consistent and asymptotically normal for any K. For a specific weight matrix, the K-MD estimator has the same asymptotic distribution as the K-ML estimator. Our main result provides an optimal sequence of weight matrices for the K-MD estimator and shows that the optimally weighted K-MD estimator has an asymptotic distribution that is invariant to K. This new result is especially unexpected given the findings in Aguirregabiria and Mira (2007) for K-ML estimators. Our main result implies two new and important corollaries about the optimal 1-MD estimator (derived by Pesendorfer and Schmidt-Dengler (2008)). First, the optimal 1-MD estimator is optimal in the class of K-MD estimators for all K. In other words, additional policy iterations do not provide asymptotic efficiency gains relative to the optimal 1-MD estimator. Second, the optimal 1-MD estimator is more or equally asymptotically efficient than any K-ML estimator for all K.]]>

The approach relies on moment conditions that have an additional orthogonal property with respect to nuisance parameters. Moreover, estimation of high-dimension nuisance parameters is carried out via new pivotal procedures. In order to achieve simultaneously valid confidence regions we use a multiplier bootstrap procedure to compute critical values and establish its validity.]]>

]]>

This is an updated version of previous working paper see here.

]]>

We propose two distinct QGMs. First, Condition Independence Quantile Graphical Models (CIQGMs) characterize conditional independence at each quantile index revealing the distributional dependence structure. Second, Prediction Quantile Graphical Models (PQGMs) characterize the best linear predictor under asymmetric loss functions. A key difference between those models is the (non-vanishing) misspecication between the best linear predictor and the conditional quantile functions.

We also propose estimators for those QGMs. Due to high-dimensionality, the two distinct QGMs require different estimators. The estimators are based on high-dimensional techniques including (a continuum of) L1-penalized quantile regressions (and low biased equations), which allow us to handle the potential large number of variables. We build upon a recent literature to obtain new results for valid choice of the penalty parameters, rates of convergence, and condence regions that are simultaneously valid.

We illustrate how to use QGMs to quantify tail interdependence (instead of mean dependence) between a large set of variables which is relevant in applications concerning with extreme events. We show that the associated tail risk network can be used for measuring systemic risk contributions. We also apply the framework to study international financial contagion and the impact of market downside movement on the dependence structure of assets' returns.

Key findings of the research include:

**Leaving education when the economy is weak has a direct impact on employment and pay at least five years afterwards**. Young adults five years after leaving education are still 1 percentage point less likely to be employed if they started out when the unemployment rate was 10% rather than 6% (unemployment has risen by 4 percentage points on average during the last three recessions). The average negative impact on the pre-tax earnings of young adults and their partners (if they have one) five years after leaving education is 4% – or £1,100 per year. These effects have faded away almost completely after a further five years.

**Some of the impact is offset by lower taxes and higher benefits**. Once you account for taxes and benefits, the effect of leaving education when unemployment is high on the combined incomes of young adults and their partners (if they have one) five years later falls from 4% to 2%.

**Another important potential safety net is that most people live with their parents in the first few years after leaving education**, irrespective of economic conditions. This is particularly important in the years immediately after leaving education, when the effects on employment and pay are the largest: between 2010 and 2015, 74% of young adults lived with their parents a year after leaving education, 54% three years in and 38% five years in.

**As a result, starting working life during a recession has little impact on the total resources available to the households that young adults live in, on average**. Young adults’ net household incomes (including the incomes of all members of their household – most importantly parents) are hit by only 1% five years after leaving education. While there is no guarantee that resources in the household are always shared equally, at the very least this implies that many parents have the capacity to provide an important safety net for their children after they leave education.

**The safety net provided by parents is particularly significant because lower-educated people are both most affected by starting out in a recession and most likely to live with their parents**. Five years after joining the labour market, the pre-tax earnings of young adults and their partners are 4% lower for those who left education at 16 but there is no effect on those who left when they were 19 or older. At the same five-year stage, 60% of those who left education at 16 still live with their parents, compared with 21% of those who left when they were 19 or older.

**For those young adults not living with parents, there are significant lasting effects on overall incomes and household spending**. For this group, starting working life when the economy is weak causes household net incomes and spending to be 2–3% lower even five years later.

The extreme points of the calibrated projection confidence interval are obtained by extremizing the value of the component (or function) of interest subject to a proper relaxation of studentized sample analogs of the moment (in)equality conditions. The degree of relaxation, or critical level, is calibrated so that the component (or function) of , not itself, is uniformly asymptotically covered with prespecied probability. This calibration is based on repeatedly checking feasibility of linear programming problems, rendering it computationally attractive.

Nonetheless, the program defining an extreme point of the confidence interval is generally nonlinear and potentially intricate. We provide an algorithm, based on the response surface method for global optimization, that approximates the solution rapidly and accurately. The algorithm is of independent interest for inference on optimal values of stochastic nonlinear programs. We establish its convergence under conditions satisfied by canonical examples in the moment (in)equalities literature.

Our assumptions and those used in the leading alternative approach (a profiling based method) are not nested. An extensive Monte Carlo analysis conrms the accuracy of the solution algorithm and the good statistical as well as computational performance of calibrated projection, including in comparison to other methods.]]>

An updated version of this working paper can be accessed here.

]]>We provide a small Monte Carlo experiment to study the estimators' finite sample properties and an application to the estimation of gasoline demand functions.]]>

It is shown that whenever g is Lipschitz, though not necessarily differentiable, the posterior distribution of g(theta) and the bootstrap distribution of theta_n coincide asymptotically. One implication is that Bayesians can interpret bootstrap inference for g(theta) as approximately valid posterior inference in a large sample. Another implication---built on known results about bootstrap inconsistency---is that credible sets for a nondifferentiable parameter g(theta) cannot be presumed to be approximately valid confidence sets (even when this relation holds true for theta).]]>

This working paper is an updated version of W16/17.

]]>In this paper we use panel data methods in an attempt to strip out the impact of forestalling, and estimate the underlying taxable income elasticity of those affected by the 50% tax rate, and thus the revenue-effect of the reform. In particular, we develop a new method of correcting for forestalling by averaging income over the (three year) period during which forestalling is likely to have taken place. This approach yields an estimate of the taxable income elasticity of 0.31, lower than earlier estimates by HMRC (2012) based on the same reform (but a different method), and consistent with the 50% tax raising around £1 billion a year (relative to the current 45% rate).

Three things are worth noting, however. First, is that estimated elasticities are very sensitive to changes in specification, and to the inclusion or exclusion of a small number of individuals with extremely high (and volatile) incomes. Second, at the same time the 50% rate was introduced, restrictions were placed on the amount of pension contributions some taxpayers could deduct from their taxable incomes (in advance of more general restrictions in place from 2011–12). Those forced to reduce their pension contributions (or unable to increase them) would have higher taxable income than they would have if these restrictions were not put in place: this may downwardly bias our estimate of the taxable income elasticity. Indeed, our estimates of the elasticity of broad income (before personal pension contributions are deducted) are higher – 0.71 using the same method. Finally, it is worth noting that the panel approach adopted here, by focusing on individuals who are observed both pre- and post- reform, excludes some forms of response (such as migration). Taken together, these three issues imply that higher figures for the taxable income elasticity (including those in HMRC, 2012) are plausible. Thus it is also plausible that the re-introduction of the 50% could reduce revenues somewhat: an elasticity of 0.71 would imply a reduction of around £1.75 billion if none of the lost income tax or NICs revenues were recouped from other tax bases or in other time periods.

We also explore in more detail the nature of the response to the 50% tax rate. Two findings stand out. First, when we restrict our sample to those just around the £150,000 threshold, we consistently estimate the taxable income elasticity to be between 0.1 and 0.2, implying that behavioural response to the higher tax rate is concentrated among those with the very highest incomes. Second, we find little evidence that individuals responded to the higher tax rate by increasing use of tax deductions. However, this must be a tentative conclusion as not all deductable items are recorded on the tax return data available. Particularly relevant in this context is the possibility that owners of closely-held incorporated businesses chose to respond to the 50% tax rate by retaining income in their business, for extraction at a later date (perhaps in the form of capital gains rather than dividends). Analysis of such responses would require the linking of personal and corporate income tax returns, which is a subject for future research.

]]>This forestalling hampered an attempt by HMRC (the UK’s tax authority) to estimate the revenue effects of the tax rise (HMRC, 2012) using incomplete data from the first year of the higher tax rate (2010–11). This analysis used an aggregate difference-in-difference approach. In this paper we update this analysis, using more complete data on the first year following the reform (2010–11) and an additional year of data (2011–12) that was unavailable when HMRC conducted their analysis. Using a similar method to HMRC (2012), we estimate an elasticity of around 0.31 based on the response in 2010–11, and 0.83 based on the response in 2011–12.

We next refine HMRC (2012)’s methodology for estimating how much of the forestalled income came from 2010–11 and how much from subsequent years. We find that all else equal, HMRC's method for estimating from which years forestalled income came – which suggests that around 70% came from 2010–11 – is likely to lead to overestimates of how much came from these initial post-reform years, and hence underestimate the underlying taxable income elasticity. An alternative method that better accounts for these issues suggests around 45% was unwound in 2010–11, and around one-sixth unwound in 2011–12, implying an elasticity of 0.58 based on the response in 2010–11 and 0.95 based on the response in 2011–12. These would both imply negative revenues from the increase in the top tax rate to 50%.

Finally, we show the sensitivity of HMRC (2012)’s estimates to changes in the specification of the model used to estimate the counterfactual incomes of the group affected by the 50% tax rate. We find that relatively small changes to the specification yield very different results, with higher taxable income elasticity estimates frequently in excess of unity. The range of reasonable central estimates that the UK’s Office for Budget Responsibility could use to estimate the revenue effects of changes to the UK’s top income tax rate is therefore wide.

However, it is important to sound three notes of caution here. First, if individuals anticipated (correctly) the 50% rate being reduced in later years (or were able to respond to the announcement made towards the end of the 2011–12 tax year that it would be reduced to 45% in 2013–14), they may also have delayed receiving income. We still obtain higher taxable income elasticity estimates than HMRC (2012) when we assume that individuals were able to delay as much income from 2011–12 to 2013–14 as they were able to bring forward from 2011–12 to 2009–10, but it may be the case that delaying income is easier than bringing it forward. If this were the case, more of the overall response to the 50% tax rate may represent temporary timing effects as opposed to underlying response, which would imply that the estimates of the underlying taxable income elasticity may be overestimates. Second, some behavioural responses, such as additional occupational pension contributions, or retention of income in businesses, while reducing income tax revenues in the short-term, generate at least some revenue in the longer-term. Third, the estimate of the counterfactual is very imprecisely defined, meaning that the estimates from the different specifications are not statistically significantly different from each other, or indeed from zero. The central estimates of HMRC (2012) are therefore still very much within the margin of error of our estimates. There is therefore still significant uncertainty in both directions around HMRC’s estimates of the taxable income elasticity of high earners, and hence the revenue effects of the 50% rate.

]]>intergenerational income persistence.]]>

may improve life expectancy, but also impose serious short term risks; reducing class sizes may improve performance of good students, but not help weaker ones or vice versa. Quantile regression methods can help to explore these heterogeneous effects. Some recent developments in quantile regression methods are surveyed below.]]>

]]>

One of the main objectives of empirical analysis of experiments and quasi-experiments is to inform policy decisions that determine the

allocation of treatments to individuals with different observable covariates. We study the properties and implementation of the Empirical Welfare Maximization (EWM) method, which estimates a treatment assignment policy by maximizing the sample analog of average social welfare over a class of candidate treatment policies. The EWM approach is attractive in terms of both statistical performance and practical implementation in realistic settings of policy design. Common features of these settings include: (i) feasible treatment assignment rules are constrained exogenously for ethical, legislative, or political reasons, (ii) a policy maker wants a simple treatment assignment rule based on one or more eligibility scores in order to reduce the dimensionality of individual observable characteristics, and/or (iii) the proportion of individuals who can receive the treatment is a priori limited due to a budget or a capacity constraint. We show that when the propensity score is known, the average social welfare attained by EWM rules converges at least at n^(-1/2) rate to the maximum obtainable welfare uniformly over a minimally constrained class of data distributions, and this uniform convergence rate is minimax optimal. We examine how the uniform convergence rate depends on the richness of the class of candidate decision rules, the distribution of conditional treatment effects, and the lack of knowledge of the propensity score. We offer easily implementable algorithms for computing the EWM rule and an application using experimental data from the National JTPA Study.

]]>]]>

This April 2017 version is an updated version of the January 2017 version. The original version of the working paper is available here.

]]>only under dense graph sequences. ]]>

Watch IFS researcher, Kate Smith, talking about the design of alcohol taxes.

A more recent version of this working paper is available here.

]]>]]>

]]>

agents present choice options based on quality, but as agents of health authorities also consider their financial implications.]]>

This is an updated version of W15/22 New joints: private providers and rising demand in the English National Health Service.

]]>We then apply this result to derive a Gaussian multiplier boot-strap procedure for constructing honest conﬁdence bands for non-parametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our ap-proach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical pro-cess does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the conﬁdence bands. Finally, of independent inter-est is our introduction of a new, practical version of Lepski’s method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.

]]>an incomplete model of English auctions, improving on the pointwise bounds available till now. Application of many of the results of the paper requires no familiarity with random set theory.

]]>

We apply our method to analyze the distributional impact of insurance coverage on health care utilization and to provide a distributional decomposition of the racial test score gap. Our analysis generates new interesting findings, and complements previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering standard inference methods invalid for obtaining uniform confidence bands for quantile and quantile effects functions.

]]>moment conditions continue to hold when one first step component is incorrect. Locally robust moment conditions also have smaller bias that is flatter as a function of first step smoothing leading to improved small sample properties. Series first step estimators confer local robustness on any moment conditions and are doubly robust for affine moments, in the direction of the series approximation. Many new locally and doubly robust estimators are given here, including for economic structural models. We give simple asymptotic theory for estimators that use cross-fitting in the first step, including machine learning.]]>

the young.]]>

cycle, using a survey dataset from rural Tanzania. We find that adverse shocks during teenage years increase the probability of early marriages and early fertility among women.]]>

]]>

]]>

The original version of the working paper, posted on 01 April, 2016, is available here.

]]>Both the Smith Commission Agreement and the UK Government’s subsequent Command Paper, ‘An Enduring Settlement’ recognised that the devolution of fiscal powers has to be accompanied by the development of a new Fiscal Framework for Scotland.

Without such a framework there could be no fiscal devolution. It is essential in order to set out rules such as: how the Scottish Government’s block grant will be calculated in light of its new fiscal powers; what level of borrowing powers Scotland will have to enable it to deal with the additional economic risks and revenue volatility that it will face; the extent and scope of fiscal rules governing Scottish Government deficits and debt; arrangements for independent fiscal scrutiny, including fiscal forecasting; and arrangements for governing the increasingly complex interactions between Scottish and UK fiscal policy, including dispute resolution.

The Fiscal Framework is not part of the Scotland Bill: it is instead an agreement between the UK and Scottish governments (and therefore does not have the same legal standing as the Bill). It was finally published on 25 February 2016 after many months of negotiations between the two governments. The process of reaching agreement was protracted, and there were a number of contentious areas. But it seems the most significant area of disagreement was how the Scottish Government’s block grant should be adjusted to reflect its new powers.

The Smith Commission Agreement established that Scotland’s underlying block grant funding would continue to be determined by the Barnett Formula. But the Barnett-determined block grant would then have to be adjusted to reflect the new powers. On the one hand, the grant would have to be reduced to reflect the transfer of tax revenues from the UK to the Scottish Government, while on the other, an addition would need to be made to reflect the transfer of new welfare spending responsibilities to the Scottish Government.

The Smith Commission Agreement also established a number of high-level principles which it felt the Fiscal Framework should adhere to, and which were expected to govern the development of a proposal to adjust Scotland’s block grant. But, as we showed in our previous report, it is not possible to design a method for adjusting Scotland’s block grant that meets all of the Smith Commission principles simultaneously.

This inconsistency between the Smith principles was the main cause of the protracted negotiations between the two governments, and for several months it seemed likely to undermine the progress of the Scotland Bill. Each government interpreted the principles somewhat differently and chose to prioritise them differently, with the result that each favoured an alternative approach to adjusting Scotland’s block grant. Compromise was finally reached in February 2016, with an agreement on how to adjust the block grant for the next five years. While the mechanism chosen is complex and seems to blend elements of the UK and Scottish governments’ preferred approaches, ultimately it is the Scottish government’s approach that will determine the block grant available to Scotland during this period. After five years, an independent assessment will be carried out and negotiations will take place on how to adjust the block grant in the years beyond 2022.

This report reviews and appraises the Fiscal Framework Agreement, with a particular focus on this issue of block grant adjustment.

*The work was carried out jointly with authors at the ESRC Centre on Constitutional Change is the hub for research of the UK’s changing constitutional relationships. Its fellows examine how the evolving relationships between governments and parliaments in London, Edinburgh, Cardiff, Belfast and Brussels impact on the polity, economy and society of the UK and its component nations. *

generous federal aid.]]>

Moreover, the data suggest that the wife and the husband retire at the same time for a nonnegligible fraction of couples. Our approach takes as a starting point a stylized economic model that leads to a univariate generalized accelerated failure time model. The covariates of that generalized accelerated failure time model act as utility-flow shifters in the economic model. We introduce simultaneity by allowing the utility flow in retirement to depend on the retirement status of the spouse. The econometric model is then completed by assuming that the observed outcome is the Nash bargaining solution in that simple economic model. The advantage of this approach is that it includes independent realizations from the generalized accelerated failure time model as a special case, and deviations from this special case can be given an economic interpretation. We illustrate the model by studying the joint retirement decisions in married couples using the Health and Retirement Study. We provide a discussion of relevant identifying variation and estimate our model using indirect inference. The main empirical nding is that the simultaneity seems economically important. In our preferred specication the indirect utility associated with being retired increases by approximately 5% when one's spouse retires. The estimated model also predicts that the marginal effect of a change in the husbands' pension plan on wives' retirement dates is about 3.3% of the direct effect on the husbands'.

]]>less entry into technologies regardless of a firm’s size.]]>

The critical level is by construction smaller (in finite sample) than the one used if projecting confience regions designed to cover the entire parameter vector. Hence, our confidence interval is weakly shorter than the projection of established confidence sets (Andrews and Soares, 2010), if one holds the choice of tuning parameters constant. We provide simple conditions under which the comparison is strict. Our inference method controls asymptotic coverage uniformly over a large class of data-generating processes. Our assumptions and those used in the leading alternative approach (a profiling-based method) are not nested. We explain why we employ some restrictions that are not required by other methods and provide examples of models for which our method is uniformly valid but profiling-based methods are not.

]]>satisfy standard norm bounds, and (3) functions with unbounded domains. In all three cases we provide two kinds of results, compact embedding and closedness, which together allow one to show that parameter spaces defined by a ||·||

Using diﬀerential geometry and functional delta methods, we establish that the estimated sorted eﬀects are consistent for the true sorted eﬀects, and derive asymptotic normality and bootstrap approximation results, enabling construction of pointwise conﬁdence bands (point-wise with respect to percentile indices). We also derive functional central limit theorems and bootstrap approximation results, enabling construction of simultaneous conﬁdence bands (simultaneous with respect to percentile indices). The derived statistical results in turn rely on establishing Hadamard diﬀerentiability of the multivariate sorting operator, a result of independent mathematical interest.

]]>

Also available: Executive Summary

]]>Click here to view accompanying sample size calculators for this paper.

]]>Leading important special cases encompassed by the framework we study include: (i) Tests of shape restrictions for infinite dimensional parameters; (ii) Confidence regions for functionals that impose shape restrictions on the underlying parameter; (iii) Inference for functionals in semiparametric and nonparametric models defined by conditional moment (in)equalities; and (iv) Uniform inference in possibly nonlinear and severely ill-posed problems.

]]>Find a Spanish language version of this working paper here.

]]>Supplementary material for this paper is available here.

]]>Supplementary material for this paper is available here.

]]>We also analyze the properties of fixed effects estimators of functions of the data, parameters and individual and time effects including average partial effects. Here, we uncover that the incidental parameter bias is asymptotically of second order, because the rate of the convergence of the fixed effects estimators is slower for average partial effects than for model parameters. The bias corrections are still effective to improve finite-sample properties.

View the supplementary document for this paper here.

]]>]]>

For the case of discretely-valued covariates we present analog estimators and characterize their large sample properties. When the number of time periods (*T*) exceeds the number of random coefficients (*P*), identification is regular, and our estimates are *√N* - consistent. When *T* = *P*, our identification results make special use of the subpopulation of stayers - units whose regressor values change little over time - in a way which builds on the approach of Graham and Powell (2012). In this just-identified case we study asymptotic sequences which allow the frequency of stayers in the population to shrink with the sample size. One purpose of these “discrete bandwidth asymptotics” is to approximate settings where covariates are continuously-valued and, as such, there is only an infinitesimal fraction of exact stayers, while keeping the convenience of an analysis based on discrete covariates. When the mass of stayers shrinks with *N*, identification is irregular and our estimates converge at a slower than *√N* rate, but continue to have limiting normal distributions.

We apply our methods to study the effects of collective bargaining coverage on earnings using the National Longitudinal Survey of Youth 1979 (NLSY79). Consistent with prior work (e.g., Chamberlain, 1982; Vella and Verbeek, 1998), we find that using panel data to control for unobserved worker heterogeneity results in sharply lower estimates of union wage premia. We estimate a median union wage premium of about 9 percent, but with, in a more novel finding, substantial heterogeneity across workers. The 0.1 quantile of union effects is insignificantly different from zero, whereas the 0.9 quantile effect is of over 30 percent. Our empirical analysis further suggests that, on net, unions have an equalizing effect on the distribution of wages.

Supporting material is available in a supplementary appendix here.

]]>The amendments to the initial proposed reforms were made to make the tax change more ‘progressive’. We find that, measured as a proportion of income or expenditure, poorer households did gain most from the amendments, but that the cash-terms gains were much larger for households with high levels of income and expenditure. In other words, the reduction in tax take from the amendments was weakly targeted at poorer households; even simple universal cash transfers would have been much more beneficial to poor households. This shows the distributional case for zero rates of VAT on goods like food is weak – especially given the growing sophistication of cash transfer programmes in particularly middle income countries.

We then examine the efficiency implications of Mexico’s VAT rate structure. We find that deviations from uniformity have a notable effect on spending patterns, but very little effect on aggregate welfare and economic efficiency as estimated by a standard QUAIDS model of consumer demand. We then argue that economic informality may actually provide an efficiency reason for lower rates of tax on goods like food for which informal production and transactions seem to be much more prevalent. This may turn the typical arguments about differential VAT rates on their head. Rather than being justifiable on distributional grounds, but entailing an efficiency cost, the reverse may actually be true.

]]>A version of this paper appeared in Spanish in the December 2014 issue of *Panorama Social*, available here.

Technical supporting material is available in a supplementary appendix here.

]]>

Supplementary material for this paper is available here.

]]>This paper wasl presented at the 'Are you prepared for retirement?' conference.

]]>Supplementary material for this paper is available here.

]]>

These findings will be presented at a briefing on 9 September, alongside several other pieces of work which shed light on how financial preparedness for retirement differs across cohorts and important differences within cohorts.

]]>*This working paper was updated in May 2015.*

In this note, we point to a simple explanation that is fully consistent with rational behaviour on the part of Indian farmers. In computing the return on cows and buffaloes, the authors used data from a single year. Cows are assets whose return varies through time. In drought years, when fodder is scarce and expensive, milk production is lower and profits are low. In non-drought years, when fodder is abundant and cheaper, milk production is higher and profits can be considerably higher. The return on cows and buffaloes, like that of many stocks traded on Wall Street, is positive in some years and negative in others. We report evidence from three years of data on the return on cows and buffaloes in the district of Anantapur and show that in one of the three years returns are very high, while in drought years they are similar to the figures obtained by Anagol, Etang and Karlan (2013).

This paper is also published as part of the NBER working paper series no. 20304

]]>We also analyze the properties of fixed effects estimators of functions of the data, parameters and individual and time effects including average partial effects. Here, we uncover that the incidental parameter bias is asymptotically of second order, because the rate of the convergence of the fixed effects estimators is slower for average partial effects than for model parameters. The bias corrections are still useful to improve finite-sample properties.

]]>

This paper sets out the methodology, assumptions, and modelling specifications used to produce the report The changing face of retirement by Emmerson, Heald and Hood (2014), which aims to shed some light on how the demographic and financial circumstances of this group will change.

This paper is also published as part of the Inter-American Development Bank Working Paper series No. IDB-WP-527.

]]>The commands clrbound, clr2bound, and clr3bound provide bound estimates that can be used directly for estimation or to construct asymptotically valid conﬁdence sets. clrtest performs an intersection bound test of the hypothesis that a collection of lower intersection bounds is no greater than zero. The command clrbound provides bound estimates for one-sided lower or upper intersection bounds on a parameter, while clr2bound and clr3bound provide two-sided bound estimates based on both lower and upper intersection bounds. clr2bound uses Bonferroni’s inequality to construct two-sided bounds that can be used to perform asymptotically valid inference on the identiﬁed set or the parameter of interest, whereas clr3bound provides a generally tighter conﬁdence interval for the parameter by inverting the hypothesis test performed by clrtest. More broadly, inversion of this test can also be used to construct conﬁdence sets based on conditional moment inequalities as described in Chernozhukov et al. (2013). The commands include parametric, series, and local linear estimation procedures, and can be installed from within STATA by typing “ssc install clrbound”.

]]>the fi rst stage and then the preference parameters in the second stage based on Manski (1975, 1985)s maximum score estimator using the choice data and first stage estimates. This setting can be extended to maximum score estimation with nonparametrically generated regressors. The paper establishes consistency and derives rate of convergence of the two-stage maximum score estimator. Moreover, the paper also provides sufficient conditions under which the two-stage estimator is asymptotically equivalent in distribution to the corresponding single-stage estimator that assumes the first stage input is known. The paper also presents some Monte Carlo simulation results for finite-sample behavior of the two-stage estimator.]]>

We propose two new specification tests, denoted Tests RS and RC, that achieve uniform asymptotic size control and dominate Test BP in terms of power in any finite sample and in the asymptotic limit. Test RC is particularly convenient to implement because it requires little additional work beyond the confidence set construction. Test RS requires a separate procedure to compute, but has the best power. The separate procedure is computationally easier than confidence set construction in typical cases.

]]>In the second part of the paper, we present a generalization of the treatment effect framework to a much richer setting, where possibly a continuum of target parameters is of interest and the Lasso-type or post-Lasso type methods are used to estimate a continuum of high-dimensional nuisance functions. This framework encompasses the analysis of local treatment effects as a leading special case and also covers a wide variety of classical and modern moment-condition problems in econometrics. We establish a functional central limit theorem for the continuum of the target parameters, and also show that it holds uniformly in a wide range of data-generating processes *P*, with continua of approximately sparse nuisance functions. We also establish validity of the multiplier bootstrap for resampling the first order approximations to the standardized continuum of the estimators, and also establish uniform validity in *P*. We propose a notion of the functional delta method for finding limit distribution and multiplier bootstrap of the smooth functionals of the target parameters that is valid uniformly in *P*. Finally, we establish rate and consistency results for continua of Lasso or post-Lasso type methods for estimating continua of the (nuisance) regression functions, also providing practical, theoretically justified penalty choices. Each of these results is new and could be of independent interest.

A supplement to this paper can be downloaded here.

]]>We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametetric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Furthermore, our approach is asymptotically honest at a polynomial rate - namely, the error in coverage level converges to zero at a fast, polynomial speed (with respect to the sample size). In sharp contrast, the approach based on extreme value theory is asymptotically honest only at a logarithmic rate - the error converges to zero at a slow, logarithmic speed. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.]]>
http://zippy.ifs.org.uk/publications/7031
*p* could be large in comparison to the sample size *n*, but only *s ≪ n* of them are needed to accurately describe the regression function. Our new methods are based on the instrumental median regression estimator that assembles the optimal estimating equation from the output of the post ℓ1-penalized median regression and post ℓ1-penalized least squares in an auxiliary equation. The estimating equation is immunized against non-regular estimation of nuisance part of the median regression function, in the sense of Neyman. We establish that in a homoscedastic regression model, the instrumental median regression estimator of a single regression coefficient is asymptotically root-n normal uniformly with respect to the underlying sparse model. The resulting conﬁdence regions are valid uniformly with respect to the underlying model. We illustrate the value of uniformity with Monte-Carlo experiments which demonstrate that standard/naive post-selection inference breaks down over large parts of the parameter space, and the proposed method does not. We then generalize our method to the case where *p1 ≫ n* regression coefficients are of interest in a non-smooth Z-estimation framework with approximately sparse nuisance functions, containing median regression with a single target regression coefficient as a very special case. We construct simultaneous conﬁdence bands on all *p*1 coefficients, and establish their uniform validity over the underlying approximately sparse model.

These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken considerably the condition on the number k of approximating functions used in series estimation from the typical k2/n → 0 to k/n → 0, up to log factors, which was available only for spline and local polynomial partition series before. Second, under the same weak conditions we derive L2 rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function. Finally, we derive uniform rates and inference results for linear functionals of interest of the conditional expectation function such as its partial derivative or conditional average partial derivative.

]]>*(A typo on page 27 that erroneously resulted in the OLS estimator instead of the 2SLS estimator was corrected in July 2015).*

Supplementary material for this paper is available here.

]]>Supplementary material relating to this working paper can be viewed here

]]>This paper is forthcoming in the The Journal of Multivariate Analysis

]]>An online appendix to accompany this publication is available here

]]>The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We also present a simple generalisation of our method to a fully heterogeneous model with a binary treatment variable. We illustrate the use of the developed methods with numerical simulations and an application that considers the effect of abortion crime rates.

]]>A supplement to this article, which outlines theoretical properties underpinning the methodology and provides a proof of theorem, can be viewed here

]]>This article is accompanied by a web appendix in which we present omitted discussions, an algorithm to implement the proposed method for the sharp RSS and proofs for the main results.

]]>This research was funded by the Nuffield Foundation

]]>As part of developing the main results, we introduce distribution regression as a comprehensive and flexible tool for modelling and estimating the *entire* conditional distribution. We show that distribution regression encompasses the Cox duration regression and represents a useful alternative to quantile regression. We establish functional central limit theorems and bootstrap validity results for the empirical distribution regression process and various related functionals.

This is a revision of CWP05/12 and CWP09/09

]]>This paper is supplemented by an online appendix which can be viewed **here**

]]>

This working paper is supplemented by an online appendix which can be viewed **here**

2SLS has the advantage of providing an easy to compute point estimator of a slope coefficient which can be interpreted as a local average treatment effect (LATE). However, the 2SLS estimator does not measure the value of other useful treatment effect parameters without invoking untenable restrictions.

The nonparametric instrumental variable (IV) model has the advantage of being weakly restrictive, so more generally applicable, but it usually delivers set identification. Nonetheless it can be used to consistently estimate bounds on many parameters of interest including, for example, average treatment effects. We illustrate using data from Angrist & Evans (1998) and study the effect of family size on female employment.

This October 2015 version corrects an error in the paper, as explained in footnote 1. The original version of the working paper is available here.

]]>The current version of this working paper was published in January 2014 and replaces an earlier version originally published in March 2013.

]]>We propose two hypothesis tests that use the infimum of the sample criterion function over the parameter space as the test statistic together with two different critical values. We obtain two main results. First, we show that the two tests we propose are asymptotically size correct in a uniform sense. Second we show our tests are more powerful than the test that checks whether the confidence set for the parameters of interest is empty or not.

]]>This report provides findings from a series of focus groups investigating how people think about household expenditure and what issues people may have in reporting household expenditure in a social survey context. The information collected in the focus groups will be used as a starting point for designing new questions on household spending for use in future social surveys. Subsequent stages of work will include cognitively testing any new questions produced and consulting a panel of experts over the proposed questions.

This project was funded by the Nuffield Foundation.

]]>This paper is a revised version of CWP13/09.

]]>The main attractive feature of our method is that it allows for imperfect selection of the controls and provides confidence intervals that are valid uniformly across a large class of models. In contrast, standard post-model selection estimators fail to provide uniform inference even in simple cases with a small, fixed number of controls. Thus our method resolves the problem of uniform inference after model selection for a large, interesting class of models. We illustrate the use of the developed methods with numerical simulations and an application to the effect of abortion on crime rates.

This paper is a revision of CWP42/11.

]]>The paper focuses on household ‘tax planning’ in the context of tax reliefs for retirement saving in the United Kingdom. It examines whether take-up of retirement saving instruments increases at the higher rate threshold for income tax, since tax relief is given at the marginal tax rate and should be more attractive to those just above this threshold than to those just below it. It then examines a more complex case where the tax system provides an incentive for pension saving to do be done by one member of a couple. Econometric results are obtained from the Family Resources Survey on these two tests of household responses to complex incentives.

]]>This is a revision of CWP09/09.

]]>Various methods have been used to overcome the point identification problem inherent in the linear age-period-cohort model. This paper presents a set-identification result for the model and then considers the use of the maximum-entropy principle as a vehicle for achieving point identification. We present two substantive applications (US female mortality data and UK female labor force participation) and compare the results from our approach to some of the solutions in the literature.

]]>We show that the model delivers set identification of the latent utility functions and we characterize sharp bounds on those functions. We develop easy-to-compute outer regions which in parametric models require little more calculation than what is involved in a conventional maximum likelihood analysis. The results are illustrated using a model which is essentially the parametric conditional logit model of McFadden (1974) but with potentially endogenous explanatory variables and instrumental variable restrictions. The method employed has wide applicability and for the first time brings instrumental variable methods to bear on structural models in which there are multiple unobservables in a structural equation.

]]>

This paper presents the findings from two experiments designed to test the hypothesis that individuals' notions of distributive justice are associated with their economic status relative to others within their own society. In the experiments, each participant played a specially designed distribution game. This game allowed us to establish whether and to what extent the participants perceived inequalities owing to differences in productivity rather than luck as just and, hence, not in need of redress. A type of participant that distinguished between inequalities owing to productivity and luck, redressing the latter and not or to a lesser extent the former, is said to be subject to an earned endowment effect. Drawing on previous work in both economics and psychology, we hypothesised that the richer members of any society would be more likely to be subject to an earned endowment effect, while the poorer members would be more inclined towards redistribution irrespective of whether the inequality was owing to productivity or luck.

We conducted our first experiment in the UK. We selected unemployed residents of one city to represent low economic status individuals and student and employed residents of the same city to represent relatively high economic status individuals. We found a statistically significant earned endowment effect among the students and employed and no effect among the unemployed. The difference between the unemployed and the others was also statistically significant.

Our second experiment was designed to test the generalizability of the findings from our first. It was conducted in Cape Town, South Africa. Exploiting the fact that Cape Town is home to one of the continent's best universities, we built a participant sample that was highly comparable to the UK sample in many regards. However, the states of employment and unemployment are less distinct in South Africa as compared to the UK and a number of interventions are in place to ensure that the student body of the University of Cape Town includes young people from not only rich and middle income but also poorer households. So, in South Africa we chose to rely on responses to a survey question to distinguish between high and low economic status individuals. The findings from this second experiment also supported the hypothesis; among individuals who classified their households as rich or high or middle income there was a statistically significant earned endowment effect, among individuals who classified their households as poor or low income there was not and the different between the two participant types was significant.

We conclude that individuals' notions of distributive justice are associated with their relative economic status within society and that this is a generalizable result.

]]>The paper studies the partial identifying power of structural single equation threshold crossing models for binary responses when explanatory variables may be endogenous. The paper derives the sharp identified set of threshold functions for the case in which explanatory variables are discrete and provides a constructive proof of sharpness. There is special attention to a widely employed semiparametric shape restriction which requires the threshold crossing function to be a monotone function of a linear index involving the observable explanatory variables. It is shown that the restriction brings great computational benefits, allowing direct calculation of the identified set of index coefficients without calculating the nonparametrically specified threshold function. With the restriction in place the methods of the paper can be applied to produce identified sets in a class of binary response models with mis-measured explanatory variables.

This is a further revised version (Oct 7th 2011) of CWP23/09 "Single equation endogenous binary response models"

]]>mechanism that generates the high degree of wealth inequality in the model is the dynamic of the "wage ladder" resulting from the search process. There is an important asymmetry between the incremental wage increases generated by on-thejob search (climbing the ladder) and the drop in income associated with job loss (falling off the ladder). The behavior of workers in low paying jobs is primarily governed by the expectation of wage growth, while the behavior of workers near the top of the distribution is driven by the possibility of job loss.

]]>Part of the success of China has been to attract the investment of foreign multinationals. This is also true for a number of other Emerging Economies. Europe's largest multinational firms increasingly file patent applications that are based on inventor activities located in emerging economies, often working alongside inventors from the firm's home country.

]]>school system, in particular for individuals originating from homes with low educated fathers. This study estimates the impact of the reform on criminal behavior: both within the generation directly affected by the reform as well as their children. We use census data on all born in Sweden between 1945 and 1955 and all their children merged with individual register data on all convictions between 1981 and 2008. We find a significant inverse effect of the reform on criminal behavior of men and on sons to fathers who went through the new school system.

]]>Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the empirical QR coefficient process, namely we obtain uniform strong approximations to the empirical QR coefficient process by conditionally pivotal and Gaussian processes, as well as by gradient and weighted bootstrap processes.

We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence, large sample distributions, and inference methods based on strong pivotal and Gaussian approximations and on gradient and weighted bootstraps. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and in the covariate value, and covering the pointwise case as a by-product. If the function of interest is monotone, we show how to use monotonization procedures to improve estimation and inference. We demonstrate the practical utility of these results with an empirical example, where we estimate the price elasticity function of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.

]]>We examine the "home bias" of knowledge spillovers (the idea that knowledge spreads more slowly over international boundaries than within them) as measured by the speed of patent citations. We present econometric evidence that the geographical localization of knowledge spillovers has fallen over time, as we would expect from the dramatic fall in communication and travel costs. Our proposed estimator controls for correlated fixed effects and censoring in duration models and we apply it to data on over two million patent citations between 1975 and 1999. Home bias is exaggerated in models that do not control for fixed effects. The fall in home bias over time is weaker for the pharmaceuticals and information/communication technology sectors where agglomeration externalities may remain strong.

]]>When consumption goods are indivisible, individuals have to hold enough resources to cross a purchasing threshold. If individuals are liquidity constrained, they are unable to borrow to cross that threshold. Instead, we show that such individuals, even if risk averse, may choose to play gamble through playing lotteries to have a chance of crossing the threshold. One implication of this model is that income effects for individuals who choose to play lotteries are likely to be larger than for the general population. This in turn implies that estimating income effects through the random allocation of lottery winnings is likely to be a biased estimate of income effects of the broader population who chose not to gamble. Using UK data on lottery wins, other windfalls and durable good purchases, we show that lottery players display higher income effects than non-players but only amongst those likely to be credit constrained. This is consistent with credit constrained, risk-averse agents gambling in order to cross a purchase threshold and to convexify their budget set.

]]>

We show that the model delivers set, not point, identification of the latent utility functions and we characterize sharp bounds on those functions. We develop easy-to-compute outer regions which in parametric models require little more calculation than what is involved in a conventional maximum likelihood analysis. The results are illustrated using a model which is essentially the parametric conditional logit model of McFadden (1974) but with potentially endogenous explanatory variables and instrumental variable restrictions.

The method employed has wide applicability and for the first time brings instrumental variable methods to bear on structural models in which there are multiple unobservables in a structural equation.

This paper has now been revised and the new version is available as CWP39/11.

]]>

**Figure: Distributional impact of tax and benefit reforms from 1978 to 2009**

Notes: Households divided into ten equally sized groups based on their disposable income, adjusted for family size. Assumes full take-up of means-tested benefits. Excludes most ‘business taxes’ (notably corporation tax and business rates, though not employer National Insurance contributions) and capital taxes (notably inheritance tax, stamp duties and capital gains tax). Source: Authors’ calculations using TAXBEN run on uprated data from the 2005–06 EFS.

]]>This paper is a revised version of cemmap working paper CWP33/07.

]]>I present an application to the study of segregation in school friendship networks, using data from Add Health containing the actual social networks of students in a representative sample of US schools. My results suggest that for white students, the value of a same-race friend decreases with the fraction of whites in the school. The opposite is true for African American students.

The model is used to study how different desegregation policies may affect the structure of the network in equilibrium. I find an inverted u-shaped relationship between the fraction of students belonging to a racial group and the expected equilibrium segregation levels. These results suggest that desegregation programs may decrease the degree of interracial interaction within schools.

]]>Optimal instruments are conditional expectations; and in developing the IV results, we also establish a series of new results for LASSO and Post-LASSO estimators of non-parametric conditional expectation functions which are of independent theoretical and practical interest. Specifically, we develop the asymptotic theory for these estimators that allows for non-Gaussian, heteroscedastic disturbances, which is important for econometric applications. By innovatively using moderate deviation theory for self-normalized sums, we provide convergence rates for these estimators that are as sharp as in the homoscedastic Gaussian case under the weak condition that log p = o(n ^{1/3}). Moreover, as a practical innovation, we provide a fully data-driven method for choosing the user-specified penalty that must be provided in obtaining LASSO and Post-LASSO estimates and establish its asymptotic validity under non-Gaussian, heteroscedastic disturbances.

In this paper we use English school level data from 1993 to 2008 aggregated up to small neighbourhood areas to look at the determinants of the demand for private education in England from the ages of 7 until 15 (the last year of compulsory schooling). We focus on the relative importance of price and quality of schooling. However, there are likely to be unobservable factors that are correlated with private school prices and/or the quality of state schools that also impact on the demand for private schooling which could bias our estimates. Our long regional and local authority panel data allows us to employ a number of strategies to deal with this potential endogeneity. Because of the likely presence of incidental trends in our unobservables, we employ a double difference system GMM approach to remove both fixed effects and incidental trends. We find that the demand for private schooling is inversely related to private school fees as well as the quality of state schooling in the local area at the time families were making key schooling choice decisions at the ages of 7, 11 and 13. We estimate that a one standard deviation increase in the private school day fee when parents/students are making these key decisions reduces the proportion attending private schools by around 0.33 percentage points which equates to an elasticity of around -0.26. This estimate is only significant for choices at age 7 (but the point estimates are very similar at the ages of 11 and 13). At age 11 and age 13, an increase in the quality of local state secondary reduces the probability of attending private schools. At age 11, a one standard deviation increase in state school quality reduces participation in private schools by 0.31 percentage points which equates to an elasticity of -0.21. The effect at age 13 is slightly smaller, but still significant. Demand for private schooling at the ages of 8, 9, 10 and 12, 14 and 15 are almost entirely determined by private school demand in the previous year for the same cohort, and price and quality do not impact significantly on this decision other than through their initial influence on the key participation decisions at the ages of 7, 11 and 13.

]]>Childcare costs are often viewed as one of the biggest barriers to work, particularly among lone parents on low incomes. Children in England are eligible to attend free part-time nursery classes (equivalent to pre-kindergarten) from the academic term after they turn 3, and are typically eligible to start free fulltime public education on 1 September after they turn four. These rules mean that children born one day apart may start nursery classes up to four months apart, and may start school up to one year apart. We exploit these discontinuities to investigate the impact of a youngest child being eligible for part-time nursery education and full-time primary education on welfare receipt and employment patterns amongst lone parents receiving welfare. In contrast to previous studies, we are able to estimate the precise timing (relative to the date on which part-time or full-time education begins) of any impact on labour supply, by using rich administrative data. Amongst those receiving welfare when their youngest child is aged approximately three and a half, we find a small but significant effect of free full-time public education on both employment and welfare receipt (of around 2 percentage points, or 10-15 per cent), which peaks eight to nine months after the child becomes eligible (aged approximately 4 years and 9 months). We find weaker evidence of an even smaller effect of eligibility for part-time nursery education. This suggests that the expansion of public education programmes to younger disadvantaged children may only encourage a small number of low income lone parents to return to work (although, of course, this is not the primary aim of such programmes).

]]>We examine the effect of large cash transfers on the consumption of food by poor households in rural Mexico. The transfers represent 20% of household income on average, and yet, the budget share of food is unchanged following receipt of this money. This is an important puzzle to solve, particularly so in the context of a social welfare programme designed in part to improve nutrition of individuals in the poorest households. We estimate an Engel curve for food. We rule out price increases, changes in the quality of food consumed and homotheticity of preferences as explanations for this puzzle. We also show that food is a necessity, with a strong negative effect of income on the food budget share. The decrease in food budget share caused by the large increase in income is cancelled by some other relevant aspect of the programme so that the net effect is nil. We argue that the program has not changed preferences and that there is no labelling of money. We propose that the key to the puzzle resides in the fact that the transfer is put in the hands of women and that the change in control over household resources is what leads to the observed changes in behaviour.

]]>We provide a tractable characterization of the sharp identification region of the parameters θ in a broad class of incomplete econometric models. Models in this class have set valued predictions that yield a convex set of conditional or unconditional moments for the observable model variables. In short, we call these models with convex moment predictions. Examples include static, simultaneous move finite games of complete and incomplete information in the presence of multiple equilibria; best linear predictors with interval outcome and covariate data; and random utility models of multinomial choice in the presence of interval regressors data. Given a candidate value for θ, we establish that the convex set of moments yielded by the model predictions can be represented as the Aumann expectation of a properly defined random set. The sharp identification region of θ, denoted Θ_{1}, can then be obtained as the set of minimizers of the distance from a properly specified vector of moments of random variables to this Aumann expectation. Algorithms in convex programming can be exploited to efficiently verify whether a candidate θ is in Θ_{1}. We use examples analyzed in the literature to illustrate the gains in identification and computational tractability afforded by our method.

This paper is a revised version of CWP27/09.

]]>We evaluate the German apprenticeship system, which combines on-the-job training with classroom teaching, by modelling individual careers from the choice to join such a scheme and followed by their employment, job to job transitions and wages over the lifecycle. Our data is drawn from administrative records that report accurately job transitions and pay. We find that apprenticeships increase wages, and change wage profiles with more growth upfront, while wages in the non-apprenticeship sector grow at a lower rate but for longer. Non-apprentices face a much higher variance to the shocks of their match specific effects and a substantially larger variance in initial level of the offered wages. We find no evidence that qualified apprentices are harder to reallocate following job loss. The average life-cycle return to an apprenticeship career is about 14% and the return is mainly driven by the differences in the wage profile.

]]>We illustrate the approach using scanner data on food purchases to estimate bounds on willingness to pay for the organic characteristic. We combine these estimates with information on households' stated preferences and beliefs to show that on average quality is the most important factor affecting bounds on household willingness to pay for organic, with health concerns coming second, and environmental concerns lagging far behind.

]]>Social experiments are powerful sources of information about the effectiveness of interventions. In practice, initial randomization plans are almost always compromised. Multiple hypotheses are frequently tested. "Significant" effects are often reported with p-values that do not account for preliminary screening from a large candidate pool of possible effects. This paper develops tools for analyzing data from experiments as they are actually implemented.

We apply these tools to analyze the influential HighScope Perry Preschool Program. The Perry program was a social experiment that provided preschool education and home visits to disadvantaged children during their preschool years. It was evaluated by the method of random assignment. Both treatments and controls have been followed from age 3 through age 40.

Previous analyses of the Perry data assume that the planned randomization protocol was implemented. In fact, as in many social experiments, the intended randomization protocol was compromised. Accounting for compromised randomization, multiple-hypothesis testing, and small sample sizes, we find statistically significant and economically important program effects for both males and females. We also examine the representativeness of the Perry study.

]]>This paper is a revised version of CWP18/09.

]]>In this paper we study post-penalized estimators which apply ordinary, unpenalized linear regression to the model selected by first-step penalized estimators, typically LASSO. It is well known that LASSO can estimate the regression function at nearly the oracle rate, and is thus hard to improve upon. We show that post-LASSO performs at least as well as LASSO in terms of the rate of convergence, and has the advantage of a smaller bias. Remarkably, this performance occurs even if the LASSO-based model selection 'fails' in the sense of missing some components of the 'true' regression model. By the 'true' model we mean here the best s-dimensional approximation to the regression function chosen by the oracle. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the 'true' model as a subset and also achieves a sufficient sparsity. In the extreme case, when LASSO perfectly selects the 'true' model, the post-LASSO estimator becomes the oracle estimator. An important ingredient in our analysis is a new sparsity bound on the dimension of the model selected by LASSO which guarantees that this dimension is at most of the same order as the dimension of the 'true' model. Our rate results are non-asymptotic and hold in both parametric and nonparametric models. Moreover, our analysis is not limited to the LASSO estimator in the first step, but also applies to other estimators, for example, the trimmed LASSO, Dantzig selector, or any other estimator with good rates and good sparsity. Our analysis covers both traditional trimming and a new practical, completely data-driven trimming scheme that induces maximal sparsity subject to maintaining a certain goodness-of-fit. The latter scheme has theoretical guarantees similar to those of LASSO or post-LASSO, but it dominates these procedures as well as traditional trimming in a wide variety of experiments.

]]>]]>

This paper has three aims:

- We provide a framework for weighing up the insurance value of disability benefi…ts against the incentive cost of inducing healthy individuals to stop work at different points of their life-cycle.
- We estimate the risks to health that may lead to work-limiting disabilities and the risk to wages that may lead to individuals choosing not to work. We also estimate the extent of false awards made through the DI program alongside the proportion of awards to those in genuine need.
- We use our model and estimates to characterize the economic effects of the disability insurance and to consider how policy reforms would affect behaviour and standard measures of household welfare.

We differentiate disability status by its severity, and show that a severe disability shock leads to a decline in wages of 40%, as well as a substantial rise in the fixed cost of going to work. In terms of the effectiveness of the DI program, we estimate high levels of rejections of genuine applicants. In our counterfactual simulations, this means that household welfare increases as the program becomes less strict, despite the worsening incentives for false applications that this implies. On the other hand, incentives for false applications are reduced by reducing generosity and increasing reassessments, and these policies increase household welfare, despite the worse insurance implied.

]]>This paper develops a formal language for study of treatment response with social interactions, and uses it to obtain new findings on identification of potential outcome distributions. Defining a person's treatment response to be a function of the entire vector of treatments received by the population, I study identification when shape restrictions and distributional assumptions are placed on response functions. An early key result is that the traditional assumption of individualistic treatment response (ITR) is a polar case within the broad class of constant treatment response (CTR) assumptions, the other pole being unrestricted interactions. Important non-polar cases are interactions within reference groups and distributional interactions. I show that established findings on identification under assumption ITR extend to assumption CTR. These include identification with assumption CTR alone and when this shape restriction is strengthened to semi-monotone response. I next study distributional assumptions using instrumental variables. Findings obtained previously under assumption ITR extend when assumptions of statistical independence (SI) are posed in settings with social interactions. However, I find that random assignment of realized treatments generically has no identifying power when some persons are leaders who may affect outcomes throughout the population. Finally, I consider use of models of endogenous social interactions to derive restrictions on response functions. I emphasize that identification of potential outcome distributions differs from the longstanding econometric concern with identification of structural functions.

This paper is a revised version of CWP01/10

]]>We develop a general class of nonparametric tests for treatment effects conditional on covariates. We consider a wide spectrum of null and alternative hypotheses regarding conditional treatment effects, including (i) the null hypothesis of the conditional stochastic dominance between treatment and control groups; ii) the null hypothesis that the conditional average treatment effect is positive for each value of covariates; and (iii) the null hypothesis of no distributional (or average) treatment effect conditional on covariates against a one-sided (or two-sided) alternative hypothesis. The test statistics are based on L1-type functionals of uniformly consistent nonparametric kernel estimators of conditional expectations that characterize the null hypotheses. Using the Poissionization technique of Giné et al. (2003), we show that suitably studentized versions of our test statistics are asymptotically standard normal under the null hypotheses and also show that the proposed nonparametric tests are consistent against general fixed alternatives. Furthermore, it turns out that our tests have non-negligible powers against some local alternatives that are n−½ different from the null hypotheses, where n is the sample size. We provide a more powerful test for the case when the null hypothesis may be binding only on a strict subset of the support and also consider an extension to testing for quantile treatment effects. We illustrate the usefulness of our tests by applying them to data from a randomized, job training program (LaLonde, 1986) and by carrying out Monte Carlo experiments based on this dataset.

]]>In this paper we consider endogenous regressors in the binary choice model under a weak median exclusion restriction, but without further specification of the distribution of the unobserved random components. Our reduced form specification with heteroscedastic residuals covers various heterogeneous structural binary choice models. As a particularly relevant example of a structural model where no semiparametric estimator has of yet been analyzed, we consider the binary random utility model with endogenous regressors and heterogeneous parameters. We employ a control function IV assumption to establish identification of a slope parameter 'â' by the mean ratio of derivatives of two functions of the instruments. We propose an estimator based on direct sample counterparts, and discuss the large sample behavior of this estimator. In particular, we show '√'n consistency and derive the asymptotic distribution. In the same framework, we propose tests for heteroscedasticity, overidentification and endogeneity. We analyze the small sample performance through a simulation study. An application of the model to discrete choice demand data concludes this paper.

]]>This paper gives identification and estimation results for quantile and average effects in nonseparable panel models, when the distribution of period specific disturbances does not vary over time. Bounds are given for interesting effects with discrete regressors that are strictly exogenous or predetermined. We allow for location and scale time effects and show how monotonicity can be used to shrink the bounds. We derive rates at which the bounds tighten as the number T of time series observations grows and give an empirical illustration.

]]>This paper is a revised version of cemmap working paper CWP15/08

]]>

In this paper we analyse the findings from a series of 'public good' games that were conducted between November 2005 and February 2007 in 104 municipalities in rural and urban Colombia with mainly poor participants. The data covers municipalities both with ('treatment') and without ('control') a PRDP in place, and within the 'treatment' municipalities, both beneficiaries and non beneficiaries of the PRDP initiative. The data for 'control' municipalities was collected as part of the evaluation of Familias en Accion (FeA), Colombia's conditional cash transfer programme.

The game is structured as a typical free-rider problem with the act of contributing to the 'public good' (a collective money pot) being always dominated by non-contribution. We interpret contribution as an act consistent with a high degree of social capital.

Potentially endogenous selection into the programme makes identifying programme effects difficult but we find strong and suggestive evidence that exposure to PRDPs improve social capital and that this extends beyond direct beneficiaries of the programme. In particular, the duration of programme operation and the proportion of programme beneficiaries in a game session increase contribution to the public good, suggesting that in order to have a major impact the programme must be sufficiently 'intensive'.

]]>