Statistical Models: Theory and Practice by Freedman - A Comprehensive and Critical Guide (PDF)
Statistical Models: Theory and Practice Freedman Pdf
If you are looking for a comprehensive and accessible introduction to statistical models and their applications, you might want to check out Statistical Models: Theory and Practice by David A. Freedman. This book explains the basic ideas of association and regression, and takes you through the current models that link these ideas to causality. The focus is on applications of linear models, including generalized least squares and two-stage least squares, with probits and logits for binary variables. The book also covers the bootstrap as a technique for estimating bias and computing standard errors. Careful attention is paid to the principles of statistical inference. The book is rich in exercises, most with answers, and includes relevant journal articles at the back.
Statistical Models Theory And Practice Freedman Pdf
But who is David A. Freedman? And why should you read his book? In this article, we will answer these questions and more. We will give you an overview of the main concepts and techniques covered in the book, as well as some examples of how they are applied in the social and health sciences. We will also discuss some of the modeling pitfalls and challenges that Freedman highlights in his book, and how to avoid them. Finally, we will conclude with a summary of the main points, some recommendations for readers, and some limitations and future directions.
What is the book about?
Statistical Models: Theory and Practice is a lively and engaging textbook that explains the things you have to know in order to read empirical papers in the social and health sciences, as well as the techniques you need to build statistical models of your own. The book is organized around published studies, as are many of the exercises. Freedman makes a thorough appraisal of the statistical methods in these papers and in a variety of other examples. He illustrates the principles of modeling, and the pitfalls. The discussion shows you how to think about the critical issuesincluding the connection (or lack of it) between the statistical models and the real phenomena.
The book has four parts: Part I introduces association and regression; Part II discusses causality and models; Part III covers linear models and extensions; Part IV deals with the bootstrap and inference. The book also has three appendices: Appendix A reviews study design; Appendix B reviews bivariate regression; Appendix C reviews matrix algebra. The book assumes some familiarity with basic probability and statistics, but provides enough background material to help readers refresh their knowledge.
Who is the author?
David A. Freedman (19382008) was Professor of Statistics at the University of California, Berkeley. He was a distinguished mathematical statistician whose theoretical research ranged from the analysis of martingale inequalities, Markov processes, de Finettis theorem, consistency of Bayes estimators, sampling, the bootstrap, and procedures for testing and evaluating models to methods for causal inference. Freedman published widely on the applicationand misapplicationof statistics in the social sciences, including epidemiology, demography, public policy, and law. He emphasized exposing and checking the assumptions that underlie standard methods, as well as understanding how those methods behave when the assumptions are falsefor example, how regression models behave when fitted to data from randomized experiments.
Freedman was also an excellent teacher and communicator, who received several awards for his teaching and writing. He was known for his no-nonsense, direct style, and his ability to explain complex statistical ideas in a clear and vivid way. He wrote several books, including Statistics (with Robert Pisani and Roger Purves), A Primer on Regression Artifacts, and Data Analysis Using Regression and Multilevel/Hierarchical Models (with Andrew Gelman). He also served as an expert witness in many legal cases involving statistical issues, such as the 1990 census adjustment, the death penalty, and discrimination.
Why is it important?
Statistical Models: Theory and Practice is an important book for anyone who deals with applied statistics, especially in the social and health sciences. The book provides a solid foundation for understanding and using statistical models, as well as a critical perspective on their limitations and pitfalls. The book helps readers to develop the skills and intuition needed to read, interpret, and evaluate empirical papers, as well as to build their own models. The book also exposes readers to a variety of real-world examples and applications, showing how statistics can be used to answer important questions and inform decisions.
The book is also important because it reflects the author's deep knowledge and experience in both theory and practice of statistics. Freedman was not only a brilliant mathematician and statistician, but also a keen observer and analyst of social phenomena. He had a rare ability to bridge the gap between abstract models and concrete reality, and to communicate his insights in a clear and compelling way. He was not afraid to challenge conventional wisdom or to point out the flaws and fallacies in popular methods. He was also generous in sharing his ideas and data with other researchers and students. By reading his book, you can learn from one of the best statisticians of our time.
Main concepts and techniques
In this section, we will briefly summarize some of the main concepts and techniques covered in the book. We will not go into too much detail or technicality, but rather give you a general idea of what they are and how they are used. For more details and examples, we recommend reading the book itself.
Association and regression
Association is a measure of how two variables are related to each other. For example, height and weight are positively associated: taller people tend to weigh more than shorter people. Association can be measured by correlation coefficients, such as Pearson's r or Spearman's rho. Correlation coefficients range from -1 to 1, where -1 means perfect negative association, 0 means no association, and 1 means perfect positive association.
Regression is a technique for modeling the relationship between one variable (called the dependent variable or the response) and one or more variables (called the independent variables or the predictors). For example, we can use regression to model how weight depends on height, age, gender, and other factors. Regression can take different forms depending on the type of variables involved. For example, linear regression assumes that the relationship is linear (i.e., a straight line), while logistic regression assumes that the response is binary (i.e., yes or no).
Association and regression are closely related concepts. Regression can be used to estimate the strength and direction of association between variables, as well as to test hypotheses about the association. Association can be used to assess how well a regression model fits the data, as well as to identify potential confounders or moderators of the relationship.
Causality and models
Causality is a concept that goes beyond association. Causality implies that there is a causal mechanism or process that links one variable (called the cause) to another variable (called the effect). For example, smoking causes lung cancer: there is a biological mechanism that explains how smoking damages the cells in the lungs and leads to cancer. Causality also implies that there is a counterfactual or hypothetical scenario that shows what would have happened if the cause had not occurred. For example, if a person had not smoked, he or she would not have developed lung cancer.
Linear models and extensions
Linear models are a class of models that assume that the response variable is a linear function of the predictor variables, plus some random error. For example, a simple linear model for weight and height is: $$ \textweight = \beta_0 + \beta_1 \times \textheight + \epsilon $$ where $\beta_0$ and $\beta_1$ are parameters that represent the intercept and the slope of the line, and $\epsilon$ is the error term that captures the variability in weight that is not explained by height. Linear models can be estimated by various methods, such as ordinary least squares (OLS), which minimizes the sum of squared errors.
Linear models can be extended to handle more complex situations, such as multiple predictors, nonlinear relationships, interactions, categorical variables, heteroskedasticity, endogeneity, and measurement error. Some of these extensions include generalized least squares (GLS), which allows for different error variances across observations; two-stage least squares (2SLS), which uses instrumental variables to deal with endogenous predictors; probit and logit models, which model binary responses using a latent variable approach; and polynomial and spline models, which model nonlinear relationships using higher-order terms or piecewise functions.
The bootstrap and inference
The bootstrap is a technique for estimating the sampling distribution of a statistic by resampling from the original data. For example, if we want to estimate the standard error of the OLS slope coefficient $\beta_1$, we can use the bootstrap as follows: - Draw a random sample of size $n$ from the original data with replacement. This is called a bootstrap sample. - Estimate $\beta_1$ using OLS on the bootstrap sample. This is called a bootstrap estimate. - Repeat steps 1 and 2 many times (e.g., 1000 times) to obtain many bootstrap estimates. - Calculate the standard deviation of the bootstrap estimates. This is called the bootstrap standard error. The bootstrap standard error can be used to construct confidence intervals or perform hypothesis tests for $\beta_1$. The bootstrap can also be used to estimate bias or other properties of estimators.
The bootstrap is a powerful and versatile tool for statistical inference, especially when the sampling distribution of a statistic is unknown or difficult to derive analytically. However, the bootstrap also has some limitations and assumptions that need to be checked. For example, the bootstrap assumes that the original data are representative of the population of interest, and that the resampling scheme matches the data-generating process. The bootstrap also depends on the choice of statistic, sample size, number of repetitions, and method of calculation.
Applications and examples
In this section, we will briefly describe some of the applications and examples that Freedman uses in his book to illustrate the concepts and techniques discussed above. We will not go into too much detail or technicality, but rather give you a general idea of what they are and how they are analyzed. For more details and examples, we recommend reading the book itself.
Social and health sciences
The book covers many applications and examples from various fields in the social and health sciences, such as sociology, economics, political science, psychology, epidemiology, medicine, and public health. Some of these examples include: - The effect of class size on student achievement - The effect of cigarette taxes on smoking behavior - The effect of campaign spending on election outcomes - The effect of birth weight on infant mortality - The effect of hormone replacement therapy on breast cancer risk - The effect of aspirin on heart attack prevention These examples illustrate how statistical models can be used to address important questions and inform decisions in these fields. They also illustrate how different types of data (e.g., observational vs experimental, cross-sectional vs longitudinal) require different types of models and methods. They also illustrate how different types of assumptions (e.g., linearity vs nonlinearity, homoskedasticity vs heteroskedasticity, exogeneity vs endogeneity) affect the validity and interpretation of the results.
Study design and data analysis
The book also covers some aspects of study design and data analysis that are essential for conducting and evaluating empirical research. Some of these aspects include: - Randomization and control groups - Matching and stratification - Instrumental variables and natural experiments - Regression discontinuity and difference-in-differences - Sensitivity analysis and robustness checks - Model selection and specification tests These aspects illustrate how study design and data analysis can affect the quality and credibility of the evidence. They also illustrate how different types of designs and methods can address different types of threats to validity, such as confounding, selection bias, reverse causation, omitted variables, measurement error, and misspecification. They also illustrate how different types of checks and tests can assess the reliability and sensitivity of the results.
Modeling pitfalls and challenges
The book also covers some of the pitfalls and challenges that arise when using statistical models in practice. Some of these pitfalls and challenges include: - Extrapolation and interpolation - Collinearity and multicollinearity - Nonlinearity and interaction - Heterogeneity and aggregation - Causality and identification - Interpretation and communication These pitfalls and challenges illustrate how modeling can go wrong or be misleading if not done carefully or appropriately. They also illustrate how modeling requires a lot of judgment and expertise, as well as a lot of communication and collaboration. They also illustrate how modeling is not a mechanical or objective process, but rather a creative and subjective one.
Conclusion
In this article, we have given you an overview of Statistical Models: Theory and Practice by David A. Freedman. We have summarized some of the main concepts and techniques covered in the book, as well as some of the applications and examples that illustrate them. We have also discussed some of the modeling pitfalls and challenges that Freedman highlights in his book, and how to avoid them.
We hope that this article has sparked your interest in reading the book, or at least in learning more about statistical models and their applications. The book is a valuable resource for anyone who deals with applied statistics, especially in the social and health sciences. The book is also a testament to the author's wisdom and experience in both theory and practice of statistics.
Here are some recommendations for readers who want to get the most out of the book: - Read the book carefully and critically. Don't just skim through the text or skip the exercises. Try to understand the logic and intuition behind each concept and technique, as well as the assumptions and limitations that come with them. - Compare and contrast different models and methods. Don't just accept or reject a model or method based on its popularity or convenience. Try to evaluate its strengths and weaknesses, as well as its suitability for the problem at hand. - Apply what you learn to your own data and questions. Don't just rely on the examples or applications provided in the book. Try to find your own data sources and questions that interest you, and use the models and methods you learn to analyze them. - Discuss what you learn with others. Don't just keep your thoughts or questions to yourself. Try to share them with other readers or learners, either online or offline. Try to explain what you learn to others, or ask for feedback or help from others. Here are some limitations and future directions for readers who want to go beyond the book: - The book is not comprehensive or exhaustive. It does not cover all possible models or methods, nor all possible applications or examples. It focuses mainly on linear models and their extensions, with some attention to causality and inference. It does not cover other types of models or methods, such as nonlinear models, multilevel models, survival analysis, time series analysis, machine learning, etc. - The book is not up-to-date or cutting-edge. It was published in 2009, which means that it does not reflect some of the latest developments or trends in statistics or its applications. It does not cover some of the newer models or methods, such as causal inference with graphical models, synthetic control methods, regression kink design, etc. - The book is not interactive or dynamic. It is a static text that does not allow for much interaction or feedback from the reader. It does not provide any online resources or tools that can enhance the learning experience, such as data sets, code, quizzes, videos, etc. Therefore, readers who want to learn more about statistical models and their applications may want to consult other sources or materials that can complement or supplement the book. For example, they may want to read other books by Freedman or other authors on related topics; they may want to take online courses or watch online lectures on statistics; they may want to join online communities or forums where they can discuss statistics with other learners; they may want to use online platforms or software that can help them perform statistical analysis; etc.
FAQs
Here are some frequently asked questions (FAQs) about Statistical Models: Theory and Practice by David A. Freedman:
Where can I find the book online?
What are the prerequisites for reading the book?
The book assumes some familiarity with basic probability and statistics, such as mean, variance, standard deviation, normal distribution, hypothesis testing, etc. The book also assumes some familiarity with basic calculus and linear algebra, such as derivatives, integrals, matrices, vectors, etc. The book provides some background material on these topics in the appendices, but they are not meant to be comprehensive or rigorous. Therefore, readers who do not have a solid background in these topics may want to review them before reading the book.
How can I use the book for teaching or learning?
The book can be used as a textbook for a course on statistical models and their applications, or as a reference book for self-study or review. The book is suitable for advanced undergraduate or beginning graduate students in statistics, as well as students and professionals in the social and health sciences who want to learn more about applied statistics. The book has four parts that can be covered in one or two semesters, depending on the level and pace of the course. The book also has plenty of exercises, most with answers, that can be used for homework or practice. The book also includes relevant journal articles at the back that can be used for further reading or discussion.
What are some other books by the same author?
David A. Freedman wrote several other books on statistics and its applications, such as: - Statistics (with Robert Pisani and Roger Purves), a popular introductory textbook on statistics that covers descriptive statistics, probability, inference, regression, and analysis of variance. - A Primer on Regression Artifacts, a concise and accessible book that explains how regression models can produce misleading results when the assumptions are violated or when the data are poorly measured or analyzed. - Data Analysis Using Regression and Multilevel/Hierarchical Models (with Andrew Gelman), a comprehensive and advanced textbook on regression and multilevel models that covers linear models, generalized linear models, multilevel models, Bayesian