![]() |
P.Mean >> Category >> Unusual data (created 2007-06-20). |
These pages describe data analysis that does not fit easily into the more traditional categories of data analysis. If I get a sufficient number of pages on the same general topic, I will create a new category. Also see Category: Modeling issues, Category: Statistical theory. Other entries about unusual data can be found in the unusual data page at the StATS website.
2011
Kanji GK. One hundred statistical tests. 3rd ed. Thousand Oaks, Calif: Sage Publications; 2006. Description: Gopal Kanji lists specific details of many statistical tests, some quite obscure. This book is for students who want more mathematical details.
Journal article: Jason S Haukoos, Roger J Lewis. Advanced statistics: bootstrapping confidence intervals for statistics with "difficult" distributions Acad Emerg Med. 2005;12(4):360-365. Abstract: "The use of confidence intervals in reporting results of research has increased dramatically and is now required or highly recommended by editors of many scientific journals. Many resources describe methods for computing confidence intervals for statistics with mathematically simple distributions. Computing confidence intervals for descriptive statistics with distributions that are difficult to represent mathematically is more challenging. The bootstrap is a computationally intensive statistical technique that allows the researcher to make inferences from data without making strong distributional assumptions about the data or the statistic being calculated. This allows the researcher to estimate confidence intervals for statistics that do not have simple sampling distributions (e.g., the median). The purposes of this article are to describe the concept of bootstrapping, to demonstrate how to estimate confidence intervals for the median and the Spearman rank correlation coefficient for non-normally-distributed data from a recent clinical study using two commonly used statistical software packages (SAS and Stata), and to discuss specific limitations of the bootstrap." [Accessed on September 21, 2011]. http://gcrc.labiomed.org/Biostat/Education/Case%20studies%202005/session2/Haukoos%20and%20Lewis%20Bootstrapping.pdf.
Gunnes N, Seierstad T, Aamdal S, et al. Assessing quality of life in a randomized clinical trial: Correcting for missing data. BMC Medical Research Methodology. 2009;9(1):28. Available at: http://www.biomedcentral.com/1471-2288/9/28 [Accessed May 20, 2009]. Excerpt: Use of proper methodology developed for analysing data subject to missingness is necessary to reduce potential estimation bias. The quality of life of patients receiving radiation therapy with concurrent chemotherapy (docetaxel) appears somewhat worse than that of patients receiving radiation therapy alone in the period during which treatment is given. The conclusions are robust for the choice of statistical methods.
Westaby S, Archer N, Manning N, et al. Comparison of hospital episode statistics and central cardiac audit database in public reporting of congenital heart surgery mortality. BMJ. 2007;335(7623):759. Available at: http://www.bmj.com/cgi/content/abstract/335/7623/759 [Accessed March 4, 2009]. Description: One of the more lively debates in medicine today is the use of report cards to summarize performance of hospitals and/or individual physicians. This paper takes individual statistics compiled by hospitals (hospital episode statistics) and compares them to a centralized database. There are large discrepancies between the two, and the authors suggest that individual hospitals should spend the effort to more rigorously collect and validate their data.
Micheloud F. Jean Paul Benzécri's Correspondence Analysis. Available at: http://www.micheloud.com/FXM/COR/E/index.htm [Accessed March 4, 2009]. Excerpt: This paper is an introduction to correspondence analysis, a statistical method allowing to analyze and describe graphically and synthetically big contingency tables, that is tables in which you find at the intersection of a row and a column the number of individuals who share the characteristic of the row and that of the column.
Molly Kelton, Cynthia LeardMann, Besa Smith, et al. Exploratory factor analysis of self-reported symptoms in a large, population-based military cohort. BMC Medical Research Methodology. 2010;10(1):94. Abstract: "BACKGROUND: US military engagements have consistently raised concern over the array of health outcomes experienced by service members postdeployment. Exploratory factor analysis has been used in studies of 1991 Gulf War-related illnesses, and may increase understanding of symptoms and health outcomes associated with current military conflicts in Iraq and Afghanistan. The objective of this study was to use exploratory factor analysis to describe the correlations among numerous physical and psychological symptoms in terms of a smaller number of unobserved variables or factors. METHODS: The Millennium Cohort Study collects extensive self-reported health data from a large, population-based military cohort, providing a unique opportunity to investigate the interrelationships of numerous physical and psychological symptoms among US military personnel. This study used data from the Millennium Cohort Study, a large, population-based military cohort. Exploratory factor analysis was used to examine the covariance structure of symptoms reported by approximately 50,000 cohort members during 2004-2006. Analyses incorporated 89 symptoms, including responses to several validated instruments embedded in the questionnaire. Techniques accommodated the categorical and sometimes incomplete nature of the survey data. RESULTS: A 14-factor model accounted for 60 percent of the total variance in symptoms data and included factors related to several physical, psychological, and behavioral constructs. A notable finding was that many factors appeared to load in accordance with symptom co-location within the survey instrument, highlighting the difficulty in disassociating the effects of question content, location, and response format on factor structure. CONCLUSIONS: This study demonstrates the potential strengths and weaknesses of exploratory factor analysis to heighten understanding of the complex associations among symptoms. Further research is needed to investigate the relationship between factor analytic results and survey structure, as well as to assess the relationship between factor scores and key exposure variables." [Accessed October 25, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/94.
Rigdon E. Frequently Asked Questions about SEM. Available at: http://www2.gsu.edu/~mkteer/semfaq.html [Accessed March 4, 2009]. Description: This is the first place you should look if you have questions about Structural Equation Models.
Wikipedia. Instrumental variable. Available at: http://en.wikipedia.org/wiki/Instrumental_variable [Accessed March 4, 2009]. Excerpt: In statistics and econometrics, an instrumental variable (IV, or instrument) can be used to produce a consistent estimator of a parameter when the explanatory variables (covariates) are correlated with the error terms. Such correlation can be caused by endogeneity, by omitted covariates, or by measurement errors in the covariates. In this situation, ordinary linear regression produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation, that is correlated with the suspect explanatory variable, and that is uncorrelated with the error term.
Kenny DA. SEM: Instrumental Variables. Available at: http://davidakenny.net/cm/iv.htm [Accessed March 4, 2009]. Excerpt: One way of identifying models that cannot be estimated by using multiple regression is through the use of instrumental variables. For path analysis, the disturbance must not be correlated with each causal variable. There are three reasons why such a correlation might exist: * Spuriousness (Third Variable Causation): A variable causes both the endogenous variable and one its causal variables and that variable is not included in the model. * Reverse Causation (Feedback Model): The endogenous variable causes, either directly or indirectly, one of its causes. * Measurement Error: There is measurement error in a causal variable.
Michel Chavance, Sylvie Escolano, Monique Romon, et al. Latent variables and structural equation models for longitudinal relationships: an illustration in nutritional epidemiology. BMC Medical Research Methodology. 2010;10(1):37. Abstract: "BACKGROUND: The use of structural equation modeling and latent variables remains uncommon in epidemiology despite its potential usefulness. The latter was illustrated by studying cross-sectional and longitudinal relationships between eating behavior and adiposity, using four different indicators of fat mass. METHODS: Using data from a longitudinal community-based study, we fitted structural equation models including two latent variables (respectively baseline adiposity and adiposity change after 2 years of follow-up), each being defined, by the four following anthropometric measurement (respectively by their changes): body mass index, waist circumference, skinfold thickness and percent body fat. Latent adiposity variables were hypothesized to depend on a cognitive restraint score, calculated from answers to an eating-behavior questionnaire (TFEQ-18), either cross-sectionally or longitudinally. RESULTS: We found that high baseline adiposity was associated with a 2-year increase of the cognitive restraint score and no convincing relationship between baseline cognitive restraint and 2-year adiposity change could be established. CONCLUSIONS: The latent variable modeling approach enabled presentation of synthetic results rather than separate regression models and detailed analysis of the causal effects of interest. In the general population, restrained eating appears to be an adaptive response of subjects prone to gaining weight more than as a risk factor for fat-mass increase." [Accessed May 6, 2010]. Available at: http://www.biomedcentral.com/1471-2288/10/37.
Kelly PA. Overview of Computer-Intensive Statistics. Available at: http://www.hsrd.houston.med.va.gov/AdamKelly/resampling.html [Accessed March 4, 2009]. Description: This page provides a nice overview of the permutation test, randomization test, Monte Carlo estimation, bootstrapping, the jackknife, and Markov Chain Monte Carlo methods.
Walters S, Campbell M. The use of bootstrap methods for analysing health-related quality of life outcomes (particularly the SF-36). Health and Quality of Life Outcomes. 2004;2(1):70. Available at: http://www.hqlo.com/content/2/1/70 [Accessed March 4, 2009]. Description: The article provides an illustrative example of how to use the bootstrap method.
Li P. The Zoo of Loglinear Analysis. Available at: http://facultystaff.richmond.edu/~pli/psy538/loglin02/index.html [Accessed March 4, 2009]. Excerpt: "Loglinear Analysis is a multivariate extension of Chi Square. You use Loglinear when you have more than two qualitative variables. Chi Square is insufficient when you have more than two qualitative variables because it only tests the independence of the variables. When you have more than two, it cannot detect the varying associations and interactions between the variables. Loglinear is a goodness-of-fit test that allows you to test all the effects (the main effects, the association effects and the interaction effects) at the same time."
All of the material above this paragraph is licensed under a
Creative Commons Attribution 3.0 United States License. This page was written by
Steve Simon and was last modified on
2010-05-06. The material
below this paragraph links to my
old website, StATS. Although I wrote all of the material
listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright
ownership of this material. The brief excerpts shown here are included under
the fair use provisions of U.S. Copyright laws.
2008
What now?
Browse other categories at this site