**P.Mean: 2010 archive**

December 2010

P.Mean: Creating LaTex formulas on the fly (created 2010-12-20). I don't use LaTeX a lot (though I should) because I am fairly happy with a proprietary product that I use for formulas, MathType. Still, there are some times when it would be nice to use a bit of LaTeX, and there's a web site that makes this easy.

P.Mean: Location of my UMKC office (created 2010-12-09). I work part-time as an independent statistical consultant and part time at the University of Missouri-Kansas City (UMKC). If you need to meet with me for UMKC related work, here is how to get to my office.

P.Mean: Are certain CAM therapies undeserving of further study (created 2010-12-01). I have become something of a celebrity on the Science Based Medicine site, as I have noted in an earlier webpage. In addition to the blog post I noted earlier, there is a new post: Of SBM and EBM Redux. Part I: Does EBM Undervalue Basic Science and Overvalue RCTs? These posts are reminding me how important it is to write precisely, which is good. I largely agree with many of the comments written in these particular entries and in others at the Science Based Medicine site, but there are still areas of fundamental disagreement. One of the major areas where we disagree is over the value of running randomized control trials for certain CAM (Complementary and Alternative Medicine) therapies that are biologically implausible.

P.Mean: Poem to help you remember the quotient rule (created 2010-11-26). I was working on some derivatives then involved a fraction, and the formula is a bit tricky to remember. There was a short poem that I learned a long time ago for the derivative of a fraction, and I can't find it anywhere on the Internet. There are some variants that are close, but nothing quite like the poem I remember. Everything important has to be found somewhere on the Internet, so I am posting the poem here. If anyone can attribute this poem to the original source, please let me know.

P.Mean: Transforming the parameter also transforms the prior distribution (created 2010-11-25). All my work on Bayesian models recently has forced me to remember some of my mathematical statistics that I had not touched since college. Here's another example of this. Suppose you have a prior distribution on a parameter θ and you want to find the comparable prior for a transformation φ=u(θ).

P.Mean: The odds ratio in logistic regression is the opposite of what it should be (created 2010-11-22). I have data in the following a table that clearly shows a positive association, but when I run a logistic regression model, the odds ratio is reported as less than 1. How can this be?

P.Mean: BUGS is more than just one program (created 2010-11-19). I am working on some Bayesian models that use a program called BUGS. BUGS stands for Bayesian Inference Using Gibbs Sampling. There are several ways you can run BUGS, and it is worthwhile to note why there are multiple programs.

P.Mean: Ambiguity in the definition of the exponential distribution (created 2010-11-16). I'm trying to run some Bayesian analyses using a program called BUGS (Bayes Using Gibbs Sampler), and this requires me to specify a prior distribution for the parameter associated with an exponential waiting time. I'm having more trouble that I should because the exponential distribution is defined two different ways.

P.Mean: The Science-Based Medicine blog defends itself (created 2010-11-09). I get a few fan letters from people, which are greatly appreciated, but when I get the rare critical response, I am even more grateful. It doesn't matter if the criticism is valid or not. Someone who takes on the unpleasant task of critiquing my work offers some valuable insights on: what I wrote poorly because it was incorrect, or what I wrote poorly because it was misinterpreted, or what I wrote well but there is a dissenting opinion. One of my webpages, P.Mean: Is there something better than Evidence Based Medicine out there (created 2010-09-20), was highlighted and criticized on the Science Based Medicine blog by David Gorski, and here are some of the things I learned from that criticism. This is an expansion of comments I left on their blog entry.

P.Mean: Would you hire someone who knew theory
or someone who knew practice (created 2010-11-03). Someone on LinkedIn
asked if it was better to hire someone who knew theory or someone who knew
practice. Here's my response.

**October 2010**

P.Mean: Poster presentation at the Missouri Technology conference (created 2010-10-04). I will be presenting a poster about the Bayesian model for accrual at the Missouri Technology conference in Columbia, Missouri. There was some confusion about this, partly because I submitted an abstract at the last minute. Here is the abstract that I turned in.

P.Mean: Why the least squares regression line has to pass through XBAR, YBAR (created 2010-10-01). An issue came up about whether the least squares regression line has to pass through the point (XBAR,YBAR), where the terms XBAR and YBAR represent the arithmetic mean of the independent and dependent variables, respectively. The line does have to pass through those two points and it is easy to show why.

P.Mean: If you knew that failure was not an
option, what would you do (created 2010-10-01). There is a question and
answer forum on LinkedIn where people ask all sorts of questions. A common
theme among some people there is to ask motivational questions, which I try to
respond to sometimes with an off-beat answer. There was a question along these
lines: "If you knew that failure was not an option, what would you do?" I
started off with a rather flippant answer, but then realized that there was a
more serious answer.

**September 2010**

P.Mean: Is there something better than Evidence Based Medicine out there (created 2010-09-20). Someone asked me about a claim made on an interesting blog, Science Based Medicine. The blog claims that Science Based Medicine (SBM), that tries to draw a distinction between that practice and Evidence Based Medicine (EBM). SBM is better because "EBM, in a nutshell, ignores prior probability (unless there is no other available evidence and falls for the p-value fallacy; SBM does not." Here's what I wrote.

P.Mean: Putting variable names into a model automatically (created 2010-09-20). I always have trouble with including a changing variable name into a sequence of statistical models in R, so when someone wrote about it on the R-Help list, I thought I should try some of the suggestions and then write them down here so I don't forget.

P.Mean: Oh those pesky interactions! (created 2010-09-16). Someone was fitting a binary logistic regression model and regretfully (that was his word) found two significant (p < 0.05) interactions. The tone was that he was testing for interactions using some type of stepwise approach, but was hoping that no interactions would appear. When they did appear, he had a panic, not about how to interpret the interactions, but rather whether he should include them in his publication. Here's the advice I offered.

P.Mean: My new twitter account (created 2010-09-15). I started a new twitter account, mostly to follow the twitter feed of the Department of Biomedical and Health Informatics at UMKC. I work in that department part-time. I may use my twitter account to announce new updates to my website. My twitter feed is @profmean.

P.Mean: Can you compute a confidence interval for your p-value? (created 2010-09-10). A question that comes up from time to time is whether you can calculate a confidence interval for a p-value. It always get statisticians into a tizzy because it seems to be such a logical thing to do, but no one does it. Here's how I like to think about the issue.

P.Mean: Using information theory to identify
discrepancies within and between text files (created 2010-09-02). I have been experimenting with the use of information theory to identify
patterns in text data files. This work in somewhat preliminary, but it has
some exciting possibilities. If there are certain patterns that occur
frequently at a given column of a text data file (e.g., always the letters "A"
or "B"), then these columns become important for looking for aberrant data
that might be caused by a typographical error, a misalignment of the row of
data, or a deviation from the code book. I want to show some preliminary
graphs that illustrate what these patterns look like for some files I am
working with. **Warning: this is a very large webpage with graphics that
extend across dozens of pages!!**

P.Mean: Is it ethical to recruit a panhandler
that you see on the street into your research study (created 2010-09-01).
Someone asked a question about the ethics of approaching a panhandler and
sharing information about a research study. I don't know all the details, but
apparently, this study was examining veterans of the Iraq war, and this
panhandler was holding a sign saying something like please help a veteran of
the Iraq war. There was some concern about whether the monetary incentive
would be disproportionate for someone who had to beg for a living, or it might
be a problem if the panhandler was given money and a flyer about the research
study at the same time. I discussed some of my concerns about this study, but
it was from the perspective of statistical validity rather than from an
ethical perspective.

**August 2010**

P.Mean: Pooling different measures of risk in a meta-analysis (created 2010-07-26). Someone on the MEDSTATS email discussion group asked about how to pool results in a meta-analysis where some of the summary measures are reported as odds ratios, others as relative risks, and still others as hazard ratios. There's actually a fourth measure that is commonly used when the outcome measure is binary (live/dead, improved/not improved, relapsed/relapse free, etc.). That is the risk difference, and its inverse, the number needed to treat. Here's what I wrote in response.

P.Mean: What is a Generalized Estimating Equations model? (created 2010-08-19). Generalized Estimating Equations (GEE) are a model for your data that can account for dependence among some of your measurements due to repeated measures, cluster sampling, or a longitudinal data set. It represents an extension of the Generalized Linear Model (GLM). Like the GLM, the GEE model allows you to specify a link function and a mean variance relationship. With the appropriate choice of these two items, you can specify a wide variety of models.

P.Mean: Is Evidence-Based Medicine too rigid (created 2010-08-19). Someone was asking about criticisms of Evidence-Based Medicine (EBM) that the reliance on grading schemes and the hierarchy of evidence was too rigid or was EBM providing some heuristics that could be adapted as needed. This is hard to respond to, but it is an important question. I view checklists and hierarchies as a necessary evil, and that sometimes they are applied too rigidly.

P.Mean: Competing books to the book I am planning to write (created 2010-08-16). I have been asked by several publishers to list competing books to the book I am planning to write. My book is quite different than anything else out there, but perhaps the closest competition would be books that talk about research methods. Here are some possible competitors in that area.

P.Mean: What should clients get from you at the end of the first consulting session (created 2010-08-14). There has been a lot of discussion about the nature and role of consulting on the message boards of the Statistical Consulting section of the American Statistical Association One particularly valuable question was what should you do when starting a new consulting job. Here is an adaptation of one particularly good response.

P.Mean: Glossary for my second book (created 2010-08-11). As I mentioned in an earlier webpage, I am talking to some publishers about writing a second book. Here's a tentative glossary for that book. I'm only including the terms in the glossary for now, but will eventually add definitions.

P.Mean: What's a fair price for SPSS? (created 2010-08-06). There was a discussion on an email discussion group about SPSS about how the SPSS software package was too expensive and how they should consider offering a discount price for the home user. Everyone was in favor of lower prices, of course, and compared the pricing of SPSS to that of Stata and R. In the spirit of debate, I offered a contrarian viewpoint. It also applies to similar complaints I have heard about the pricing of SAS software.

P.Mean: Fighting the claim that any size
difference is clinically important (created 2010-08-05). When working with people to select an appropriate sample size, it is
important to establish the minimum clinically important difference (MCID).
This is a difference such that any value smaller would be clinically trivial,
but any value larger would be clinically important. I get told
quite often that any difference that might be detected is important. I could
be flippant here and then tell them that their sample size is now infinite and
my consulting rate is proportional to the sample size, but I don't make
flippant comments (out loud, at least). Here's how I might challenge such a
claim.

**July 2010**

P.Mean: Standard operating procedures for a statistical consulting center (created 2010-07-30). I asked a question on one of the American Statistical Association message boards about how I setting up a consulting service at the University of Missouri-Kansas City (UMKC), where I work part-time. I wanted to develop some SOPs (Standard Operating Procedures) for this center that would supplement the guidance already available on the web. I asked if anyone else had SOPs (or anything similar) that I could look at so I wouldn't re-invent the wheel. I got a lot of responses.

P.Mean: When should research in a given area end? (created 2010-07-26). Someone asked a rather philosophical question, is there ever an end to research in a given area? Will there ever be a "last word" on a research topic. Here's what I wrote in response.

P.Mean: Sample chapter: The first three steps in selecting an appropriate sample size (created 2010-07-24). As I mentioned in an earlier webpage, I am talking to some publishers about writing a second book. The working title is "Jumpstart Statistics: How to Restart Your Stalled Research Project." Here's a tentative chapter from that book. It is not quite complete yet, but I'm hoping to finish it soon. One of your most critical choices in designing a research study is selecting an appropriate sample size. A sample size that is either too small or too large will be wasteful of resources and will raise ethical concerns.

P.Mean: Tentative table of contents for my second book (created 2010-07-24). As I mentioned in an earlier webpage, I am talking to some publishers about writing a second book. The working title is "Jumpstart Statistics: How to Restart Your Stalled Research Project." Here's a tentative table of contents.

P.Mean: Jumpstart Statistics, a proposal for my second book (created 2010-07-23). I want to talk to some publishers about writing a second book. Here is what I will propose to them.

P.Mean: Salary survey for Biostatisticians (created 2010-07-21). I am working part-time at UMKC in the Department of Informatic Medicine and Personalized Health. They like me and want me to increase my hours from 10 hours a week (25% time) to something more. I'll talk to them about this, but at the same time, I want to point out that my salary is not competitive with my peers. Here's a table from a recent survey on salaries, published in the Amstat News.

P.Mean: What is principal components analysis? (created 2010-07-19). I was asked to help someone who was reviewing a paper that used principal components analysis (PCA) as part of the statistical methodology. I have not yet seen the article, so I could only offer very general advice.

P.Mean: Another counter-intuitive
probability problem (created 2010-07-04). A recent article in Science
News, rekindled the two children problem and offered an odd twist. Here's the
simple version. Suppose you have two children, one of whom is a boy. What is
the probability that both children are boys? The obvious, but incorrect
choice is 1/2. The correct answer is 1/3. How does this work?

**June 2010 (8 entries)**

P.Mean: Resources using Stack Overflow (created 2010-06-30) . A bunch of Internet resources fell into my lap all at once. Some of them relate to a new technology (Stack Overflow/Stack Exchange) that allows people to pose questions like an Interenet email discussion group, but it is web-based and has some of the capabilities associated with blogs and wikis.

P.Mean: The SPSS t-test is confusing (created 2010-06-29). I have always disliked how SPSS (now IBM SPSS) presented the output from their independent samples t-test. I want to explain why it is confusing and show you an alternative based on the general linear model.

P.Mean: Classic references in Statistics (created 2010-06-29). A prominent statistician, Christian Robert, listed some classic research papers in Statistics that he wanted to present to his students in a special readings class. This was commented on by another prominent statistician, Andrew Gelman. I'm not a prominent statistician, but that won't stop me from adding my two cents.

P.Mean: What I use for talks instead of Powerpoint (created 2010-06-28). Someone on LinkedIn asked a question about what technologies people use for their presentations (laptop, flipchart, or whiteboard). For most of my presentations, I use none of these technologies. Instead I create a webpage of my presentation and then print it and hand out copies.

P.Mean: The futility of small sample sizes for evaluating a binary outcome (created 2010-06-16). I'm helping out with a project that involves a non-randomized comparison of two groups of patients. One group gets a particular anesthetic drug and the other group does not. The researcher wants to compare rates of hypotension, respiratory depression, apnea, and hypoxia. I suggested using continuous outcomes like O2 saturation levels rather than discrete events like hypoxia, but for a variety of reasons, they cannot use continuous outcomes. Their original goal was to collect data on about 20 patients in each group.

P.Mean: An example of a bad survey (created 2010-06-11). I was asked to fill out an Internet survey to define my "consulting needs." That's a rather strange invitation, and sounds almost like a cheap way to develop business leads. But it was a request through LinkedIn, so I thought it was worth filling out. I want to try to build my contacts at LinkedIn, and filling out a short survey seemed like a small price to pay to get a potential lead for my own consulting business. When I went to the webpage with the actual survey, though, I was shocked and disappointed with what I found.

P.Mean: An interesting alternative to power calculations (created 2010-06-09). Someone on the MedStats Internet discussion group mentioned an alternative to power calculations called accuracy in parameter estimation (AIPE). It looks interesting. Here are some relevant references.

P.Mean: Minimum sample size needed for a time
series prediction (created 2010-06-08). Someone asked what the minimum
sample size that was needed in a time series analysis model to forecast
future observations. Strictly speaking, you can forecast with two
observations. Draw a straight line connecting the two points and then extend
that line as far as you want in the future. But you wouldn't want to do that.
So a better question might be what is the minimum number of data points that
you would need in order to provide a good forecast of the future.

**May 2010 (9 entries)**

P.Mean: What is the premier conference for statistical consulting (created 2010-05-28). Someone asked what the premier conference for statistical consulting. That's a rather ambiguous question, because different people will interpret terms like "premier conference" and "statistical consulting" differently. The answer, however, is pretty unambiguous. In North America, it would have to be the Joint Statistics Meetings (JSM).

P.Mean: Lessons learned the hard way: don't presume to know how your software handles missing value codes (created 2010-05-28). I'm working on an interesting project that involves summing up rvu's (resource value units) across certain records for a given patient. Some of the rvu's are missing. How should the program handle these missing rvu's. We discussed this by email and agreed to ignore missing rvu's in the sum. This is effectively the same as replacing the missing rvu's with zero. There is two cases worth worrying about, though, and handling those cases makes me realize just how tricky missing values are.

P.Mean: How I got started in my career as an
independent statistical consultant (created 2010-05-24). LinkedIn has a
question and answer board, and one of the questions inspired me to write up
the story of how I got started in my career as an independent statistical
consultant. Here's the original question: *I'm very curious as to what
events or conversations enabled you to change direction in your career. What
thought process did you go through? What resources did you use or uncover?*

P.Mean: How do I handle criticism (created 2010-05-21). Someone asked how I handle criticism. To be honest, I don't get criticized all that much. Possibly it is that I do very little that deserves criticism, and possibly, people are intimidated by the area I work in (unjustifiably intimidated, by the way, but many people are just plain scared of numbers). It is also important to note that most people don't like to share negative opinions directly. They certainly will tell others, of course, if something is wrong, but it takes some boldness and some bravery to confront a person directly.

P.Mean: How to avoid charges of plagiarism (created 2010-05-15). I'm not an expert on this, but I got a question about how to avoid charges of plagiarism in a thesis, especially the sections of the thesis that reviewed existing research and theoretical background. Here's how I responded.

P.Mean: Withdrawing from a study and taking your data with you (created 2010-05-15). Someone asked me what the phrase "you can withdraw from the study at any time" really means. Can a research subject withdraw and take their data with them (that is, ask that their data be expunged from the database)? What if they raise the objection after the data analysis is done, because they don't like the results of the study. Can they ask for their data to be expunged then? What if they raise the objection after the data is published?

P.Mean: Lessons learned the hard way: don't throw good money after bad (created 2010-05-14). I am helping out with data management for a project involving 19 million records from an insurance database. The file is too big to be read into R in one piece, so I decided to read in successive segments of 100,000 records and then write them out again as separate files. This was a big mistake and showed me the importance of the saying: "Don't throw good money after bad."

P.Mean: What is a good surrogate measure for socioeconomic status (created 2010-05-03). I received a question, indirectly, about what might be a good surrogate measure for socioeconomic status (SES). That raises two questions, actually. What is SES, and how can we tell if a surrogate is a good surrogate for SES.

P.Mean: More discussion on
instrumental variables (created 2010-05-03). I attended the May meeting
of the KUMC Statistics Journal Club. The topic of discussion was a paper
outlining the properties and applications of instrumental variables.

**April 2010 (7 entries)**

P.Mean: My life so far: fails to meet expectations (created 2010-04-21). I'm learning how to use LinkedIn, and there are some people on that site who ask general philosophical questions. Some are a bit silly but they are still fun to answer. One person asked people to apply the traditional performance evaluation categories (Exceeds expectations, Meets expectations, Fails to meet expectations) to their own lives. So here is what I wrote.

P.Mean: Interpreting p-values in a
published abstract, part 1 (created 2010-04-14). In one of my recent
webinars, I asked people to read the following abstract and interpret the
p-values presented within. * The Outcome of Extubation Failure in a
Community Hospital Intensive Care Unit: A Cohort Study. Seymour CW,
Martinez A, Christie JD, Fuchs BD. Critical Care 2004, 8:R322-R327 (20 July
2004) Introduction: Extubation failure has been associated with poor
intensive care unit (ICU) and hospital outcomes in tertiary care medical
centers. Given the large proportion of critical care delivered in the
community setting, our purpose was to determine the impact of extubation
failure on patient outcomes in a community hospital ICU. Methods: A
retrospective cohort study was performed using data gathered in a 16-bed
medical/surgical ICU in a community hospital. During 30 months, all patients
with acute respiratory failure admitted to the ICU were included in the
source population if they were mechanically ventilated by endotracheal tube
for more than 12 hours. Extubation failure was defined as reinstitution of
mechanical ventilation within 72 hours (n = 60), and the control cohort
included patients who were successfully extubated at 72 hours (n = 93).
Results: The primary outcome was total ICU length of stay after the initial
extubation. Secondary outcomes were total hospital length of stay after the
initial extubation, ICU mortality, hospital mortality, and total hospital
cost. Patient groups were similar in terms of age, sex, and severity of
illness, as assessed using admission Acute Physiology and Chronic Health
Evaluation II score (P > 0.05). Both ICU (1.0 versus 10 days; P < 0.01) and
hospital length of stay (6.0 versus 17 days; P < 0.01) after initial
extubation were significantly longer in reintubated patients. ICU mortality
was significantly higher in patients who failed extubation (odds ratio =
12.2, 95% confidence interval [CI] = 1.5�101; P < 0.05), but there was no
significant difference in hospital mortality (odds ratio = 2.1, 95% CI =
0.8�5.4; P < 0.15). Total hospital costs (estimated from direct and indirect
charges) were significantly increased by a mean of US$33,926 (95% CI =
US$22,573�45,280; P < 0.01). Conclusion: Extubation failure in a community
hospital is univariately associated with prolonged inpatient care and
significantly increased cost. Corroborating data from tertiary care centers,
these adverse outcomes highlight the importance of accurate predictors of
extubation outcome.* It is a bit dangerous to read only the abstract, of
course, but this was intended for a general illustration.

P.Mean: Quiz about p-values (created
2010-04-14). In one of my webinars, I offered the following quiz
question: *A research paper computes a p-value of 0.45. How would you
interpret this p-value? 1. Strong evidence for the null hypothesis; 2. Strong
evidence for the alternative hypothesis; 3. Little or no evidence for the
null hypothesis; 4. Little or no evidence for the alternative hypothesis; 5.
More than one answer above is correct; 6. I do not know the answer.* This
is actually a bit of a trick question.

P.Mean: Using entropy and the surprisal value to measure the degree of agreement with the consensus finding (created 2010-03-02). One of the research problems that I am working on involves evaluation of a subjective rating system. I have been using information theory to try to identify objects where the evaluators agree well and objects where the evaluators do not agree well. I also am working on identifying objects that an individual rater does poorly. The method is to measure when the surprisal of the category that a rater selected is much lower than the entropy (the average surprisal across all raters)

P.Mean: What makes a good website (created 2010-04-07). Someone posed a series of questions about what makes a perfect website design. I am not a big fan of "design" and tried to make that point in my responses.

P.Mean: Should I learn R instead of SAS (created 2010-04-05). I got a question from a statistician beginning her career asking whether she should learn SAS or R. That's a very personal question and there is no perfect answer. Here is what I wrote.

P.Mean: Dealing with a large text file that
crashes your computer (created 2010-04-02). At a meeting, a colleague was
describing a text file that he had received that had crashed his system. No
way, I thought, could a simple text file crash your system. I offered to
investigate and he was right. The text file crashed my system too, and
repeatedly. Here's what I did to figure out how a simple text file could
crash your computer.

**March 2010 (6 entries)**

P.Mean: What to say when any data analysis is pointless (created 2010-03-25). Someone on the MEDSTATS email discussion group asked for help. They were trying to establish a normal range or reference interval for a set of observations involving gastric emptying. The sample size, 14, was much too small to produce reliable results, but it got worse than that. For one of the outcomes, the result was fourteen zeros. What can you do with such a data set? What can you say? That a difficult question, and here is how I would approach such a problem.

P.Mean: Calculating weights to correct for over and under sampling (created 2010-03-22). Someone asked how to use weights to adjust for the fact that certain strata in a study were recruited more vigorously than other strata. For example, suppose you sampled at four communities and noted the age distribution as 0-14 years, 15-39 years, and 40+ years. How would you adjust for differential age distributions.

P.Mean: Ordinal surprisals (created 2010-03-20). Closely related to the concept of ordinal entropy is ordinal surprisals. The surprisal is the negative log base 2 of the probability, and if you multiply the probabilities with the surprisals and add them up, you get entropy. Can you define an ordinal surprisal in such a way that when you multiply the ordinal surprisals by the probabilities, you get the ordinal entropy?

P.Mean: Can sex be an outcome variable (created 2010-03-16). Someone asked whether it was legitimate to use sex (gender) as a dependent variable or outcome variable in a logistic regression model. It seems wrong, on the face of it, to think that various factors can influence whether we are male or female. It actually is perfectly fine to use sex as an outcome variable. Here is how I would justify its use.

P.Mean: Ordinal entropy (created 2010-03-11). I have been using the concept of entropy to evaluate a sperm morphology classification system and to identify aberrant records in large fixed format text files. Some of the data I have been using in these areas is ordinal with three levels, normal, borderline, and abnormal. In all of my work so far, I have treated all three categories symmetrically. So, for example, the entropy of a system where 50% of the probability is associated with normal and 50% is associated with borderline is 1. The entropy of a system where 50% of the probability is associated with normal and 50% is associated with abnormal is also 1. It has always bothered me a bit because it seems that the second case, where the probabilities are placed at the two extremes, should have a higher level of entropy. Here is a brief outline of how I think entropy ought to be redefined to take into account the ordinal nature of a variable.

P.Mean: Finding duplicate records in a 19
million record database (created 2010-03-02). I was asked to help find
duplicate records in a large database (19 million records). The suspected
number of duplicates was suspected to be small, possibly around 90. My
colleague's approach was running PROC FREQ in SAS on the "unique" id and then
looking for ids that have a frequency greater than 1. That did not work--it
took too long or it overloaded the system, or both. So I wanted to look at
alternatives for identifying duplicate records that would do this more
efficiently.

**February 2010 ( 9 entries)**

P.Mean: Is intuition real? (created 2010-02-25). Someone asked if intuition is real. My hunch is that intuition is may be real, but it is grossly overrated.

P.Mean: Abstract submitted to Missouri Regional Life Sciences Summit (created 2010-02-13). Yesterday, I submitted the following abstract for a poster session in the Missouri Regional Life Sciences Summit. I'll find out on Monday if it will be accepted. "Slipped deadlines and sample size shortfalls in clinical trials: a proposed remedy using a Bayesian model with an informative prior distribution."

P.Mean: Meta-analysis for a single mean estimate (created 2010-02-11). Someone noted that the usual meta analysis is carried out for the study on two treatment groups, usually for a difference in means. What if you had several studies estimating not a difference in means, but just estimates of a single mean. Could you conduct a meta-analysis in this situation?

P.Mean: Exponential interpolation (created 2010-02-11). Someone wanted an exponential interpolation formula. It's not quite a statistics question, but it caught my interest.

P.Mean: Fan page for The Monthly Mean (created 2010-02-11). I've been getting some advice about Facebook. One suggestion was to set up a "fan page". There are some differences between being a "friend" on Facebook and being a "fan".

P.Mean: Humility is a good thing for researchers to have (created 2010-02-08). I've been writing a series of articles about the seven deadly sins of researchers. One of these sins is pride. I might need to talk about the alternative to pride, which is humility. I believe that researchers should adopt a humble outlook. Humility is often misunderstood as a bad thing. It is not.

P.Mean: Consulting remotely versus consulting in person (created 2010-02-08). Someone was asking whether there is a trend in consulting to demand a local presence rather than allowing a consultant to work remotely. I was unable to comment on work trends, as I have only been an independent consultant for 14 months. I did point out, however, some of the issues associated with remote consulting.

P.Mean: What are the characteristics of a good statistical consultant (created 2010-02-07). Someone was considering a career as a statistical consultant. Besides building up a network and gaining experience, what traits would be necessary to be successful in such a career?

P.Mean: Proposed poster for the Missouri
Regional Life Sciences Summit (created 2010-02-03). I am preparing a
poster for the Missouri Regional Life Science Summit. The poster guidelines
are a bit unusual in that there is only room for a four foot by four foot
square poster. Normally, these posters can be much wider. The tentative title
is "Slipped deadlines, sample size shortfalls, and a proposed Bayesian
solution using an informative prior distribution" and here is a proposed
abstract.

**January 2010 (7 entries)**

P.Mean: Facebook account (created 2010-01-25). Several people have been encouraging me to set up an account on Facebook. I did it this evening and two hours later, I had two friends.

P.Mean: Abstracts for a possible upcoming talk (created 2010-01-20). I might be asked to give a talk in February and I wanted to offer two possible choices. Here are the titles and abstracts of those talks.

P.Mean: SPSS or Stata? (created 2010-01-19). *I am an SPSS user. Some of my friends are choosing to leave SPSS and
learn STATA. What are the advantages of STATA over SPSS?*

P.Mean: Masters or Phd in Statistics? (created 2010-01-19). Someone asked me about careers in Statistics and if you get the best career with a Masters degree or a PhD. That's a very subjective choice and individual preferences should weigh strongly in your choice.

P.Mean: Power calculations for comparison of Poisson counts across two groups (created 2010-01-11). Suppose you want to compare Poisson count variables across two groups. How much data would you need to collect? It's a tricky question and there are several approaches that you can consider.

P.Mean: Where can I find free online textbooks (created 2010-01-07). Someone was away from their personal library for a while and needed a free online statistics reference book. With a free textbook, you get what you pay for, of course, but there are some exceptions.

P.Mean: What is residual confounding (created 2010-01-06). Residual confounding is a frequent explanation for unusual research findings. Before I define the term and show an example, I need to address a more basic issue. The term "confounding" is used frequently but often without careful consideration of the true definition of the term. I tend to shy away from this term and typically use "covariate imbalance" instead.