Welcome to my blog, a place to explore and learn about the experience of running a psychiatric practice. I post about things that I find useful to know or think about. So, enjoy, and let me know what you think.

Thursday, January 14, 2016

DIY Study Evaluation

If you have any interest at all in being able to evaluate the results of clinical trials on your own, say because you don't trust what the pharmaceutical companies are telling you, then I HIGHLY recommend you head on over to 1 Boring Old Man and read through his posts from the last few weeks. Basically, he's writing a statistics manual for clinicians, complete with downloadable spreadsheets of his own devising.

His explanations are clear, but I wanted to make sure I could do this on my own, so I tried it out. Here's how it worked.

I would categorize myself as a fairly conservative prescriber, by which I mean that I'm not eager to jump on the new drug bandwagon, and I like to wait a year or two, until we know a little about the effects and side effects of a new drug, before I write for it. I also wait a few weeks before upgrading my iOS for the same reason, so there ya go. But I recently had occasion to prescribe the antidepressant, Brintellix, or vortioxetine. I can't get into the clinical details, but suffice it to say there were reasons. So with Brintellix on my mind, I decided to try out the 1 Boring Old Man spreadsheet on one of their studies that I found on clinicaltrials.gov, specifically, Efficacy Study of Vortioxetine (LuAA 21004) in Adults with Major Depressive Disorder, the results of which were submitted to clinicaltrials.gov in October 2013.

From the get-go, it's looks like a poor study. There were 50 study sites scattered all over Asia, Europe, Australia, and Africa, and it looks like they did something to the outcome measures midstream. But I'm just trying out the spreadsheet, so I'm ignoring all that for now.

The primary outcome measure was change in HAM-D score, which means that I needed to use the spreadsheet for continuous variables, because mean change could have been any number. If the measure was, "Achieved remission," however they define, "remission," then the results would be tabulated in Yes/No form, and I would have had to use a different spreadsheet designed for categorical variables.

But let me pause here and ask a question: Just what am I looking for? Well, I'm looking for effect size, which generally isn't given in results. Usually, we just get to see p-values, but I'll get to why that's not sufficient in a later post.

As a reminder, effect size is the difference between treatment groups, expressed in standard deviations. Roughly speaking, a large effect size is 0.8, medium is 0.5, and small is 0.2. So, for example, if the effect size of A v. B is 0.8, then A did 0.8 of a standard deviation better than B, and this is considered a large effect. So if I know the effect size, then I can tell how much better one group did than another. I can quantify the difference between groups. Cohen's d is often used as a measure of effect size.

It turns out that you only need three pieces of information to determine effect size, all generally available in typical papers. For each arm of the study, you need the number of subjects in that arm, the mean, and the standard error of measure (SEM) or standard deviation, which are interchangeable via the formula (sorry, I don't have Greek letters in my font):

That's it: n, mean, SEM.

Here is that information from the study report.  Note that there were four arms: Placebo; Vortioxetine 1mg, 5mg, and 10mg.

Let's plug 'em all in to the 1BOM spreadsheet, while noting that I'm not including the ANOVA, which you really need to do first, to make sure the four groups aren't all the same in comparison to each other, because if they are, then any result you get when you compare 1 group directly with 1 other group is invalid. Just so you know, I computed the ANOVA using this calculator, also recommended by 1BOM, which requires exactly the same information as you need to compute effect sizes, and it turns out that the groups are NOT all the same (this is another thing related to p value, which I plan to discuss in a later post).

The top three rows show the effect sizes for the three active arms, compared with placebo. Note that the effect sizes are in the moderate range, 0.423 to 0.591.

In the next three rows, I also checked to see how the active arms compared with each other in a pairwise fashion, and the 10mg really doesn't do much better than the 5mg or even the 1mg, with 0.170 the largest effect size.

Just considering effect sizes in this one study, Brintellix looks okay.

So you can see that there are powerful things you can do in the privacy of your home, to understand what a study is really telling you, using only minimal information. That feels pretty good. At the same time, you have to take into account other elements, like the fact that they seem to have changed outcome measures after the protocol was already established. That should invalidate the whole kit and kaboodle, but sometimes you need to try out a new drug, and the studies aren't great, but it's the best you can do.

1 comment:

  1. Thanks for trying it out. That's exactly what it's for. That study made no adjustments for multiple outcome parameters:

    "There were improvements (nominal P values < .05 with no adjustment for multiplicity) in HDRS-24 total score, response and remission rates, CGI-I score, MADRS total score, and HDRS-24 total score in subjects with baseline HARS score ≥ 20 at week 8 for all Lu AA21004 treatment groups vs placebo."