His explanations are clear, but I wanted to make sure I could do this on my own, so I tried it out. Here's how it worked.

I would categorize myself as a fairly conservative prescriber, by which I mean that I'm not eager to jump on the new drug bandwagon, and I like to wait a year or two, until we know a little about the effects and side effects of a new drug, before I write for it. I also wait a few weeks before upgrading my iOS for the same reason, so there ya go. But I recently had occasion to prescribe the antidepressant, Brintellix, or vortioxetine. I can't get into the clinical details, but suffice it to say there were reasons. So with Brintellix on my mind, I decided to try out the 1 Boring Old Man spreadsheet on one of their studies that I found on clinicaltrials.gov, specifically, Efficacy Study of Vortioxetine (LuAA 21004) in Adults with Major Depressive Disorder, the results of which were submitted to clinicaltrials.gov in October 2013.

From the get-go, it's looks like a poor study. There were 50 study sites scattered all over Asia, Europe, Australia, and Africa, and it looks like they did something to the outcome measures midstream. But I'm just trying out the spreadsheet, so I'm ignoring all that for now.

The primary outcome measure was change in HAM-D score, which means that I needed to use the spreadsheet for continuous variables, because mean change could have been any number. If the measure was, "Achieved remission," however they define, "remission," then the results would be tabulated in Yes/No form, and I would have had to use a different spreadsheet designed for categorical variables.

But let me pause here and ask a question: Just what am I looking for? Well, I'm looking for effect size, which generally isn't given in results. Usually, we just get to see p-values, but I'll get to why that's not sufficient in a later post.

As a reminder, effect size is the difference between treatment groups, expressed in standard deviations. Roughly speaking, a large effect size is 0.8, medium is 0.5, and small is 0.2. So, for example, if the effect size of A v. B is 0.8, then A did 0.8 of a standard deviation better than B, and this is considered a large effect. So if I know the effect size, then I can tell how much better one group did than another. I can quantify the difference between groups. Cohen's d is often used as a measure of effect size.

It turns out that you only need three pieces of information to determine effect size, all generally available in typical papers. For each arm of the study, you need the number of subjects in that arm, the mean, and the standard error of measure (SEM) or standard deviation, which are interchangeable via the formula (sorry, I don't have Greek letters in my font):

Here is that information from the study report. Note that there were four arms: Placebo; Vortioxetine 1mg, 5mg, and 10mg.

The top three rows show the effect sizes for the three active arms, compared with placebo. Note that the effect sizes are in the moderate range, 0.423 to 0.591.

In the next three rows, I also checked to see how the active arms compared with each other in a pairwise fashion, and the 10mg really doesn't do much better than the 5mg or even the 1mg, with 0.170 the largest effect size.

Just considering effect sizes in this one study, Brintellix looks okay.

So you can see that there are powerful things you can do in the privacy of your home, to understand what a study is really telling you, using only minimal information. That feels pretty good. At the same time, you have to take into account other elements, like the fact that they seem to have changed outcome measures after the protocol was already established. That should invalidate the whole kit and kaboodle, but sometimes you need to try out a new drug, and the studies aren't great, but it's the best you can do.

Thanks for trying it out. That's exactly what it's for. That study made no adjustments for multiple outcome parameters:

ReplyDelete"There were improvements (nominal P values < .05 with no adjustment for multiplicity) in HDRS-24 total score, response and remission rates, CGI-I score, MADRS total score, and HDRS-24 total score in subjects with baseline HARS score ≥ 20 at week 8 for all Lu AA21004 treatment groups vs placebo."