Psych Practice: January 2016

Friday, January 22, 2016

Minding Our P's

I've been reluctant to write this post, even though I've referenced it previously. It's just that I've been trying to gain a better grasp of statistics, and reading 1boringoldman's posts has really helped me, and based on them, I finally managed to explain to myself what p-values actually are, but I just figured that must be completely obvious to everyone but me, so why should I bother writing about it.

What finally convinced me was thinking about how so many studies include p-values as a way of proving one drug is "significantly" better than another or placebo, and don't bother to include effect sizes. This is a great obfuscating tactic, so whoever's conducting these studies must think that people misunderstand p-values. And that makes it worth writing this post.

This is the story of how I clarified p-values to myself.

Let's say we're doing a study comparing two compounds,

Hubba Bubba

and Bubble Yum

to determine which is better at curing the common cold.

We start out with the null hypothesis, which states that we assume there is no difference between the two compounds, or what they can do. If the p-value turns out to be less than 0.05, then we can reject the null hypothesis. I don't like thinking about the null hypothesis because it confuses me. It's like trying to decipher a triple negative. So we're gonna put it aside for now.

We randomize 100 patients to each arm, and follow up the next day, and the day after, with a rating scale, the CQ-7. And this is what we find:

Let's assume we've done all our work honestly and accurately, and we get a p-value less than 0.05. Does this mean that Bubble Yum is significantly better at curing the common cold than Hubba Bubba? It does not. It means we can reject the null hypothesis. But what does THAT mean?

Think of it this way.

Suppose that on the night before the study begins, I sneak into the lab and change the wrappers so that there is no Bubble Yum, only Hubba Bubba. And then suppose we do the study, and we get exactly the same results as above. Can it be? Is it possible that all 100 subjects taking Hubba Bubba wrapped as Bubble Yum got better, and all 100 subjects taking Hubba Bubba wrapped as Hubba Bubba didn't? Yes, it is possible. It's just extremely unlikely. Extremely improbable. How improbable? Well, there's less than a 5% chance that the two compounds could be exactly the same, and yet yield such freakishly different results. That's why the "p" in p-value stands for probability.

In other words, we've rejected the null hypothesis.

Let me repeat. If the p-value is less than 0.05, then there is less than a 5% chance that the null hypothesis is true, i.e. less than a 5% chance that the compounds could be the same and yet yield such disparate results. Which means they're probably not the same. And we choose the significance level to be 0.05, but we could just as easily choose 0.10, or 0.01.

So a very small p-value does not mean that Bubble Yum is significantly better than Hubba Bubba at curing the common cold. It just means that it is extremely unlikely that Bubble Yum could be no better than Hubba Bubba at curing the common cold, with these very different results.

In order to determine how much better Bubble Yum is than Hubba Bubba, you need to look at effect size, and as we have seen any number of times, a small p-value does not imply a large effect size. For example, in the CBT study I recently looked at, p was <0.001, but the effect size was 0.45, only moderate.

This is why many studies leave the effect size out of their publications.

Sunday, January 17, 2016

Long Term Efficacy of CBT?

I get email updates from several places I consider reasonably reputable, like NEJM, that have lists of new and interesting articles. I consider those kinds of updates helpful ways of staying current. I also get other kinds of email updates that feel more like ads, or infomercials, like this one, from Psychiatric News Alert:

Study Finds Long-Term Benefits of CBT for Patients With Treatment-Resistant Depression

Patients with treatment-resistant depression who receive cognitive-behavioral therapy (CBT) in addition to antidepressants over several months may continue to benefit from the therapy years later, according to a study in Lancet Psychiatry...

“Our findings provide robust evidence for the effectiveness of CBT given as an adjunct to usual care that includes medication in reducing depressive symptoms and improving quality of life over the long term,” the study authors wrote. “As most of the CoBalT participants had severe and chronic depression, with physical or psychological comorbidity, or both, these results should offer hope for this population of difficult-to-treat patients.”

You can link to the Lancet Study, Long-term effectiveness and cost-effectiveness of cognitive behavioural therapy as an adjunct to pharmacotherapy for treatment-resistant depression in primary care: follow-up of the CoBalT randomised controlled trial, by Wiles et al, here. It's full text.

In brief:

Background
Cognitive behavioural therapy (CBT) is an effective treatment for people whose depression has not responded to antidepressants. However, the long-term outcome is unknown. In a long-term follow-up of the CoBalT trial, we examined the clinical and cost-effectiveness of cognitive behavioural therapy as an adjunct to usual care that included medication over 3–5 years in primary care patients with treatment-resistant depression.

Methods
CoBalT was a randomised controlled trial done across 73 general practices in three UK centres. CoBalT recruited patients aged 18–75 years who had adhered to antidepressants for at least 6 weeks and had substantial depressive symptoms (Beck Depression Inventory [BDI-II] score ≥14 and met ICD-10 depression criteria). Participants were randomly assigned using a computer generated code, to receive either usual care or CBT in addition to usual care. Patients eligible for the long-term follow-up were those who had not withdrawn by the 12 month follow-up and had given their consent to being re-contacted. Those willing to participate were asked to return the postal questionnaire to the research team. One postal reminder was sent and non-responders were contacted by telephone to complete a brief questionnaire. Data were also collected from general practitioner notes. Follow-up took place at a variable interval after randomisation (3–5 years). The primary outcome was self-report of depressive symptoms assessed by BDI-II score (range 0–63), analysed by intention to treat. Cost-utility analysis compared health and social care costs with quality-adjusted life-years (QALYs)...

They took an old study, with subjects who had taken antidepressants for at least 6 weeks and had substantial depression symptoms characterized by a BDI-II score of at least 14, and followed up with a questionnaire and GP notes. Primary outcome was self-report of depressive symptoms assessed by BDI-II score. They also did a cost analysis.

Findings
Between Nov 4, 2008, and Sept 30, 2010, 469 eligible participants were randomised into the CoBalT study. Of these, 248 individuals completed a long-term follow-up questionnaire and provided data for the primary outcome (136 in the intervention group vs 112 in the usual care group). At follow-up (median 45·5 months [IQR 42·5–51·1]), the intervention group had a mean BDI-II score of 19·2 (SD 13·8) compared with a mean BDI-II score of 23·4 (SD 13·2) for the usual care group (repeated measures analysis over the 46 months: difference in means −4·7 [95% CI −6·4 to −3·0, p<0·001]). Follow-up was, on average, 40 months after therapy ended. The average annual cost of trial CBT per participant was £343 (SD 129). The incremental cost-effectiveness ratio was £5374 per QALY gain. This represented a 92% probability of being cost effective at the National Institute for Health and Care Excellence QALY threshold of £20 000.

Follow-up was a median of 45.5 months, at which point, the CBT group had a mean BDI-II of 19.2, and the control group a mean BDI-II of 23.4

Interpretation
CBT as an adjunct to usual care that includes antidepressants is clinically effective and cost effective over the long-term for individuals whose depression has not responded to pharmacotherapy. In view of this robust evidence of long-term effectiveness and the fact that the intervention represented good value-for-money, clinicians should discuss referral for CBT with all those for whom antidepressants are not effective.

Note that "individuals whose depression has not responded to pharmacotherapy," were taking antidepressants for 6 weeks. The study states later that, "This definition of treatment-resistant depression was inclusive and directly relevant to primary care."

Let's look at the details. I'll start by stating that I'm not going to consider the cost effectiveness, because I don't know how. And it may be that even if the clinical effects turn out to not be impressive (spoiler!), the treatment may be worthwhile from a financial standpoint.

At the start of the current study, all patients were taking antidepressants, and were randomized to 12-18 sessions of CBT, or usual care from their GPs. I find this confusing. It seems like medication ought to be a confounder, since depression is cyclic to begin with and people respond to medications at variable rates. Also, if you consider these patients to be treatment resistant, why continue them on antidepressants?

I also find what they did with the outcome measures confusing:

The primary outcome was self-report of depressive symptoms assessed by BDI-II score (range 0–63). Secondary outcomes were response (≥50% reduction in depressive symptoms relative to baseline); remission (BDI-II score <10); quality of life (Short-Form health survey 12 [SF-12]); and measures of depression (PHQ-9) and anxiety (Generalised Anxiety Disorder assessment 7 [GAD-7])...

The primary outcome for the main trial was a binary response variable; for this follow-up, the primary outcome was specified as a continuous outcome (BDI-II score) to maximise power. The change in the specification of the primary outcome for the long-term follow-up was made at the time the request for additional funding was submitted to the funder (Nov 6, 2012).

Does this mean they changed the primary outcome? In the original CoBalT trial, "The primary outcome was response, defined as at least 50% reduction in depressive symptoms (BDI score) at 6 months compared with baseline." Did the present study start out using response, and then switch to change in BDI-II score after the fact, which we all know is a no-no? They're claiming they changed it when they requested funding for the current study, but is that before or after they had established their primary outcome measure?

Or did the current study start with change in BDI-II score as the primary outcome measure, and is that okay? In other words, if you're basing your current study on a previous study, is it valid to establish your protocol with a different outcome measure than the original study? I don't know.

Moving on. The study makes a lot of claims about secondary outcomes, and whether or not subjects were still taking antidepressants, but I'm restricting myself to thinking about the primary outcome, and the BDI-II measures are as follows:

The effect size, according to this chart, is 0.45, which is on the low side of moderate. I don't know how they did their computation, but when I used 1BoringOldman's spreadsheet (see this post), I got a Cohen's d effect size of 0.31, which is low.

I'm not sure how this constitutes "Robust evidence." I'm also not sure what's robust about a mean BDI-II of 19.2, when by their definition, a BDI-II score of more than 14 is considered "Substantial depressive symptoms."

Look. I'm not a big fan of CBT, but I'm willing to consider it as a useful treatment if you show me good data. Just don't go hyping your at-best-mediocre data like it's amazing. But of course, Psychiatric News is a product of the APA.

Thursday, January 14, 2016

DIY Study Evaluation

If you have any interest at all in being able to evaluate the results of clinical trials on your own, say because you don't trust what the pharmaceutical companies are telling you, then I HIGHLY recommend you head on over to 1 Boring Old Man and read through his posts from the last few weeks. Basically, he's writing a statistics manual for clinicians, complete with downloadable spreadsheets of his own devising.

His explanations are clear, but I wanted to make sure I could do this on my own, so I tried it out. Here's how it worked.

I would categorize myself as a fairly conservative prescriber, by which I mean that I'm not eager to jump on the new drug bandwagon, and I like to wait a year or two, until we know a little about the effects and side effects of a new drug, before I write for it. I also wait a few weeks before upgrading my iOS for the same reason, so there ya go. But I recently had occasion to prescribe the antidepressant, Brintellix, or vortioxetine. I can't get into the clinical details, but suffice it to say there were reasons. So with Brintellix on my mind, I decided to try out the 1 Boring Old Man spreadsheet on one of their studies that I found on clinicaltrials.gov, specifically, Efficacy Study of Vortioxetine (LuAA 21004) in Adults with Major Depressive Disorder, the results of which were submitted to clinicaltrials.gov in October 2013.

From the get-go, it's looks like a poor study. There were 50 study sites scattered all over Asia, Europe, Australia, and Africa, and it looks like they did something to the outcome measures midstream. But I'm just trying out the spreadsheet, so I'm ignoring all that for now.

The primary outcome measure was change in HAM-D score, which means that I needed to use the spreadsheet for continuous variables, because mean change could have been any number. If the measure was, "Achieved remission," however they define, "remission," then the results would be tabulated in Yes/No form, and I would have had to use a different spreadsheet designed for categorical variables.

But let me pause here and ask a question: Just what am I looking for? Well, I'm looking for effect size, which generally isn't given in results. Usually, we just get to see p-values, but I'll get to why that's not sufficient in a later post.

As a reminder, effect size is the difference between treatment groups, expressed in standard deviations. Roughly speaking, a large effect size is 0.8, medium is 0.5, and small is 0.2. So, for example, if the effect size of A v. B is 0.8, then A did 0.8 of a standard deviation better than B, and this is considered a large effect. So if I know the effect size, then I can tell how much better one group did than another. I can quantify the difference between groups. Cohen's d is often used as a measure of effect size.

It turns out that you only need three pieces of information to determine effect size, all generally available in typical papers. For each arm of the study, you need the number of subjects in that arm, the mean, and the standard error of measure (SEM) or standard deviation, which are interchangeable via the formula (sorry, I don't have Greek letters in my font):

That's it: n, mean, SEM.

Here is that information from the study report. Note that there were four arms: Placebo; Vortioxetine 1mg, 5mg, and 10mg.

Let's plug 'em all in to the 1BOM spreadsheet, while noting that I'm not including the ANOVA, which you really need to do first, to make sure the four groups aren't all the same in comparison to each other, because if they are, then any result you get when you compare 1 group directly with 1 other group is invalid. Just so you know, I computed the ANOVA using this calculator, also recommended by 1BOM, which requires exactly the same information as you need to compute effect sizes, and it turns out that the groups are NOT all the same (this is another thing related to p value, which I plan to discuss in a later post).

The top three rows show the effect sizes for the three active arms, compared with placebo. Note that the effect sizes are in the moderate range, 0.423 to 0.591.

In the next three rows, I also checked to see how the active arms compared with each other in a pairwise fashion, and the 10mg really doesn't do much better than the 5mg or even the 1mg, with 0.170 the largest effect size.

Just considering effect sizes in this one study, Brintellix looks okay.

So you can see that there are powerful things you can do in the privacy of your home, to understand what a study is really telling you, using only minimal information. That feels pretty good. At the same time, you have to take into account other elements, like the fact that they seem to have changed outcome measures after the protocol was already established. That should invalidate the whole kit and kaboodle, but sometimes you need to try out a new drug, and the studies aren't great, but it's the best you can do.

Tuesday, January 12, 2016

Shrinks, Once More, Again

Yes, I thought I was done with Jeffery Liebermans's, Shrinks: The Untold Story of Psychiatry, but it was not to be.

Clinical Psychiatry News asked me to write a shorter review than the one on my blog, from the angle of whether it would be a good book for a psychiatrist to recommend to patients.

So how could I resist? This one is much shorter, and less of a rant.

So please surf over there and check it out. It feels good to have my opinion expressed beyond these confines. I think the site is free but you may have to register. Also, the print version will be out in a few weeks.

Enjoy, and come back here to comment, if you like.