Welcome to my blog, a place to explore and learn about the experience of running a psychiatric practice. I post about things that I find useful to know or think about. So, enjoy, and let me know what you think.

Wednesday, September 18, 2013

HAM-D'ing It Up

Last month I published a post entitled, Lifelong Learning-A New Frontier. In it, I introduced the idea of an online journal club, and I threw down the gauntlet with a challenge-let's talk about this paper:

A Rating Scale For Depression, by Max Hamilton

So here I am, talking about it. In written form.

In case it isn't obvious, this article introduced the Ham-D, or Hamilton Rating Scale for Depression, which is still in use.

And in case you happen to think there's something new under the sun, the paper begins with, "The appearance of yet another rating scale for measuring symptoms of mental disorder may seem unnecessary, since there are already so many in existence and many of them have been extensively used."

The year is 1960.

I'm gonna go on to delineate some random thoughts and reactions to the paper, in the hope that this will encourage dialogue, as might take place in an in-person journal club.

The first thing old Max H does is describe the purpose and appropriate usage of this particular rating scale. Or more accurately, what it's purpose isn't:

1. It's not devised for normal subjects
2. It's not self-rating
3. It's not about social adjustment/behavior
4. It's not broad range

Rather, it focuses on the measurement of symptoms in individuals already diagnosed with depression.

The present scale has been devised for use only on patients already diagnosed as suffering from affective disorder of depressive type. It is used for quantifying the results of an interview, and its value depends entirely on the skill of the interviewer in eliciting the necessary information… It has been found to be of great practical value in assessing results of treatment.

One question I have is, who makes the diagnosis? And based on what diagnostic system? The DSM-II was published in 1968, which means the HAM-D was developed to assess depression in people who may or may not have met the DSM-V criteria for Major Depression, were they being assessed today. So is it still appropriate to use the scale?

The scale includes 17 variables related to depression, plus 4 additional variables, diurnal variation, derealization, paranoid symptoms, and obsessional symptoms, that are either related to type rather than severity or intensity of depression (diurnal variation), or are seen only rarely in the context of depression (the other three). Each variable is rated on either a 5 point (0-4) or a 3 point (0-2) scale, with the latter in use when quantification is difficult, e.g. insomnia and agitation. It's interesting to note that on the modern HAM-D form, agitation is measured on a five point scale, which Hamilton found "impracticable".

The scale was written with the intention of having a given patient rated by two different raters. Where only one rater is available, the score should be doubled.

Some caveats for the raters:

1. No distinction is made between intensity and frequency of a symptom-the rating is at the discretion of the rater, who is expected to take both into account.
2. Depressive Triad: depressive mood, guilt, suicidal tendencies-the rater needs to avoid a  halo effect, e.g. giving guilt and depressive mood the same rating because they're closely related.

Table 1 is the correlation matrix.

It's how well each individual symptom correlated with each of the other individual symptoms. So, for example, Depression has a 1.0, since it correlates 100% with Depression. Guilt correlates with depression 49.1% of the time, and 100% with Guilt, etc.

This is followed by the extraction of some data, summarized into 4 factors-not sure how these are obtained.
As I understand it (poorly), factor analysis is a way to take your data and look at it as fewer variables than you started with. I briefly perused the Wiki Article, which seemed to involve some Linear Algebra. And since it's been many a year since I was intimate with eigenvectors, I'm gonna leave it at that. In other words, it's magic.

But, for example, Factor 1 has high correlations with depressed mood, guilt, suicide, delayed insomnia, work and interests, retardation, genital, and insight; And low correlation with agitation and anxiety, so they call it a "retarded depression"
This, so the article claims, corresponds well with the classical description of depression.

Which one? Melancholia? Seems like.

Finally, the end of the paper includes several case descriptions, not just scores. This is in stark contrast to today's style. I suppose this is knowable, but I don't know it-were most papers written with case descriptions then?

Please comment so we can get a discussion going. It's a short paper. Check it out.


  1. Bravo! I’ll steer our residents to this because those who responded to the poll said they liked the idea of an online journal club.

  2. I have used the HAM-D (and HAM-A) in clinical trials of antidepressants and anxiolytics. In the course of doing that there were occasion where all of the researchers were transported to a common site and asked to rate vignettes in order to establish inter rater reliability scores and they were generally high. Although there was not literature at the time, the folks with the higher HAM-D scores were more likely to respond to antidepressants. We were also into neuroendocrine research at the time and there were paper published in JCEM showing the correlation between positive HAM-D scores (>30) and number of positive markers.

    Hamilton has clarity about the use of this scale that is far superior to what passes today. The best example I can think of is the PHQ-9 that doubles as a depression diagnosis, marker of recovery for depression over time, and end point of treatment. I think people have not critically thought about that problem and the current idea that checking off a list of symptoms is the truth.

    How in the world is that possible?

    1. I have very little research experience, so thanks for the info about inter-rater reliability. But it occurred to me as I was writing that the HAM-D was originally validated based on a diagnostic system not currently in use, and I wondered what the implications are.
      I'm, frankly, suspicious of the use of checklists. If used well, they can be helpful markers. But what worries me is the idea that, as we're increasingly encouraged to streamline and provide "cost and time-efficient but patient-centered care", interactions with patients will be replaced by checklists like the PHQ-9.

    2. We should all be very skeptical of checklists. The HAM-D was used across several diagnostic systems to measure depression after the diagnosis was made. One of the dimensions of the clinical interview that is missed is the tremendous amount of information that must be considered to come up with a mood disorder diagnosis and treatment plan. Nobody knows the full extent of that information and how much information exchange must actually occur. The checklist crowd assumes that practically no information exchange needs to occur.

      "Cost and time-efficient but patient centered care" is managed care rhetoric. There is no patient I know of who thinks that a checklist diagnosis and a 2 minute discussion of a citalopram prescription is patient centered, much less dose adjustments from somebody reviewing PHQ-9 scores who they have never met.

    3. I couldn't agree more about the managed care rhetoric. And it's not just managed care. The PIP modules required for Maintenance of Certification all recommend using rating scales as part of treatment, and imply that you're not doing your job if you don't use them. It's just one step away from using them instead of treatment.
      One thing that impressed me in the Hamilton paper was the assumption that raters would be skilled interviewers adept at eliciting data, and that the quality of the rating was dependent on the interviewing skills of the rater.
      "The present scale has been devised for use only on
      patients already diagnosed as suffering from
      affective disorder of depressive type. It is used for
      quantifying the results of an interview, and its value
      depends entirely on the skill of the interviewer in
      eliciting the necessary information. The interviewer
      may, and should, use all information available to
      help him with his interview and in making the final
      This is not a 2 minute, nameless, faceless rating scale that can be implemented by anyone with a pencil.

  3. I agree with the comments in this string. Have you seen the RDQ by Zimmerman? I have a post about it at http://thepracticalpsychosomaticist.com/2013/09/16/the-rdq-for-depression-from-dr-zimmerman-and-colleagues/

    I realize it's another checklist, but it contains elements that patients might think are more important to them than symptoms alone. It's too long for me to use on a consult service.

    I admit I've tried using the QIDS-SR and the CUDOS in outpatient clinics. I tried to use them as points of departure. How a patient answered a particular item usually led to a more detailed conversation about what was going on in their lives.

    I think it's interesting that Zimmerman compared the PHQ-9 with other tools, including his own (the CUDOS)and the HAM-D. The abstract is at the end of my post, http://thepracticalpsychosomaticist.com/2013/07/08/integrated-care-marginalizing-psychiatrists-or-optimizing-access-to-psychiatric-treatment/

    I think Zimmerman rightly cautions us about using these instruments to determine depression severity.