The Inter- and Intraexaminer Reliability of a Paraspinal Skin Temperature Differential Instrument

Plaugher G, Lopes MA, Melch PE, Cremata EE.  J Manipulative Physiol Ther. 1991 Jul-Aug;14(6):361-7.
An experiment was undertaken to determine the intra- and interexaminer reliability of a paraspinal skin temperature differential instrument. 19 pain-free female chiropractic college students participated as subjects for the investigation. Three separate areas of the spine (C4-T2) and substantial agreement in the region T4-T8. The lumbar region could not be evaluated with he Kappa statistic due to limited variation. Following agreement for a positive finding in a given area, the numerical ratings were evaluated for agreement with the intraclass correlation coefficient (ICC). The first observation between examiners indicated fair agreement (ICC=0.2756, p=0.0478). The second observation between examiners had substantial agreement (ICC=0.6402, p=0.042). Intraexaminer agreement was moderate for one examiner (ICC=0.5078, p=0.0016). The other examiner showed an excellent level of agreement (ICC=0.8588, p<0.001) between observations.

The Interexaminer Reliability of a Galvanic Skin Response Instrument

Plaugher G, Haas M, Doble RW Jr, Lopes MA, Cremata EE, Lantz C. J Manipulative Physiol Ther. 1993 Sep;16(7):453-9.
OBJECTIVE: To determine the interexaminer reliability of a protocol of use of a galvanic skin resistance device for detection of low resistance areas along the spinal column, in relatively pain-free subjects.
DESIGN: A blinded investigation of concordance of skin resistance examination findings over the spinal column using two clinicians experienced in the use of the instrument.
SETTING: A private practice chiropractic outpatient clinic.
PATIENTS: Sixty-four male and female chiropractic college students (mean age: 35 yr). The Visual Analog Pain Scale indicated a mean response of 7.6 mm on a 100-mm range.
MAIN OUTCOME MEASURES: Concordance of examiners evaluated with the kappa statistic.
RESULTS: The results indicated modest levels of concordance for the first study sample (n = 46). The average kappa was 0.37. The second group assessed (n = 18) also demonstrated only modest levels of interexaminer concordance. The average kappa value for this sample was 0.36.
CONCLUSION: The use of the Electrical Conductor Scanner instrument for evaluating putative spinal pathology through manifestations in skin resistance in relatively asymptomatic subjects is not supported by the results of this experiment. The unevenness of data generated from this experiment in certain spinal regions necessitates further investigation prior to making any strong conclusions regarding the usefulness of this instrument in the clinical setting.

Temperature Assessment for Neuromusculoskeletal Abnormalities of the Spinal Column

Plaugher G. Skin J Manipulative Physiol Ther. 1992 Jul-Aug;15(6):365-81.
OBJECTIVE: A qualitative review of the scientific literature on thermographic instrumentation for detecting neuromusculoskeletal abnormalities of the spinal column was made. Electronic infrared instrumentation (telethermography), liquid crystal thermography and various hand-held devices were scrutinized in terms of reliability and comparison with other diagnostic tests (e.g., computed tomography, myelography, electromyography, magnetic resonance imaging).
DATA SOURCES: A Medline literature search was performed from 1966 through 1990. English language material was retrieved using the following key words: thermography and spine, spinal injuries, cervical vertebrae, thoracic vertebrae, lumbar vertebrae, sacroiliac joint, lumbosacral region, back or neck. The Index to Chiropractic Literature was also reviewed. The categories of skin temperature and thermography were scrutinized. Chapters of texts and nonpublished works were not incorporated.
STUDY SELECTION: Studies involving the comparison of thermographic findings with those of other tests were the primary focus of the review. Case reports, as well as the use of thermography as an outcome measure, were also studied. Interexaminer reliability studies are reported.
DATA EXTRACTION: The study populations are characterized as well as binding procedures, if any. The authors' statistical work, if applicable, is presented and criticized.
DATA SYNTHESIS: Relatively few reliability studies exist for thermography. Emphasis has been on validity studies that compare the results of the thermogram with other reference tests. There has been a general lack of high-quality research design (e.g., blinding) throughout the thermographic literature base. The sensitivity of the various thermographic instrumentation has shown encouraging results, although this must be tempered with the generally poor design of many studies. Specificity, in contrast, has shown mixed results. The review indicated telethermography to be a sensitive diagnostic procedure for detecting abnormalities, such as disc protrusion, of the lumbar and cervical spine. Liquid crystal thermography effectiveness is difficult to determine due to the paucity of blinded investigations, although normative data for the cervical spine and upper extremities is present. Literature on the various hand-held instruments has revealed moderate levels of examiner reliability for infrared devices, with less information available for thermocouple instruments. Normative data for hand-held instruments is absent.
CONCLUSION: Continued investigation is needed in the area of thermographic research in light of the paucity of blinded and/or controlled investigations. More sensitive neurophysiological and anatomical measures must be used when comparing the results from thermography. The lack of an available gold standard for comparing thermographic findings has been problematic. Future research should focus on thermography as a noninvasive outcome measure and interpreter reliability.

Paraspinal Skin Temperature Assessment Rating Incongruent with the Data from Studies

Letter to the editor:
Lopes MA, Coleman RR.  Chiropractic and Manual Ther (2013-11-27 04:00).
We found the article, "Review of methods used by chiropractors to determine the site for applying manipulation" by Triano et. al. to be, in many ways, very applicable and clinically important and we commend the authors on this accomplishment. However, there is a subjective nature to parts of the process involved in this review, which has led in our opinion to at least one inaccurate determination. We found the rating of 'unfavorable' applied by the authors to skin temperature assessment to be inappropriate, primarily based on the studies accepted for review by the authors.

The P.A.R.T.S. concept, described as a widely utilized method to justify treatment, was used as an integral part of this review and was the format for the sections in the article. The 'T' in P.A.R.T.S. stands for tissue temperature, texture, and tone.
In regards to the tissue temperature portion of this 'T' section, the authors stated: "The evidence from high quality studies is unfavorable toward the use of paraspinal skin temperature measures to locate the site of care, due to limited reliability." We found this statement particularly interesting given that our study (1) was the top rated study by the authors in this category. We note that the authors generally refer to a range of findings of Kappa statistical values from our study, but do not mention the ICC values or the regional concordance differences or the differences between the first and second set of scans.

Instrumentation for paraspinal thermography is also one of the oldest methods of chiropractic assessment. Such instrumentation, including relatively new technology as with some of the instruments used in studies accepted for this review, has in most cases showed acceptable to excellent reliability as noted in the following articles reviewed by the authors:

Owens: "Intraexaminer and interexaminer reliability of paraspinal thermal scans using the TyTron C-3000 were found to be very high, with ICC values between 0.91 and 0.98. Changes seen in thermal scans when properly done are most likely due to actual physiological changes rather than equipment error (2)."

Hart: Reliability testing with 10 minute intervals between samples showed good ICC values of > 0.75 (3).

Roy: "…the infrared cameras showed that they were valid tools in a controlled environment (4)."

Plaugher, Lopes, et. al.: Following agreement for a positive finding in a given area, the interexaminer reliability of the first set of observations showed fair agreement with ICC values of 0.28. Intraexaminer agreement was moderate (ICC: 0.51) for one examiner and excellent (ICC: 0.86) for the other examiner. For the second set of observations between examiners the ICC values showed substantial agreement (0.64) and intraexaminer agreement was moderate for one examiner (0.51) and excellent for the other (0.86). Concordance measured with Kappa statistics were slight to moderate in the C4-T2 region and substantial in the T4-8 region. There was excessive overlap of the observations in the lumbar region, which contraindicated the use of Kappa statistics for that region (given a pre-requisite of some variation needed for Kappa), but this overlap more likely indicated high levels of interexaminer agreement in skin temperature differential findings frequently occurring at the same spinal levels (1).

Noting the range of possible determinations that could be applied to each procedure being reviewed, it appears that 'unfavorable' was not consistent with the evidence the authors accepted for their review. Here is the range of choices given by the authors:

"Favorable: For general use by clinicians to determine site of care

Favorable with limitations: Favorable for determining site of care although limits exist such as number and quality of studies, limited generalizability, etc.

Unclear: Based on the evidence available, it is unclear whether or not this procedure should be recommended for use

Unfavorable with exceptions: Procedure is not recommended for general use but may be used in limited circumstances
(e.g. other techniques unavailable.)

Unfavorable: Procedure is not recommended for use (limited number of studies, significant flaws in methods, not generalizable, high quality evidence against validity and/or reliability"

Triano et. al. mention that the unfavorable paraspinal skin temperature rating was based on "high quality studies" and since our study (1) was rated the highest and other studies on this subject showed favorable findings for reliability, it appears that the authors may have utilized our study as their primary source to opine that skin temperature assessment is unfavorable. Considering the findings of our study mentioned above in totality, the results were more positive than negative for reliability in our study, which leaves us with some confusion as to what Triano et. al. based a completely "unfavorable" rating on in regards to their conclusion.

The authors also seem to overlook the fact that various methods of skin temperature assessment exist. Contact thermocouple instrumentation is not the same as infrared thermography, which has shown very favorable reliability. The contact thermocouple instrument from our study in 1991, commonly known as a Nervoscope, was the original, non amplified version. There are now electronically amplified versions of that unit that need testing and it should not be assumed that one instrument study, no matter the quality, answers all questions about paraspinal thermography. Given the favorable results of the other paraspinal skin temperature studies utilizing different instruments and technology than ours, it seems those studies met the inclusion criteria and then were completely ignored in determining this rating.

We understand that there are some questions about the use of paraspinal skin temperature assessment: environmental controls, skin contact possibly affecting a reading or pattern, validity inadequately tested, etc. But many or most of the other assessment procedures deemed favorable in this study have similar questions about them.

There is enough data from the studies accepted for this review that show moderate to excellent reliability, however, that at least a conditional designation such as 'favorable with limitations' or 'unclear' should have been given for the paraspinal skin temperature assessment, although a 'favorable' rating appears more appropriate. The noninvasive nature of the assessment, lack of an expense burden to a patient, and a reasonable number of studies showing decent reliability should be enough to suggest this as a favorable assessment or at least unclear or favorable with limitations. Instrumentation thermography is close to a gold standard for this aspect of the P.A.R.T.S. concept.

Further, when comparing the designation of unfavorable for skin temperature assessment to tissue texture assessment (another part of the 'T' section), which was given a 'favorable' designation, we felt that tissue texture assessment proved to be no more or possibly less supported by the evidence presented in this review article than that presented for paraspinal skin temperature assessment. Tissue texture was listed as favorable based on only five studies, with three showing reliability of slight, fair and moderate respectively. It appears, therefore, that a more rigorous standard was applied to the paraspinal thermography than to tissue texture assessment.
Mark A. Lopes, D.C.
Roger R. Coleman, D.C.

1. Plaugher G, Lopes MA, Melch PE, Cremata, EE. The inter- and intraexaminer reliability of a paraspinal skin temperature differential instrument. J Manipulative Physiol Ther 1991,14:361-367.
2. Owens EF Jr, Hart JF, Donofrio JJ, Haralambous J, Mierzejewski E: Paraspinal skin temperature patterns: an interexaminer and intraexaminer reliability study. J
Manipulative Physiol Ther 2004, 27:155-159.
3. Hart JH, Omolo B, Boone WR, Brown C, Ashton A: Reliability of three methods of computer-aided thermal pattern analysis. J Can Chiropract Assoc 2007, 51:175-185.
4.. Roy R, Boucher JP, Comtois AS: Validity of infrared thermal measurements of
segmental paraspinal skin surface temperature. J Manipulative Physiol Ther 2006,