Tools and templates for measuring subjective intensity in behavioral research.
Standard labeled scales (Likert, VAS) are the dominant tool in affective science and much of psychology. They work well for many purposes, but they have two documented problems that psychophysics has developed methods to address:
Interpersonal relativity. Labels mean different things to different people. If a group experiences sensations more intensely, they also interpret "extremely intense" more intensely — the label anchors scale with the person's experience. This is the El Greco fallacy, and it can cause real group differences to vanish or illusory differences to appear.
See: Chituc & Scholl (2025). The El Greco fallacy, this time with feeling. Affective Science, 6, 526–533.
Nonlinearity. Labeled scale responses are logarithmically related to actual subjective magnitude. Averaging Likert responses directly is analogous to averaging Richter scale values to estimate average earthquake energy — it systematically underestimates the mean due to Jensen's inequality.
See: Chituc, Crockett & Scholl (2026). How to show that a cruel prank is worse than a war crime. Cognition, 266, 106315.
A vertical visual analog scale with quasi-logarithmically spaced semantic labels. The participant clicks or drags along a continuous scale (0–100), guided by the following labels:
| Position | Label |
|---|---|
| 100 | Strongest imaginable sensation of any kind |
| 52.5 | Very strong |
| 34.7 | Strong |
| 17.2 | Moderate |
| 6.1 | Weak |
| 1.4 | Barely detectable |
| 0 | No sensation |
The key innovation: the top anchor is modality-general. It asks about the strongest sensation of any kind — not the strongest taste, or the strongest emotion. This breaks the El Greco problem: group differences in one modality (e.g., taste for supertasters, emotion for high-trait-anger individuals) don't distort the scale anchors.
Try the interactive gLMS demo →
Some gLMS protocols include 15 cross-modal practice trials before collecting target ratings (e.g., "the brightness of the sun on a clear day," "the pain of a paper cut"), following Hayes et al. (2013). The idea is to calibrate participants' use of the full scale range. However, our data suggest this may be unnecessary — the Scale × Gender interaction is significant regardless of whether practice trials are included. If you want to be thorough, the practice items are listed below, but for most online studies this is probably overkill.
This scale captures the entire range of how intense experiences can possibly be. The top of the scale (100) represents the strongest sensation you could ever imagine experiencing — the absolute maximum intensity across all sensory modalities. The bottom (0) is the complete absence of any sensation.
The gVAS (generalized Visual Analog Scale) is a horizontal slider anchored at "No sensation" (0) and "Strongest imaginable sensation of any kind" (100) — the same modality-general top anchor as the gLMS, but without the intermediate labels. It is substantially simpler to implement: in Qualtrics, it's just a standard horizontal slider with no custom JavaScript. (The gLMS requires custom JS to display quasi-logarithmically spaced labels on a vertical axis; the gVAS avoids this entirely.)
Try the interactive gVAS demo →
The gVAS retains the critical feature of the gLMS — a modality-general top anchor — while being much easier to deploy online. If you need a quick, no-custom-code alternative to the gLMS, the gVAS is a solid choice.
Free-response numerical judgment relative to a named benchmark. Participants assign any non-negative number, and all judgments are interpreted as ratios relative to the benchmark. This provides ratio-scale measurement — a 20 really means "twice as much" as a 10.
There are two key design decisions when using magnitude estimation:
Below are the exact wordings used across nine experiments in Chituc, Crockett & Scholl (2026) and related studies. All produced valid ratio-scale data.
In this study, we will ask you to read a few scenarios and rate them using a special kind of measurement. After you read each scenario, we will ask you to rate how moral a person or action is by assigning it a number. Please use a 0 to mean "neither moral nor immoral." (Imagine something like playing with a pen. It's neither morally bad nor morally good, it just is).
As a 10, we want you to think about the morality of the following event: stealing a wallet.
This event is called your benchmark.
In this study, we will ask you to rate one scenario relative to this benchmark. If the scenario describes something just as immoral as the benchmark, you should also rate it a 10. If it describes something half as immoral as the benchmark, you should rate it a 5. If it is twice as immoral as the benchmark, rate it a 20, and so on. You can use decimals (e.g. "7.5") if you feel like you need to.
Please try to keep the benchmark and these instructions carefully in mind when you give your response. If you read about something that seems extremely immoral for example, please don't respond with an arbitrarily high number. Instead, really think about just how much more immoral it is relative to the benchmark, and rate it accordingly. There's no right or wrong answer to this question, so just do your best and try to be as careful and thoughtful as possible.
On the next page, we would like you to rate how morally wrong something is by coming up with a number that captures how you feel. To do this, we would like you to think of how morally wrong it is to steal a wallet. However morally wrong that is, we'll call that a 10. From there, we would like you to rate a different moral event in reference to stealing a wallet. If something is twice as morally wrong as stealing a wallet, give it a 20. If it is half as morally wrong as stealing a wallet, give it a 5. You can use numbers as big or as small as you'd like (you can go over 100), but please try to give a realistic answer.
If stealing a wallet is a 10, this event is a...
If killing an innocent stranger for fun is a 100, how morally wrong is it to steal a wallet? (Remember: a 50 would mean "half as immoral," 200 would mean "twice as immoral," etc. You can use any number you'd like in your response, including fractions, decimals, and numbers over 100).
For this kind of rating, we'll ask you to describe the badness of a crime by directly comparing it to a different crime: stealing a wallet.
We want you to first just give that crime any number that feels right to you: people tend to overthink this, but it really doesn't matter what you choose. What does matter is that you assign the other crimes a number in relationship to stealing a wallet. Suppose you gave it a "20"
If another crime is just as bad as stealing a wallet, then you should also give it a 20. If it is half as bad, you should give it a 10. If it's ten times as bad, give it a 200, and so on. Please use decimals if you feel like you need to, and try to remember that there's no right or wrong answer to these questions, so just do your best.
Now, think about the [immorality] of the actions described in the benchmark story. However [immoral] you think it is, let's give that a number. If the benchmark seems very [immoral], you could give it a relatively large number. If it seems hardly [immoral] at all, you can give it a very small number. People tend to overthink this, but it really doesn't matter, and you can choose whatever number you like.
What does matter, though, is that you give the other stories you're about to read a number in relation to whatever number you choose for the benchmark story. For example, if you read a story that is twice as [immoral] as the benchmark, you would give a number that is twice as big. If you read a story that is half as [immoral] as the benchmark, you would give a number that is half as big. If it were ten times as [immoral], you would give a number ten times as big. And if it were only 1% as [immoral], you would give it a number that is 100 times smaller (or .1 times as large). You can use any number you want, big or small (10000 or .0001 are totally acceptable answers — just make sure you really mean it and aren't exaggerating).
There's no right or wrong answer to these questions, so just do your best.
All of these variants produce valid ratio-scale data. Rules of thumb:
Full implementation with PsychoPy integration and standalone analysis helpers.
from glms import gLMS, MagnitudeEstimation
# Collect a gLMS rating (opens PsychoPy window)
scale = gLMS()
rating = scale.collect_rating("How intense was the taste?")
# Or get a PsychoPy Slider to embed in your own experiment
slider = scale.to_psychopy_slider(win)
# gVAS variant (horizontal)
scale_h = gLMS(orientation="horizontal")
# Magnitude estimation
me = MagnitudeEstimation(
benchmark_label="stealing a wallet",
benchmark_value=10,
domain="moral wrongness"
)
print(me.instructions)
# Full gLMS session with practice trials
from glms import run_glms_with_practice
results = run_glms_with_practice("How intense was the anger?")
Shiny UI components (vertical gLMS and horizontal gVAS), analysis helpers, and a publication-ready plot function.
source("glms.R")
# Shiny UI: vertical gLMS
glms_input("my_rating", "How intense was the taste?")
# Shiny UI: horizontal gVAS
glms_input("my_rating", "How intense?", orientation = "horizontal")
# Standard label positions
glms_labels()
# Analysis helpers
me_log_transform(x) # log(x + 1)
geometric_mean(x) # exp(mean(log(x)))
# Publication-ready gLMS figure
plot_glms(title = "gLMS Scale", accent_color = "#1b9e77")
# Magnitude estimation instructions
me_instructions(
benchmark_label = "stealing a wallet",
benchmark_value = 10,
zero_label = "neither moral nor immoral",
domain = "moral wrongness"
)
Plugin for online experiments. Supports vertical and horizontal orientations, mobile touch, and custom labels.
import gLMSPlugin from './plugin-glms.js';
const trial = {
type: gLMSPlugin,
prompt: "How intense was the anger you felt?",
show_instructions: true,
// Optional:
// orientation: "horizontal", // gVAS variant
// scale_height: 500,
// button_label: "Continue",
// required: true,
};
// Data saved: { response: 34.7, rt: 2341 }
For magnitude estimation in jsPsych, use the built-in survey-text plugin with number validation. Helper function included:
import { meInstructionText } from './plugin-glms.js';
const instructions = meInstructionText({
benchmark_label: "stealing a wallet",
benchmark_value: 10,
domain: "moral wrongness"
});
Custom JavaScript that transforms a standard Qualtrics slider into a vertical gLMS with quasi-logarithmically spaced labels.
glms-qualtrics.js into the addOnReady functionNo custom JavaScript required. Just a standard Qualtrics horizontal slider:
This is what was used in the gVAS pilot (N = 783) that replicated the gLMS results.
No custom JavaScript needed:
log(x + 1) to handle zeros.Bartoshuk, L. M., Duffy, V. B., Green, B. G., et al. (2004). Valid across-group comparisons with labeled scales: the gLMS versus magnitude matching. Physiology & Behavior, 82, 109–114.
Chituc, V., Crockett, M. J., & Scholl, B. J. (2026). How to show that a cruel prank is worse than a war crime: Shifting scales and missing benchmarks in the study of moral judgment. Cognition, 266, 106315.
Chituc, V. & Scholl, B. J. (2025). The El Greco fallacy, this time with feeling: How (not) to measure group differences in emotional intensity. Affective Science, 6, 526–533.
Green, B. G., Dalton, P., Cowart, B., et al. (1996). Evaluating the 'Labeled Magnitude Scale' for measuring sensations of taste and smell. Chemical Senses, 21, 323–334.
Green, B. G., Shaffer, G. S., & Gilmore, M. M. (1993). Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses, 18, 683–702.
Hayes, J. E., Allen, A. L., & Bennett, S. M. (2013). Direct comparison of the generalized Visual Analog Scale (gVAS) and general Labeled Magnitude Scale (gLMS). Food Quality and Preference, 28, 36–44.