Psychophysical Measures of Subjective Experience

Tools and templates for measuring subjective intensity in behavioral research.

Why? gLMS gVAS Magnitude Estimation Code Qualtrics Analysis References

Why these methods?

Standard labeled scales (Likert, VAS) are the dominant tool in affective science and much of psychology. They work well for many purposes, but they have two documented problems that psychophysics has developed methods to address:

Interpersonal relativity. Labels mean different things to different people. If a group experiences sensations more intensely, they also interpret "extremely intense" more intensely — the label anchors scale with the person's experience. This is the El Greco fallacy, and it can cause real group differences to vanish or illusory differences to appear.

See: Chituc & Scholl (2025). The El Greco fallacy, this time with feeling. Affective Science, 6, 526–533.

Nonlinearity. Labeled scale responses are logarithmically related to actual subjective magnitude. Averaging Likert responses directly is analogous to averaging Richter scale values to estimate average earthquake energy — it systematically underestimates the mean due to Jensen's inequality.

See: Chituc, Crockett & Scholl (2026). How to show that a cruel prank is worse than a war crime. Cognition, 266, 106315.

The generalized Labeled Magnitude Scale (gLMS)

A vertical visual analog scale with quasi-logarithmically spaced semantic labels. The participant clicks or drags along a continuous scale (0–100), guided by the following labels:

Position	Label
100	Strongest imaginable sensation of any kind
52.5	Very strong
34.7	Strong
17.2	Moderate
6.1	Weak
1.4	Barely detectable
0	No sensation

The key innovation: the top anchor is modality-general. It asks about the strongest sensation of any kind — not the strongest taste, or the strongest emotion. This breaks the El Greco problem: group differences in one modality (e.g., taste for supertasters, emotion for high-trait-anger individuals) don't distort the scale anchors.

Try the interactive gLMS demo →

Practice trials

Some gLMS protocols include 15 cross-modal practice trials before collecting target ratings (e.g., "the brightness of the sun on a clear day," "the pain of a paper cut"), following Hayes et al. (2013). The idea is to calibrate participants' use of the full scale range. However, our data suggest this may be unnecessary — the Scale × Gender interaction is significant regardless of whether practice trials are included. If you want to be thorough, the practice items are listed below, but for most online studies this is probably overkill.

Practice trial items (click to expand)

The warmth of lukewarm water
The brightness of a dimly lit room
The loudness of a whisper
The sweetness of a ripe strawberry
The pain of biting your tongue
The brightness of the sun on a clear day
The warmth of holding a hot cup of coffee
The loudness of a fire alarm
The sweetness of honey
The pain of a paper cut
The strength of a firm handshake
The warmth of a summer breeze
The loudness of normal conversation
The brightness of a candle in a dark room
The pain of stubbing your toe

Instructions text

This scale captures the entire range of how intense experiences can possibly be. The top of the scale (100) represents the strongest sensation you could ever imagine experiencing — the absolute maximum intensity across all sensory modalities. The bottom (0) is the complete absence of any sensation.

The gVAS (horizontal variant)

The gVAS (generalized Visual Analog Scale) is a horizontal slider anchored at "No sensation" (0) and "Strongest imaginable sensation of any kind" (100) — the same modality-general top anchor as the gLMS, but without the intermediate labels. It is substantially simpler to implement: in Qualtrics, it's just a standard horizontal slider with no custom JavaScript. (The gLMS requires custom JS to display quasi-logarithmically spaced labels on a vertical axis; the gVAS avoids this entirely.)

Try the interactive gVAS demo →

Pilot validation (N = 783). In a gVAS pilot for Chituc & Scholl (2025), participants recalled an anger episode and rated its intensity on a horizontal gVAS. The gender difference in anger intensity was not significant on the gVAS: women M = 63.5 (SD = 22.3), men M = 61.4 (SD = 23.9), d = 0.09, t(779) = 1.25, p = .21. Combining these gVAS data with the Likert data from Experiment 1a of the published study (using the same analysis procedure: log-transform and z-score the gVAS data, z-score the Likert data, test the Scale × Gender interaction), the interaction was significant: F(3, 1779) = 3.45, p = .016, η² = .006. The gender gap on the Likert (d = 0.17, p = .009) shrank to near zero on the gVAS (d = 0.09, p = .21), replicating the pattern observed with the gLMS.

The gVAS retains the critical feature of the gLMS — a modality-general top anchor — while being much easier to deploy online. If you need a quick, no-custom-code alternative to the gLMS, the gVAS is a solid choice.

Magnitude Estimation

Free-response numerical judgment relative to a named benchmark. Participants assign any non-negative number, and all judgments are interpreted as ratios relative to the benchmark. This provides ratio-scale measurement — a 20 really means "twice as much" as a 10.

Try the interactive ME demo →

There are two key design decisions when using magnitude estimation:

Modulus type: Does the experimenter assign a fixed number to the benchmark (e.g., "stealing a wallet = 10"), or does the participant choose their own number?
Instruction length: How much explanation do you give?

Below are the exact wordings used across nine experiments in Chituc, Crockett & Scholl (2026) and related studies. All produced valid ratio-scale data.

Fixed modulus (experimenter assigns the number)

Two-page instructions (benchmark = wallet, modulus = 10)

Page 1

In this study, we will ask you to read a few scenarios and rate them using a special kind of measurement. After you read each scenario, we will ask you to rate how moral a person or action is by assigning it a number. Please use a 0 to mean "neither moral nor immoral." (Imagine something like playing with a pen. It's neither morally bad nor morally good, it just is).

As a 10, we want you to think about the morality of the following event: stealing a wallet.

This event is called your benchmark.

Source: Chituc, Crockett & Scholl (2026), Studies 1a–1c

Page 2

In this study, we will ask you to rate one scenario relative to this benchmark. If the scenario describes something just as immoral as the benchmark, you should also rate it a 10. If it describes something half as immoral as the benchmark, you should rate it a 5. If it is twice as immoral as the benchmark, rate it a 20, and so on. You can use decimals (e.g. "7.5") if you feel like you need to.

Please try to keep the benchmark and these instructions carefully in mind when you give your response. If you read about something that seems extremely immoral for example, please don't respond with an arbitrarily high number. Instead, really think about just how much more immoral it is relative to the benchmark, and rate it accordingly. There's no right or wrong answer to this question, so just do your best and try to be as careful and thoughtful as possible.

Source: Chituc, Crockett & Scholl (2026), Studies 1a–1c

One-paragraph instructions (benchmark = wallet, modulus = 10)

Single instruction block

On the next page, we would like you to rate how morally wrong something is by coming up with a number that captures how you feel. To do this, we would like you to think of how morally wrong it is to steal a wallet. However morally wrong that is, we'll call that a 10. From there, we would like you to rate a different moral event in reference to stealing a wallet. If something is twice as morally wrong as stealing a wallet, give it a 20. If it is half as morally wrong as stealing a wallet, give it a 5. You can use numbers as big or as small as you'd like (you can go over 100), but please try to give a realistic answer.

Source: Young Replication study (Study 6)

Minimal one-liner (benchmark = wallet, modulus = 10)

Rating prompt only

If stealing a wallet is a 10, this event is a...

Source: Used across Gamut Pilot, Construal Pilots, Moral Pilots 2.15

Fixed modulus with high anchor (benchmark = killing, modulus = 100)

Rating prompt

If killing an innocent stranger for fun is a 100, how morally wrong is it to steal a wallet? (Remember: a 50 would mean "half as immoral," 200 would mean "twice as immoral," etc. You can use any number you'd like in your response, including fractions, decimals, and numbers over 100).

Source: Moral Luck studies (Studies 8–9)

Free modulus (participant picks the number)

Full instructions (free modulus)

For this kind of rating, we'll ask you to describe the badness of a crime by directly comparing it to a different crime: stealing a wallet.

We want you to first just give that crime any number that feels right to you: people tend to overthink this, but it really doesn't matter what you choose. What does matter is that you assign the other crimes a number in relationship to stealing a wallet. Suppose you gave it a "20"

If another crime is just as bad as stealing a wallet, then you should also give it a 20. If it is half as bad, you should give it a 10. If it's ten times as bad, give it a 200, and so on. Please use decimals if you feel like you need to, and try to remember that there's no right or wrong answer to these questions, so just do your best.

Source: Chituc, Crockett & Scholl (2026), Studies 3–5 (Crimes 3.0)

Free modulus — story-based benchmark

Now, think about the [immorality] of the actions described in the benchmark story. However [immoral] you think it is, let's give that a number. If the benchmark seems very [immoral], you could give it a relatively large number. If it seems hardly [immoral] at all, you can give it a very small number. People tend to overthink this, but it really doesn't matter, and you can choose whatever number you like.

What does matter, though, is that you give the other stories you're about to read a number in relation to whatever number you choose for the benchmark story. For example, if you read a story that is twice as [immoral] as the benchmark, you would give a number that is twice as big. If you read a story that is half as [immoral] as the benchmark, you would give a number that is half as big. If it were ten times as [immoral], you would give a number ten times as big. And if it were only 1% as [immoral], you would give it a number that is 100 times smaller (or .1 times as large). You can use any number you want, big or small (10000 or .0001 are totally acceptable answers — just make sure you really mean it and aren't exaggerating).

There's no right or wrong answer to these questions, so just do your best.

Source: Morality-Immorality study. Bracketed terms were piped from condition variables (e.g., [immorality] / [morality]).

Which variant should I use?

All of these variants produce valid ratio-scale data. Rules of thumb:

Fixed modulus reduces between-subject variance in number use, simplifying analysis. Use this by default.
Free modulus more closely follows classical psychophysics (Stevens, 1956) and avoids potential anchoring effects. Use this if you want to be methodologically conservative or if your benchmark is ambiguous.
Two-page instructions are best for naïve online participants who have never done magnitude estimation before.
Minimal one-liners work surprisingly well once participants understand the task (e.g., after a practice trial or in within-subject designs).

Code Implementations

Python / PsychoPy

Full implementation with PsychoPy integration and standalone analysis helpers.

from glms import gLMS, MagnitudeEstimation

# Collect a gLMS rating (opens PsychoPy window)
scale = gLMS()
rating = scale.collect_rating("How intense was the taste?")

# Or get a PsychoPy Slider to embed in your own experiment
slider = scale.to_psychopy_slider(win)

# gVAS variant (horizontal)
scale_h = gLMS(orientation="horizontal")

# Magnitude estimation
me = MagnitudeEstimation(
    benchmark_label="stealing a wallet",
    benchmark_value=10,
    domain="moral wrongness"
)
print(me.instructions)

# Full gLMS session with practice trials
from glms import run_glms_with_practice
results = run_glms_with_practice("How intense was the anger?")

View source →

R / Shiny

Shiny UI components (vertical gLMS and horizontal gVAS), analysis helpers, and a publication-ready plot function.

source("glms.R")

# Shiny UI: vertical gLMS
glms_input("my_rating", "How intense was the taste?")

# Shiny UI: horizontal gVAS
glms_input("my_rating", "How intense?", orientation = "horizontal")

# Standard label positions
glms_labels()

# Analysis helpers
me_log_transform(x)      # log(x + 1)
geometric_mean(x)         # exp(mean(log(x)))

# Publication-ready gLMS figure
plot_glms(title = "gLMS Scale", accent_color = "#1b9e77")

# Magnitude estimation instructions
me_instructions(
  benchmark_label = "stealing a wallet",
  benchmark_value = 10,
  zero_label = "neither moral nor immoral",
  domain = "moral wrongness"
)

View source →

jsPsych 7+

Plugin for online experiments. Supports vertical and horizontal orientations, mobile touch, and custom labels.

import gLMSPlugin from './plugin-glms.js';

const trial = {
  type: gLMSPlugin,
  prompt: "How intense was the anger you felt?",
  show_instructions: true,
  // Optional:
  // orientation: "horizontal",  // gVAS variant
  // scale_height: 500,
  // button_label: "Continue",
  // required: true,
};

// Data saved: { response: 34.7, rt: 2341 }

For magnitude estimation in jsPsych, use the built-in survey-text plugin with number validation. Helper function included:

import { meInstructionText } from './plugin-glms.js';

const instructions = meInstructionText({
  benchmark_label: "stealing a wallet",
  benchmark_value: 10,
  domain: "moral wrongness"
});

View source →

Qualtrics

gLMS (vertical)

Custom JavaScript that transforms a standard Qualtrics slider into a vertical gLMS with quasi-logarithmically spaced labels.

Create a Slider question
Set min = 0, max = 100, step = 0.1
Click the question → gear icon → Add JavaScript
Paste the contents of glms-qualtrics.js into the addOnReady function

View source →

gVAS (horizontal) — the easy option

No custom JavaScript required. Just a standard Qualtrics horizontal slider:

Create a Slider question (horizontal is the default)
Set min = 0, max = 100, step = 0.1
Add the gLMS instruction text (from the instructions section above) as a preceding text block
The label text describes the scale anchors; participants respond on the 0–100 horizontal slider

This is what was used in the gVAS pilot (N = 783) that replicated the gLMS results.

Magnitude estimation in Qualtrics

No custom JavaScript needed:

Create a Text Entry question
Set validation: Number, Min = 0
Adapt the wording from the magnitude estimation section above

Analysis Tips

Magnitude estimation data

Log-transform before analysis: log(x + 1) to handle zeros.
Report geometric means rather than arithmetic means.
Start by dividing by the modulus (e.g., 10 if you gave them one, whatever number they chose if they supplied one). Then analyze log(ME + 1).

gLMS / gVAS data

Data is continuous (0–100), suitable for standard parametric tests.
The scale is not linearly spaced — the labels are quasi-logarithmically distributed. This is by design.
For group comparisons, the gLMS/gVAS avoids the El Greco confound that plagues standard Likert scales.

References

Bartoshuk, L. M., Duffy, V. B., Green, B. G., et al. (2004). Valid across-group comparisons with labeled scales: the gLMS versus magnitude matching. Physiology & Behavior, 82, 109–114.

Chituc, V., Crockett, M. J., & Scholl, B. J. (2026). How to show that a cruel prank is worse than a war crime: Shifting scales and missing benchmarks in the study of moral judgment. Cognition, 266, 106315.

Chituc, V. & Scholl, B. J. (2025). The El Greco fallacy, this time with feeling: How (not) to measure group differences in emotional intensity. Affective Science, 6, 526–533.

Green, B. G., Dalton, P., Cowart, B., et al. (1996). Evaluating the 'Labeled Magnitude Scale' for measuring sensations of taste and smell. Chemical Senses, 21, 323–334.

Green, B. G., Shaffer, G. S., & Gilmore, M. M. (1993). Derivation and evaluation of a semantic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses, 18, 683–702.

Hayes, J. E., Allen, A. L., & Bennett, S. M. (2013). Direct comparison of the generalized Visual Analog Scale (gVAS) and general Labeled Magnitude Scale (gLMS). Food Quality and Preference, 28, 36–44.

← vladchituc.com