A brief introduction on latent variable based ordinal regression models with an application to survey data

Johannes Wieditz^†,‡, Clemens Miller^*, Jan Scholand^‡, Marcus Nemeth^‡
^† Department of Medical Statistics, University Medical Centre Göttingen, Humboldtallee 32, 37073 Göttingen, Germany,
^‡ Department of Anaesthesiology, University Medical Centre Göttingen, Robert-Koch-Straße 40, 37075 Göttingen, Germany
^* Neurosurgical Critical Care Unit, Department of Neurosurgery, Medical University Innsbruck, Anichstrasse 35, 6020 Innsbruck, Austria

Abstract: The analysis of survey data is a frequently arising issue in clinical trials, particularly when capturing quantities that are difficult to measure. Typical examples are questionnaires about patient’s well-being, pain, or consent to an intervention. In these, data is captured on a discrete scale containing only a limited number of possible answers, from which the respondent has to pick the answer which fits best his/her personal opinion. This data is generally located on an ordinal scale as answers can usually be arranged in an ascending order, e.g., “bad”, “neutral”, “good” for well-being.

Since responses are usually stored numerically for data processing purposes, analysis of survey data using ordinary linear regression models are commonly applied. However, assumptions of these models are often not met as linear regression requires a constant variability of the response variable and can yield predictions out of range of the response categories. By using linear models, one only gains insights about the mean response, which may affect representativeness.

In contrast, ordinal regression models can provide probability estimates for all response categories and yield information about the full response scale beyond the mean. In this work, we provide a concise overview of the fundamentals of latent variable based ordinal models, applications to a real dataset, and outline the use of state-of-the-art software for this purpose. Moreover, we discuss strengths, limitations, and typical pitfalls. This is a companion work to a current vignette-based structured interview study in paediatric anaesthesia.

Keywords: Ordinal regression, latent variable, cumulative link models, logistic regression, response distribution, factors influencing willingness to consent to participation

1 Introduction

Data derived from surveys or patient interviews are often subject of research in medicine. As answers are often encoded numerically, e.g., using the numeric rating scale for pain, data analysis using linear models does often seem reasonable. Whereas for finely graduated response scales summary statistics such as mean or median response are often of interest, these are often not very meaningful or representative for response scales with only a few categories, e.g., “On a scale of 1 (absolutely yes) to 5 (absolutely no), do you consent in the participation in the following study?”. In this case, probabilities for the individual response levels or a proportion of responders who at least “rather consenting” for participation would be more revealing. The application of this statistical methodology is illustrated using a vignette-based interview study from paediatric anaesthesia (German Clinical Trials Register DRKS00027090.

1.1 Motivation

The setting of the study is as follows: Parents or legal representatives of inpatient children were asked for consent for participation in three fictional studies containing potential objectives for clinical studies in paediatric anaesthesia that widely differed in terms of invasiveness. Aim of this investigation was to identify factors influencing the willingness to participate (acronym FILIPPA) in the following studies:

a prospective observational study on a non-invasive temperature measurement sensor,
a randomised controlled trial (RCT) of inducing anaesthesia by intravenous vs. inhalative agents, and
a pharmacological study on intravenous ibuprofen (painkiller) designed to collect data for regular approval in children, corresponding to an open label phase-II pharmacological study.

Interviews were conducted by the same investigator to ensure consistent wording and in the same order (defined by increasing invasiveness as outlined above). Legal representatives were asked if they were willing to have their child participating in the corresponding studies. Responses were captured on a five-point Likert-scale of “absolutely consent”, “rather consent”, “unsure”, “rather decline” and “absolutely decline” participation. For further details on experimental design and the exact descriptions of the studies we refer to (Miller et al. 2024).

Figure 1 portrays a descriptive statistic of the response distribution stratified by the three studies with increasing invasiveness (bar charts, left to right) in form of an alluvial diagram. The streams between the bar charts show the migration between the answers from one question to the following one. It appears that as the study becomes more invasive, willingness to participate generally decreases. Note that this behaviour is, however, not present in all participants. This might be due to the fact that a certain understanding of research in medicine is beneficial to understand the different degrees of invasiveness between the studies, particularly the RCT and the pharmacological study.

Alluvial diagram of the migration of the level of consent for three studies with increasing level of invasiveness (left to right) indicated by the streams between the bars. The bar charts show the frequency distribution of the corresponding responses for a given study.

Figure 1: Alluvial diagram of the migration of the level of consent for three studies with increasing level of invasiveness (left to right) indicated by the streams between the bars. The bar charts show the frequency distribution of the corresponding responses for a given study.

Statistical inference, particularly identification of significant influencing factors or testing, however, is only possible within a statistical model. For this purpose, we perform an ordinal regression on the response depending on the study type and including various covariates such as child’s sex, age, preceding participation in studies, as well as age and professional medical degree of the legal representatives.

The applied statistical approach presented here is based on the literature briefly summarised below. (Agresti 2012) provides a comprehensive introduction to categorial data analysis but early work can already be found in (Hildebrand, Laing, and Rosenthal 1977) and (Johnson and Albert 2006). (Agresti 2013) and (McCullagh 1980) focus particularly on ordinal models, and (Fullerton 2009) provides a concise comparison of different types of models. Goodness of fit and model selection problems as well as summary measures of the model’s predictive power are addressed in (Fagerland and Hosmer 2016; Agresti and Kateri 2017; Agresti and Tarantola 2018). Most ordinal regression fall into the category of generalised linear models. To this end, (Agresti 2015) provide an detailed derivation. A software implementation based on these generalised linear models (so-called cumulative link models) as well as a comprehensive how-to-apply-it tutorial is given in (Christensen 2015, 2018). A clear online introduction using software examples can also be found under (Dunn 2020 a, 2020 b); the latter also covers issues from Bayesian ordinal regression. A broad overview about the topic and the framework including online resources is, moreover, also provided by (Harrell Jr 2015, 2023 a, 2023 d, 2023 c, 2023 b). More recent articles demonstrate the applications of cumulative link models in deep learning classification algorithms, see e.g. (Vargas, Gutiérrez, and Hervás-Martı́nez 2020). Regarding applications in medical research, (Norris et al. 2006) present a concise comparison between different approaches. For more finely graduated responses scales, as it is often the case for e.g. verbal, visual analogue or numeric scales, (Heller, Manuguerra, and Chow 2016) present an approach for data analysis. (Manuguerra, Heller, and Ma 2020) provide corresponding software to do so.

1.2 Audience

The target audience of this article is statisticians and medical researchers with a profound quantitative background who have not applied ordinal methods so far. We provide a brief introduction to the fundamentals of latent variable based ordinal models—an approach which yields an easy starting point to the topic in our opinion. This article particularly focuses on the analysis of data on response scales with only few levels and provides practical recommendation, enriched with corresponding $\texttt{R}$ code sections. For additional information such as alternative approaches, software or model extensions we provide references to the literature at appropriate positions.

1.3 R package ordinal

For the analyses of this tutorial we employ the $\texttt{R}$ package $\texttt{ordinal}$ by (Christensen 2015) which is available on CRAN, or via https://github.com/runehaubo/ordinal. Code sections are presented at suitable positions and were executed using $\texttt{R}$ version 4.3.2, (R Core Team 2023). The package supports fitting of ordinal fixed effects models ($\texttt{clm}$) as well as mixed effects models ($\texttt{clmm}$). To use the package no extraordinary data preprocessing is required. An overview of the data set is provided below:

data %>% glimpse()
#> Rows: 318
#> Columns: 8
#> $ id           <fct> 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, …
#> $ age          <dbl> 1.6438356, 1.6438356, 1.6438356, 13.3315068, 13.3315068, …
#> $ sex          <fct> Female, Female, Female, Female, Female, Female, Male, Mal…
#> $ partner      <chr> "Mother", "Mother", "Mother", "Father", "Father", "Father…
#> $ partner_age  <dbl> 35, 35, 35, 53, 53, 53, 38, 38, 38, 34, 34, 34, 32, 32, 3…
#> $ prev_studies <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F…
#> $ study        <fct> Observational study, Randomised controlled trial, Pharmac…
#> $ response     <ord> Absolutely consent, Rather consent, Rather consent, Absol…

Presented results are based moreover on the packages $\texttt{emmeans}$, $\texttt{ggstream}$ and $\texttt{tidyverse}$, (Lenth 2022; Sjoberg 2021; Wickham et al. 2019). Several graphical parameters have been outsourced; code is provided in the Appendix.

1.4 Further reading

There is a number of alternative approaches for ordinal regression and software packages providing tools to this end beyond the latent variable based approach presented here. The introduction of these approaches and software is out of the scope of this paper. We state some reading recommendations in the following.

(Yee 2015) present vector generalized linear and additive models requiring only that the regression coefficients enter through a set of linear predictors. The corresponding $\texttt{R}$ package $\texttt{VGAM}$ can be found in (Yee 2023). This package is highly flexible and allows, among other things, for multivariate ordinal responses or relaxing the proportional odds assumptions in considering partial proportional odds models.

Moreover, (Harrell Jr 2015) provides with the corresponding $\texttt{R}$ package $\texttt{rms}$ (Harrell Jr 2023 c) a large number of functions for fitting models, estimation, tests, confidence intervals, interpreting and displaying results. To be emphasised should be the functions $\texttt{orm}$ and $\texttt{lrm}$ which are primarily used for continuous and discrete ordinal data, respectively. These functions also implement cumulative link models, but without scale effects, partial or structured thresholds. A Bayesian alternative is implemented in the $\texttt{brms}$ package by (Bürkner 2017) or by the $\texttt{blrm}$ function of the $\texttt{rmsb}$ package by (Harrell Jr 2023 b) which often promises better convergence behaviours in presence of random effects in the model. For many additional resources for ordinal models, we refer to (Harrell Jr 2023 a, 2023 d).

1.5 Outline

This article is structured as follows: Section 2 puts ordinary linear regression and ordinal models in juxtaposition and points out each strengths and limitations. Section 3 introduces the fundamental ideas of ordinal regression models based on the latent variable approach. Section 4 considers the influence of discrete and continuous covariates on the response outcome and examine different aspects on estimated parameters and confidence intervals, prediction, goodness of fit and outline numerical issues that may arise. We allude to advanced topics in Section 5 and summarise our findings in Section 6. All computations are stated along with corresponding $\texttt{R}$ code sections. The corresponding manuscript can be found at https://arxiv.org/abs/2306.13538. This website was built using $\texttt{bookdown}$ and (Barnier 2022).

2 Ordinal models: strengths, limitations and alternatives

The question of how to evaluate ordinal response data appropriately has been widely discussed in the literature, cf. (Agresti 2012, Chapter 1). So far, however, no broad consensus has been found as the chosen approach often depends on the original question to be answered.

Approaches considering ordinal responses only as mere categorical data often do not exploit the additional structure. In contrast to these nominal approaches, ordinal models can provide often descriptive statistics similar to ordinary linear regression, such as means, slopes or correlations. Furthermore, ordinal analysis can use a greater variety of models. These models are usually more efficient yielding higher power for detecting trends or location alternatives using fewer parameters. Moreover, these parameters are often simpler in their interpretation than parameters in standard models for nominal variables, cf. (Agresti 2012, Section 1.2).

For ordinal data with many response levels, as for instance for visual analogue/ numeric scales, ordinal models are often inappropriate as individual response levels are not that meaningful or fitting might even be infeasible if the number of parameters to be estimated grows too large (Heller, Manuguerra, and Chow 2016). For this kind of data, responses are commonly already encoded numerically on the questionnaire. As a result, ordinary least squares is usually less problematic and provides more interpretable insights than an ordinal analysis, cf. (Agresti 2013, Section 1.2).

In contrast, for ordinal data consisting of only a few response levels, ordinal regression analysis is often the better choice even though a linear regression analysis might be useful for identifying variables that clearly affect the response variable. (Agresti 2013, Section 1.3.2) points out a number of reasons why ordinary linear regression is in this case often inappropriate:

There is usually not a clear-cut choice for the scores, i.e. a particular response outcome might be consistent with a range of values of some underlying latent score, modelling an abstract quantity causing the response, see Section 3. Ordinary regression analysis, however, does not allow for such an error.
An ordinary regression approach does not provide probability estimates for the response levels but only a prediction of the estimated value given covariates (possibly with corresponding confidence interval to quantify uncertainty).
Linear regression may yield predictions beyond the original response scale (i.e. above the highest or below the lowest level).
Linear regression ignores different variabilities in the response categories: usually there is only a small variability at predictor values for which observations fall mainly in the highest (or lowest) category, but there is a considerable variability at predictor values for which observations tend to spread among the categories.

As for the presented application, see Section 1.1, the distribution of consent and the number of people responding “rather consent” was of particular interest, we argue that an ordinal analysis is the most appropriate in this case. For further applications and discussions about model choices we refer to (Agresti 2012, Section 1.3) and the references therein.

3 Ordinal models

In ordinal statistics, the quantity of interest is typically an ordinal response (e.g., of a survey), modelled by a random variable $Y$ which is assumed to take values on an ordinal scale with $L\ge 2$ levels, $1 \le 2 \le \dots \le L$, (e.g. the levels of consent). Note that although the levels are encoded numerically, it is neither assumed that we can interpret between-level distances (e.g., $2-1$) nor that the distances between two levels are equidistant—there might be a large difference between “unsure” and “rather consent” but only a small one between “rather consent” and “absolutely consent”.

3.1 Latent variable approach

For ordinal regression, we assume that $Y$ can be acquired as the discretisation of an unobserved, continuous latent score $S$ as

\[ \begin{equation} Y = \ell \quad \text{if and only if} \quad \theta_{\ell-1} < S \le \theta_{\ell} \label{eq:defY} \end{equation} \] for all levels $\ell = 1,2,\dots, L$ where the $\theta_{\ell}$’s are cut-points (also threshold coefficients) corresponding to the level boundaries on the scale of the latent variable, see Figure 2. The cut-points $\theta_\ell$ are assumed to be ordered in a strictly increasing manner $-\infty = \theta_0 < \theta_1 <\dots < \theta_{L-1} < \theta_L = +\infty$ and have to be estimated within a regression framework. For an ordinal variable with $L$ levels we have to estimate $L-1$ cut-points $\theta_1,\theta_2,\dots,\theta_{L-1}$. Of note, the term cut-points is a traditional term for an endpoint in one inequality of ($\ref{eq:defY}$); it is not meant to suggest cutting data and is not related to categorisation or loss of information.

The latent variable includes all external parameters which can influence the response behaviour of the responder and is often thought of as an abstract quantity, e.g., consent, quality of life or pain, of which $Y=1,2,\dots,L$ represent the ordinal levels e.g., “absolutely consent”, “rather consent”, “unsure”, “rather decline”, “absolutely decline” (for $L=5$).

$Probability distribution of response $Y$ (hatched regions) to a questionnaire with $L=5$ possible answers (encoded as $1$ to $5$). The response probabilities are acquired as the area (coloured regions) under the density curve of $S$ between two consecutive cut points $\theta_{\ell}, \theta_{\ell+1}$, ${\ell}=0,1,2,3,4$.$

Figure 2: Probability distribution of response $Y$ (hatched regions) to a questionnaire with $L=5$ possible answers (encoded as $1$ to $5$). The response probabilities are acquired as the area (coloured regions) under the density curve of $S$ between two consecutive cut points $\theta_{\ell}, \theta_{\ell+1}$, ${\ell}=0,1,2,3,4$.

From Equation ($\ref{eq:defY}$) follows that the probability distributions of $Y$ and $S$ are related as $\mathbb{P}(Y \le \ell) = \mathbb{P}(S \le \theta_{\ell})$ and in particular \[\mathbb{P}(Y = \ell) = \mathbb{P}( \theta_{\ell-1} < S \le \theta_{\ell}) = \mathbb{P}(S \le \theta_\ell) - \mathbb{P}( S \le \theta_{\ell-1}).\] Within an ordinal regression framework, the influence of the covariates $\textbf{x}=(x_1,x_2,\dots,x_K)^\top \in \mathbb R^K$ on the response $Y$ is typically modelled on the latent scale as

\[ \begin{align} \label{eq:nd_eps} S = \textbf{x}^\top \boldsymbol{\beta} + \varepsilon= \sum_{k=1}^K x_k \beta_k + \varepsilon \end{align} \]

where $\boldsymbol{\beta}\in\mathbb R^K$ contains the regression parameters and $\varepsilon$ is a random variable whose distribution needs to be specified, typically with zero mean and known variance. Some common choices are stated below in Section 3.2.

3.2 Methodological details

3.2.1 Link functions

Assume $\varepsilon\sim\mathcal{N}(0,1)$ to be standard normally distributed. Then, we obtain the probability distribution of the survey response $Y$ given covariates $\textbf{x}$ via $S$ as \[ \begin{align} \notag \mathbb{P}(Y \le \ell \mid \textbf{x}) &= \mathbb{P}\left( S \le \theta_{\ell} \mid \textbf{x} \right) = \mathbb{P}\big({S - \textbf{x}^\top \boldsymbol{\beta}} \le \theta_\ell - \textbf{x}^\top \boldsymbol{\beta} \mid \textbf{x} \big)\\ \label{eq:prob} &\stackrel{(\ref{eq:nd_eps})}{=} \Phi\left(\theta_\ell - \textbf{x}^\top \boldsymbol{\beta}\right), \end{align} \] where $\Phi$ is the distribution function of the standard normal distribution. Particularly, the probability for a response answer can be obtained as the area under a normal curve between two consecutive cut points (possibly shifted by the term including information about covariates), see Figure 2. This model is, in relation to the probit model for binary responses, also called ordered probit model.
Another popular choice is to assume $\varepsilon$ to follow a standard logistic distribution, i.e., $\varepsilon$ has cumulative distribution function $F(t) = \mathbb{P}(\varepsilon\le t) = 1 / (1 + \mathrm{e}^{-t})$. Then, similarly to Equation ($\ref{eq:prob}$), we obtain \[ \begin{align} \notag \mathbb{P}(Y \le \ell \mid \textbf{x}) = F\left(\theta_\ell - \textbf{x}^\top \boldsymbol{\beta}\right) = \frac{1}{1+\mathrm{e}^{-\left(\theta_\ell - \textbf{x}^\top \boldsymbol{\beta}\right)}} \end{align} \] and for the log-odds holds \[ \begin{align} \label{eq:logodds} \log\left(\frac{\mathbb{P}(Y\le \ell \mid \textbf{x})}{1-\mathbb{P}(Y\le \ell \mid \textbf{x})}\right) = \theta_\ell - \textbf{x}^\top \boldsymbol{\beta}, \quad \ell=1,2,\dots, L. \end{align} \] Thus this model is also called proportional odds or ordered logistic regression model, (McCullagh 1980; Fullerton 2009). Note that for $L=2$, i.e. for binary responses, this model reduces to the ordinary logistic regression model.
More generally, $S$ can be modelled to follow an arbitrary distribution. (Christensen 2018, Section 2.2) provide a comprehensive overview about the choice of common link functions beyond the ones presented above, depending on the assumptions of the rating behaviour of the responder, e.g., choose a Cauchy distribution if extreme ratings are assumed to be more likely. Moreover, (Agresti 2012, Chapter 5) yield a differentiated view on the field of ordinal models with various examples and highlight aspects about practical and theoretical issues.

3.2.2 Cumulative link models

From Equation ($\ref{eq:prob}$) follows (for an arbitrary choice of $\varepsilon$) that we can write the cumulative distribution function (CDF) of $Y$ given the covariates $\textbf{x}$ as

\[ \begin{align} \label{eq:clm} \mathbb{P}(Y\le \ell \mid \textbf{x}) = F(\theta_\ell - \textbf{x}^\top \boldsymbol{\beta}), \quad \ell = 1,2,\dots,L, \end{align} \]

where $F$ is the distribution function of $\varepsilon$. As the function $F$ links the CDFs of $Y$ and $S$, ordinal models of the form ($\ref{eq:clm}$) are called cumulative link models and $F$ is called inverse link function (as it takes the linear predictor $\theta_\ell-\textbf{x}^\top \boldsymbol{\beta}$ from the latent space back and maps it to predicted probabilities for $Y$), cf. (Christensen 2018).

3.2.3 Intercept

Note, that in contrast to ordinary linear regression, the model from Equation ($\ref{eq:nd_eps}$) deliberately does not include an intercept term as a model with $S=\alpha + \textbf{x}^\top \boldsymbol{\beta} + \varepsilon$ and shifted cut points $\tilde\theta_\ell = \theta_\ell-\alpha$ would result in the same distribution for the response $Y$. Thus, the cut-points $\theta_\ell$ would not be identifiable in a model including a fixed unknown intercept. A model including a fixed known intercept, however, would result in the same parameter estimates $\boldsymbol{\beta}$ as for the model ($\ref{eq:nd_eps}$). For this reason, without loss of generality $\alpha=0$ is assumed.

3.3 Interpretation of model parameters

To conclude this section, let us briefly address the influence of the regression coefficients. An interpretation of this influence on the response variable $Y$ is, in general, difficult and depends on the chosen link function, cf. (Agresti 2012, Section 5.1.3). For the logit-link, there is the following relation for the difference in log-odds (i.e., the logarithm of the odds ratio). Note that here, the odds are the ratio of responding at most $\ell$ vs. responding at least $\ell +1$, $\ell=1,2,\dots, L$. Denote the vector of covariates by $\textbf{x}=(x_1,x_2,\dots, x_K)$ and let $\tilde{\textbf{x}}=(x_1,\dots,x_{k-1}, x_k+1, x_{k+1}, \dots,x_K)$ be differing from $\textbf{x}$ only at the $k$-th component by one, then

\[ \begin{align*} \log\left(\frac{\mathbb{P}(Y\le \ell \mid \textbf{x})}{1-\mathbb{P}(Y\le \ell \mid \textbf{x})}\right) - \log\left(\frac{\mathbb{P}(Y\le \ell \mid \tilde{\textbf{x}})}{1-\mathbb{P}(Y\le \ell \mid \tilde{\textbf{x}})}\right) &\stackrel{(\ref{eq:logodds})}{=} \theta_\ell - \textbf{x}^\top \boldsymbol{\beta} - \left( \theta_\ell - \tilde{\textbf{x}}^\top \boldsymbol{\beta} \right)\\ &= \left(\tilde{\textbf{x}}-\textbf{x}\right)^\top \boldsymbol{\beta} = \beta_k, \end{align*} \]

for all $\ell=1,2,\dots,L$.

Thus, for a binary group variable, the coefficient $\beta_k$ is the change in the log-odds for a change in groups for each cumulative probability, keeping all other parameters fixed. More general, for a continuous covariate $x_k$, $\beta_k$ is the effect of a unit increase in $x_k$ on the log-odds for each cumulative probability, controlling for all the other predictors. As a result, $\exp(\beta_k)$ is a cumulative odds ratio using any collapsing of the ordinal response for values of $x_k$ that differ by 1 unit. The corresponding consequences for the question analysed and an interpretation beyond that are, however, often debatable. Here, the interpretation of predicted response probabilities is more recommended and represents the true strength of ordinal models.

For an arbitrary distribution of $S$, we can still claim that a unit increase in $x_k$ corresponds to an increase in the mean $\mathop{\mathrm{\mathbb{E}}}S$ of $S$ by $\beta_k$, keeping the other predictor values fixed, cf. (Agresti 2012, Sections 5.1.3 and 5.2.1). Stating a relation between the regression parameters and the response distribution in general is, however, not possible.

4 Identification of influencing factors and effect quantification

Let us now investigate the influences of covariates on the response outcome. To this end, we consider at first discrete covariates, say, a binary group variable having two levels 0/1 (e.g., two different studies with one question each having the same possible answers). Then, the distribution of the latent variable $S$ for group 1 is the one for group 0 shifted by the regression coefficient $\beta$, see Figure 3. This results in a shift of the probability masses for all possible responses. In the example, this means that positive values for $\beta$ make responses encoded with high values more likely (and responses encoded with low values less likely), and vice versa for negative values of $\beta$.

$Graphical representation of the influence of a binary group variable to the distribution of the latent variable $S$ and effects on the probabilities of the survey responses $Y$: The entire distribution of the latent variable, particularly its mean ${\alpha}$ (dashed lines), is shifted by the regression coefficient ${\beta}$ to the right (left) if $\beta$ > $0$ ($\beta$ < $0$). The probabilities (coloured areas) for all response levels change accordingly.$

Figure 3: Graphical representation of the influence of a binary group variable to the distribution of the latent variable $S$ and effects on the probabilities of the survey responses $Y$: The entire distribution of the latent variable, particularly its mean ${\alpha}$ (dashed lines), is shifted by the regression coefficient ${\beta}$ to the right (left) if $\beta$ > $0$ ($\beta$ < $0$). The probabilities (coloured areas) for all response levels change accordingly.

Using the framework presented above, we are moreover able to test for significant differences in the response behaviour between two groups. Two response behaviours are considered to differ significantly if the corresponding regression coefficient $\beta$ on the latent scale differs significantly from zero, i.e., there is a significant shift in the distribution of the latent variable $S$. Testing procedures are of asymptotic Wald-type (Christensen 2015, Section 4.1).

4.1 Model application

We consider the data from the FILIPPA study from Section 1.1. We are interested whether and how the invasiveness of the study (observational/ RCT/ pharmacological study) and the child’s sex influence the response behaviour of the legal representatives. Therefore, we fit an ordinal regression model for the response with covariates $\texttt{study}$ type and child’s $\texttt{sex}$ with logit-link using the $\texttt{clm}$ function from the $\texttt{ordinal}$ package:

clmodel <- clm(response ~ study + sex, data = data, link = 'logit')
summary(clmodel)
#> formula: response ~ study + sex
#> data:    data
#> 
#>  link  threshold nobs logLik  AIC    niter max.grad cond.H 
#>  logit flexible  318  -359.80 733.60 6(1)  5.47e-12 2.3e+02
#> 
#> Coefficients:
#>                                  Estimate Std. Error z value Pr(>|z|)    
#> studyRandomised controlled trial   2.2127     0.3723   5.943 2.79e-09 ***
#> studyPharmacological study         2.8428     0.3682   7.721 1.16e-14 ***
#> sexMale                           -0.4778     0.2350  -2.033    0.042 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Threshold coefficients:
#>                                   Estimate Std. Error z value
#> Absolutely consent|Rather consent   1.9020     0.3362   5.658
#> Rather consent|Unsure               2.3924     0.3446   6.942
#> Unsure|Rather decline               2.9745     0.3541   8.401
#> Rather decline|Absolutely decline   3.6105     0.3670   9.837

This model provides:

$L-1 = 4$ threshold coefficients (cut-points), corresponding to the $\theta_{\ell}$’s in Figure 3;
two coefficients for the levels “RCT” ($\texttt{studyRCT}$) and “pharmacological study” ($\texttt{studyPharma}$) of invasiveness (in comparison to the “observational study”);
and one coefficient $\texttt{sexMale}$ for the influence of the child’s sex (here: boys in comparison to the girls).

The estimated coefficients for study invasiveness and sex are statistically significant (according to a Wald test, $p<0.001$ and $p=0.042$, respectively) at the two-sided 5% significance level. As the coefficient for study type “RCT” is positive, the latent score distribution of $Y$ for an RCT study is shifted to the right by 2.2127 in comparison to the observational study. See Figure 3 for an illustration where 1 denotes the highest level of consent (“absolutely consent”) and 5 the least (“absolutely decline”). As the cut-points do not depend on the covariates, by this shift we predict levels with lesser consent with higher probability.

A similar argument holds for the pharmacological study compared to the observational study (here, the shift is 2.8428, i.e. another 0.6301 units w.r.t. “RCT”). Moreover, as the coefficient for $\texttt{sex}$ is negative (–0.4778), legal representatives of boys were significantly more likely to consent to participate in the studies than representatives of girls ($p=0.042$).

To assess whether there is also a significantly different response behaviour between the RCT and the pharmacological study, we have to conduct one additional test. Note that to this, we have to adjust for multiple testing. For this purpose, the $\texttt{emmeans}$ method from the package of the same name $\texttt{emmeans}$ by (Lenth 2022) provides several possibilities using the $\texttt{adjust}$ parameter. By default, $p$-value adjustment using Tukey’s method, cf. (Tukey 1953), for multiple comparisons is applied:

emmeans(clmodel, specs = list(pairwise ~ study), mode = 'latent')[[2]]
#>  1                                                   estimate    SE  df z.ratio
#>  Observational study - Randomised controlled trial      -2.21 0.372 Inf  -5.943
#>  Observational study - Pharmacological study            -2.84 0.368 Inf  -7.721
#>  Randomised controlled trial - Pharmacological study    -0.63 0.252 Inf  -2.496
#>  p.value
#>   <.0001
#>   <.0001
#>   0.0336
#> 
#> Results are averaged over the levels of: sex 
#> P value adjustment: tukey method for comparing a family of 3 estimates

See also (Hsu 1996, Section 5) for the theoretical background and a comprehensive overview about multiple testing problems. From the last column it follows that there are significant differences in the response behaviour of voters between all three study types at the significance level 5%.

Note that in the model summary there are no $p$-values provided for the threshold coefficients as testing against zero would not make much sense—the actual position of the cut-points has no meaning but only the distances between each other and the relative position to the mean of the latent variable, cf. Section 3.2.3.

For this study, the odds for a legal representative of a girl to respond “absolutely consent” for the observational study are about $\exp(2.2127)=9.1$ times the odds for the RCT. This estimate is, however, relatively imprecise which can probably be reduced to the small number of legal representatives of a girl absolutely willing to participate in the RCT or the small sample size of the study overall. The 95%-confidence interval is [1.5154, 2.9858], corresponding to [4.551, 19.803] for the odds ratio. As descriptive statistics of effects (here: probabilities) can compare the cumulative probabilities more suitably and easier to interpret, we recommend to describe effects for ordinal regression quantitatively by presenting comparisons of probabilities e.g. at their extreme values, see also (Agresti 2012, Section 8.2.4). The procedure how to do so is described in the following.

4.2 Predicting probabilities, confidence intervals

The package $\texttt{emmeans}$ provides moreover a method for computing predicted marginal response probabilities for each possible answer (columns $\texttt{response}$ and $\texttt{prob}$), possibly stratified by given covariates (first line in each paragraph below):

emmeans(clmodel, ~ response | study / sex, mode = 'prob')
#> study = Observational study, sex = Female:
#>  response             prob      SE  df asymp.LCL asymp.UCL
#>  Absolutely consent 0.8701 0.03799 Inf   0.79565    0.9446
#>  Rather consent     0.0461 0.01408 Inf   0.01853    0.0737
#>  Unsure             0.0352 0.01162 Inf   0.01239    0.0579
#>  Rather decline     0.0223 0.00799 Inf   0.00660    0.0379
#>  Absolutely decline 0.0263 0.00941 Inf   0.00789    0.0448
...

Note, that as the regression parameters and cut-points are subject to uncertainties (the data is random), so are the estimated probabilities for the possible responses. Corresponding asymptotic (indicated by the infinite number of degrees of freedom $\texttt{df=Inf}$) confidence intervals, given as $[\texttt{asymp.LCL}$, $\texttt{asymp.UCL}]$, can be derived using the delta method and are provided in addition to the predictions in the $\texttt{emmeans}$ method above (last two columns). We refer to (Christensen 2018, Section 4.7) for further details on the derivation of standard errors (column $\texttt{SE}$) to compute these confidence intervals.

The $\texttt{emmeans}$ package contains moreover a function to show the regression output graphically using $\texttt{emmip}$ for a clear and compact presentation:

emmip(clmodel, ~ response | study / sex, mode = 'prob', CI = TRUE,
            style = 'factor',
            CIarg = list(colour=response_colours, size=5, alpha=.5 )) +
labs(y = 'Estimated probability', x = 'Level of consent') +
scale_x_discrete(limits=rev) +
scale_y_continuous(labels = scales::percent) +
facet_grid(sex ~ study) + theme

$Estimated response probabilities stratified by study type (horizontally) and child's sex (vertically) within the ordinal regression model of Example \@ref(ex-clm1).$

Figure 4: Estimated response probabilities stratified by study type (horizontally) and child’s sex (vertically) within the ordinal regression model of Example 4.1.

Figure 4 clearly shows differences in the response behaviour of legal representatives depending on the study invasiveness. More precisely, the statement that willingness to consent for participation decreases with increasing invasiveness from Section 4.1 is confirmed. Further, we discern smaller differences between the willingness to consent between boys and girls. Particularly for the RCT and the pharmacological study, more participants were responding “absolutely consent” and less “absolutely decline” in the boys’ strata than in girls’. Differences were about 12% and 6% for the RCT and about 10% and 11% for the pharmacological study, respectively.

Finally we like to remark that $\texttt{emmeans}$ provides for the possibility to compute cumulative and exceedance (i.e. 1 – cumulative) probabilities using $\texttt{mode = 'cum.prob'}$ and $\texttt{mode = 'exc.prob'}$, respectively. For instance, the probability for a legal representative of at least “rather consenting” is:

emmeans(clmodel, ~ study / sex, mode = 'cum.prob', at = list(cut = "Rather consent|Unsure"))
#>  study                       sex    cumprob     SE  df asymp.LCL asymp.UCL
#>  Observational study         Female   0.916 0.0264 Inf     0.864     0.968
#>  Randomised controlled trial Female   0.545 0.0576 Inf     0.432     0.658
#>  Pharmacological study       Female   0.389 0.0533 Inf     0.285     0.494
#>  Observational study         Male     0.946 0.0180 Inf     0.911     0.982
#>  Randomised controlled trial Male     0.659 0.0504 Inf     0.560     0.757
#>  Pharmacological study       Male     0.507 0.0510 Inf     0.407     0.607
#> 
#> Confidence level used: 0.95

Note, that the cumulative and exceedance probablities can be obtained from the individual probability estimates above. This, however, does not hold for the confidence intervals.

For the model parameters, the $\texttt{ordinal}$ package provides moreover profile likelihood based confidence intervals for model covariates. These confidence intervals often possess better coverage than Wald-type confidence intervals, particularly in studies with small to moderate sample size. The computation can be performed via:

confint(clmodel)
#>                                      2.5 %      97.5 %
#> studyRandomised controlled trial  1.515385  2.98583513
#> studyPharmacological study        2.156130  3.60999306
#> sexMale                          -0.940631 -0.01824491

Profile likelihood based confidence intervals are implemented for regression and scale parameters, but are not available for threshold, nominal and flexible link parameters (Christensen 2018).

4.3 Numerics and convergence

The model coefficients are estimated numerically using a maximum likelihood approach by calculation of zeroes of the gradient of the negative log-likelihood. Parameter estimates are output after a convergence criterion is satisfied (typically small gradient) or a maximum number of iterations has been reached. Numerics and control parameters can be passed using $\texttt{clm.control}$.

To oversee the convergence of the approach, the $\texttt{summary}$ of the $\texttt{clm}$ method provides three parameters, see the third line ff. in the summary of Section 4.1: $\texttt{niter}$ (the number of Newton-Raphson iterations needed with the number of step-halvings in paranthesis), $\texttt{max.grad}$ (the maximum absolute gradient of log-likelihood) and $\texttt{cond.H}$ (the condition of the Hessian at the maximum). More detailed information can be obtained using the $\texttt{convergence}$ method applied to the model (cf. also Section 5.1 below). (Christensen 2015) states that large $\texttt{cond.H}$ values (like $>$ $\texttt{1e4}$) might indicate that the model is ill-defined. As in the case of Section 4.1 $\texttt{max.grad}$ is small and $\texttt{cond.H}$ is reasonably sized, we can conclude that the algorithm seems to have converged properly.

4.4 Interaction effects

One might wonder whether it is appropriate to include also an interaction term $\texttt{study:sex}$ in the model. To this end, we can fit a model including this interaction and compare it with the model from Section 4.1 (not containing factor interaction). For this purpose, the $\texttt{ordinal}$ package provides the $\texttt{anova}$ method conducting a likelihood ratio test between both models:

clmodel_inter <- clm(response ~ study * sex, data = data, link = 'logit')
anova(clmodel, clmodel_inter)

This implies that there is no statistical evidence for including interaction $\texttt{study:sex}$ between study-type and child’s sex into the model ($p=0.8577$).

4.5 Random effects

To conclude this first example, we consider the question whether there are substantial differences in the individual response behaviours, i.e. whether there are parents responding systematically e.g. particularly low or high values or answering always “unsure” etc. To this end, we fit an ordinal model as in Section 4.1 but with additional individual random effect using the $\texttt{clmm}$ function from the $\texttt{ordinal}$ package and compare this model, analogously to 4.4 with the model from Section 4.1:

clmmodel <- clmm(response ~ study + sex + (1 | id), data = data, link = 'logit')
anova(clmodel, clmmodel)

Again, there is no strong evidence ($p=0.1444$) for substantial subject specific response behaviours; a test at the 5% significance level would not reject the model from Section 4.1 in favour to a model including additional individual random effects.

Note that, as of now, the $\texttt{clmm}$ function does not support predicting probabilities of a model containing random effects. In contrast, the former implementation $\texttt{clmm2}$ of $\texttt{clmm}$ does support prediction, however, provides only fitted values for a random effect of zero, cf. (Christensen 2015). Beyond this, we present an approximate approach using the $\texttt{emmeans}$ method at the end of the following Section 4.6.

Of note, methods from the $\texttt{ordinal}$ package seem to often not converge when there are random effects included. In this case, it might be worthwhile to consider e.g. a Bayesian random effects ordinal logistic model, cf. the $\texttt{blrm}$ from the $\texttt{rmsb}$ package (Harrell Jr 2023 b). Moreover, a further alternative for including repeated measures into ordinal models is implemented in the $\texttt{repolr}$ package by (Parsons 2016) which extends the $\texttt{polr}$ from the $\texttt{MASS}$ package by (Venables and Ripley 2002). More sophisticated designs for analysing survey data often involve clustering and probability sampling weights. We refer the interested reader to (An and others 2002; Selosse, Jacques, and Biernacki 2021) or (Agresti 2012, Chapter 12ff.).

4.6 Continuous covariates

Considerung the influence of a continuous covariate on the response outcome, fitting of models is done analogously as described in Section 4.1 above. The influence of a continuous covariate is portrayed schematically in Figure 5 for a univariate covariate ${x}\in\mathbb R^1$. The covariate ${x}$ is allowed to vary along the horizontal axis whereas the latent score is plotted on the vertical axis. This corresponds to Figure 3 reflected at the 45° line. A change in ${x}$ by one unit results in a shift of the latent score distribution by the corresponding regression coefficient, i.e. here $\beta$ units. More generally, the mean of the latent score between two responders with covariates ${x}'$ and ${x}''$ is shifted by $\pm{\beta}({x}''-{x}')$ (depending on which direction one considers). The probabilities of the response categories shift, given unchanged cut-points, accordingly, see the coloured areas in Figure 5.

$Graphical representation of the influence of a univariate continuous covariate $\textbf{x}\in\R^1$ to the distribution of the latent variable $S$ and effects on the probabilities of the survey responses $Y$: A change of the covariate $\textbf{x}$ causes a change of the mean of the latent score $\E S$ (cf. $\alpha$ in Figure \@ref(fig:model-parameters)) along the regression line (dash-dotted $-\cdot$), i.e. if the line's slope is $\beta$, a change of $\textbf{x}$ by one unit results in a change of $\E S$ by $\beta$ units. Particularly, assuming fixed cut-points, if the regression coefficient $\beta$ is positive (negative), a higher value of $\textbf{x}$ results in tendentially higher (smaller) response values.$

Figure 5: Graphical representation of the influence of a univariate continuous covariate $\textbf{x}\in\mathbb R^1$ to the distribution of the latent variable $S$ and effects on the probabilities of the survey responses $Y$: A change of the covariate $\textbf{x}$ causes a change of the mean of the latent score $\mathop{\mathrm{\mathbb{E}}}S$ (cf. $\alpha$ in Figure 3) along the regression line (dash-dotted $-\cdot$), i.e. if the line’s slope is $\beta$, a change of $\textbf{x}$ by one unit results in a change of $\mathop{\mathrm{\mathbb{E}}}S$ by $\beta$ units. Particularly, assuming fixed cut-points, if the regression coefficient $\beta$ is positive (negative), a higher value of $\textbf{x}$ results in tendentially higher (smaller) response values.

Below you can find the result for an ordinal model for the response depending on study type and child’s sex, as before, and additionally the age of the legal representative (or discussion partner, respectively) for the data from the FILIPPA study. Note that whenever there was more than one legal representative present at the interview, we chose the age of the oldest one for this analysis.

clmodel <- clm(response ~ study + sex + partner_age, data = data, link = 'logit')
summary(clmodel)
#> formula: response ~ study + sex + partner_age
#> data:    data
#> 
#>  link  threshold nobs logLik  AIC    niter max.grad cond.H 
#>  logit flexible  306  -342.41 700.82 6(1)  6.29e-10 1.7e+05
#> 
#> Coefficients:
#>                                  Estimate Std. Error z value Pr(>|z|)    
#> studyRandomised controlled trial  2.28547    0.38863   5.881 4.08e-09 ***
#> studyPharmacological study        3.00952    0.38550   7.807 5.87e-15 ***
#> sexMale                          -0.52753    0.24277  -2.173   0.0298 *  
#> partner_age                      -0.03713    0.01546  -2.402   0.0163 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Threshold coefficients:
#>                                   Estimate Std. Error z value
#> Absolutely consent|Rather consent   0.6098     0.6546   0.931
#> Rather consent|Unsure               1.0961     0.6564   1.670
#> Unsure|Rather decline               1.7115     0.6573   2.604
#> Rather decline|Absolutely decline   2.3568     0.6602   3.570
#> (12 observations deleted due to missingness)

We observe that there are only minor changes in the coefficients $\texttt{studyRCT}$, $\texttt{studyPharma}$ and $\texttt{sex}$ in comparison to Section 4.1. The coefficient $\texttt{partner_age}$ states that for every increase in the age of the legal representative by one year, the mean latent score decreases by about –0.03713 units. This means that the older the interview partners are, the more likely they are responding smaller response levels, i.e. the more likely they are consenting for study participation.

The resulting matrix of estimates can then be visualised as a stream plot as follows:

ggplot(clm_fit, aes(x = partner_age, y = fit, fill = response)) +
geom_stream(type = 'proportional', alpha = 0.8) +
labs(y = 'Estimated probability', x = 'Age of legal representative [years]',
    fill = 'Level of consent') +
scale_x_continuous(expand = c(0,0), limits = c(20, 60)) +
scale_y_continuous(expand = c(0,0), labels = scales::percent_format()) +
scale_fill_manual(values = response_colours, labels = responses) +
theme +
facet_grid(sex ~ study)

$Estimated response probabilities stratified by study type (horizontally) within the ordinal regression model of Section \@ref(ex-clm2).$

Figure 6: Estimated response probabilities stratified by study type (horizontally) within the ordinal regression model of Section 4.6.

In all three studies the level of consent is increasing for an increasing age of the legal representative even though this is a bit more pronounced in males than in female inpatients, see Figure 6. If we included an interaction term into the regression model, e.g., $\texttt{study:sex}$ between study type and child’s sex, it might occur that this behaviour differs among the different study types. However, as neither interaction between $\texttt{study}$, $\texttt{sex}$ and $\texttt{partner_age}$ was significant we did not include parameter interaction into the model.

Note that Figure 6, in contrast to Figure 4 does not contain any information about the estimated uncertainty of estimates (e.g., in form of confidence intervals). This is, however, rather a limitation of the graphical representation than of the applied method. We can still obtain asymptotic Wald-type confidence intervals for the probabilities of each response level given the covariates (e.g., here for all studies and a legal representative of age 42) using:

emmeans(clmodel, ~ response | study / partner_age, mode = 'prob', at = list(partner_age=42))
#> study = Observational study, partner_age = 42:
#>  response             prob      SE  df asymp.LCL asymp.UCL
#>  Absolutely consent 0.9171 0.02658 Inf   0.86504    0.9692
#>  Rather consent     0.0301 0.01016 Inf   0.01022    0.0500
#>  Unsure             0.0235 0.00833 Inf   0.00716    0.0398
#>  Rather decline     0.0137 0.00524 Inf   0.00341    0.0239
#>  Absolutely decline 0.0156 0.00598 Inf   0.00386    0.0273
...

4.7 Invariance to choice of number of response categories

(Agresti 2012, Section 3.3.3) points out that the regression parameters $\boldsymbol{\beta}$ in the latent variable model do not depend on the particular way the continuous latent scale is cut by the cut-points $\theta_\ell$. Thus, the effect of the parameters $\boldsymbol{\beta}$ are independent of the choice of the categories of $Y$. For instance, the same effect parameters apply for a variable with five consent levels (as in Section 4.1 and 4.6 above) or to a response variable with e.g. ten or only three consent levels. This makes it possible to compare model parameters from studies using different scales.

5 Advanced topics

To conclude we like to allude to a number of advanced topics which we deem to be important in the context fitting of ordinal models.

5.1 Convergence checks

As already mentioned in Section 4.3, the model coefficients are determined numerically. Whereas the $\texttt{summary}$ command shows only a brief outline about model coefficients and convergence, the $\texttt{convergence}$ command from the $\texttt{ordinal}$ packages yields more comprehensive overview about that and provides moreover information about the number of correctly estimated decimals.

5.2 Goodness of fit, model selection

In applications after model fitting typically the question for goodness of fit arises, i.e. “How well does our model predict the present data?”. For ordinary linear regression, quantities such as an $R^2$ coefficient of determination, quantifying the amount of data variance is explained by the model, are often stated. As a result, goodness of fit testing is enabled. In a nutshell, a non-significant $p$-value of a goodness of fit test indicates that there is no evidence that observed and fitted values/ frequencies do differ in statistically significant way (thus indicating a reasonable fit). Comparisons of several nested models are possible using criteria such as the Akaike (AIC) or the Bayesian information criterion (BIC) to account for increasing goodness of fit for an increasing number of included parameters, cf. (Agresti 2015, Section 4.6).

Goodness of fit testing in ordinal models is considered by (Pulkstenis and Robinson 2004; Fagerland and Hosmer 2016). Corresponding algorithms are implemented in $\texttt{R}$ in the $\texttt{generalhoslem}$ package by (Jay 2019). For practical applications, (Fagerland and Hosmer 2013) recommend calculation of three different methods, the Lipsitz test, an ordinal version of the Hosmer-Lemeshow test, and the Pulkstenis-Robinson test, to assess goodness of fit, each covering slightly different aspects of the problem. Of note, these tests, however, do not have good power to detect a particular type of lack of fit (Hosmer et al. 1997; Agresti 2013). Particularly, a large value of the global fit statistic only indicates some lack of fit, but does not provide insights about its nature (Agresti 2013). For an elaborate discussion about these tests and further details we refer to (Fagerland and Hosmer 2013, Section 6) or (Hosmer Jr, Lemeshow, and Sturdivant 2013, Chapter 5).

(Harrell Jr 2015, Sections 13.3.5 and 13.3.6) propose checking model assumptions and model fit graphically, e.g. using nomograms for assessing the proportional odds assumption and quantifying the model’s predictive ability using pseudo $R^2$ coefficients. Moreover, goodness-of-fit can be assessed by comparing the log-likelihood of the model with the one of a hoped-for-simpler model or the one of a richer model using tests similarly to the ones presented in Sections 4.4 and 4.5.

Finally, we like to remark that there is a number of various pseudo $R^2$ measures and measures of predictive discrimination for ordinal models. A common variant is (Nagelkerke 1991)’s $R^2$ which is also part of the standard output of the $\texttt{lrm}$ function by (Harrell Jr 2023 d). A broader variety is provided by the $\texttt{PseudoR2}$ function in the $\texttt{DescTool}$ package (Signorell 2023). We refer to (Tjur 2009) for a literature summary and some theoretical background.

5.3 Trial design and sample size computation

In practice, often the question arises how many participants a study should include. In case of an ordinal analysis the $\texttt{Hmisc}$ package (Harrell Jr 2023 a) provides the $\texttt{posamsize}$ and $\texttt{popower}$ functions to determine power of tests and sample size estimates for ordinal proportional odds models. For example, for a comparison of the response behaviour between study types “Observational” and “RCT” at an expected response distribution of $p = (.87, .05, .04, .02 ,.02)^\top$ for the observational study and an expected odds ratio of 9 a two-sided test on the significance level 5% achieves a power of 80% if we include 58 patients:

Hmisc::posamsize(c(.87, .05, .04, .02, .02), 9)
#> Total sample size: 57.2 
#> Efficiency of design compared with continuous response: 0.341

5.4 Model generalisations

Finally, the $\texttt{ordinal}$ package provides for a number of generalisations of the ordinal model presented in Section 3. The general form of a cumulative link model (see Equation ($\ref{eq:clm}$)) can be written as

\[\begin{align} \mathbb{P}(Y \le \ell \mid \textbf{x}, \textbf{w}, \textbf{z} ) &= F_\lambda\left( \frac{ g_{\boldsymbol{\alpha}}(\theta_\ell) - \textbf{x}^\top \boldsymbol{\beta} - \textbf{w}^\top \tilde{\boldsymbol{\beta}}_\ell}{\exp(\textbf{z}^\top \boldsymbol{\zeta})} \right) %= F_\lambda\left( \frac{ g_{\boldsymbol{\alpha}}(\theta_\ell) - \sum_{i=1}^k x_i \beta_i - \sum_{j=1}^m w_j \tilde{\beta}_{\ell, j}}{\exp\left(\sum_{s=1}^S z_s \zeta_s\right)} \right) \end{align}\]

where the parameters affect the model as follows:

$F_\lambda$ is the inverse link function which may be parametrised by a parameter $\lambda \in \mathbb R$. Its inverse $F_\lambda^{-1}$ is also referred to as flexible link function.
Cut-points may be transformed via $g_{\boldsymbol{\alpha}}(\theta_\ell)$ to be more structured to reduce the number of model parameters, thus increasing efficiency of the estimators. Typical choices are assumptions of symmetric distribution around the mean or having equal distances between each other (equidistant cut-points). Unrestricted cut-points (corresponding to $g_{\boldsymbol{\alpha}}$ to be the identity) are also referred to as flexible.
$\textbf{x}^\top \boldsymbol{\beta}$ are the ordinary regression effects as in Equation ($\ref{eq:clm}$).
Regression effects (or the cut-points, respectively) might be allowed to depend on covariates to include nominal effects $\textbf{w}^\top \tilde{\boldsymbol{\beta}}_\ell$, see Figure 7 top. This allows for more flexibility in the modelling of rating behaviours. Computationally, to include nominal effects you have to pass the corresponding variable to the $\texttt{nominal}$ parameter of $\texttt{clm}$. Note, however, that parameters included in nominal effects cannot be included as covariates due to identifiability reasons. Models including these nominal effects with logit-link are also called partial or non-proportional odds models, see (Peterson and Harrell Jr 1990).
The variance of the latent variable might be depending on covariates via $\exp(\textbf{z}^\top \boldsymbol{\zeta})$ (scale effects, see Figure 7 bottom), e.g., reflecting different variances in the rating behaviour depending on group membership. To use scale effects in the $\texttt{clm}$ method, pass variable names to the $\texttt{scale}$ parameter while fitting the model.

$Effects of including (top) nominal and (bottom) scale effects to an ordinal regression model. (top) For every nominal effect we have to estimate another $L$ (number of response levels) regression parameters, or equivalently $L$ threshold coefficients $\tilde{\theta}_\ell = \theta_\ell - \tilde{\beta}_\ell$. (bottom) Including scale effects changes the variance of the underlying latent variable $S$.$ $Effects of including (top) nominal and (bottom) scale effects to an ordinal regression model. (top) For every nominal effect we have to estimate another $L$ (number of response levels) regression parameters, or equivalently $L$ threshold coefficients $\tilde{\theta}_\ell = \theta_\ell - \tilde{\beta}_\ell$. (bottom) Including scale effects changes the variance of the underlying latent variable $S$.$

Figure 7: Effects of including (top) nominal and (bottom) scale effects to an ordinal regression model. (top) For every nominal effect we have to estimate another $L$ (number of response levels) regression parameters, or equivalently $L$ threshold coefficients $\tilde{\theta}_\ell = \theta_\ell - \tilde{\beta}_\ell$. (bottom) Including scale effects changes the variance of the underlying latent variable $S$.

A more comprehensive overview is provided in (Christensen 2018, Section 2.3ff.) with corresponding remarks concerning implementations in (Christensen 2018, Section 4). Whether it is reasonable to include any of these modifications into the model is usually up to the analysing statistician. Models can again be compared using likelihood ratio tests as described in Section 4.4 and 4.5, respectively. For a fitted model, likelihood ratio tests for models after adding nominal or scale effects can be performed via

nominal_test(clmodel)

or $\texttt{scale_test(clmodel)}$, respectively. These tests can be viewed as goodness-of-fit tests. In the model above, there is no statistical evidence that including nominal effects into the model would increase the model fit significantly.

Finally, there are extensions to nonlinear ordinal regression models, i.e. model in which the predictors are included in a nonlinear way. We refer e.g. to the functions $\texttt{ordglm}$ from the $\texttt{gnlm}$ (Swihart and Lindsey 2019) package or $\texttt{nordr}$ from the (Turner and Firth 2023) package for further details.

6 Conclusion

The tools of the ordinal regression framework provide a method to appropriately analyse ordinally scaled data, particularly with only a few number of response categories. By applying ordinal regression we yield probabilities for each response category. This offers a more differentiated view, e.g., regarding proportions of participants consenting in contrast to considering just a mean score. Moreover, considering data on a latent scale, we have the possibility of testing group differences and marginal effects such as cumulative odds. If survey data is very finely graduated, such as for numeric rating scales or comprehensive quality-of-life questionnaires, the usage of ordinal methods is often limited as the estimated parameters are not that meaningful anymore in contrast to e.g. a mean or median score or, are even not feasible due to a too large number of parameters to be estimated. For the FILIPPA study, we demonstrated that a latent variable based ordinal regression analysis has the potential of identifying factors of willingness to study participation and to quantify the probability of consenting to the participation in fictional studies with differing levels of invasiveness given a range of demographic variables from the children and their legal representatives.

Data availability statement

The raw data underlying the presented analyses will be made available by the author upon reasonable request.

Acknowledgements

We thank Thomas Asendorf for helpful discussions.

Conflicts of interest and financial disclosures

The authors have no conflicts of interest to declare. We conducted this research with institutional resources only. Open access charges were funded by the Open Access Publication Funds of the Göttingen University, who had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Agresti, Alan. 2012. Analysis of Ordinal Categorical Data. 2nd ed. John Wiley & Sons.

———. 2013. Categorical Data Analysis. 3rd ed. John Wiley & Sons.

———. 2015. Foundations of Linear and Generalized Linear Models. John Wiley & Sons.

Agresti, Alan, and Maria Kateri. 2017. “Ordinal Probability Effect Measures for Group Comparisons in Multinomial Cumulative Link Models.” Biometrics 73 (1): 214–19. https://doi.org/https://doi.org/10.1111/biom.12565.

Agresti, Alan, and Claudia Tarantola. 2018. “Simple Ways to Interpret Effects in Modeling Ordinal Categorical Data.” Statistica Neerlandica 72 (3): 210–23.

An, Anthony B, and others. 2002. “Performing Logistic Regression on Survey Data with the New Surveylogistic Procedure.” In Proceedings of the Twenty-Seventh Annual Sas Users Group International Conference, 258–27. SAS Institute Inc. Cary, NC.

Barnier, Julien. 2022. Rmdformats: HTML Output Formats and Templates for ’Rmarkdown’ Documents. https://CRAN.R-project.org/package=rmdformats.

Bürkner, Paul-Christian. 2017. “Brms : An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (August). https://doi.org/10.18637/jss.v080.i01.

Christensen, Rune Haubo B. 2018. “Cumulative Link Models for Ordinal Regression with the R Package Ordinal.” Submitted in J. Stat. Software 35.

Christensen, Rune Haubo Bojesen. 2015. “Ordinal—Regression Models for Ordinal Data.” R Package Version 28: 2015.

Dunn, Taylor. 2020a. “Ordinal Regression in R: Part 1.” March 15, 2020. https://tdunn.ca/posts/2020-03-15-ordinal-regression-in-r-part-1.

———. 2020b. “Ordinal Regression in R: Part 2.” March 17, 2020. https://tdunn.ca/posts/2020-03-17-ordinal-regression-in-r-part-2.

Fagerland, Morten W, and David W Hosmer. 2013. “A Goodness-of-Fit Test for the Proportional Odds Regression Model.” Statistics in Medicine 32 (13): 2235–49.

———. 2016. “Tests for Goodness of Fit in Ordinal Logistic Regression Models.” Journal of Statistical Computation and Simulation 86 (17): 3398–3418.

Fullerton, Andrew S. 2009. “A Conceptual Framework for Ordered Logistic Regression Models.” Sociological Methods & Research 38 (2): 306–47. https://doi.org/10.1177/0049124109346162.

Harrell Jr, Frank E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer Series in Statistics. Springer International Publishing. https://books.google.de/books?id=sQ90rgEACAAJ.

———. 2023a. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.

———. 2023b. Rmsb: Bayesian Regression Modeling Strategies. https://CRAN.R-project.org/package=rmsb.

———. 2023c. Rms: Regression Modeling Strategies. https://CRAN.R-project.org/package=rms.

———. 2023d. “Resources for Ordinal Regression Models.” May 2023. https://www.fharrell.com/post/rpo.

Heller, Gillian Z, Maurizio Manuguerra, and Roberta Chow. 2016. “How to Analyze the Visual Analogue Scale: Myths, Truths and Clinical Relevance.” Scandinavian Journal of Pain 13 (1): 67–75.

Hildebrand, David K, James D Laing, and Howard Rosenthal. 1977. Analysis of Ordinal Data. 8. Sage.

Hosmer, David W, Trina Hosmer, Saskia Le Cessie, and Stanley Lemeshow. 1997. “A Comparison of Goodness-of-Fit Tests for the Logistic Regression Model.” Statistics in Medicine 16 (9): 965–80.

Hosmer Jr, David W, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied Logistic Regression. Vol. 398. John Wiley & Sons.

Hsu, Jason. 1996. Multiple Comparisons: Theory and Methods. CRC Press.

Jay, Matthew. 2019. Generalhoslem: Goodness of Fit Tests for Logistic Regression Models. https://CRAN.R-project.org/package=generalhoslem.

Johnson, Valen E, and James H Albert. 2006. Ordinal Data Modeling. Springer Science & Business Media.

Lenth, Russell V. 2022. Emmeans: Estimated Marginal Means, aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.

Manuguerra, Maurizio, Gillian Z. Heller, and Jun Ma. 2020. “Continuous Ordinal Regression for Analysis of Visual Analogue Scales: The R Package ordinalCont.” Journal of Statistical Software 96 (8): 1–24. https://doi.org/10.18637/jss.v096.i08.

McCullagh, Peter. 1980. “Regression Models for Ordinal Data.” Journal of the Royal Statistical Society: Series B (Methodological) 42 (2): 109–27.

Miller, Clemens, Jan Scholand, Johannes Wieditz, Carlo Pandaro, Hanna Haus, Hendrik Rosewich, and Marcus Nemeth. 2024. “Factors Influencing Willingness to Participate in Clinical Studies in Pediatric Anesthesia (FILIPPA): A Vignette-Based, Structured Interview Study.” BMC Medical Research Methodology.

Nagelkerke, Nico JD. 1991. “A Note on a General Definition of the Coefficient of Determination.” Biometrika 78 (3): 691–92.

Norris, Colleen M, William A Ghali, L Duncan Saunders, Rollin Brant, Diane Galbraith, Peter Faris, Merril L Knudtson, Approach Investigators, and others. 2006. “Ordinal Regression Model and the Linear Regression Model Were Superior to the Logistic Regression Models.” Journal of Clinical Epidemiology 59 (5): 448–56.

Parsons, Nick. 2016. Repolr: Repeated Measures Proportional Odds Logistic Regression. https://CRAN.R-project.org/package=repolr.

Peterson, Bercedis, and Frank E Harrell Jr. 1990. “Partial Proportional Odds Models for Ordinal Response Variables.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 39 (2): 205–17.

Pulkstenis, Erik, and Timothy J. Robinson. 2004. “Goodness-of-Fit Tests for Ordinal Response Regression Models.” Statistics in Medicine 23 (6): 999–1014. https://doi.org/https://doi.org/10.1002/sim.1659.

R Core Team. 2023. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Selosse, Margot, Julien Jacques, and Christophe Biernacki. 2021. “OrdinalClust: An R Package to Analyze Ordinal Data.” The R Journal 12 (2).

Signorell, Andri. 2023. DescTools: Tools for Descriptive Statistics. https://CRAN.R-project.org/package=DescTools.

Sjoberg, David. 2021. Ggstream: Create Streamplots in ’Ggplot2’. https://CRAN.R-project.org/package=ggstream.

Swihart, Bruce, and Jim Lindsey. 2019. Gnlm: Generalized Nonlinear Regression Models. https://CRAN.R-project.org/package=gnlm.

Tjur, Tue. 2009. “Coefficients of Determination in Logistic Regression Models—a New Proposal: The Coefficient of Discrimination.” The American Statistician 63 (4): 366–72.

Tukey, John Wilder. 1953. “The Problem of Multiple Comparisons.” Multiple Comparisons.

Turner, Heather, and David Firth. 2023. Gnm: Generalized Nonlinear Models. https://CRAN.R-project.org/package=gnm.

Vargas, Victor Manuel, Pedro Antonio Gutiérrez, and César Hervás-Martı́nez. 2020. “Cumulative Link Models for Deep Ordinal Classification.” Neurocomputing 401: 48–58.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Yee, Thomas W. 2015. Vector Generalized Linear and Additive Models: With an Implementation in R. 1st ed. Springer Publishing Company, Incorporated.

Yee, T. W. 2023. VGAM: Vector Generalized Linear and Additive Models. https://CRAN.R-project.org/package=VGAM.

Appendix

theme <- theme(panel.background = element_blank(),
    panel.spacing = unit(1, 'lines'),
    panel.border = element_rect(colour = 'black', fill=NA, linewidth=1),
    axis.title.x = element_text(),
    axis.title.y = element_text(),
    axis.text.y = element_text(),
    axis.text.x = element_text(angle = 30, vjust=1, hjust=1),
    strip.text = element_text(),
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_line(linewidth = 0.5, linetype = 'dotted', 
        colour = 'lightgrey'),
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_line(linewidth = 0.5, linetype = 'dotted', 
        colour = 'lightgrey')
    )
responses <- factor(c("Absolutely consent", "Rather consent", "Unsure", "Rather decline", "Absolutely decline"), ordered = T)
response_colours <- rep(c('#4DAF4A', '#377EB8', '#FFFF33', '#FF7F00', '#E41A1C'), 6)

Correspondence

Johannes Wieditz^†,*
^† Department of Medical Statistics, University Medical Centre Göttingen,
Humboldtallee 32, 37073 Göttingen, Germany,
^* Department of Anaesthesiology, University Medical Centre Göttingen,
Robert-Koch-Straße 40, 37075 Göttingen, Germany
Mail: johannes.wieditz@med.uni-goettingen.de