The effective sample size in Bayesian information criterion for level‐specific fixed and random‐effect selection in a two‐level nested model

Abstract

Popular statistical software provides the Bayesian information criterion (BIC) for multi-level models or linear mixed models. However, it has been observed that the combination of statistical literature and software documentation has led to discrepancies in the formulas of the BIC and uncertainties as to the proper use of the BIC in selecting a multi-level model with respect to level-specific fixed and random effects. These discrepancies and uncertainties result from different specifications of sample size in the BIC's penalty term for multi-level models. In this study, we derive the BIC's penalty term for level-specific fixed- and random-effect selection in a two-level nested design. In this new version of BIC, called BICE1, this penalty term is decomposed into two parts if the random-effect variance–covariance matrix has full rank: (a) a term with the log of average sample size per cluster and (b) the total number of parameters times the log of the total number of clusters. Furthermore, we derive the new version of BIC, called BICE2, in the presence of redundant random effects. We show that the derived formulae, BICE1 and BICE2, adhere to empirical values via numerical demonstration and that BICE (E indicating either E1 or E2) is the best global selection criterion, as it performs at least as well as BIC with the total sample size and BIC with the number of clusters across various multi-level conditions through a simulation study. In addition, the use of BICE1 is illustrated with a textbook example dataset.

On generating plausible values for multilevel modelling with large‐scale‐assessment data

Abstract

Large-scale assessments (LSAs) routinely employ latent regressions to generate plausible values (PVs) for unbiased estimation of the relationship between examinees' background variables and performance. To handle the clustering effect common in LSA data, multilevel modelling is a popular choice. However, most LSAs use single-level conditioning methods, resulting in a mismatch between the imputation model and the multilevel analytic model. While some LSAs have implemented special techniques in single-level latent regressions to support random-intercept modelling, these techniques are not expected to support random-slope models. To address this gap, this study proposed two new single-level methods to support random-slope estimation. The existing and proposed methods were compared to the theoretically unbiased multilevel latent regression method in terms of their ability to support multilevel models. The findings indicate that the two existing single-level methods can support random-intercept-only models. The multilevel latent regression method provided mostly adequate estimates but was limited by computational burden and did not have the best performance across all conditions. One of our proposed single-level methods presented an efficient alternative to multilevel latent regression and was able to recover acceptable estimates for all parameters. We provide recommendations for situations where each method can be applied, with some caveats.

A correlated traits correlated (methods – 1) multitrait‐multimethod model for augmented round‐robin data

Abstract

We didactically derive a correlated traits correlated (methods – 1) [CTC(M – 1)] multitrait-multimethod (MTMM) model for dyadic round-robin data augmented by self-reports. The model is an extension of the CTC(M – 1) model for cross-classified data and can handle dependencies between raters and targets by including reciprocity covariance parameters that are inherent in augmented round-robin designs. It can be specified as a traditional structural equation model. We present the variance decomposition as well as consistency and reliability coefficients. Moreover, we explain how to evaluate fit of a CTC(M – 1) model for augmented round-robin data. In a simulation study, we explore the properties of the full information maximum likelihood estimation of the model. Model (mis)fit can be quite accurately detected with the test of not close fit and dynamic root mean square errors of approximation. Even with few small round-robin groups, relative parameter estimation bias and coverage rates are satisfactory, but several larger round-robin groups are needed to minimize relative parameter estimation inaccuracy. Further, neglecting the reciprocity covariance-structure of the augmented round-robin data does not severely bias the remaining parameter estimates. All analyses (including data, R scripts, and results) and the simulation study are provided in the Supporting Information. Implications and limitations are discussed.

A Gibbs‐INLA algorithm for multidimensional graded response model analysis

Abstract

In this paper, we propose a novel Gibbs-INLA algorithm for the Bayesian inference of graded response models with ordinal response based on multidimensional item response theory. With the combination of the Gibbs sampling and the integrated nested Laplace approximation (INLA), the new framework avoids the cumbersome tuning which is inevitable in classical Markov chain Monte Carlo (MCMC) algorithm, and has low computing memory, high computational efficiency with much fewer iterations, and still achieve higher estimation accuracy. Therefore, it has the ability to handle large amount of multidimensional response data with different item responses. Simulation studies are conducted to compare with the Metroplis-Hastings Robbins-Monro (MH-RM) algorithm and an application to the study of the IPIP-NEO personality inventory data is given to assess the performance of the new algorithm. Extensions of the proposed algorithm for application on more complicated models and different data types are also discussed.

A Bayesian nonparametric approach for handling item and examinee heterogeneity in assessment data

Abstract

We propose a novel nonparametric Bayesian item response theory model that estimates clusters at the question level, while simultaneously allowing for heterogeneity at the examinee level under each question cluster, characterized by a mixture of binomial distributions. The main contribution of this work is threefold. First, we present our new model and demonstrate that it is identifiable under a set of conditions. Second, we show that our model can correctly identify question-level clusters asymptotically, and the parameters of interest that measure the proficiency of examinees in solving certain questions can be estimated at a n$$ \sqrt{n} $$ rate (up to a log term). Third, we present a tractable sampling algorithm to obtain valid posterior samples from our proposed model. Compared to the existing methods, our model manages to reveal the multi-dimensionality of the examinees' proficiency level in handling different types of questions parsimoniously by imposing a nested clustering structure. The proposed model is evaluated via a series of simulations as well as apply it to an English proficiency assessment data set. This data analysis example nicely illustrates how our model can be used by test makers to distinguish different types of students and aid in the design of future tests.

Exploring examinees’ responses to constructed response items with a supervised topic model

Abstract

Textual data are increasingly common in test data as many assessments include constructed response (CR) items as indicators of participants' understanding. The development of techniques based on natural language processing has made it possible for researchers to rapidly analyse large sets of textual data. One family of statistical techniques for this purpose are probabilistic topic models. Topic modelling is a technique for detecting the latent topic structure in a collection of documents and has been widely used to analyse texts in a variety of areas. The detected topics can reveal primary themes in the documents, and the relative use of topics can be useful in investigating the variability of the documents. Supervised latent Dirichlet allocation (SLDA) is a popular topic model in that family that jointly models textual data and paired responses such as could occur with participants' textual answers to CR items and their rubric-based scores. SLDA has an assumption of a homogeneous relationship between textual data and paired responses across all documents. This approach, while useful for some purposes, may not be satisfied for situations in which a population has subgroups that have different relationships. In this study, we introduce a new supervised topic model that incorporates finite-mixture modelling into the SLDA. This new model can detect latent groups of participants that have different relationships between their textual responses and associated scores. The model is illustrated with an example from an analysis of a set of textual responses and paired scores from a middle grades assessment of science inquiry knowledge. A simulation study is presented to investigate the performance of the proposed model under practical testing conditions.

Evaluating the performance of existing and novel equivalence tests for fit indices in structural equation modelling

Abstract

It has been suggested that equivalence testing (otherwise known as negligible effect testing) should be used to evaluate model fit within structural equation modelling (SEM). In this study, we propose novel variations of equivalence tests based on the popular root mean squared error of approximation and comparative fit index fit indices. Using Monte Carlo simulations, we compare the performance of these novel tests to other existing equivalence testing-based fit indices in SEM, as well as to other methods commonly used to evaluate model fit. Results indicate that equivalence tests in SEM have good Type I error control and display considerable power for detecting well-fitting models in medium to large sample sizes. At small sample sizes, relative to traditional fit indices, equivalence tests limit the chance of supporting a poorly fitting model. We also present an illustrative example to demonstrate how equivalence tests may be incorporated in model fit reporting. Equivalence tests in SEM also have unique interpretational advantages compared to other methods of model fit evaluation. We recommend that equivalence tests be utilized in conjunction with descriptive fit indices to provide more evidence when evaluating model fit.

K‐Plus anticlustering: An improved k‐means criterion for maximizing between‐group similarity

Abstract

Anticlustering refers to the process of partitioning elements into disjoint groups with the goal of obtaining high between-group similarity and high within-group heterogeneity. Anticlustering thereby reverses the logic of its better known twin—cluster analysis—and is usually approached by maximizing instead of minimizing a clustering objective function. This paper presents k-plus, an extension of the classical k-means objective of maximizing between-group similarity in anticlustering applications. K-plus represents between-group similarity as discrepancy in distribution moments (means, variance, and higher-order moments), whereas the k-means criterion only reflects group differences with regard to means. While constituting a new criterion for anticlustering, it is shown that k-plus anticlustering can be implemented by optimizing the original k-means criterion after the input data have been augmented with additional variables. A computer simulation and practical examples show that k-plus anticlustering achieves high between-group similarity with regard to multiple objectives. In particular, optimizing between-group similarity with regard to variances usually does not compromise similarity with regard to means; the k-plus extension is therefore generally preferred over classical k-means anticlustering. Examples are given on how k-plus anticlustering can be applied to real norming data using the open source R package anticlust, which is freely available via CRAN.

Testing indirect effect with a complete or incomplete dichotomous mediator

Abstract

Past methodological research on mediation analysis mainly focused on situations where all variables were complete and continuous. When issues of categorical data occur combined with missing data, more methodological considerations are involved. Specifically, appropriate decisions need to be made on estimation methods of the indirect effects and on confidence intervals for testing the indirect effects with accommodations of missing data. We compare strategies that address these issues based on a model with a dichotomous mediator, aiming to provide guidelines for researchers facing such challenges in practice.

A sequential exploratory diagnostic model using a Pólya‐gamma data augmentation strategy

Abstract

Cognitive diagnostic models provide a framework for classifying individuals into latent proficiency classes, also known as attribute profiles. Recent research has examined the implementation of a Pólya-gamma data augmentation strategy binary response model using logistic item response functions within a Bayesian Gibbs sampling procedure. In this paper, we propose a sequential exploratory diagnostic model for ordinal response data using a logit-link parameterization at the category level and extend the Pólya-gamma data augmentation strategy to ordinal response processes. A Gibbs sampling procedure is presented for efficient Markov chain Monte Carlo (MCMC) estimation methods. We provide results from a Monte Carlo study for model performance and present an application of the model.