| Description: |
Phraseological competence is crucial for language acquisition, processing, and fluency (Ellis et al., 2008; Paquot et al., 2020) but remains challenging for L2 learners (Laufer & Waldman, 2011; Paquot & Granger, 2012). Recent empirical studies have increasingly conceptualized phraseological competence as a multidimensional construct that includes facets such as accuracy, depth, and breadth (Naismith & Juffs, 2025; Paquot, 2019; Xu, 2018). This conceptualization raises important questions about how phraseological competence should be assessed and which dimensions are most salient in different contexts. In research on linguistic constructs more broadly, two complementary approaches are commonly adopted: (1) analyzing learner output to identify salient performance features, and (2) examining rater orientations to uncover the criteria underlying expert judgments (Ducasse & Brown, 2009). Within phraseological research, however, the first approach—relying on corpus-based analyses and computational indices—has been overwhelmingly dominant (e.g., Paquot, 2019; Paquot & Naets, 2025). In contrast, little is known about the dimensions that human raters focus on when evaluating phraseological competence, even though their judgments are often regarded as the gold standard for establishing construct validity (Crossley et al., 2013). This study investigates how human raters assess phraseological competence, with the aim of identifying the specific dimensions to which they are most sensitive. To this end, it adopts comparative judgment (CJ), a methodological innovation increasingly used to assess complex and multidimensional linguistic constructs in applied linguistics (Thwaites & Paquot, 2024; Verhavert et al., 2019). In CJ, expert raters compare pairs of performances and decide which one demonstrates a higher level of the target construct. These pairwise comparisons are then modelled statistically using the Bradley-Terry-Luce (BTL) framework. CJ has proven to be a reliable and valid method for ... |