| Title: |
The Insight-Inference Loop: Efficient Text Classification via Natural Language Inference and Threshold-Tuning |
| Language: |
English |
| Authors: |
Sandrine Chausson (ORCID 0009-0005-4415-4962); Marion Fourcade (ORCID 0000-0002-4821-9031); David J. Harding (ORCID 0000-0002-2121-0790); Björn Ross (ORCID 0000-0003-2717-3705); Grégory Renard |
| Source: |
Sociological Methods & Research. 2026 55(2):568-615. |
| Availability: |
SAGE Publications. 2455 Teller Road, Thousand Oaks, CA 91320. Tel: 800-818-7243; Tel: 805-499-9774; Fax: 800-583-2665; e-mail: journals@sagepub.com; Web site: https://sagepub.com |
| Peer Reviewed: |
Y |
| Page Count: |
48 |
| Publication Date: |
2026 |
| Document Type: |
Journal Articles; Reports - Research |
| Descriptors: |
Classification; Artificial Intelligence; Social Science Research; Natural Language Processing; Social Media; Elections |
| DOI: |
10.1177/00491241251326819 |
| ISSN: |
0049-1241; 1552-8294 |
| Abstract: |
Modern computational text classification methods have brought social scientists tantalizingly close to the goal of unlocking vast insights buried in text data--from centuries of historical documents to streams of social media posts. Yet three barriers still stand in the way: the tedious labor of manual text annotation, the technical complexity that keeps these tools out of reach for many researchers, and, perhaps most critically, the challenge of bridging the gap between sophisticated algorithms and the deep theoretical understanding social scientists have already developed about human interactions, social structures, and institutions. To counter these limitations, we propose an approach to large-scale text analysis that requires substantially less human-labeled data, and no machine learning expertise, and efficiently integrates the social scientist into critical steps in the workflow. This approach, which allows the detection of statements in text, relies on large language models pre-trained for natural language inference, and a "few-shot" threshold-tuning algorithm rooted in active learning principles. We describe and showcase our approach by analyzing tweets collected during the 2020 U.S. presidential election campaign, and benchmark it against various computational approaches across three datasets. |
| Abstractor: |
As Provided |
| Entry Date: |
2026 |
| Accession Number: |
EJ1502021 |
| Database: |
ERIC |