Dieses Ergebnis aus ERIC kann Gästen nicht angezeigt werden. Login für vollen Zugriff.

De-Identifying Student Personally Identifying Information with GPT-4

Title:	De-Identifying Student Personally Identifying Information with GPT-4
Language:	English
Authors:	Shreya Singhal; Andres Felipe Zambrano; Maciej Pankiewicz; Xiner Liu; Chelsea Porter; Ryan S. Baker
Source:	International Educational Data Mining Society. 2024.
Availability:	International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: https://educationaldatamining.org/conferences/
Peer Reviewed:	Y
Page Count:	7
Publication Date:	2024
Document Type:	Speeches/Meeting Papers; Reports - Research
Descriptors:	MOOCs; Privacy; Confidential Records; Student Records; Information Security; Artificial Intelligence; Intelligent Tutoring Systems; Natural Language Processing; Identification; Discussion Groups; Computer Mediated Communication; Electronic Learning; Data Collection; Technology Uses in Education
Abstract:	Education is increasingly taking place in learning environments mediated by technology. This transition has made it easier to collect student-generated data including comments in discussion forums and chats. Although this data is extremely valuable to researchers, it often contains sensitive information like names, locations, social media links, and other personally identifying information (PII) that must be carefully redacted before utilizing the data for research to protect their privacy. Historically, this task of redacting PII has been painstakingly conducted by humans; more recently, some researchers have attempted to use regular expressions and supervised machine-learning methods. Nowadays, with the recent high performance shown by Large Language Models in a wide range of tasks, they have become another alternative to be explored for de-identifying educational data. In this work, we assess GPT-4's performance in de-identifying data from discussion forums in 9 Massive Open Online Courses. Our results show an average recall of 0.958 for identifying PII that needs to be redacted, suggesting that it is an appropriate tool for this purpose. Our tool is also successful at identifying cases missed by humans when redacting data. These findings indicate that GPT-4 can not only increase the efficiency but also enhance the quality of the redaction process. However, the precision of such redaction is considerably worse (0.526), over-redacting names and locations that do not represent PII, showing a need for further improvement. [For the complete proceedings, see ED675485.]
Abstractor:	As Provided
Entry Date:	2025
Accession Number:	ED675571
Database:	ERIC