| Description: |
Detecting-Warning-Labels-on-E-Cigarette-Content Detecting Warning Labels on E-Cigarette Content Across Social Media Platforms Introduction This repository contains scripts for collecting data from TikTok and YouTube, processing them, and feeding them to a rule-based classifier. The pipeline consists of multiple steps, including video downloading, screenshot extraction, OCR processing, language detection, classification, and statistical analysis. Technical requirements Before proceeding, ensure that you have the following installed: Python 3.x An Oracle cloud instance A Box account with API access Basic command-line knowledge Required Dependencies Install the following Python libraries before running the scripts: pip install opencv-python pandas numpy pytesseract langdetect requests boxsdk Warning Label Detection Workflow The figure illustrates the framework for collecting and extracting text from YouTube and TikTok videos, detailing the language detection process and the development of a rule-based classifier for warning labels. Box Download First we need to sign up for a free version of box. We get to the developer console and then we access the APP console then we click create new app. Then we click on custom app, and then we create a custom app. Then we choose the authnetication type which is auth 2 and create the app afterwards. Now in the configuration tab of the App that we created we click on App access only. And then we choose Make API calls using using the as-user header and generate user access tokens. and then we get the client ID, client secret and developer token. The developer token has to be generated every 60 minutes because it will expire in every 60 minutes. We now get the folder ID from box. On the Oracle Instance we write a script to download all the videos. (get_videos.py) Image Processing Image processing procedures for each social media platform Screenshots The script takes screenshots every one second until the max time which is 79 seconds. Oracle Vision Takes the screenshots in form of ... |