Hassanpour S, Tomita N, DeLise T, Crosier B, Marsch L. (2019). Identifying substance use risk based on deep neural networks and Instagram social media data. Neuropsychopharmacology. 44: 487–494. doi: 10.1038/s41386-018-0247-x
Researchers built and tested a machine learning model to identify substance use risk among adults (n = 2,287) based on images, captions, and comments from Instagram posts. Recruitment occurred through an incentivized crowdsourcing platform and word of mouth. Participants reported substance use in a web-based survey, which researchers used to classify each participant as “low-risk” or “high-risk” with regard to substance use. Researchers randomly extracted 20 anonymized Instagram posts and accompanying captions and comments from each of the 2,287 participant Instagram accounts for analysis. Next, researchers divided the Instagram data into training (80%), validation (10%), and test (10%) sets. For two weeks, researchers trained the model using the training set and used the validation set to improve model parameters. To evaluate the model, researchers used the test set (228 randomly selected Instagram users) to compare participant-reported risk level with the machine learning model-classified risk level. The resulting machine learning model was able to detect alcohol risk significantly better than chance (precision: 68.6%, recall: 76.6%, F-measure: 72.4%). The model was unable to reliably detect tobacco, prescription drug, or other illicit drug use. Participants who were younger, white, had fewer captions and comments per post, and posted more facial images had an elevated risk for alcohol use compared with their counterparts. Innovative deep-learning approaches can utilize social media data to provide new insights into low-impact identification of population-level alcohol use risk. Future research could develop models for substance use and other behavioral health risks (e.g., depression) on other platforms (e.g., Facebook, Twitter) and for targeted high-risk populations.