Social Media Corpus

Dataset: Social Media Corpus

A select set of hand-curated de-identified social media (Facebook posts / comments) is available for access along with the accompanying / scored ground truth data.  All personal identifiers with respect to name, location, and other identifiable content has been removed and reduced to the underlying words for the purpose of general linguistic analysis.  Participants were requested to complete an additional consent for this dataset.

Access to the dataset is subject to the Data Usage Agreement (DUA) whereby researchers are forbidden from any attempt to de-identify participants.

Citation Information

Researchers leveraging this corpus of data are requested to use the following citation information and to confirm this citation information prior to publication.

Tesserae Social Media Corpus, https://tesserae.nd.edu

BibTeX Entry

To be added in October 2018