Who is eligible to join the Consortium?
Consortium membership is by application. Our intent is to be inclusive while aiming to ensure the privacy and security of the Consortium’s data, and its ethical and public interest use. The Consortium welcomes applications from researchers – from diverse backgrounds, experiences, and who use varied methodologies – who undertake data-driven analysis related to content moderation.
To be an eligible candidate for membership, applicants must demonstrate the following:
- That they hold a primary institutional affiliation with an academic, journalistic, nonprofit, or civil society research organization. If they are students, they must be master’s or PhD level students; undergraduate students are ineligible at this time.
- Prior experience and relevant skills for data-driven analysis. Consortium datasets are primarily shared as JSON files and require technical skills to analyze.
- A specific public interest research use case for the data provided by the Consortium. (“Public interest research use case” means non-commercial research for journalistic, academic, or non-profit/civil society purposes.)
- Industry-standard plans and systems for safeguarding the privacy and security of the data provided by the Consortium. Consortium members are required to sign a data use agreement.
More information on eligibility and a link to the application is at the bottom of this page.
What data is shared with the Consortium?
To start, we are continuing our ongoing disclosures of persistent platform manipulation campaigns and information operations, which are prohibited by Twitter’s platform manipulation and spam policy. (Manipulation that we can reliably attribute to a government or state linked actor is considered an information operation.) Over time, we intend to share similarly comprehensive data about persistent platform manipulation campaigns that are not attributable to state-backed actors, as well as other content moderation policy areas and enforcement decisions – and we will update this page with more information when we do. The exact data types we share may vary depending on the types of activity in question.
Members of the Consortium have access to an archive of information operations datasets starting from 2018. We have attributed these information operations either publicly or internally. Once our teams have identified, removed and investigated these campaigns and any associated violative content, we share datasets with Consortium members. These datasets include profile information, Tweets and media (e.g., images and videos) from accounts we believe are connected to state linked information operations. Tweets and media which were deleted are not included in the datasets. The data the Consortium has access to is not hashed, unlike the public historic archive. Note that not all of the accounts we identified as connected to these campaigns actively Tweeted, so the number of accounts represented in the datasets may be less than the total number of accounts attributed to the information operation and enforced against.
All Consortium datasets require members to be able to analyze large datasets due to their size.
How is the publicly accessible information operations archive different from what the Consortium has access to?
Beginning in October 2018, we published the first comprehensive, public archive of data related to state-backed information operations. From that date through early 2022, when we launched the Twitter Moderation Research Consortium, we publicly shared 37 datasets of attributed platform manipulation campaigns originating from 17 countries, spanning more than 200 million Tweets and nine terabytes of media.
With the advent of the Twitter Moderation Research Consortium, we have discontinued public dataset releases, instead focusing on releasing data to the Consortium. The existing archive of information operations datasets continue to be available for download below — while no content has been redacted, some account-specific information has been hashed to protect account privacy.
Why is the publicly accessible information operations archive hashed?
For accounts with fewer than 5,000 followers, we hashed certain identifying fields (such as user ID and screen name) in the publicly-accessible archive. While we’ve taken precautions to minimize false positives in these datasets, we’ve nevertheless hashed select fields to reduce the potential for negative impact on authentic or compromised accounts — while still enabling longitudinal research, network analysis, and assessment of the underlying content created by these accounts.
Members of the Consortium are provided access to unhashed versions of these datasets for research. Consortium members agree to the terms of a data license agreement limiting usage of the unhashed datasets to research purposes, with provisions to ensure the researcher may only use the datasets pursuant to specific limitations and in conjunction with appropriate security measures.
Where else can I access Twitter data for research purposes?
If you are an academic, check out free academic access to our API for research here. Learn more about general API access here.
What can I do if I believe I've been included here in error?
If you believe your account has been included in one of these datasets in error, please log into your Twitter account and file a suspension appeal here for our full review.