GDI’s work focuses on three primary areas. First is our neutral, independent, transparent index of a website’s risk of disinforming readers. We employ cutting edge artificial intelligence combined with thorough analyses of journalistic practice to best serve and inform advertisers, the ad tech industry, search and social media companies, and researchers.
Second is our independent, non-profit open source intelligence (OSINT) hub, which tracks disinformation and extremism across platforms online. We serve a broad array of NGOs, ad tech intermediaries and online platforms.
Third is our policy team that provides data and research to support policy makers in governments, regulatory bodies and platforms around the world.
The core output of the Disinformation Index is our Dynamic Exclusion List (DEL) of global news publications rated high risk for disinformation. The DEL contains the highest risk websites and apps across multiple countries and languages and is continually updated to capture new disinformation sources and narratives. Ad tech companies and platforms can license GDI data to make more informed choices about their online ad purchases.
GDI’s Media Buying Audits provide in-depth analysis of a brand’s media buying strategy and assess the level of exposure to disinformation. The audits provide advertisers with metrics and qualitative analysis of advertising placements and include actionable recommendations tailored to brand values and corporate social responsibility goals.
Ad exchanges and supply-side platforms can use GDI data to screen properties for disinformation risk at scale. Inventory quality teams can access the website, article and app-level risk ratings to inform publisher onboarding and identify potential policy violations.
GDI has performed in-depth journalistic integrity assessments of high-profile media outlets in most major media markets using a transparent methodology compatible with the Journalism Trust Initiative’s international standard. With coverage of nearly two dozen of the world’s most impactful media markets, we index both the highest-risk and the lowest-risk media in each country. Find our individual country reports here.
GDI’s open source intelligence hub performs cross-platform OSINT tracking of disinformation and extremism online. Primarily serving GDI’s internal needs, this hub leverages both GDI’s internal resources and our network of global partners to provide intelligence on emerging disinformation threats to a broad array of NGOs, ad tech intermediaries and online platforms. Contact us to learn more about GDI’s disinformation intelligence capabilities.
We view disinformation as a byproduct of the attention-driven business models that power today’s internet. Disinformation is a global problem not contained by borders. Governments and policymakers around the world have a role to play in regulating and reforming the ad tech industry to counter disinformation and its harms. GDI provides access to our data and research to policy makers in government and elsewhere.
How does GDI use machine learning?
GDI has developed a new approach to disinformation detection using recent advances in Natural Language Processing (NLP) and Large Language Models (LLM). Embedding text using LLMs allows for analysis of words and their relationship to each other in a sentence, as opposed to traditional counting-based text analysis techniques. LLM-based detection models are more accurate and more easily adapted to address the ever evolving landscape of disinformation.
Our approach involves the tagging of sentences from previously assessed content which contains narratives that GDI tracks. The sentence tagging process is performed by third party researchers or GDI analysts trained to recognise disinformation according to the GDI definition of disinformation. To ensure consistency, all analysts use a codebook that sets rules for the repeatable and measurable identification of potential disinformation. The data from which these sentences are selected is anonymised for domain, author and any other identifying attributes.
The tagged sentences are used to construct digital filters which are then encoded using an LLM. These encoded filters are then used to identify potential disinformation in newly seen content. When new content is analysed, each sentence contained in that content is also encoded using a LLM. This allows us to use our machine learning models to determine how "close" each sentence in the article is to the tagged sentence filters. A website is flagged for Manual Review when a significant number of articles on that website contain sentences that match tagged sentence filters.
How does GDI decide which websites to assess?
GDI looks at a number of different data points to find websites that might be suitable for assessment. These data points include:
Intelligence data: websites identified through GDI’s ongoing intelligence monitoring.
Intelligence partners: websites from third party intelligence sources.
Website similarity: analysis of website traffic data to identify audience overlap.
Ad tech data: analysis of bid stream data sets provided by third parties.
Licensees: websites suggested for review by GDI’s ad tech partners.
Any new websites are queued for crawling in GDI’s Veracity.ai platform.
How does GDI assess disinformation risk?
Disinformation Risk Rating of the Open Web
GDI risk assessment of websites relies on a review process performed by a team of trained intelligence analysts. The Manual Review is run each week across sites identified by our machine learning classifiers as carrying the highest potential disinformation risk.
Websites are analysed to identify the presence of keywords that may relate to the disinformation narratives tracked by GDI. Any articles that contain these keywords are then run through our classifiers. The classifiers use a Large Language Model to encode sentences and analyse individual words and their relationship to others in each sentence. The sentences within each flagged article are then compared to narrative filters, written by trained analysts, that act as examples of the disinformation themes we are trying to identify.
The Manual Review assesses content against our adversarial narrative conflict framework. All content reviewed is anonymised for website, author and any other identifying attributes. Each website is reviewed by a minimum of two intelligence analysts who perform a “blind” review meaning that they do not see each other’s rating. Analysts can assign one of four labels:
If both analysts’ ratings agree then consensus is reached and the website is assigned that label. If there is no consensus a tie-break review is performed by a “resolver” who is also a trained intelligence analyst.
Disinformation Risk Rating of Apps
We are unable to replicate our open web methodology for app environments which cannot be studied in the same way. Instead, the disinformation rating for app environments is based on the ability to verify a direct connection between apps and websites on the DEL. The verification process involves the identification of apps owned by or directly affiliated with websites rated as high risk for open web disinformation.
These apps are surfaced using a combination of data analysis and manual human review. GDI receives app store and app bid stream data from a third party research partner. This data is cleaned before being used to map mobile and Connected TV apps to open web domains. A manual review is performed to verify the ownership and connection between website and app to eliminate false positives. Any apps identified in this way are added to the Dynamic Exclusion List.
Disinformation Risk Rating of YouTube Channels
There are three different phases to the risk rating of YouTube channels.
Verification - When a new website is added to the open web DEL a verification process is performed to identify any YouTube channels that are run by or directly affiliated to that website. Potentially linked channels are surfaced by extracting YouTube links from open web data and are manually matched to the official channel(s) of that same publisher.
Adjacency - The next phase focuses on the assessment of channels that are adjacent to any channels on the DEL. Adjacent channels are those to which DEL channels have subscribed or those that are linked to within descriptions of videos hosted by the DEL channel.
Analysts then pull a sample of transcripts from videos hosted by these adjacent channels and run the transcript text through our machine learning classifiers. Any transcripts that trigger the classifiers are then manually reviewed in the same anonymised way as text content from the open web. Any channels rated as high risk are then added to the DEL.
Discovery -The final phase of YouTube risk ratings is achieved by identifying channels that have a high level of audience overlap with channels on the DEL. To do this, analysts extract other channels subscribed to by users already subscribed to DEL channels and measure the overlap in audience. They also look for any crossover in users who comment on the videos within these channels and use this as a secondary signal.
A sample of transcripts is then pulled and run through the process for rating disinformation risk. Any channels rated as high risk are added to the DEL.
How does GDI remove bias from its data?
Neutrality is of utmost importance to GDI and is one of the three core pillars on which GDI was founded. For risk assessments of news publishers to be widely accepted and implemented, it is critical that they are unbiased and non-political.
All data tagging follows a strict set of guidelines which are documented in a standardised codebook. This codebook was developed using best practices across the fields of research, data science and intelligence. The codebook sets out GDI’s methodology and approach to disinformation rating along with the rules for tagging data in a consistent and repeatable manner.
Establishing a codebook and methodology also allows research partners from GDI’s global network to contribute to the tagging process in a uniform way. The diversity of experience, language ability and cultural expertise supplied by partners is another crucial component that helps remove bias from the data.
Lastly, we use F1 scores to understand how well the machine learning models are performing with respect to precision and recall. F1 scores allow us to assess whether the models are working as intended and provide valuable model performance feedback to our data science team.
How does GDI avoid political bias?
Neutrality is a core pillar of GDI’s work. GDI's methodology does not measure partisanship or the political, religious or ideological orientation of a given site. Rather, the index measures indicators of disinformation risk based on the adversarial narrative framework. The review of content includes indicators of the degree of adversariality, and the researchers implementing the framework are trained by GDI to employ a highly structured methodology, including how to detect disinformation risk across the political spectrum.
Why does GDI charge for the Dynamic Exclusion List?
Disinformation risk rating is a market-demanded tool by advertisers who want to protect their brands. By licensing our data, GDI is able to receive feedback from the market. Companies tell us which data is most valuable as they seek to limit brand safety risk. That feedback is vital for GDI to ensure our data is meeting a real market need.
GDI believes competition is essential to provide high quality choices to advertisers. GDI does not want or claim a monopoly on assessing disinformation. Giving away the data for free would risk creating a de facto monopoly. A robust marketplace of entities that assess disinformation ensures rigour in our collective work.
What is the difference between GDI’s exclusion list and its Media Market Reviews?
Dynamic Exclusion List (DEL)
GDI’s DEL is the main output of our Disinformation Index which provides risk ratings for global news websites, apps and YouTube channels. The DEL is a proprietary data set licensed by digital media and ad tech companies and integrated into their broader approaches to brand safety and ad placement.
We combine algorithmic classification with expert human review of content to assess brand safety risk in the category of disinformation. Data inputs from web crawling and 3rd party sources are classified for risk by our machine learning models. Machine learning outputs are then assessed against our disinformation framework by our trained intelligence analysts using a blind review process. Our risk rating methodology is built to ensure consistency in data tagging and the ability to validate our models according to industry bias and fairness standards.
Only those websites, apps and YouTube channels found to persistently publish adversarial narratives are added to the DEL. All entities on the DEL are periodically re-reviewed to capture changes in content that may affect a news source’s risk ratings. All publishers and content creators have the ability to request a review of the results of our determinations.
Media Market Reviews (MMRs)
MMRs were studies conducted by GDI’s research team that evaluated the level of disinformation risk present in a country’s online news media market. These public-facing reports were typically funded by philanthropic organisations that seek to align market incentives to reduce disinformation across the globe. We conducted the studies in partnership with locally-based media experts, typically academic researchers. Researchers were trained to employ a highly structured methodology that assesses the content and operations of the websites sampled for the study.
All content reviewed was anonymised for website, author and any other identifying attributes. Each website’s risk rating was calculated relative to the other websites sampled for the study. There is no connection between the outputs of an MMR and the DEL. A website scored as “high risk” in an MMR was therefore not automatically added to the Dynamic Exclusion List.
Get in touch to find out more about any of our products or to discuss how GDI can provide customised data and intelligence solutions to help you reduce the risk posed by online disinformation.
Contact Us