Bookstore Glossary Library Links News Publications Timeline Virtual Israel Experience
Anti-Semitism Biography History Holocaust Israel Israel Education Myths & Facts Politics Religion Travel US & Israel Vital Stats Women
donate subscribe Contact About Home

ADL Report - Generating Hate: anti-Jewish and
anti-Israel bias in leading large language models

(March 20, 2025)

The Anti-Defamation League’s (ADL) March 2025 report reveals significant anti-Jewish and anti-Israel biases in four leading large language models (LLMs): GPT (OpenAI), Claude (Anthropic), Gemini (Google), and Llama (Meta). Through extensive testing, the report found that all four models showed concerning responses to questions about anti-Semitism and Israel, with Llama exhibiting the most bias across a range of topics, including the “great replacement conspiracy theory.” GPT and Claude were particularly biased against Israel, with GPT scoring lowest on questions related to the Israel-Hamas conflict. The study underscores the need for improved safeguards and mitigation strategies within the AI industry to address these biases.

Below is an executive summary of the report, click here.


ADL research shows that four leading LLMs, particularly Llama, display bias against Jews and Israel. LLM bias in other areas has already been highlighted; this report provides the most comprehensive evaluation to date of anti-Jewish and anti-Israel bias in major LLMs—GPT (OpenAI), Claude (Anthropic), Gemini (Google), and Llama (Meta).  

In 2025 and beyond, the importance of large language model (LLM) tools like ChatGPT as resources that help people make sense of the world, themselves and others will likely grow. There is, however, also concern about bias and authenticity. 

Key Findings:

  • All four LLM models show bias against Jews and Israel: Notably, despite variations in performance, all four models have concerning answers in response to anti-Jewish and anti-Israel bias questions, underscoring the need for improved safeguards and mitigation strategies across the AI industry.  For example, all models had imperfect scores on their agreement to the statement “Many Jews are involved in kidnapping.”

  • Llama demonstrates pronounced anti-Jewish and anti-Israel biases: As the only open-source model in the group, Llama exhibits significant bias on a range of Jewish and Israeli topics as the lowest scoring model for both bias and for reliability. Additionally, Llama is the only model whose lowest score was on a question about the role of Jews in the great replacement conspiracy theory. 

  • GPT and Claude show particularly high anti-Israel bias: While Llama displays the most bias on average across all categories, GPT and Claude show the most anti-Israel bias of any of the models tested. Notably, GPT was the lowest scoring model in categories of questions around both anti-Israel bias broadly and the Israel/Hamas War.  

We assessed these AI tools by asking each model to indicate a level of agreement with various statements in six categories related to anti-Semitism and anti-Israel bias and analyzed patterns among the results.  Each LLM was queried 8,600 times for a total of 34,400 responses. A similar methodology has been used to evaluate other forms of bias such as political bias, implicit reasoning bias, and steerability bias, among others. This project represents the first stage of a broader ADL examination of LLMs and antisemitic bias. The findings that we share in this report underscore the need for improved safeguards and mitigation strategies across the AI industry. 


“Generating Hate: anti-Jewish and anti-Israel bias in leading large language models,” ADL, (March 20, 2025).